Recommender System for Restaurants Based on Comments

Yuemamsi
9 min readDec 17, 2020

--

Don’t know how to choose restaurants? Don’t worry, let a recommender “robot” help you!

Introduction

Restaurant recommendation systems are very popular nowadays, but most of them are based on the types of restaurants and customer ratings. However, the novelty of our system is based on the user’s comments for the restaurant on Yelp, and you can freely enter keywords that are not related to restaurant information, such as mood, weather, feelings, etc., and this system will recommend restaurants with high matching by using these keywords to search for the comments on Yelp. Simply, the recommendation logic of our system is to imagine each restaurant as a document, and the name of the restaurant is the name of the document, and the comments received by the restaurant are the paragraphs in the document, and the system will recommend the most matching “documents” based on the keywords. Our system largely solves the difficulties you encounter when choosing restaurants. As long as you are unable to decide which restaurant to go to, our system can always recommend a restaurant based on your keywords at that time.

Data

The whole dataset is from Yelp’s official website (https://www.yelp.com/dataset), and we download the corresponding files directly. For this project, we only focus on Business.json and Reviews.json out of all 6 files. The Business.json file contains the basic information of each company, and the important features we needed are business_id, categories, city, name, review_count, stars and state. The Reviews.json contains the customer comments for each company in the Business.json file, and the important features we needed are business_id, cool, funny, stars, useful and text. These two files can be merged based on the business_id feature.

2.1 Data Preprocessing

As these two files are pretty huge, we upload them to Hadoop File System to process them by PySpark, sorting out the restaurants’ information and their corresponding reviews. We first use SQL to count the businesses in each state by the groupby statement. We find that the data is highly unbalanced, as the top 10 states account for 99% of the data. Actually, in our proposal, we’d like to build the recommender system for Michigan, but unfortunately, there are not enough records to support it. Thus, we decide to use the data from Arizona, which contains the most data with 60,603 businesses. Since we are only interested in restaurants, we then filter the categories by keyword ‘restaurant’ with the review_count is larger or equal to 200, narrowing down the data to 1,983 businesses. Thus, we get a sample business dataset with 1,983 restaurants that have at least 200 reviews from customers. The sample business dataset is exported as business_az.csv. Finally, we use the business_id from the filtered business dataset to find out its corresponding review records, and we get a sample reviews dataset with 1,215,290 records, which accounts for about 61% of the entire reviews file. The sample business dataset is exported as reviews_az.csv.

2.2 Data Cleaning

import pandas as pd
business = pd.read_csv("business_az.csv",delimiter = '\t',names=["business_id","categories","city","name","review_count","average_stars","state"])
review = pd.read_csv("review_az.csv",delimiter = '\t',names=["business_id","cool","funny","personal_stars","useful","text"])
f = open('lemur-stopwords.txt','r')
stoplist = [i.strip() for i in f.readlines()]
review['clean'] = review['text'].apply(lambda x: ' '.join([i for i in x.lower().split() if i not in stoplist]))\
.apply(lambda x:''.join([i for i in x if i not in string.punctuation]))
data = pd.merge(business, review.drop(columns=['text']), on='business_id')
data['agree'] = data['cool'] + data['funny'] + data['useful']
data['agree_weighted'] = np.log(data['agree']+1)
data = data.dropna(subset=['clean']).fillna(0)
data.to_pickle("sample.pkl")

We use the Pandas library to read the sample business dataset and the sample reviews dataset as dataframes. We then import a stop words list called lemur-stopwords.txt (https://raw.githubusercontent.com/meta-toolkit/meta/master/data/lemur-stopwords.txt), and based on it, we remove stop words and punctuations in each comment of the sample reviews dataset. Then, we merge the sample business dataframe and the sample reviews dataframe based on the business_id column. Next, we check the null value for the merged dataset, and there are only cool, funny, useful and text columns containing the null value. We drop the rows with no text directly and fill in the null value with 0 for cool, funny and useful columns. As cool, funny and useful are features that measure how other customers agree with the review, we insert a new column called “agree” to sum them all up.

From the above figure, we can see the distribution of agree tends to the power-law distribution, which means a large portion of agree is clustered at the small value. To reduce the impact of the agree’s value difference, we insert a new column called “agree_weighted”, containing the normalized value of agree. The agree_weighted feature is an important basis for us to set the baseline. This cleaned merged data is finally exported as sample.pkl.

2.3 Ground Truth Relevance

In order to facilitate the evaluation of our recommender system, we create a training dataset by assuming 3 queries, namely “happy tonight”, “pop music” and “Chinese restaurant”. Then, we find out 1,000 comments containing the words in the query for 3 queries respectively. After browsing each comment in person, we rate the relevant score for all 1,000 comments according to the following scale. This training dataset is finally exported as groundtruth.csv.

- 2: It’s a very meaningful comment, and strongly related to the query

- 1: It’s a pertinent comment, and is weakly related to the query

- 0: It’s unrelated to the query, or it’s an abusive or vexatious comment

Methods

The document ranking algorithm is the core of our recommender system. The logic of our recommender system is to find the 10 most relevant comments based on the query entered by the user and recommend the restaurants to which these 10 comments belong. To this end, we explore 5 different document ranking algorithms.

We import the rank_bm25 library to help us set up models. The rank_bm25 library contains a revised BM25 algorithm, which is different from the normal BM25 algorithm as it ignores the parameter k3 and query term frequency (QTF). In addition to this revised BM25 model, based on the rank_bm25 library, we establish other 4 models by modifying the score calculation formula, which each correspond to the normal BM25 algorithm, the TF-IDF algorithm, the Pivoted Normalization algorithm and the InL2 algorithm respectively. Among them, the first three algorithms are very common and very widely used; while, the InL2 algorithm is a Divergence from the Randomness-based model, which measures the global informativeness of the term in the document collection: the more the term occurrences diverge from random throughout the collection, the more informative the term is (Benkoussas & Bellot, 2015). The formulas of the 5 used algorithms are as the following.

Revised BM25 (form the rank_bm25 library):

Normal BM25:

TF-IDF:

Pivoted Normalization:

InL2:

from rank_bm25 import BM25
class
InL2(BM25):
def __init__(self, corpus, tokenizer=None, b=0.1):
self.b = b
super().__init__(corpus, tokenizer)

def _calc_idf(self, nd):
for word, freq in nd.items():
idf = np.log((self.corpus_size + 1)/(freq+0.5))
self.idf[word] = idf

def get_scores(self, query):
score = np.zeros(self.corpus_size)
doc_len = np.array(self.doc_len)
for q in query:
q_freq = np.array([(doc.get(q) or 0) for doc in self.doc_freqs])
tfn = q_freq*np.log(1+self.b*self.avgdl/doc_len)
score += query.count(q)*\
(1/(tfn+1))*\
(tfn*(self.idf.get(q) or 0))
return score

We respectively test the effects of these 5 models on the created training dataset. By adjusting the parameter in each model (except the TF-IDF model), we find the most suitable parameters for each model according to our evaluation metrics. Finally, in order to increase the user’s freshness and experience and attract more young people, we visualize the recommender system based on the best model with the best evaluation results in the form of robot dialogue. Here, we import the Flask library and modify a robot-chat software of MIT to implement the visualization.

Results and Discussions

We use two metrics to evaluate the performance of algorithms in a binary way and a numerical way respectively:

Precision@10 (P@10). The P@10 score can calculate the proportion of the relevant score of 1 or 2 in the first 10 comments returned. A higher P@10 score means more accurateness.

Mean Absolute Error (MAE). The MAE score can accurately reflect the error between the relevant score of the returned comment and the expected relevant score of 2. A lower MAE score means fewer errors.

In order to evaluate our recommender system more intuitively, we set a baseline. We find the comments containing the words in the query, and return the first 10 comments from high to low according to agree_weighted. Based on the results obtained by this method, we calculate the P@10 score and the MAE score following our evaluation metric, and use these two scores as the baseline. The baselines of the P@10 score and the MAE score are 0.7667 and 0.8, respectively.

According to our evaluation metrics, the performances of the five models with the best parameters are shown in the above two 2 figures. In the end, we choose the normal BM25 model as the core algorithm to finalize our recommender system.

Based on the ground-truth data generated by us, our final model successfully defeats the baseline and achieves a very high P@10 score of 0.9, which means that 9 of the top 10 restaurants recommended by our recommender system have comments that are related to the query entered by the user. A lower MAE score means that our recommendation system recommends more restaurants with high relevant comments (a relevant score of 2) than the baseline. Such success is beyond our expectations. We believe that this success is because we have a sufficient number of comments, and the BM25 algorithm has better performance when the document contains query terms.

On the other hand, the success of our model is based on many preconditions. For example, we filter out restaurants with insufficient comments, and we assume that the words that users may enter are common everyday words and most of them are contained in 889,363 comments. Therefore, we believe that this recommendation system needs more research before it is actually put into use.

What’s Next

After comparing 5 different document ranking algorithms, we successfully use the BM25 algorithm to make a restaurant recommender system based on Yelp’s comments and beat the baseline by achieving a good P@10 score and an MAE score. Our recommendation system imagines the reviews of each restaurant as a document, and based on the user’s query, the system can recommend restaurants with highly matched comments. Here, we assume that users will like restaurants with comments that feel the same way they do. However, our system still has some limitations, so it would be a good idea to add latent semantic analysis in the future. Besides, in our actual operation, we find that without a large number of comments as support, our recommendation system will greatly lose practicality. At the same time, for some new restaurants without comments, we need to use other methods to make recommendations. Therefore, in addition to the comment, we consider adding “stars” given by customers as a basis in the future, and recommend restaurants from multiple dimension features, such as location, opening year, etc., to make up for the limitations of our recommender system.

Reference

Benkoussas, C., & Bellot, P. (2015). Information Retrieval and Graph Analysis Approaches for Book Recommendation. TheScientificWorldJournal, 2015, 926418. https://doi.org/10.1155/2015/926418

--

--

No responses yet