
Indeed, words like ‘friendly’, ‘professional’ in terms of service are also key words that led to higher ratings. In the second analysis, words such as ‘food’, ‘service’ and ‘staff’ that appear on the list indicate that they are the key elements for a business in the service industry to succeed. Typically, they rate in the range of 2 to 3.5 stars, and rarely 1 or 1.5.

This shows that people tend to give higher ratings even though they write negative comments. Rather, they are spread out between 2 and 3.5 stars. In the first analysis of the relationship between stars and polarities, I noticed that although reviews of 4 stars and above tend to have mostly positive polarities, the negative polarities of reviews are not necessarily below 2 stars. Lastly, I used ExtratreesClassifier from the sklearn package to find the importance of a feature (a unique word or a bigram in each case) in terms of the impact it makes on the rating.įor the last analysis, I calculated the length of each review text and the ratings from each user review, and ran a linear regression model on the length and stars with the sklearn package. To find the importance of bigrams, I used BigramCollocationFinder from the NLTK package to first filter out stop words and then identified 50 bigrams with the highest PMI (Pointwise Mutual Information), or a measure of association between a feature and a class. In order to find out the importance of the individual words, I first used the bag-of-words technique to collect the unique words in the universe of reviews with the textmining package and represented each review as a vector after filtering out the stop words (such as “and”, “it”, etc.). Then, I ran a linear regression model using the sklearn package on the polarity and number of stars to calculate the correlation between the two.įor the second and third analysis, I wanted to find out what individual words/bigrams have more impact on the rating than other words/bigram. The files I used are “Business.json” which includes a “Business ID” and average “Stars” for each ID, and “Review.json” which includes the “Business ID”, “Stars”, “Text” from each user.įor the first analysis, I translated each review text into English and then calculated the polarity of each review from a user using the TextBlob Python package. I scraped Yelp review data using the Yelp API. This allows us to extract the key factors that matter the customer experience the most. Conducting sentiment analysis involves bag-of-word and bigram techniques, which collect all the unique words or bigram (a pair of consecutive words) and represent each text with a sequence of 0s and 1s where 1 represents the existence of the feature.

Diving into the sentiments of the review texts allows us to see actual users’ thoughts. I wanted to see if the relationship between polarity and rating was consistent, and if not, what the trends actually were. Polarity classifies whether the expressed opinion in a sentence is positive, negative, or neutral by assigning a number ranging from -1 to 1 taking strength of the opinion into account.

Thus, I wanted to investigate more about the reviews and ratings each restaurant received and how they related to each other in three major areas: polarities, bag-of-words/bigram and length of the review texts.

Yelp has not only changed how humans behave and interact with each other, but has also had a major impact on restaurants and their business as well. Studies have shown that an extra half-star on a review nets restaurants 19% more reservations (Tepper). It serves as a platform to view restaurant reviews, ratings from other users out of five stars, and photos. Yelp is an application people refer to when selecting businesses and restaurants to patronize. Var gform gform||(document.Nowadays, modern technology has affected people’s daily lives not only by providing information efficiently, but also influencing decision making.
