Hi everyone in our last 2 tutorials we studied Eculidean Distance and Pearson Correlation Score for finding out *similarity* among people. Now its time to recommend some items to people which they have never tried.

Have you have ever thought that how different shopping websites or social media websites recommend items to us which we have never tried. Well there are multiple approaches, complex ones as well, to solve that problem, but at present we will look into one of the easiest ways for basic idea.

**Approach for Recommending Items**

Mango | Banana | Strawberry | Pineapple | Orange | Apple | |

John | 4.5 | 3.5 | 4 | 4 | ||

Martha | 2.5 | 4.5 | 5 | 3.5 | ||

Mathew | 3.75 | 4.25 | 3 | 3.5 | ||

Nick | 4 | 3 | 4.5 | 4.5 |

We will use similarity score for finding out similarity among people, then we will check for the missing items for a person in comparison to others. Before I move on further in depth, lets take an example for better mapping. Using our above data-set we are required to recommend items to ‘John’.

- Calculate similarity score of everyone with respect to ‘John’
- Now list out items which others have provided rating but ‘John’ hasn’t.
- We will use weighted rating for getting better result, that is, take product similarity score with each item, corresponding to other people.
- In case of ‘Martha’, fruits which ‘John’ didn’t rate are Orange and Apple
- Similarity score between ‘Martha’ and ‘John’ is ‘0.4721359549995794’
- Weighted score =
*(Similarity_Score * Rating)* - For Orange weighted score = 0.4721359549995794 * 5 = 2.360679774997897
- Calculate weighted score corresponding to each fruit and for every other person.

- Calculate sum of all the similarity scores corresponding to each other item
- For Orange
*Sum of Similarity per Item (sspi)*= Sum of Similarity Score of ‘Martha’ and ‘Nick’*sspi*= 0.4721359549995794 + 0.5358983848622454*sspi*= 1.008034339861825

- For Apple
*Sum of Similarity per Item (sspi)*= Sum of Similarity Score of ‘Martha’ and ‘Mathew’*sspi*= 0.4721359549995794 + 0.439607805437114*sspi*= 0.9117437604366934

- For Orange
- Calculate
*Sum of Weighted Score per Item (swcpi)*- For Orange
*swcpi = (Martha_Similarity_Score * Rating) + (Nick_Similarity_Score * Rating)**swcpi*= (0.4721359549995794 * 5) + (0.5358983848622454*4.5)*swcpi = 4.772222506878001*

- For Apple
*swcpi*= 3.191103161528427

- For Orange
- For better result we will take
*average of weighted score*with respect to*Sum of Similarity per Item*.- For Orange
*Average Weighted Score (aws)*= (*Sum of Weighted Score per Item*)/(*Sum of Similarity per Item*)*aws*= (*4.772222506878001) / (1.008034339861825)**aws*= 4.734186444017519

- For Apple
*Average Weighted Score (aws)*= 3.5

- For Orange

The Ranking of fruits for John are equal to *Average Weighted Score (aws)*

Python implementation for above algorithm (here I have used Euclidea Distance formula for calculating similarity, you can use any other mathematical model as well, for doing the same like Pearson Correlation Score)

#Dictionary of People rating for fruits choices={'John': {'Mango':4.5, 'Banana':3.5, 'Strawberry':4.0, 'Pineapple':4.0}, 'Nick': {'Mango':4.0, 'Orange':4.5, 'Banana':3.0, 'Pineapple':4.5}, 'Martha': {'Orange':5.0, 'Banana':2.5, 'Strawberry':4.5, 'Apple':3.5}, 'Mathew': {'Mango':3.75, 'Strawberry':4.25, 'Apple':3.5, 'Pineapple':3.0}} import pandas as pd from math import sqrt class testClass(): def create_csv(self): df = pd.DataFrame.from_dict(choices, orient='index') df.to_csv('fruits.csv') #Finding Similarity among people using Eucledian Distance Formula def choice_distance(self, cho, per1, per2): #Will set the following dictionary if data is common for two persons sample_data={} #Above mentioned varibale is an empty dictionary, that is length =0 for items in cho[per1]: if items in cho[per2]: sample_data[items]=1 #Value is being set 1 for those items which are same for both persons #If both person does not have any similarity or similar items return 0 if len(sample_data)==0: return 0 #Calculating Euclidean Distance final_sum = sum([pow(cho[per1][items]-cho[per2][items],2) for items in cho[per1] if items in cho[per2]]) return(1/(1+sqrt(final_sum))) #Value being returned above always lies between 0 and 1 #Value 1 is added to sqrt to prevent 1/0 division and to normaloze result. #Calculating similarity value for a person with repect to other people def scoreForAll(self,cho,similarity=choice_distance): for others in cho: if others!='John': score=similarity(self, cho, 'John', others),others #Remember to add self keyword in above call print(score) #Recommending which fruit should a person try, which he or she has never tried def recommendation(self, cho, per, sim_method=choice_distance): sumS={} total={} for others in cho: #Removing the comparison of the person to itself who needs recommendations. if others==per: continue similarVal=sim_method(self,cho,per,others) if similarVal == 0: continue #IF You Are Using Pearson Correlation Score then uncomment the below code # and comment the line of code #if similarVal<=0: continue for fruits in cho[others]: if fruits not in cho[per] or cho[per][fruits]==0: #multiply similarity score with rating total.setdefault(fruits,0) total[fruits]+=cho[others][fruits]*similarVal #calculate sum of similarities sumS.setdefault(fruits,0) sumS[fruits]+=similarVal #Generating normalized data result=[(totalVal/sumS[fruits],fruits) for fruits,totalVal in total.items()] result.sort() result.reverse() return result def main(): ob = testClass() ob.create_csv() ob.scoreForAll(choices) print(ob.recommendation(choices,'John')) if __name__ == "__main__": main()

#### Output

(0.5358983848622454, 'Nick') (0.4721359549995794, 'Martha') (0.439607805437114, 'Mathew') [(4.734186444017522, 'Orange'), (3.5, 'Apple')]

In out next tutorial I will come up with some new interesting techniques, to make you think how easy and interesting Machine Learning is!!

Stay tuned and keep learning!!

For more updates and news related to this blog as well as to data science, machine learning and data visualization, please follow our facebook page by clicking this link.