DS Concepts DS Languages

Recommending Items

Recommending Items

Hi everyone! In our last 2 tutorials we studied Euclidean Distance and Pearson Correlation Score for finding out similarity among people. Now, it’s time for recommending items to people which they have never tried.

Have you have ever thought that how different shopping websites or social media websites recommend items to us which we have never tried. Well there are multiple approaches, complex ones as well, to solve that problem, but at present we will look into one of the easiest ways for basic idea.

Approach for Recommending Items

Mango Banana Strawberry Pineapple Orange Apple
John 4.5 3.5 4 4
Martha 2.5 4.5 5 3.5
Mathew 3.75 4.25 3 3.5
Nick 4 3 4.5 4.5

We will use similarity score for finding out similarity among people, then we will check for the missing items for a person in comparison to others. Before I move on further in depth, lets take an example for better mapping. Using our above data-set we are required to recommend items to ‘John’.

  1. Calculate similarity score of everyone with respect to ‘John’
  2. Now list out items which others have provided rating but ‘John’ hasn’t.
  3. We will use weighted rating for getting better result, that is, take product similarity score with each item, corresponding to other people.
    • In case of ‘Martha’, fruits which ‘John’ didn’t rate are Orange and Apple
    • Similarity score between ‘Martha’ and ‘John’ is ‘0.4721359549995794’
    • Weighted score = (Similarity_Score * Rating)
    • For Orange weighted score = 0.4721359549995794 * 5 = 2.360679774997897
    • Calculate weighted score corresponding to each fruit and for every other person.
  4. Calculate sum of all the similarity scores corresponding to each other item
    • For Orange Sum of Similarity per Item (sspi) = Sum of Similarity Score of ‘Martha’ and ‘Nick’
      • sspi = 0.4721359549995794 + 0.5358983848622454
      • sspi = 1.008034339861825
    • For Apple Sum of Similarity per Item (sspi) = Sum of Similarity Score of ‘Martha’ and ‘Mathew’
      • sspi = 0.4721359549995794 + 0.439607805437114
      • sspi = 0.9117437604366934
  5. Calculate Sum of Weighted Score per Item (swcpi)
    • For Orange swcpi = (Martha_Similarity_Score * Rating) + (Nick_Similarity_Score * Rating) 
      • swcpi = (0.4721359549995794 * 5) + (0.5358983848622454*4.5)
      • swcpi = 4.772222506878001
    • For Apple swcpi = 3.191103161528427
  6. For better result we will take average of weighted score with respect to Sum of Similarity per Item.
    • For Orange Average Weighted Score (aws) = (Sum of Weighted Score per Item)/(Sum of Similarity per Item)
      • aws = (4.772222506878001) / (1.008034339861825)
      • aws = 4.734186444017519
    • For Apple Average Weighted Score (aws) = 3.5

The Ranking of fruits for John are equal to Average Weighted Score (aws)

Python implementation for above algorithm

Here I have used Euclidean Distance formula for calculating similarity, you can use any other mathematical model as well, for doing the same like Pearson Correlation Score.

[sourcecode language=”python” wraplines=”false” collapse=”false”]
#Dictionary of People rating for fruits
choices={‘John’: {‘Mango’:4.5, ‘Banana’:3.5, ‘Strawberry’:4.0, ‘Pineapple’:4.0},
‘Nick’: {‘Mango’:4.0, ‘Orange’:4.5, ‘Banana’:3.0, ‘Pineapple’:4.5},
‘Martha’: {‘Orange’:5.0, ‘Banana’:2.5, ‘Strawberry’:4.5, ‘Apple’:3.5},
‘Mathew’: {‘Mango’:3.75, ‘Strawberry’:4.25, ‘Apple’:3.5, ‘Pineapple’:3.0}}

import pandas as pd

from math import sqrt

class testClass():
def create_csv(self):
df = pd.DataFrame.from_dict(choices, orient=’index’)
df.to_csv(‘fruits.csv’)

#Finding Similarity among people using Eucledian Distance Formula

def choice_distance(self, cho, per1, per2):
#Will set the following dictionary if data is common for two persons
sample_data={}
#Above mentioned varibale is an empty dictionary, that is length =0

for items in cho[per1]:
if items in cho[per2]:
sample_data[items]=1
#Value is being set 1 for those items which are same for both persons

#If both person does not have any similarity or similar items return 0
if len(sample_data)==0: return 0

#Calculating Euclidean Distance
final_sum = sum([pow(cho[per1][items]-cho[per2][items],2) for items in cho[per1] if items in cho[per2]])
return(1/(1+sqrt(final_sum)))
#Value being returned above always lies between 0 and 1
#Value 1 is added to sqrt to prevent 1/0 division and to normaloze result.

#Calculating similarity value for a person with repect to other people
def scoreForAll(self,cho,similarity=choice_distance):
for others in cho:
if others!=’John’:
score=similarity(self, cho, ‘John’, others),others
#Remember to add self keyword in above call
print(score)

#Recommending which fruit should a person try, which he or she has never tried
def recommendation(self, cho, per, sim_method=choice_distance):
sumS={}
total={}

for others in cho:
#Removing the comparison of the person to itself who needs recommendations.
if others==per: continue
similarVal=sim_method(self,cho,per,others)
if similarVal == 0: continue
#IF You Are Using Pearson Correlation Score then uncomment the below code
# and comment the line of code
#if similarVal<=0: continue

for fruits in cho[others]:
if fruits not in cho[per] or cho[per][fruits]==0:
#multiply similarity score with rating
total.setdefault(fruits,0)
total[fruits]+=cho[others][fruits]*similarVal

#calculate sum of similarities
sumS.setdefault(fruits,0)
sumS[fruits]+=similarVal

#Generating normalized data
result=[(totalVal/sumS[fruits],fruits) for fruits,totalVal in total.items()]
result.sort()
result.reverse()
return result

def main():

ob = testClass()
ob.create_csv()
ob.scoreForAll(choices)
print(ob.recommendation(choices,’John’))

if __name__ == “__main__”:
main()
[/sourcecode]

Output

[sourcecode language=”python” wraplines=”false” collapse=”false”]
(0.5358983848622454, ‘Nick’)
(0.4721359549995794, ‘Martha’)
(0.439607805437114, ‘Mathew’)
[(4.734186444017522, ‘Orange’), (3.5, ‘Apple’)]
[/sourcecode]

In out next tutorial I will come up with some new interesting techniques, to make you think how easy and interesting Machine Learning is!!

Stay tuned and keep learning!!

For more updates and news related to this blog as well as to data science, machine learning and data visualization, please follow our facebook page by clicking this link.

Leave a Reply

Back To Top
%d bloggers like this: