RECOMMENDING ITEMS

Hi everyone in our last 2 tutorials we studied Eculidean Distance and Pearson Correlation Score for finding out similarity among people. Now its time to recommend some items to people which they have never tried.

Have you have ever thought that how different shopping websites or social media websites recommend items to us which we have never tried. Well there are multiple approaches, complex ones as well, to solve that problem, but at present we will look into one of the easiest ways for basic idea.

Approach for Recommending Items

Mango Banana Strawberry Pineapple Orange Apple
John 4.5 3.5 4 4
Martha 2.5 4.5 5 3.5
Mathew 3.75 4.25 3 3.5
Nick 4 3 4.5 4.5

We will use similarity score for finding out similarity among people, then we will check for the missing items for a person in comparison to others. Before I move on further in depth, lets take an example for better mapping. Using our above data-set we are required to recommend items to ‘John’.

  1. Calculate similarity score of everyone with respect to ‘John’
  2. Now list out items which others have provided rating but ‘John’ hasn’t.
  3. We will use weighted rating for getting better result, that is, take product similarity score with each item, corresponding to other people.
    • In case of ‘Martha’, fruits which ‘John’ didn’t rate are Orange and Apple
    • Similarity score between ‘Martha’ and ‘John’ is ‘0.4721359549995794’
    • Weighted score = (Similarity_Score * Rating)
    • For Orange weighted score = 0.4721359549995794 * 5 = 2.360679774997897
    • Calculate weighted score corresponding to each fruit and for every other person.
  4. Calculate sum of all the similarity scores corresponding to each other item
    • For Orange Sum of Similarity per Item (sspi) = Sum of Similarity Score of ‘Martha’ and ‘Nick’
      • sspi = 0.4721359549995794 + 0.5358983848622454
      • sspi = 1.008034339861825
    • For Apple Sum of Similarity per Item (sspi) = Sum of Similarity Score of ‘Martha’ and ‘Mathew’
      • sspi = 0.4721359549995794 + 0.439607805437114
      • sspi = 0.9117437604366934
  5. Calculate Sum of Weighted Score per Item (swcpi)
    • For Orange swcpi = (Martha_Similarity_Score * Rating) + (Nick_Similarity_Score * Rating) 
      • swcpi = (0.4721359549995794 * 5) + (0.5358983848622454*4.5)
      • swcpi = 4.772222506878001
    • For Apple swcpi = 3.191103161528427
  6. For better result we will take average of weighted score with respect to Sum of Similarity per Item.
    • For Orange Average Weighted Score (aws) = (Sum of Weighted Score per Item)/(Sum of Similarity per Item)
      • aws = (4.772222506878001) / (1.008034339861825)
      • aws = 4.734186444017519
    • For Apple Average Weighted Score (aws) = 3.5

The Ranking of fruits for John are equal to Average Weighted Score (aws)

Python implementation for above algorithm (here I have used Euclidea Distance formula for calculating similarity, you can use any other mathematical model as well, for doing the same like Pearson Correlation Score)

#Dictionary of People rating for fruits
choices={'John': {'Mango':4.5, 'Banana':3.5, 'Strawberry':4.0, 'Pineapple':4.0},
'Nick': {'Mango':4.0, 'Orange':4.5, 'Banana':3.0, 'Pineapple':4.5},
'Martha': {'Orange':5.0, 'Banana':2.5, 'Strawberry':4.5, 'Apple':3.5},
'Mathew': {'Mango':3.75, 'Strawberry':4.25, 'Apple':3.5, 'Pineapple':3.0}}

import pandas as pd

from math import sqrt

class testClass():
    def create_csv(self):
        df = pd.DataFrame.from_dict(choices, orient='index')
        df.to_csv('fruits.csv')

    #Finding Similarity among people using Eucledian Distance Formula

    def choice_distance(self, cho, per1, per2):
        #Will set the following dictionary if data is common for two persons
        sample_data={}
        #Above mentioned varibale is an empty dictionary, that is length =0

        for items in cho[per1]:
            if items in cho[per2]:
                sample_data[items]=1
                #Value is being set 1 for those items which are same for both persons

        #If both person does not have any similarity or similar items return 0
        if len(sample_data)==0: return 0

        #Calculating Euclidean Distance
        final_sum = sum([pow(cho[per1][items]-cho[per2][items],2) for items in cho[per1] if items in cho[per2]])
        return(1/(1+sqrt(final_sum)))
        #Value being returned above always lies between 0 and 1
        #Value 1 is added to sqrt to prevent 1/0 division and to normaloze result.

    #Calculating similarity value for a person with repect to other people
    def scoreForAll(self,cho,similarity=choice_distance):
        for others in cho:
            if others!='John':
                score=similarity(self, cho, 'John', others),others
                #Remember to add self keyword in above call
                print(score)

    #Recommending which fruit should a person try, which he or she has never tried
    def recommendation(self, cho, per, sim_method=choice_distance):
        sumS={}
        total={}

        for others in cho:
            #Removing the comparison of the person to itself who needs recommendations.
            if others==per: continue
            similarVal=sim_method(self,cho,per,others)
            if similarVal == 0: continue
                    #IF You Are Using Pearson Correlation Score then uncomment the below code
                    # and comment the line of code
            #if similarVal<=0: continue

            for fruits in cho[others]:
                if fruits not in cho[per] or cho[per][fruits]==0:
                    #multiply similarity score with rating
                    total.setdefault(fruits,0)
                    total[fruits]+=cho[others][fruits]*similarVal

                    #calculate sum of similarities
                    sumS.setdefault(fruits,0)
                    sumS[fruits]+=similarVal

        #Generating normalized data
        result=[(totalVal/sumS[fruits],fruits) for fruits,totalVal in total.items()]
        result.sort()
        result.reverse()
        return result

def main():

    ob = testClass()
    ob.create_csv()
    ob.scoreForAll(choices)
    print(ob.recommendation(choices,'John'))

if __name__ == "__main__":
    main()

Output

(0.5358983848622454, 'Nick')
(0.4721359549995794, 'Martha')
(0.439607805437114, 'Mathew')
[(4.734186444017522, 'Orange'), (3.5, 'Apple')]

In out next tutorial I will come up with some new interesting techniques, to make you think how easy and interesting Machine Learning is!!

Stay tuned and keep learning!!

For more updates and news related to this blog as well as to data science, machine learning and data visualization, please follow our facebook page by clicking this link.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s