PEARSON CORRELATION SCORE

Hello everyone, in our last Machine Learning tutorial we learnt about that how we can use Euclidean Distance formula to find out similarity among people. In this tutorial we learn a new way to do the same thing but in a bit complex or rather I should say in advanced manner. We will use Pearson Correlation Score for calculating similarity among people. It has one major difference in the result being generated by it, in comparison to Euclidean Distance, that even if the distance between the values of fruits provided by two persons is high, but if it is consistent, that is difference is nearly is consistent through out all fruits, then Pearson Correlation Score will mark both persons highly similar or totally same.

Mango Banana Strawberry Pineapple Orange Apple
John 4.5 3.5 4 4
Martha 2.5 4.5 5 3.5
Mathew 3.75 4.25 3 3.5
Nick 4 3 4.5 4.5

For example in the above data if we look at ‘John’ and ‘Martha’ the distance between the fruits between them is nearly same, as a result Pearson Correlation Value will be around ‘1’ for them.

Pearson Correlation Score

We will calculate Pearson Correlation Score only for those fruits which are common for both the persons.

Pearson_correlation

Above formula provides us the Pearson Correlation Coefficient or Score, where ‘n’ is the sample size or total number of fruits, ‘x’ and ‘y’ are the values corresponding to each fruit.

Python code for the above method:

#Dictionary of People rating for fruits
choices={'John': {'Mango':4.5, 'Banana':3.5, 'Strawberry':4.0, 'Pineapple':4.0},
'Nick': {'Mango':4.0, 'Orange':4.5, 'Banana':3.0, 'Pineapple':4.5},
'Martha': {'Orange':5.0, 'Banana':2.5, 'Strawberry':4.5, 'Apple':3.5},
'Mathew': {'Mango':3.75, 'Strawberry':4.25, 'Apple':3.5, 'Pineapple':3.0}}

from math import sqrt
#Finding Similarity among people using Eucledian Distance Formula
class testClass():
    def pearson(self, cho, per1, per2):
        #Will set the following dictionary if data is common for two persons
        sample_data={}
        #Above mentioned varibale is an empty dictionary, that is length =0

        for items in cho[per1]:
            if items in cho[per2]:
                sample_data[items]=1
                #Value is being set 1 for those items which are same for both persons

        #Calculating length of sample_data dictionary
        length = len(sample_data)
        #If both person does not have any similarity or similar items return 0
        if length==0: return 0

        #Remember one thing we will calculate all the below values only for common items
        #   or the items which are being shared by both person1 and person2, that's why
        #   we will be using sample_data dictionary in below loops 

        #Calculating Sum of all common elements for Person1 and Person2
        sum1=sum([cho[per1][val] for val in sample_data])
        sum2=sum([cho[per2][val] for val in sample_data])

        #Calculating Sum of squares of all common elements for both
        sumSq1=sum([pow(cho[per1][val],2) for val in sample_data])
        sumSq2=sum([pow(cho[per2][val],2) for val in sample_data])

        #Calculating Sum of Products of all common elements for both
        sumPr=sum([cho[per1][val]*cho[per2][val] for val in sample_data])

        #Calculating Person Correlation Score
        x = sumPr-(sum1*sum2/length)
        y = sqrt((sumSq1-pow(sum1,2)/length)*(sumSq2-pow(sum2,2)/length))

        if y==0 : return 0

        return(x/y)
        #Value being returned above always lies between -1 and 1
        #Value of 1 means maximum similarity

def main():
    ob = testClass()

    print(ob.pearson(choices, 'John', 'Nick'))
    print(ob.pearson(choices, 'John', 'Martha'))
    print(ob.pearson(choices, 'John', 'John'))

if __name__ == "__main__":
    main()

Output

0.6546536707079778

1.0

1.0
In our next tutorial we learn about a way for recommending items to people which they have never tried! For example a book or movie being recommended by some social media website to you depending upon your taste.
Stay tuned and keep learning!!
For more updates and news related to this blog as well as to data science, machine learning and data visualization, please follow our facebook page by clicking this link.

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s