Hi MLEnthusiasts! Today, we will implement a case-study involving Credit Card Dataset for Clustering. We will discover customer segments to define marketing strategy. The dataset has been taken from Kaggle and the dataset has the following variables:
– CUST_ID
– BALANCE
– BALANCE_FREQUENCY
– PURCHASES
– ONEOFF_PURCHASES
– INSTALLMENTS_PURCHASES
– CASH_ADVANCE
– PURCHASES_FREQUENCY
– ONEOFF_PURCHASES_FREQUENCY
– CASH_ADVANCE_FREQUENCY
– CASH_ADVANCE_TRX
– PURCHASES_TRX
– CREDIT_LIMIT
– PAYMENTS
– MINIMUM_PAYMETS
– PRC_FULL_PAYMENT
– TENURE
Let’s start our analysis by first importing data in R using read.csv() function and then looking at its variables using View() function and summary() function.
creditCardData <- read.csv("CreditCard.csv")
summary(creditCardData)
## CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES
## C10001 : 1 Min. : 0.0 Min. :0.0000 Min. : 0.00
## C10002 : 1 1st Qu.: 128.3 1st Qu.:0.8889 1st Qu.: 39.63
## C10003 : 1 Median : 873.4 Median :1.0000 Median : 361.28
## C10004 : 1 Mean : 1564.5 Mean :0.8773 Mean : 1003.20
## C10005 : 1 3rd Qu.: 2054.1 3rd Qu.:1.0000 3rd Qu.: 1110.13
## C10006 : 1 Max. :19043.1 Max. :1.0000 Max. :49039.57
## (Other):8944
## ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE
## Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 38.0 Median : 89.0 Median : 0.0
## Mean : 592.4 Mean : 411.1 Mean : 978.9
## 3rd Qu.: 577.4 3rd Qu.: 468.6 3rd Qu.: 1113.8
## Max. :40761.2 Max. :22500.0 Max. :47137.2
##
## PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.08333 1st Qu.:0.00000
## Median :0.50000 Median :0.08333
## Mean :0.49035 Mean :0.20246
## 3rd Qu.:0.91667 3rd Qu.:0.30000
## Max. :1.00000 Max. :1.00000
##
## PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX
## Min. :0.0000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000
## Median :0.1667 Median :0.0000 Median : 0.000
## Mean :0.3644 Mean :0.1351 Mean : 3.249
## 3rd Qu.:0.7500 3rd Qu.:0.2222 3rd Qu.: 4.000
## Max. :1.0000 Max. :1.5000 Max. :123.000
##
## PURCHASES_TRX CREDIT_LIMIT PAYMENTS MINIMUM_PAYMENTS
## Min. : 0.00 Min. : 50 Min. : 0.0 Min. : 0.02
## 1st Qu.: 1.00 1st Qu.: 1600 1st Qu.: 383.3 1st Qu.: 169.12
## Median : 7.00 Median : 3000 Median : 856.9 Median : 312.34
## Mean : 14.71 Mean : 4494 Mean : 1733.1 Mean : 864.21
## 3rd Qu.: 17.00 3rd Qu.: 6500 3rd Qu.: 1901.1 3rd Qu.: 825.49
## Max. :358.00 Max. :30000 Max. :50721.5 Max. :76406.21
## NA's :1 NA's :313
## PRC_FULL_PAYMENT TENURE
## Min. :0.0000 Min. : 6.00
## 1st Qu.:0.0000 1st Qu.:12.00
## Median :0.0000 Median :12.00
## Mean :0.1537 Mean :11.52
## 3rd Qu.:0.1429 3rd Qu.:12.00
## Max. :1.0000 Max. :12.00
##
As you can see, there are missing values in the dataset, represented by NAs. So, here, we have to do missing value imputation by looking at the distribution of the variables which have missing values. One of them is CREDIT_LIMIT and other one is MINIMUM_PAYMENTS. We will have a look at the distribution by using the hist() function.
hist(creditCardData$CREDIT_LIMIT)
The data is right skewed. So, here will will replace the missing values by the median of the variable.
creditCardData$CREDIT_LIMIT[is.na(creditCardData$CREDIT_LIMIT)] <- median(creditCardData$CREDIT_LIMIT, na.rm = TRUE)
summary(creditCardData)
## CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES
## C10001 : 1 Min. : 0.0 Min. :0.0000 Min. : 0.00
## C10002 : 1 1st Qu.: 128.3 1st Qu.:0.8889 1st Qu.: 39.63
## C10003 : 1 Median : 873.4 Median :1.0000 Median : 361.28
## C10004 : 1 Mean : 1564.5 Mean :0.8773 Mean : 1003.20
## C10005 : 1 3rd Qu.: 2054.1 3rd Qu.:1.0000 3rd Qu.: 1110.13
## C10006 : 1 Max. :19043.1 Max. :1.0000 Max. :49039.57
## (Other):8944
## ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE
## Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 38.0 Median : 89.0 Median : 0.0
## Mean : 592.4 Mean : 411.1 Mean : 978.9
## 3rd Qu.: 577.4 3rd Qu.: 468.6 3rd Qu.: 1113.8
## Max. :40761.2 Max. :22500.0 Max. :47137.2
##
## PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.08333 1st Qu.:0.00000
## Median :0.50000 Median :0.08333
## Mean :0.49035 Mean :0.20246
## 3rd Qu.:0.91667 3rd Qu.:0.30000
## Max. :1.00000 Max. :1.00000
##
## PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX
## Min. :0.0000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000
## Median :0.1667 Median :0.0000 Median : 0.000
## Mean :0.3644 Mean :0.1351 Mean : 3.249
## 3rd Qu.:0.7500 3rd Qu.:0.2222 3rd Qu.: 4.000
## Max. :1.0000 Max. :1.5000 Max. :123.000
##
## PURCHASES_TRX CREDIT_LIMIT PAYMENTS MINIMUM_PAYMENTS
## Min. : 0.00 Min. : 50 Min. : 0.0 Min. : 0.02
## 1st Qu.: 1.00 1st Qu.: 1600 1st Qu.: 383.3 1st Qu.: 169.12
## Median : 7.00 Median : 3000 Median : 856.9 Median : 312.34
## Mean : 14.71 Mean : 4494 Mean : 1733.1 Mean : 864.21
## 3rd Qu.: 17.00 3rd Qu.: 6500 3rd Qu.: 1901.1 3rd Qu.: 825.49
## Max. :358.00 Max. :30000 Max. :50721.5 Max. :76406.21
## NA's :313
## PRC_FULL_PAYMENT TENURE
## Min. :0.0000 Min. : 6.00
## 1st Qu.:0.0000 1st Qu.:12.00
## Median :0.0000 Median :12.00
## Mean :0.1537 Mean :11.52
## 3rd Qu.:0.1429 3rd Qu.:12.00
## Max. :1.0000 Max. :12.00
##
Now, let’s do the same for MINIMUM_PAYMENTS.
hist(creditCardData$MINIMUM_PAYMENTS)
The data is right-skewed over here also with skewness co-efficient very high. The values greater than 500 can be called as outliers over here. Let’s replace all the missing values with median in this case too.
creditCardData$MINIMUM_PAYMENTS[is.na(creditCardData$MINIMUM_PAYMENTS)] <- median(creditCardData$MINIMUM_PAYMENTS, na.rm = TRUE)
summary(creditCardData)
## CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES
## C10001 : 1 Min. : 0.0 Min. :0.0000 Min. : 0.00
## C10002 : 1 1st Qu.: 128.3 1st Qu.:0.8889 1st Qu.: 39.63
## C10003 : 1 Median : 873.4 Median :1.0000 Median : 361.28
## C10004 : 1 Mean : 1564.5 Mean :0.8773 Mean : 1003.20
## C10005 : 1 3rd Qu.: 2054.1 3rd Qu.:1.0000 3rd Qu.: 1110.13
## C10006 : 1 Max. :19043.1 Max. :1.0000 Max. :49039.57
## (Other):8944
## ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE
## Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 38.0 Median : 89.0 Median : 0.0
## Mean : 592.4 Mean : 411.1 Mean : 978.9
## 3rd Qu.: 577.4 3rd Qu.: 468.6 3rd Qu.: 1113.8
## Max. :40761.2 Max. :22500.0 Max. :47137.2
##
## PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.08333 1st Qu.:0.00000
## Median :0.50000 Median :0.08333
## Mean :0.49035 Mean :0.20246
## 3rd Qu.:0.91667 3rd Qu.:0.30000
## Max. :1.00000 Max. :1.00000
##
## PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX
## Min. :0.0000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000
## Median :0.1667 Median :0.0000 Median : 0.000
## Mean :0.3644 Mean :0.1351 Mean : 3.249
## 3rd Qu.:0.7500 3rd Qu.:0.2222 3rd Qu.: 4.000
## Max. :1.0000 Max. :1.5000 Max. :123.000
##
## PURCHASES_TRX CREDIT_LIMIT PAYMENTS MINIMUM_PAYMENTS
## Min. : 0.00 Min. : 50 Min. : 0.0 Min. : 0.02
## 1st Qu.: 1.00 1st Qu.: 1600 1st Qu.: 383.3 1st Qu.: 170.86
## Median : 7.00 Median : 3000 Median : 856.9 Median : 312.34
## Mean : 14.71 Mean : 4494 Mean : 1733.1 Mean : 844.91
## 3rd Qu.: 17.00 3rd Qu.: 6500 3rd Qu.: 1901.1 3rd Qu.: 788.71
## Max. :358.00 Max. :30000 Max. :50721.5 Max. :76406.21
##
## PRC_FULL_PAYMENT TENURE
## Min. :0.0000 Min. : 6.00
## 1st Qu.:0.0000 1st Qu.:12.00
## Median :0.0000 Median :12.00
## Mean :0.1537 Mean :11.52
## 3rd Qu.:0.1429 3rd Qu.:12.00
## Max. :1.0000 Max. :12.00
##
Now, we have done missing value imputation. We can see that we don’t require CUST_ID in our analysis since it doesn’t contribute much in our model. So, let’s remove it.
creditCardData <- creditCardData[, -c(1)]
head(creditCardData)
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1 40.90075 0.818182 95.40 0.00
## 2 3202.46742 0.909091 0.00 0.00
## 3 2495.14886 1.000000 773.17 773.17
## 4 1666.67054 0.636364 1499.00 1499.00
## 5 817.71434 1.000000 16.00 16.00
## 6 1809.82875 1.000000 1333.28 0.00
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1 95.40 0.000 0.166667
## 2 0.00 6442.945 0.000000
## 3 0.00 0.000 1.000000
## 4 0.00 205.788 0.083333
## 5 0.00 0.000 0.083333
## 6 1333.28 0.000 0.666667
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1 0.000000 0.083333
## 2 0.000000 0.000000
## 3 1.000000 0.000000
## 4 0.083333 0.000000
## 5 0.083333 0.000000
## 6 0.000000 0.583333
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1 0.000000 0 2 1000
## 2 0.250000 4 0 7000
## 3 0.000000 0 12 7500
## 4 0.083333 1 1 7500
## 5 0.000000 0 1 1200
## 6 0.000000 0 8 1800
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 1 201.8021 139.5098 0.000000 12
## 2 4103.0326 1072.3402 0.222222 12
## 3 622.0667 627.2848 0.000000 12
## 4 0.0000 312.3439 0.000000 12
## 5 678.3348 244.7912 0.000000 12
## 6 1400.0578 2407.2460 0.000000 12
Now, that we have done this, let’s do the outlier handling for this dataset.I will be removing those data-points which show dramatic increase or decrease as compared to other data-points. Mostly it will be on 1-2% or 98-99%tile level. It will serve two benefits: One being that this will not lead to too much of data loss. Two, the quality and accuracy of our clustering will increase. Outliers not only induce multi-collinearity, they will also disrupt the accuracy of our clustering by distorting the Euclidean distances. So, let’s move ahead.
box1 = boxplot(creditCardData$BALANCE)
As can be seen, there are lot of outliers in this variable.
quantile(creditCardData$BALANCE, seq(0, 1, 0.02))
## 0% 2% 4% 6% 8%
## 0.000000 2.140114 6.791763 11.735999 17.201403
## 10% 12% 14% 16% 18%
## 23.575529 31.452780 40.225116 49.857097 62.077146
## 20% 22% 24% 26% 28%
## 77.238026 94.013827 116.139987 141.418853 169.842053
## 30% 32% 34% 36% 38%
## 207.176552 245.192376 293.739058 348.041747 407.315239
## 40% 42% 44% 46% 48%
## 467.021989 535.938751 624.680672 709.474515 801.696530
## 50% 52% 54% 56% 58%
## 873.385231 946.573749 1016.507930 1080.748668 1137.698757
## 60% 62% 64% 66% 68%
## 1207.815587 1299.268313 1389.589016 1478.072900 1585.909487
## 70% 72% 74% 76% 78%
## 1698.588855 1827.325615 1967.656363 2142.242732 2362.829739
## 80% 82% 84% 86% 88%
## 2571.434263 2773.562692 3019.835517 3363.675352 3804.151958
## 90% 92% 94% 96% 98%
## 4338.563657 4881.429707 5544.159539 6460.903714 7969.618588
## 100%
## 19043.138560
box1$stats
## [,1]
## [1,] 0.0000
## [2,] 128.2540
## [3,] 873.3852
## [4,] 2054.3728
## [5,] 4940.1139
There is a jump above 98th percentile.
creditCardData$BALANCE = ifelse(creditCardData$BALANCE > 7969, 7969, creditCardData$BALANCE)
boxplot(creditCardData$BALANCE)
The plot is quite better than before.Let’s do the same for other variables.
box2 <- boxplot(creditCardData$PURCHASES)
box2$stats
## [,1]
## [1,] 0.00
## [2,] 39.58
## [3,] 361.28
## [4,] 1110.17
## [5,] 2711.90
quantile(creditCardData$PURCHASES, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 6% 7% 8% 9% 10% 11%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 12% 13% 14% 15% 16% 17%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 18% 19% 20% 21% 22% 23%
## 0.0000 0.0000 0.0000 0.0000 0.0000 6.9972
## 24% 25% 26% 27% 28% 29%
## 25.6444 39.6350 49.9566 59.6999 68.6952 79.0000
## 30% 31% 32% 33% 34% 35%
## 89.2850 99.7538 110.4680 120.8302 134.9360 148.3305
## 36% 37% 38% 39% 40% 41%
## 159.0448 172.0533 184.9696 200.0000 212.8500 227.6544
## 42% 43% 44% 45% 46% 47%
## 240.3860 254.8661 267.5696 282.3965 298.3566 313.9233
## 48% 49% 50% 51% 52% 53%
## 329.4268 345.6633 361.2800 380.0000 398.6400 419.0961
## 54% 55% 56% 57% 58% 59%
## 435.8488 453.2340 472.1912 494.9790 516.2268 539.8503
## 60% 61% 62% 63% 64% 65%
## 557.5460 584.1632 611.4620 639.1305 673.1180 704.0290
## 66% 67% 68% 69% 70% 71%
## 740.3106 779.6405 807.8096 848.2106 894.3160 933.7895
## 72% 73% 74% 75% 76% 77%
## 975.4060 1016.3939 1060.1300 1110.1300 1168.8100 1218.2049
## 78% 79% 80% 81% 82% 83%
## 1282.8246 1343.5166 1422.4380 1490.7128 1569.9518 1664.6431
## 84% 85% 86% 87% 88% 89%
## 1764.1376 1859.1160 1968.0532 2093.4020 2234.1520 2385.8153
## 90% 91% 92% 93% 94% 95%
## 2542.6240 2721.8562 2967.4824 3222.5813 3589.2156 3998.6195
## 96% 97% 98% 99% 100%
## 4490.7764 5183.4517 6335.7680 8977.2900 49039.5700
creditCardData$PURCHASES = ifelse(creditCardData$PURCHASES > 6336, 6336, creditCardData$PURCHASES)
boxplot(creditCardData$PURCHASES)
box3 <- boxplot(creditCardData$ONEOFF_PURCHASES)
box3$stats
## [,1]
## [1,] 0.00
## [2,] 0.00
## [3,] 38.00
## [4,] 577.83
## [5,] 1443.33
quantile(creditCardData$ONEOFF_PURCHASES, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 6% 7% 8% 9% 10% 11%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 12% 13% 14% 15% 16% 17%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 18% 19% 20% 21% 22% 23%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 24% 25% 26% 27% 28% 29%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 30% 31% 32% 33% 34% 35%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 36% 37% 38% 39% 40% 41%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 42% 43% 44% 45% 46% 47%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 48% 49% 50% 51% 52% 53%
## 0.0000 25.0000 38.0000 45.6500 57.0000 67.8594
## 54% 55% 56% 57% 58% 59%
## 79.0000 90.6905 105.0000 117.3137 134.1582 150.2957
## 60% 61% 62% 63% 64% 65%
## 168.3780 185.0000 204.7210 221.8101 242.6084 264.3005
## 66% 67% 68% 69% 70% 71%
## 290.0000 315.2490 339.8296 366.7169 400.1550 434.5543
## 72% 73% 74% 75% 76% 77%
## 463.0848 497.7885 537.3756 577.4050 626.0096 664.0365
## 78% 79% 80% 81% 82% 83%
## 712.7218 769.3897 816.9920 869.1290 930.0354 999.6964
## 84% 85% 86% 87% 88% 89%
## 1048.4432 1116.4075 1200.0000 1283.5115 1372.9396 1480.7132
## 90% 91% 92% 93% 94% 95%
## 1600.0990 1751.6232 1943.3240 2127.5609 2385.1696 2671.0940
## 96% 97% 98% 99% 100%
## 3076.2372 3609.9608 4432.5868 6689.8982 40761.2500
creditCardData$ONEOFF_PURCHASES = ifelse(creditCardData$ONEOFF_PURCHASES > 4433, 4433, creditCardData$ONEOFF_PURCHASES)
boxplot(creditCardData$ONEOFF_PURCHASES)
box4 <- boxplot(creditCardData$INSTALLMENTS_PURCHASES)
box4$stats
## [,1]
## [1,] 0.00
## [2,] 0.00
## [3,] 89.00
## [4,] 468.65
## [5,] 1170.49
quantile(creditCardData$INSTALLMENTS_PURCHASES, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 6% 7% 8% 9% 10% 11%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 12% 13% 14% 15% 16% 17%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 18% 19% 20% 21% 22% 23%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 24% 25% 26% 27% 28% 29%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 30% 31% 32% 33% 34% 35%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 36% 37% 38% 39% 40% 41%
## 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 42% 43% 44% 45% 46% 47%
## 0.0000 0.0000 13.0336 31.9260 46.7940 57.0000
## 48% 49% 50% 51% 52% 53%
## 68.7260 78.8218 89.0000 99.2395 110.0348 122.5776
## 54% 55% 56% 57% 58% 59%
## 133.7546 145.0095 158.4944 170.9916 183.9000 200.0000
## 60% 61% 62% 63% 64% 65%
## 213.9500 228.1746 239.7328 252.0000 267.1224 284.1105
## 66% 67% 68% 69% 70% 71%
## 299.8138 315.0166 332.5000 350.0000 371.1390 389.8508
## 72% 73% 74% 75% 76% 77%
## 411.3204 427.9954 449.2104 468.6375 494.1984 521.1274
## 78% 79% 80% 81% 82% 83%
## 547.3504 576.8342 603.2920 638.8760 679.9836 726.6076
## 84% 85% 86% 87% 88% 89%
## 775.7280 823.0615 873.6104 934.8400 1006.7812 1071.2159
## 90% 91% 92% 93% 94% 95%
## 1140.0700 1225.5399 1329.3908 1448.2455 1596.8604 1750.0875
## 96% 97% 98% 99% 100%
## 1957.0300 2273.2570 2757.3850 3886.2405 22500.0000
creditCardData$INSTALLMENTS_PURCHASES = ifelse(creditCardData$INSTALLMENTS_PURCHASES > 3886.2, 3886.2, creditCardData$INSTALLMENTS_PURCHASES)
boxplot(creditCardData$INSTALLMENTS_PURCHASES)
box5 <- boxplot(creditCardData$CASH_ADVANCE)
box5$stats
## [,1]
## [1,] 0.000
## [2,] 0.000
## [3,] 0.000
## [4,] 1113.869
## [5,] 2784.295
quantile(creditCardData$CASH_ADVANCE, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 6% 7% 8% 9% 10% 11%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 12% 13% 14% 15% 16% 17%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 18% 19% 20% 21% 22% 23%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 24% 25% 26% 27% 28% 29%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 30% 31% 32% 33% 34% 35%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 36% 37% 38% 39% 40% 41%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 42% 43% 44% 45% 46% 47%
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## 48% 49% 50% 51% 52% 53%
## 0.00000 0.00000 0.00000 0.00000 19.01261 47.21669
## 54% 55% 56% 57% 58% 59%
## 73.29373 94.92347 113.69394 146.64741 175.86786 198.16096
## 60% 61% 62% 63% 64% 65%
## 238.63372 272.66419 302.20833 357.17746 399.65292 451.94552
## 66% 67% 68% 69% 70% 71%
## 490.53359 554.96069 639.73276 722.62786 797.27230 879.50541
## 72% 73% 74% 75% 76% 77%
## 929.70080 975.00317 1052.93473 1113.82114 1183.11998 1286.85883
## 78% 79% 80% 81% 82% 83%
## 1381.66606 1462.14780 1574.93378 1686.09064 1817.81642 1909.20380
## 84% 85% 86% 87% 88% 89%
## 2037.90189 2194.72201 2370.47308 2529.55907 2710.13105 2859.77980
## 90% 91% 92% 93% 94% 95%
## 3065.53456 3319.42513 3584.39104 3895.23291 4232.21233 4647.16912
## 96% 97% 98% 99% 100%
## 5264.20760 6010.90910 7298.60917 9588.16336 47137.21176
creditCardData$CASH_ADVANCE = ifelse(creditCardData$CASH_ADVANCE > 7299, 7299, creditCardData$CASH_ADVANCE)
boxplot(creditCardData$CASH_ADVANCE)
box6 <- boxplot(creditCardData$PURCHASES_FREQUENCY)
box7 <- boxplot(creditCardData$ONEOFF_PURCHASES_FREQUENCY)
box7$stats
## [,1]
## [1,] 0.000000
## [2,] 0.000000
## [3,] 0.083333
## [4,] 0.300000
## [5,] 0.750000
quantile(creditCardData$ONEOFF_PURCHASES_FREQUENCY, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5% 6%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 7% 8% 9% 10% 11% 12% 13%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 14% 15% 16% 17% 18% 19% 20%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 21% 22% 23% 24% 25% 26% 27%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 28% 29% 30% 31% 32% 33% 34%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 35% 36% 37% 38% 39% 40% 41%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 42% 43% 44% 45% 46% 47% 48%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 49% 50% 51% 52% 53% 54% 55%
## 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330
## 56% 57% 58% 59% 60% 61% 62%
## 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0909090 0.1250000
## 63% 64% 65% 66% 67% 68% 69%
## 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670
## 70% 71% 72% 73% 74% 75% 76%
## 0.2222220 0.2500000 0.2500000 0.2500000 0.2500000 0.3000000 0.3333330
## 77% 78% 79% 80% 81% 82% 83%
## 0.3333330 0.3333330 0.3333330 0.4166670 0.4166670 0.4166670 0.5000000
## 84% 85% 86% 87% 88% 89% 90%
## 0.5000000 0.5295457 0.5833330 0.5833330 0.6666670 0.6666670 0.7500000
## 91% 92% 93% 94% 95% 96% 97%
## 0.7500000 0.8333330 0.9166670 0.9166670 1.0000000 1.0000000 1.0000000
## 98% 99% 100%
## 1.0000000 1.0000000 1.0000000
box8 <- boxplot(creditCardData$PURCHASES_INSTALLMENTS_FREQUENCY)
box9 <- boxplot(creditCardData$CASH_ADVANCE_FREQUENCY)
box9$stats
## [,1]
## [1,] 0.000000
## [2,] 0.000000
## [3,] 0.000000
## [4,] 0.222222
## [5,] 0.545455
quantile(creditCardData$CASH_ADVANCE_FREQUENCY, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5% 6%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 7% 8% 9% 10% 11% 12% 13%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 14% 15% 16% 17% 18% 19% 20%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 21% 22% 23% 24% 25% 26% 27%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 28% 29% 30% 31% 32% 33% 34%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 35% 36% 37% 38% 39% 40% 41%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 42% 43% 44% 45% 46% 47% 48%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 49% 50% 51% 52% 53% 54% 55%
## 0.0000000 0.0000000 0.0000000 0.0833330 0.0833330 0.0833330 0.0833330
## 56% 57% 58% 59% 60% 61% 62%
## 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330
## 63% 64% 65% 66% 67% 68% 69%
## 0.0833330 0.1000000 0.1250000 0.1666670 0.1666670 0.1666670 0.1666670
## 70% 71% 72% 73% 74% 75% 76%
## 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 0.2222220 0.2500000
## 77% 78% 79% 80% 81% 82% 83%
## 0.2500000 0.2500000 0.2500000 0.2500000 0.2500000 0.2750647 0.3333330
## 84% 85% 86% 87% 88% 89% 90%
## 0.3333330 0.3333330 0.3333330 0.3333330 0.4000000 0.4166670 0.4166670
## 91% 92% 93% 94% 95% 96% 97%
## 0.4166670 0.5000000 0.5000000 0.5000000 0.5833330 0.6000000 0.6666670
## 98% 99% 100%
## 0.7500000 0.8333330 1.5000000
creditCardData$CASH_ADVANCE_FREQUENCY = ifelse(creditCardData$CASH_ADVANCE_FREQUENCY > 0.833, 0.833, creditCardData$CASH_ADVANCE_FREQUENCY)
boxplot(creditCardData$CASH_ADVANCE_FREQUENCY)
box10 <- boxplot(creditCardData$CASH_ADVANCE_TRX)
box10$stats
## [,1]
## [1,] 0
## [2,] 0
## [3,] 0
## [4,] 4
## [5,] 10
## attr(,"class")
## 1
## "integer"
quantile(creditCardData$CASH_ADVANCE_TRX, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14%
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 15% 16% 17% 18% 19% 20% 21% 22% 23% 24% 25% 26% 27% 28% 29%
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 30% 31% 32% 33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43% 44%
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 45% 46% 47% 48% 49% 50% 51% 52% 53% 54% 55% 56% 57% 58% 59%
## 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
## 60% 61% 62% 63% 64% 65% 66% 67% 68% 69% 70% 71% 72% 73% 74%
## 1 1 2 2 2 2 2 2 2 3 3 3 3 3 4
## 75% 76% 77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87% 88% 89%
## 4 4 4 5 5 5 5 6 6 7 7 7 8 8 9
## 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100%
## 10 10 11 12 13 15 17 19 23 29 123
creditCardData$CASH_ADVANCE_TRX = ifelse(creditCardData$CASH_ADVANCE_TRX > 29, 29, creditCardData$CASH_ADVANCE_TRX)
boxplot(creditCardData$CASH_ADVANCE_TRX)
box11 <- boxplot(creditCardData$PURCHASES_TRX)
box11$stats
## [,1]
## [1,] 0
## [2,] 1
## [3,] 7
## [4,] 17
## [5,] 41
## attr(,"class")
## 1
## "integer"
quantile(creditCardData$PURCHASES_TRX, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5% 6% 7% 8% 9%
## 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 10% 11% 12% 13% 14% 15% 16% 17% 18% 19%
## 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 20% 21% 22% 23% 24% 25% 26% 27% 28% 29%
## 0.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
## 30% 31% 32% 33% 34% 35% 36% 37% 38% 39%
## 1.00 2.00 2.00 2.00 2.00 3.00 3.00 3.00 3.00 4.00
## 40% 41% 42% 43% 44% 45% 46% 47% 48% 49%
## 4.00 4.00 5.00 5.00 5.00 6.00 6.00 6.00 6.00 7.00
## 50% 51% 52% 53% 54% 55% 56% 57% 58% 59%
## 7.00 7.00 8.00 8.00 8.00 9.00 9.00 10.00 10.00 10.00
## 60% 61% 62% 63% 64% 65% 66% 67% 68% 69%
## 11.00 11.00 12.00 12.00 12.00 12.00 12.00 12.00 12.00 13.00
## 70% 71% 72% 73% 74% 75% 76% 77% 78% 79%
## 13.00 14.00 15.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00
## 80% 81% 82% 83% 84% 85% 86% 87% 88% 89%
## 22.00 23.00 24.00 25.00 26.00 27.00 29.00 31.00 33.00 35.00
## 90% 91% 92% 93% 94% 95% 96% 97% 98% 99%
## 37.00 40.00 44.00 47.57 51.00 57.00 65.00 75.00 91.00 116.51
## 100%
## 358.00
creditCardData$PURCHASES_TRX = ifelse(creditCardData$PURCHASES_TRX > 116.51, 116.51, creditCardData$PURCHASES_TRX)
boxplot(creditCardData$PURCHASES_TRX)
box12 <- boxplot(creditCardData$CREDIT_LIMIT)
box12$stats
## [,1]
## [1,] 50
## [2,] 1600
## [3,] 3000
## [4,] 6500
## [5,] 13600
quantile(creditCardData$CREDIT_LIMIT, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5% 6% 7% 8%
## 50.0 500.0 700.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0
## 9% 10% 11% 12% 13% 14% 15% 16% 17%
## 1000.0 1200.0 1200.0 1200.0 1200.0 1200.0 1200.0 1200.0 1500.0
## 18% 19% 20% 21% 22% 23% 24% 25% 26%
## 1500.0 1500.0 1500.0 1500.0 1500.0 1500.0 1500.0 1600.0 1700.0
## 27% 28% 29% 30% 31% 32% 33% 34% 35%
## 1800.0 1800.0 1910.5 2000.0 2000.0 2000.0 2000.0 2250.0 2500.0
## 36% 37% 38% 39% 40% 41% 42% 43% 44%
## 2500.0 2500.0 2500.0 2500.0 2500.0 2500.0 2722.9 3000.0 3000.0
## 45% 46% 47% 48% 49% 50% 51% 52% 53%
## 3000.0 3000.0 3000.0 3000.0 3000.0 3000.0 3000.0 3500.0 3500.0
## 54% 55% 56% 57% 58% 59% 60% 61% 62%
## 3600.0 4000.0 4000.0 4000.0 4000.0 4000.0 4200.0 4500.0 4500.0
## 63% 64% 65% 66% 67% 68% 69% 70% 71%
## 4500.0 5000.0 5000.0 5000.0 5000.0 5500.0 5500.0 6000.0 6000.0
## 72% 73% 74% 75% 76% 77% 78% 79% 80%
## 6000.0 6000.0 6000.0 6500.0 6500.0 6500.0 7000.0 7000.0 7000.0
## 81% 82% 83% 84% 85% 86% 87% 88% 89%
## 7500.0 7500.0 7500.0 8000.0 8000.0 8500.0 8500.0 9000.0 9000.0
## 90% 91% 92% 93% 94% 95% 96% 97% 98%
## 9500.0 10000.0 10000.0 10500.0 11000.0 12000.0 12500.0 13500.0 15000.0
## 99% 100%
## 17000.0 30000.0
creditCardData$CREDIT_LIMIT <- ifelse(creditCardData$CREDIT_LIMIT > 17000, 17000, creditCardData$CREDIT_LIMIT)
boxplot(creditCardData$CREDIT_LIMIT)
box13 <- boxplot(creditCardData$PAYMENTS)
box13$stats
## [,1]
## [1,] 0.0000
## [2,] 383.2739
## [3,] 856.9015
## [4,] 1901.2793
## [5,] 4177.3248
quantile(creditCardData$PAYMENTS, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5%
## 0.00000 0.00000 0.00000 28.37037 65.42608 89.98892
## 6% 7% 8% 9% 10% 11%
## 111.57822 129.99246 149.73493 164.59309 179.61707 192.86103
## 12% 13% 14% 15% 16% 17%
## 205.79355 219.41584 234.64663 250.99708 265.35043 278.25977
## 18% 19% 20% 21% 22% 23%
## 290.11933 301.77971 313.14103 326.66086 340.67787 353.32117
## 24% 25% 26% 27% 28% 29%
## 367.52652 383.27617 398.92790 414.86863 429.80406 445.64342
## 30% 31% 32% 33% 34% 35%
## 459.43829 476.22344 495.62227 510.07714 527.08293 542.52790
## 36% 37% 38% 39% 40% 41%
## 557.85438 573.87988 588.87340 606.22629 624.26820 647.14105
## 42% 43% 44% 45% 46% 47%
## 667.35812 686.93747 706.35388 725.33038 750.53966 774.50228
## 48% 49% 50% 51% 52% 53%
## 801.43110 825.93060 856.90155 880.27188 911.09918 945.45105
## 54% 55% 56% 57% 58% 59%
## 977.17126 1009.60946 1042.09284 1074.91545 1108.41713 1144.79404
## 60% 61% 62% 63% 64% 65%
## 1185.25927 1223.57597 1263.35793 1301.12816 1334.80118 1369.33208
## 66% 67% 68% 69% 70% 71%
## 1403.42117 1450.08786 1493.02346 1550.44711 1604.09211 1654.84212
## 72% 73% 74% 75% 76% 77%
## 1705.43246 1771.32063 1844.18399 1901.13432 1964.88058 2048.25176
## 78% 79% 80% 81% 82% 83%
## 2136.61981 2221.29513 2314.01765 2418.59665 2526.70801 2645.23371
## 84% 85% 86% 87% 88% 89%
## 2794.96094 2945.83913 3110.37818 3304.40757 3517.96080 3726.96906
## 90% 91% 92% 93% 94% 95%
## 3923.90664 4201.80975 4577.01574 4921.93574 5513.52085 6082.09060
## 96% 97% 98% 99% 100%
## 6925.35328 8084.81289 9801.97521 13608.71554 50721.48336
creditCardData$PAYMENTS <- ifelse(creditCardData$PAYMENTS > 13608.71, 13608.71, creditCardData$PAYMENTS)
boxplot(creditCardData$PAYMENTS)
box14 <- boxplot(creditCardData$MINIMUM_PAYMENTS)
box14$stats
## [,1]
## [1,] 0.019163
## [2,] 170.851668
## [3,] 312.343947
## [4,] 788.721609
## [5,] 1712.713459
quantile(creditCardData$MINIMUM_PAYMENTS, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4%
## 0.019163 20.040613 41.456921 55.311014 67.060567
## 5% 6% 7% 8% 9%
## 74.644117 82.488645 87.502150 95.200493 100.792418
## 10% 11% 12% 13% 14%
## 109.131328 115.562817 122.228152 128.162788 132.242811
## 15% 16% 17% 18% 19%
## 136.812654 140.972198 146.025353 149.855005 153.467444
## 20% 21% 22% 23% 24%
## 157.390750 162.137911 164.105812 166.388178 168.884636
## 25% 26% 27% 28% 29%
## 170.857654 173.055263 175.225481 177.147049 178.998434
## 30% 31% 32% 33% 34%
## 181.647737 184.780824 188.063927 190.847662 194.158800
## 35% 36% 37% 38% 39%
## 198.178921 202.843548 207.677942 212.696770 218.808479
## 40% 41% 42% 43% 44%
## 227.692134 234.793544 245.321491 255.207133 263.747395
## 45% 46% 47% 48% 49%
## 274.080603 285.428608 296.042891 309.501766 312.343947
## 50% 51% 52% 53% 54%
## 312.343947 312.343947 315.534359 329.146890 343.698868
## 55% 56% 57% 58% 59%
## 360.662595 374.443645 390.842218 409.404537 422.555350
## 60% 61% 62% 63% 64%
## 438.826129 454.987986 472.002637 489.173657 508.632190
## 65% 66% 67% 68% 69%
## 527.373644 548.872602 572.633982 597.043009 619.197464
## 70% 71% 72% 73% 74%
## 642.820796 669.546327 697.414238 725.149292 756.162286
## 75% 76% 77% 78% 79%
## 788.713501 830.906616 872.112364 915.688453 952.003611
## 80% 81% 82% 83% 84%
## 994.385464 1039.735325 1093.223398 1150.659627 1210.881642
## 85% 86% 87% 88% 89%
## 1280.879553 1339.241966 1429.947802 1507.031600 1615.390262
## 90% 91% 92% 93% 94%
## 1731.689977 1843.344040 1988.427640 2172.527924 2440.151272
## 95% 96% 97% 98% 99%
## 2719.566935 3068.337274 3658.647023 4800.983569 8626.691541
## 100%
## 76406.207520
creditCardData$MINIMUM_PAYMENTS <- ifelse(creditCardData$MINIMUM_PAYMENTS > 8627, 8627, creditCardData$MINIMUM_PAYMENTS)
boxplot(creditCardData$MINIMUM_PAYMENTS)
box15 <- boxplot(creditCardData$PRC_FULL_PAYMENT)
box15$stats
## [,1]
## [1,] 0.000000
## [2,] 0.000000
## [3,] 0.000000
## [4,] 0.142857
## [5,] 0.333333
quantile(creditCardData$PRC_FULL_PAYMENT, seq(0, 1, 0.01))
## 0% 1% 2% 3% 4% 5% 6%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 7% 8% 9% 10% 11% 12% 13%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 14% 15% 16% 17% 18% 19% 20%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 21% 22% 23% 24% 25% 26% 27%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 28% 29% 30% 31% 32% 33% 34%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 35% 36% 37% 38% 39% 40% 41%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 42% 43% 44% 45% 46% 47% 48%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 49% 50% 51% 52% 53% 54% 55%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 56% 57% 58% 59% 60% 61% 62%
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 63% 64% 65% 66% 67% 68% 69%
## 0.0000000 0.0000000 0.0000000 0.0833330 0.0833330 0.0833330 0.0833330
## 70% 71% 72% 73% 74% 75% 76%
## 0.0833330 0.0909090 0.0909090 0.1000000 0.1111110 0.1428570 0.1666670
## 77% 78% 79% 80% 81% 82% 83%
## 0.1666670 0.1818180 0.2222220 0.2500000 0.2727270 0.3000000 0.3333330
## 84% 85% 86% 87% 88% 89% 90%
## 0.3750000 0.4244046 0.5000000 0.5000000 0.5714290 0.6363640 0.6700003
## 91% 92% 93% 94% 95% 96% 97%
## 0.7500000 0.8181820 0.8750000 0.9166670 1.0000000 1.0000000 1.0000000
## 98% 99% 100%
## 1.0000000 1.0000000 1.0000000
box16 <- boxplot(creditCardData$TENURE)
Having done outlier handling for all the variables, let’s now go ahead to do scaling of the variables.
creditCardData_scaled <- scale(creditCardData, center = TRUE, scale = TRUE)
head(creditCardData_scaled)
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## [1,] -0.78204173 -0.2494205 -0.59477303 -0.5349217
## [2,] 0.88877311 0.1343172 -0.66547774 -0.5349217
## [3,] 0.51497162 0.5180549 -0.09245089 0.2845047
## [4,] 0.07713999 -1.0168960 0.44549041 1.0537589
## [5,] -0.37151372 0.5180549 -0.65361951 -0.5179645
## [6,] 0.15279580 0.5180549 0.32266877 -0.5349217
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## [1,] -0.4274896 -0.5535258 -0.8064453
## [2,] -0.5685578 3.3983093 -1.2216898
## [3,] -0.5685578 -0.5535258 1.2697723
## [4,] -0.5685578 -0.4273040 -1.0140688
## [5,] -0.5685578 -0.5535258 -1.0140688
## [6,] 1.4029655 -0.5535258 0.4392858
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## [1,] -0.6786229 -0.7072736
## [2,] -0.6786229 -0.9169440
## [3,] 2.6733017 -0.9169440
## [4,] -0.3992970 -0.9169440
## [5,] -0.3992970 -0.9169440
## [6,] -0.6786229 0.5507533
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## [1,] -0.6853668 -0.5574735 -0.5796820 -0.9779245
## [2,] 0.5931878 0.1666757 -0.6750920 0.7112937
## [3,] -0.6853668 -0.5574735 -0.1026319 0.8520618
## [4,] -0.2591837 -0.3764362 -0.6273870 0.8520618
## [5,] -0.6853668 -0.5574735 -0.6273870 -0.9216172
## [6,] -0.6853668 -0.5574735 -0.2934519 -0.7526954
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## [1,] -0.6429091 -0.49303098 -0.5255216 0.3606594
## [2,] 1.0896863 0.26363463 0.2342138 0.3606594
## [3,] -0.4562632 -0.09737218 -0.5255216 0.3606594
## [4,] -0.7325325 -0.35283651 -0.5255216 0.3606594
## [5,] -0.4312738 -0.40763191 -0.5255216 0.3606594
## [6,] -0.1107457 1.34644378 -0.5255216 0.3606594
Now, first let’s do the cluster analysis using K-means clustering.For that first, we will find out the optimal number of clusters by Calinski criteria using the following code.
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-2
fit <- cascadeKM(creditCardData_scaled, 1, 10, iter = 1000)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 447500)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 447500)
plot(fit, sortg = TRUE, grpmts.plot = TRUE)
optimalNoOfClusters = as.numeric(which.max(fit$results[2,]))
cat("Optimal number of clusters by Calinski criterion is ", optimalNoOfClusters, "\n")
## Optimal number of clusters by Calinski criterion is 3
From the plot and the statement, we can see that the optimal number of clusters is 3. Let’s now go for plotting elbow chart for the same.
#calculate WSS of the data.
wss <- (nrow(creditCardData)-1)*sum(apply(creditCardData, 2, var))
for (i in 2:15) wss[i] <- sum(kmeans(creditCardData, centers = i)$withinss)
plot(1:15, wss, type = "b", xlab = "Number of clusters", ylab = "Within groups sum of squares", col = "red", pch = 2)
From chart we see that the from 2-4, there is significant change in the graph. Also, calinki criterion says that the optimal number of clusters is 3. So, let’s stick with 3.
k <- kmeans(creditCardData_scaled, 3)
k
## K-means clustering with 3 clusters of sizes 5892, 1672, 1386
##
## Cluster means:
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1 -0.3983947 -0.1959587 -0.3212162 -0.2936399
## 2 1.1628069 0.3339297 -0.4133965 -0.3144702
## 3 0.2908576 0.4302005 1.8642171 1.6276481
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1 -0.2178541 -0.3675839 -0.07727405
## 2 -0.3747862 1.5199256 -0.64452579
## 3 1.3782388 -0.2709316 1.10602152
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1 -0.2476700 -0.0603146
## 2 -0.3308258 -0.5504815
## 3 1.4519568 0.9204753
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1 -0.3469807 -0.3504420 -0.2780927 -0.3436813
## 2 1.5063303 1.4574325 -0.4281501 0.5601988
## 3 -0.3421170 -0.2684147 1.6986935 0.7852223
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 1 -0.3467525 -0.2301229 0.01017108 -0.0366549
## 2 0.4857288 0.6258468 -0.40848594 -0.1032120
## 3 0.8881150 0.2232815 0.44953860 0.2803328
##
## Clustering vector:
## [1] 1 2 1 1 1 1 3 1 1 1 1 1 3 3 1 2 1 1 1 1 1 3 1 3 2 1 1 1 2 1 3 1 2 1
## [35] 2 1 2 3 2 2 1 1 1 1 3 1 1 2 3 1 2 3 1 1 1 1 2 3 1 2 1 1 2 1 3 1 1 1
## [69] 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 3 3 2 2 1 2 3 2 1 1 1 1 1 2 1 1 1 1
## [103] 3 1 1 2 1 2 1 3 1 1 3 2 1 1 1 2 1 1 3 1 3 1 2 3 1 1 2 1 1 3 1 2 1 1
## [137] 3 1 3 1 1 1 2 3 3 1 1 1 1 1 3 3 1 3 3 1 3 1 3 1 2 1 1 1 1 2 1 3 1 1
## [171] 2 3 1 2 3 1 1 2 3 1 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 1 2 1 1 1 2
## [205] 3 1 2 2 2 1 2 1 1 1 1 1 1 1 2 3 3 3 1 3 3 1 3 3 1 3 1 3 1 1 3 1 2 1
## [239] 1 1 2 1 3 1 3 1 3 1 2 3 2 1 1 3 2 1 1 3 3 1 1 3 3 3 1 3 3 1 1 3 3 2
## [273] 3 3 2 1 1 1 1 3 3 1 1 3 1 2 1 1 1 2 3 2 1 2 1 3 2 2 1 1 2 3 2 1 2 3
## [307] 3 1 1 1 1 3 3 1 2 1 2 1 1 1 2 2 2 1 2 1 3 1 1 3 1 1 3 3 2 2 1 3 1 1
## [341] 1 2 1 1 3 1 1 1 2 1 1 3 2 1 1 3 3 2 1 1 1 1 1 3 1 1 1 3 2 2 3 2 1 3
## [375] 1 1 3 1 1 1 2 1 2 1 3 1 1 2 1 1 1 1 1 3 1 1 1 1 2 1 1 1 2 3 2 1 1 1
## [409] 2 1 1 3 1 2 1 3 2 3 1 2 1 1 2 3 2 1 1 1 2 2 2 2 1 2 1 2 1 3 1 2 1 2
## [443] 1 1 3 3 3 1 1 1 1 3 3 2 1 1 1 1 1 3 3 3 1 1 1 2 3 1 3 1 2 2 3 1 1 2
## [477] 1 3 1 1 1 3 2 1 3 3 2 3 2 3 3 2 3 1 1 2 1 2 1 1 1 3 1 1 1 1 1 3 1 3
## [511] 1 3 2 2 3 1 2 2 1 1 2 3 1 1 3 1 2 2 1 1 1 3 1 1 3 1 3 2 1 2 1 1 2 1
## [545] 3 3 3 1 3 1 3 3 2 3 2 1 2 2 2 1 3 1 1 3 1 3 1 3 1 2 1 2 1 1 3 1 2 2
## [579] 1 1 2 3 2 2 3 3 1 3 1 2 3 3 1 2 1 1 1 3 3 1 2 1 2 1 1 1 2 1 2 3 2 1
## [613] 3 3 1 2 3 1 1 3 3 1 3 1 3 1 1 1 3 3 1 1 1 2 2 1 1 2 1 3 3 2 3 3 3 3
## [647] 2 3 2 2 1 3 3 1 1 3 3 1 3 1 1 3 3 1 3 1 1 1 3 3 1 2 3 1 2 3 2 3 1 2
## [681] 1 2 1 1 2 1 1 1 1 3 3 1 1 3 1 1 2 1 3 1 2 1 2 1 3 1 1 2 3 3 2 1 3 1
## [715] 1 2 2 1 1 1 1 2 1 2 1 3 3 2 3 3 1 1 2 3 3 2 1 1 1 2 1 1 2 2 1 1 1 1
## [749] 1 3 3 1 2 2 2 2 2 2 3 3 2 1 2 2 1 1 1 1 2 3 1 3 1 1 1 1 3 3 1 3 2 1
## [783] 3 3 1 1 2 1 2 2 1 1 2 1 1 1 1 1 1 1 2 3 1 2 1 1 3 3 1 1 2 1 2 1 1 1
## [817] 3 1 1 2 1 1 1 1 3 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 3 2 2 2 3 2 1 1 2 1
## [851] 1 2 1 2 3 1 3 3 1 1 1 3 1 1 1 2 2 2 1 1 2 3 1 2 3 1 1 1 2 1 2 1 2 2
## [885] 3 1 3 3 1 3 1 3 1 2 3 3 1 1 2 1 1 1 2 2 1 1 1 1 3 2 2 1 1 1 1 1 3 3
## [919] 1 3 1 1 2 1 2 1 3 1 2 1 1 1 1 2 1 1 1 1 3 2 1 3 2 2 1 2 1 1 1 1 1 1
## [953] 3 1 2 1 1 1 1 1 3 2 2 1 1 1 1 3 1 2 3 3 1 2 1 3 3 2 1 1 1 3 3 1 2 3
## [987] 1 2 3 2 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 3 1 2 1 1 1 1 2 1 3
## [1021] 1 1 1 1 1 1 1 2 1 2 2 3 2 2 1 2 1 2 1 1 1 2 2 1 1 1 1 3 1 1 2 1 1 3
## [1055] 1 1 1 1 3 3 1 3 1 2 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 3 1 2 3
## [1089] 2 2 1 2 1 1 1 1 2 1 3 1 2 1 1 3 2 1 1 1 2 2 2 1 1 1 2 1 1 1 3 2 2 1
## [1123] 1 1 1 3 3 1 1 1 2 1 1 2 1 1 1 3 1 1 1 1 1 2 1 2 1 2 1 3 3 3 1 1 1 1
## [1157] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 2 3 1 1 1 2 2 2 3 1 1 1 1 1 1
## [1191] 3 1 1 3 1 2 1 1 1 3 1 1 3 1 1 1 1 1 2 1 1 2 3 3 1 1 3 2 3 1 1 1 2 1
## [1225] 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 1 3 3 3 1 2 1 1 3 3 2 1 1 1 1 1 3 3 1
## [1259] 1 2 1 1 2 3 2 1 1 1 3 3 1 3 1 1 1 3 1 3 1 1 1 1 2 1 1 1 3 3 3 1 3 1
## [1293] 3 1 1 1 1 1 1 1 2 1 3 1 3 3 1 2 3 1 1 3 1 1 3 2 1 1 2 2 1 1 3 2 1 1
## [1327] 1 1 2 1 2 3 2 2 1 2 2 1 3 1 3 2 3 3 1 2 1 1 2 3 2 3 1 1 1 1 1 3 3 2
## [1361] 3 3 1 1 1 2 1 2 3 3 1 1 1 3 3 1 1 1 2 1 1 1 3 1 3 1 1 2 1 2 1 1 1 1
## [1395] 1 1 1 1 1 3 2 1 1 3 1 1 3 2 1 1 2 3 3 1 1 1 3 1 1 1 3 1 3 3 2 1 1 3
## [1429] 1 1 1 1 3 3 3 2 1 1 1 3 1 1 2 1 3 1 1 3 1 2 2 2 1 2 3 1 1 1 1 1 3 2
## [1463] 2 3 2 3 3 1 1 1 1 3 1 1 3 2 1 3 3 3 1 2 3 1 3 1 2 2 1 2 3 2 1 2 1 1
## [1497] 1 3 3 1 1 1 1 3 2 2 1 1 1 1 3 1 1 2 3 1 1 2 2 3 1 1 3 3 3 2 1 3 1 1
## [1531] 1 1 3 3 1 1 2 1 1 1 1 1 1 1 1 3 1 2 1 3 1 2 1 1 1 1 1 1 2 3 3 1 2 2
## [1565] 1 1 1 3 1 3 1 1 3 1 1 3 1 2 2 1 1 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1
## [1599] 1 1 2 1 1 3 3 3 1 1 2 3 3 1 2 3 1 3 3 3 3 1 1 1 1 1 3 1 2 1 1 2 1 2
## [1633] 1 2 1 1 2 1 3 3 1 3 1 1 1 1 2 3 1 3 3 1 2 3 1 2 3 2 1 1 2 3 1 3 1 1
## [1667] 1 3 1 1 1 1 1 1 3 2 2 1 3 1 2 1 2 2 1 1 2 1 2 2 2 1 1 1 1 1 1 3 1 1
## [1701] 2 2 3 1 3 1 2 3 1 3 2 1 2 1 3 1 3 3 2 3 3 3 1 1 1 1 1 3 3 3 1 1 1 1
## [1735] 1 2 3 1 3 3 2 3 3 1 1 1 1 3 1 1 1 3 1 3 3 3 3 1 1 3 2 1 1 3 3 3 1 3
## [1769] 1 2 1 3 3 1 2 3 1 2 2 1 1 2 3 1 3 1 1 2 1 1 1 3 1 1 2 2 1 3 1 2 1 2
## [1803] 1 1 1 1 1 1 1 1 2 2 1 1 3 1 1 3 1 1 1 1 1 1 3 1 3 3 1 2 1 2 2 1 1 2
## [1837] 3 1 1 1 1 2 3 1 3 3 1 1 1 1 1 3 1 1 2 1 2 3 2 1 1 3 1 1 2 1 1 3 1 1
## [1871] 3 3 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 3 2 1 1 1 1 1 1 2 3 3
## [1905] 2 2 3 2 1 2 1 1 3 2 1 3 1 2 1 2 2 1 1 2 2 2 1 1 1 1 2 1 1 2 2 1 2 2
## [1939] 1 1 1 1 1 3 2 1 1 1 1 1 2 2 1 3 3 1 1 2 1 3 1 3 2 1 1 1 1 1 1 1 1 2
## [1973] 1 1 2 1 1 1 1 3 1 1 1 2 2 1 3 1 2 1 1 3 1 1 1 1 1 3 1 1 2 2 1 2 1 1
## [2007] 1 1 2 1 1 1 1 1 2 3 3 1 3 2 2 1 1 3 3 3 1 1 1 3 1 1 1 1 2 1 1 1 2 2
## [2041] 3 1 1 2 2 2 3 3 2 1 2 3 1 1 3 3 1 1 2 3 1 1 2 1 1 2 1 3 3 1 2 3 2 1
## [2075] 3 2 1 3 1 2 2 3 3 1 3 1 2 2 3 2 2 1 3 1 3 3 1 1 1 1 1 2 1 1 1 1 2 3
## [2109] 2 3 1 1 1 1 1 3 3 1 1 2 2 1 1 1 1 3 1 1 1 3 1 2 3 1 1 1 3 1 3 1 3 1
## [2143] 1 3 1 1 3 1 2 1 1 1 3 1 1 1 1 3 3 2 1 1 3 1 3 1 2 1 1 1 1 2 1 1 2 1
## [2177] 1 3 1 2 1 2 3 1 1 1 1 1 3 2 2 2 1 1 1 1 2 1 2 1 2 1 1 3 3 3 2 1 3 1
## [2211] 2 1 2 1 1 2 1 3 3 1 3 1 1 1 1 3 2 1 1 1 1 1 3 1 1 3 1 1 1 1 2 3 1 1
## [2245] 1 1 2 1 2 2 1 3 1 1 1 1 1 3 2 3 1 3 1 2 1 3 2 2 1 1 1 2 1 1 1 1 3 1
## [2279] 2 1 1 1 1 1 1 1 3 1 2 1 1 3 1 1 1 1 3 1 2 1 1 3 2 1 1 1 2 1 1 3 3 3
## [2313] 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 3 3 1 3 1 3 2 1 3 2 1 2 2 2 3
## [2347] 1 1 2 2 1 1 2 1 1 3 1 1 2 3 3 2 3 3 1 1 2 1 1 1 1 1 1 3 2 1 1 1 1 1
## [2381] 1 1 3 2 3 2 1 2 1 2 1 3 2 1 1 3 1 2 3 2 3 1 1 2 1 1 3 1 2 1 1 3 1 2
## [2415] 1 1 3 1 1 1 3 3 2 2 1 1 1 2 1 1 1 3 2 1 1 1 1 2 1 2 3 2 2 3 3 1 2 1
## [2449] 2 1 1 1 1 3 2 1 1 3 1 1 1 1 1 2 3 2 1 2 3 2 2 3 1 3 2 2 1 3 1 1 1 1
## [2483] 1 1 1 2 1 2 2 2 2 1 1 1 1 2 1 1 3 3 3 1 1 3 1 1 3 1 1 1 1 1 1 2 1 3
## [2517] 2 1 1 2 1 1 3 3 1 1 1 3 1 2 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1
## [2551] 3 3 1 1 2 2 1 1 1 1 3 2 1 1 1 1 1 3 3 1 1 2 1 3 2 1 2 1 2 2 1 2 1 1
## [2585] 1 2 3 1 1 1 1 3 1 2 2 1 2 1 2 2 1 1 1 1 1 2 2 2 2 1 1 1 2 3 1 1 1 1
## [2619] 1 2 1 1 1 1 1 2 1 1 1 3 1 2 2 2 1 3 1 1 1 2 1 3 2 2 1 2 1 1 1 2 1 1
## [2653] 2 1 1 1 3 2 1 1 2 1 3 1 3 1 1 1 1 2 1 3 2 2 1 1 2 2 2 1 2 2 2 1 2 2
## [2687] 1 3 3 1 2 1 2 2 1 2 2 3 3 2 1 1 1 1 1 1 2 1 1 3 1 1 3 1 1 1 1 3 1 1
## [2721] 1 1 1 3 1 1 2 2 3 2 2 1 1 1 1 1 1 2 1 1 1 2 2 3 1 3 2 1 1 1 3 1 1 1
## [2755] 1 2 2 3 1 3 3 3 2 2 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 1 1 3 3 1 1 1
## [2789] 1 1 1 1 2 3 1 1 2 1 1 1 3 3 1 3 1 1 3 1 1 1 1 3 1 1 1 2 3 1 1 1 2 1
## [2823] 3 3 1 1 1 1 2 2 2 2 1 3 3 1 2 3 1 2 2 1 1 3 1 1 2 3 1 1 1 1 2 1 1 1
## [2857] 1 1 1 2 1 1 2 1 1 2 1 2 3 2 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 2
## [2891] 1 1 1 1 2 1 1 3 2 1 1 1 1 3 2 1 1 1 1 1 1 1 1 3 1 2 1 1 2 1 1 1 2 2
## [2925] 1 1 2 3 2 1 1 3 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 3 1 1 2 1 1 1
## [2959] 1 1 2 3 2 1 2 2 2 1 1 1 3 1 1 2 1 1 3 1 3 2 1 1 1 1 1 3 1 1 3 2 1 1
## [2993] 2 1 1 2 1 1 1 1 1 2 1 1 1 1 3 3 1 1 3 1 1 3 2 1 1 1 1 1 3 1 3 3 2 1
## [3027] 1 2 1 1 3 1 2 1 1 2 2 1 2 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 3 1 1 1 3
## [3061] 1 1 2 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 2 1 1 1 3
## [3095] 1 2 2 1 1 2 1 1 2 2 1 3 2 2 1 2 1 3 1 1 1 2 1 1 3 1 2 1 1 3 1 2 1 2
## [3129] 1 1 1 1 1 1 1 2 1 1 1 2 3 2 1 1 1 3 3 1 1 1 1 3 2 2 1 2 3 1 2 2 3 1
## [3163] 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 1 2 1 3 3 1 1 1 3 1 1 1 1 2 1 2 1 1 3
## [3197] 1 2 2 1 3 1 1 3 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 2 2 1 1 3 1 1 1 3 1
## [3231] 1 1 2 1 3 1 1 1 3 1 3 3 1 2 1 1 1 1 1 1 1 3 1 2 1 1 1 1 1 2 3 2 2 1
## [3265] 2 2 1 3 1 2 3 2 1 2 2 3 1 2 1 1 2 1 3 2 1 1 1 3 1 1 1 2 3 1 1 1 1 1
## [3299] 2 1 2 1 2 1 3 1 1 1 2 1 1 1 1 1 3 3 1 1 1 1 3 2 1 1 3 3 1 1 1 1 1 2
## [3333] 1 1 1 3 3 1 1 2 2 1 1 2 1 1 2 3 2 1 3 2 3 1 1 1 3 1 3 2 2 1 1 1 2 1
## [3367] 2 1 1 1 1 1 1 2 1 1 1 1 1 3 1 1 3 1 1 1 1 3 3 1 2 1 1 2 1 1 1 2 3 1
## [3401] 2 1 1 1 3 1 1 1 3 2 1 3 2 2 2 1 1 1 1 1 2 1 2 3 1 1 1 3 1 1 1 1 1 1
## [3435] 1 3 1 1 1 1 2 1 1 1 1 1 3 1 2 1 1 1 3 1 3 1 1 1 2 1 3 2 1 1 1 1 1 3
## [3469] 1 1 2 1 1 1 1 3 3 1 1 3 1 2 1 2 2 3 1 1 1 1 2 3 3 1 1 3 2 2 1 3 2 2
## [3503] 1 3 1 1 1 2 1 1 3 1 1 3 2 1 2 1 1 2 3 3 3 1 1 1 1 3 2 2 3 3 2 2 1 2
## [3537] 2 3 1 2 3 1 1 1 3 2 1 2 2 1 1 1 2 3 1 2 2 1 1 1 2 1 2 1 1 1 1 1 1 3
## [3571] 2 2 1 1 2 1 3 1 1 1 1 1 1 1 3 3 1 1 2 1 1 2 1 2 1 1 1 3 1 1 1 3 1 1
## [3605] 2 1 1 3 1 3 2 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 3 1 3 2 1 3 3
## [3639] 3 1 1 2 1 3 2 2 1 2 1 2 3 1 3 1 1 1 2 1 1 3 1 1 1 3 2 1 2 1 1 3 1 3
## [3673] 1 1 1 1 3 2 1 1 3 2 3 1 1 1 1 3 3 1 1 3 3 1 1 3 2 1 3 1 2 1 3 1 1 3
## [3707] 1 3 3 2 1 3 2 1 1 1 2 1 1 3 2 2 2 1 1 3 1 1 1 3 3 1 1 3 3 2 2 2 3 1
## [3741] 1 1 3 1 3 1 1 1 1 2 1 2 1 2 1 1 1 3 2 1 2 1 2 1 1 1 2 2 3 1 3 2 1 1
## [3775] 1 2 2 1 3 1 1 2 1 3 1 3 1 1 3 1 2 3 1 1 3 3 3 1 1 1 3 1 2 3 2 1 2 3
## [3809] 1 1 1 2 1 1 2 1 1 1 2 1 1 2 2 3 1 2 1 1 3 1 3 1 1 1 1 1 3 2 1 1 2 1
## [3843] 1 1 1 1 3 1 2 1 1 3 1 1 1 1 1 3 2 3 2 1 3 1 1 1 1 1 3 1 1 2 1 1 2 3
## [3877] 2 1 3 1 1 1 1 3 1 3 3 3 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 3 2 1 2 1
## [3911] 3 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 3 2 1 3 1 1 1 1 3 1 3 1 1 1 1
## [3945] 3 2 1 1 3 1 1 1 1 1 3 3 3 1 1 1 1 2 1 1 1 1 1 1 1 3 1 3 1 1 1 2 1 1
## [3979] 1 1 1 1 1 1 2 1 1 1 3 2 1 1 1 1 1 1 2 3 1 2 1 1 3 1 3 2 3 3 2 1 3 1
## [4013] 3 3 3 1 2 2 2 3 1 1 2 1 1 3 1 1 2 1 3 1 1 3 1 1 1 1 1 1 2 2 1 2 1 1
## [4047] 2 1 1 1 1 3 1 1 1 1 2 1 1 1 1 2 2 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 2
## [4081] 1 3 1 1 2 1 3 1 1 2 1 2 3 1 1 2 1 2 1 1 2 1 3 2 1 3 1 1 1 3 1 1 3 1
## [4115] 1 1 1 1 1 3 1 1 3 2 3 2 3 1 1 2 1 1 2 3 1 3 1 2 3 2 3 1 1 2 2 1 3 1
## [4149] 2 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 2 2 1 1
## [4183] 1 1 3 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 3 1 3 1 1 1 2 1 3 3 1 2
## [4217] 3 1 1 2 3 1 1 2 1 1 1 1 2 3 2 1 3 1 1 3 2 1 1 2 3 2 1 1 1 1 1 1 1 2
## [4251] 1 1 2 1 1 1 2 1 1 1 3 1 1 2 3 1 3 1 1 1 1 1 3 2 2 1 3 1 1 1 1 1 3 3
## [4285] 3 3 1 3 1 1 1 3 1 1 1 2 3 3 1 3 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 2
## [4319] 1 1 1 1 2 1 1 1 3 1 1 1 2 1 1 3 1 1 1 1 1 3 2 1 1 1 1 1 1 1 2 2 2 2
## [4353] 1 1 1 1 3 1 1 2 3 1 1 2 3 1 1 1 2 1 1 3 1 1 1 1 3 1 3 1 2 1 3 1 1 1
## [4387] 1 3 1 3 1 1 1 1 2 1 2 3 1 1 1 1 1 3 1 1 1 1 1 3 1 2 1 2 1 3 1 1 1 2
## [4421] 1 1 3 1 1 1 2 3 3 1 1 1 1 1 1 1 1 2 1 1 3 1 2 1 2 1 3 1 1 1 1 1 1 1
## [4455] 2 1 3 3 1 1 1 2 3 1 3 1 3 1 1 1 1 1 1 1 2 1 3 2 1 2 3 2 1 1 1 2 1 1
## [4489] 1 2 2 1 1 2 1 1 1 1 1 1 1 3 1 1 1 3 2 1 3 2 1 1 1 1 2 3 1 3 3 1 1 1
## [4523] 1 1 2 1 1 1 1 1 1 3 3 1 1 3 1 1 1 1 3 1 3 1 1 1 1 1 1 1 2 1 3 3 1 1
## [4557] 2 3 2 2 1 1 2 1 2 2 2 1 1 2 2 1 1 1 1 3 1 1 1 2 1 2 2 1 3 2 3 1 3 1
## [4591] 1 1 1 1 1 2 1 2 2 1 1 1 2 1 1 1 1 1 1 2 3 1 1 1 3 1 1 2 3 2 1 1 2 1
## [4625] 1 1 2 1 1 1 3 1 2 1 1 1 1 1 3 3 2 1 2 3 1 3 2 1 1 1 1 2 1 1 1 1 2 3
## [4659] 2 1 2 3 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1
## [4693] 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 3 2 3 2 1 3 1 2 2 1 2 1 1 2 2 2 1 2 1
## [4727] 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 2 2 2 1 2 1 3 1 1 1 2 2 1 3 1 3 3
## [4761] 3 1 1 3 1 1 1 1 3 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1
## [4795] 1 1 2 3 1 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 2 1 1 3 2
## [4829] 3 1 3 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 1 3 1 2 1 1 1 1 2 1 1 1 1 1
## [4863] 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2
## [4897] 1 1 2 1 3 3 1 1 3 3 3 2 1 1 1 3 1 3 1 1 1 3 1 3 3 3 1 1 1 1 1 1 1 1
## [4931] 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 3 1 2 2 1 1 1 1 1 3 2 1 2 1
## [4965] 1 3 2 1 1 1 1 1 2 1 1 1 1 2 1 3 1 3 2 1 3 1 1 1 1 2 1 2 1 1 1 2 1 1
## [4999] 3 1 1 3 1 1 2 3 1 1 1 1 3 2 1 1 1 1 3 1 3 1 1 1 3 3 1 1 1 3 3 1 1 2
## [5033] 2 2 1 2 1 1 2 3 1 1 2 1 3 2 3 1 3 1 1 3 2 2 3 1 1 1 1 2 1 2 1 1 1 3
## [5067] 1 1 2 2 1 1 1 1 1 1 1 1 3 2 1 1 1 2 1 1 3 1 3 3 1 3 1 1 3 1 1 2 1 1
## [5101] 1 2 1 1 1 1 2 1 1 1 1 3 1 1 1 1 2 1 1 1 3 1 1 2 1 1 1 1 3 3 1 1 2 1
## [5135] 2 3 3 3 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 3 1 1 1 1 1 1 2 1 1 2 1
## [5169] 3 1 1 2 1 1 1 3 1 2 2 1 1 2 1 1 1 1 1 1 1 3 1 1 1 2 3 1 1 3 2 1 1 2
## [5203] 1 1 2 1 1 1 3 1 1 1 3 1 2 1 1 3 3 1 2 3 1 1 1 2 3 1 1 3 1 1 1 1 1 1
## [5237] 1 1 1 1 3 1 1 2 1 1 1 1 1 3 1 3 1 1 1 2 1 1 1 1 3 1 1 1 3 1 2 3 3 1
## [5271] 1 2 1 3 2 3 3 1 1 1 3 2 2 1 1 1 3 2 1 1 1 1 1 1 1 2 2 1 3 1 1 1 1 1
## [5305] 1 1 1 1 1 1 3 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 2 1 1 1 3 1 1 1
## [5339] 3 1 3 2 1 1 3 2 1 3 1 2 1 3 2 1 1 1 2 1 2 1 2 1 1 2 1 1 2 1 1 1 2 2
## [5373] 1 1 2 1 1 3 1 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 2 1 3 1 1 3 3 1 1 3 1
## [5407] 1 1 2 3 2 1 1 1 1 1 1 1 3 1 1 3 2 2 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1
## [5441] 2 1 1 1 3 2 2 3 1 1 3 1 2 3 3 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 3 1
## [5475] 1 3 3 1 1 1 1 2 1 2 1 1 2 1 2 2 1 1 1 2 1 3 1 2 1 2 3 1 1 1 2 1 1 2
## [5509] 1 1 2 2 1 1 3 3 3 1 1 1 2 1 2 3 3 3 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1
## [5543] 1 2 3 3 1 3 2 3 1 1 1 2 1 1 1 2 1 1 1 1 1 3 1 1 2 3 2 2 1 1 2 3 1 3
## [5577] 1 1 1 1 2 2 1 1 1 2 3 2 1 1 1 1 1 1 1 1 2 3 2 3 1 1 2 1 1 1 1 2 1 1
## [5611] 2 1 1 1 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 1 1
## [5645] 1 3 2 1 1 3 1 1 1 1 1 1 1 3 1 1 1 2 2 1 1 2 2 2 1 1 1 1 1 3 3 1 1 1
## [5679] 1 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 3 3 1 1 1 1 2 1 2 1 2 2 1 1 1 1 1 2
## [5713] 1 3 1 1 2 2 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 3 1 3 1
## [5747] 1 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 2
## [5781] 2 1 1 1 1 1 2 1 1 2 1 1 1 1 3 1 2 2 1 1 1 2 1 1 3 3 1 3 1 1 1 1 3 1
## [5815] 2 3 1 2 1 1 1 1 1 2 1 1 3 1 1 1 2 3 3 1 1 2 3 1 2 3 2 1 2 1 1 1 1 3
## [5849] 1 1 2 2 3 1 1 1 2 1 1 1 3 3 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3
## [5883] 1 3 1 1 1 2 1 1 3 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 1
## [5917] 1 1 1 2 1 1 1 1 1 1 2 1 1 3 1 1 1 2 2 3 3 3 1 1 1 1 2 1 2 3 1 1 2 1
## [5951] 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 2 1 2 1 3 1 1 1 1 2 1 1 1 3 2 1
## [5985] 1 1 1 1 2 2 2 1 1 2 2 1 1 3 1 2 3 1 1 1 2 1 1 1 1 3 1 1 2 1 1 2 1 1
## [6019] 1 1 1 3 1 1 1 1 1 3 3 1 1 1 3 3 3 1 1 2 3 1 3 3 1 3 1 2 2 2 1 3 3 1
## [6053] 1 2 1 3 1 1 1 1 3 1 1 3 1 3 1 2 1 1 1 1 1 2 2 1 3 1 2 2 1 1 1 1 3 1
## [6087] 1 1 1 1 3 3 1 2 1 1 3 1 1 1 1 1 2 1 1 1 2 1 1 3 1 1 2 3 1 1 2 2 1 2
## [6121] 2 3 1 2 2 1 2 1 3 2 1 3 2 1 3 1 1 1 2 1 2 3 2 2 2 1 2 1 1 1 1 3 1 3
## [6155] 1 3 1 1 2 1 1 2 1 1 1 2 2 2 1 1 1 1 2 1 3 1 3 1 1 1 1 2 1 1 3 1 1 1
## [6189] 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 2 3 1 2 1 1 2 2 1 1 2 3 1 1 2 1 1
## [6223] 1 2 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 3 1 2 1 1 1 2
## [6257] 1 1 3 2 1 1 1 1 3 1 2 1 1 1 1 1 2 3 1 3 1 1 1 3 1 1 2 1 3 2 1 3 2 1
## [6291] 3 1 3 2 2 1 3 3 1 1 1 1 1 1 3 3 1 1 2 1 1 1 1 1 1 1 1 1 2 3 3 1 1 3
## [6325] 2 1 2 1 1 3 2 3 1 2 3 1 3 1 1 1 1 2 1 1 2 1 2 1 1 3 1 2 3 1 1 2 1 1
## [6359] 1 1 2 2 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 2 1 3 1 2 1 1 1 1 1 1 1 3 1
## [6393] 1 2 1 1 1 2 1 1 1 1 1 1 3 2 2 1 2 1 1 2 2 1 1 1 1 2 1 2 1 2 2 3 3 1
## [6427] 3 1 3 3 2 3 1 2 1 3 1 1 3 1 3 3 1 3 2 2 1 1 1 1 1 2 1 3 1 2 1 1 1 3
## [6461] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 3 1 1 1 2 2 1 1 2 1 2 1 1 1
## [6495] 2 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 1 2 1 1 1 1 1 1
## [6529] 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 2 1 1 3 1
## [6563] 1 1 2 1 2 1 2 1 1 1 1 1 1 1 2 3 1 1 1 3 1 1 1 1 2 3 1 2 1 1 1 2 1 1
## [6597] 2 1 2 1 2 2 1 3 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 1 1 2 3 3 1 3 3 1 3
## [6631] 1 1 3 2 1 3 1 1 1 1 3 1 1 1 1 1 1 3 1 1 3 1 1 2 2 1 1 1 1 1 1 1 1 1
## [6665] 1 1 3 2 1 1 3 1 1 1 1 1 2 1 3 2 1 2 2 2 1 1 3 1 1 3 1 1 1 3 1 1 2 1
## [6699] 1 1 1 2 2 1 3 1 1 2 1 3 1 1 1 3 1 1 2 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1
## [6733] 1 1 1 1 1 1 1 1 1 1 2 2 1 3 3 3 1 3 3 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1
## [6767] 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1 3 1 1 1 2 1 1 1 1 1 3 1
## [6801] 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 2 3 1 2 1 1 1 1
## [6835] 1 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 2 1 1 3 1 2 2 1 1 1
## [6869] 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 3 1 1 2 1 2 1
## [6903] 1 1 1 2 3 1 1 2 2 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 1 1
## [6937] 1 1 1 1 1 2 1 2 3 1 1 1 1 2 1 2 1 1 1 1 2 1 1 2 1 1 1 2 2 2 1 1 1 2
## [6971] 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 3 1 3 1 1 2 1 2 1 1 1 1 1 1
## [7005] 1 1 1 1 3 1 1 2 1 3 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 3 1 1 1 1 1 1 1
## [7039] 1 1 2 2 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
## [7073] 1 2 1 1 1 2 3 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1
## [7107] 1 1 1 1 1 3 1 1 1 1 1 1 3 1 1 1 3 1 3 1 1 3 1 1 1 1 2 2 1 1 1 1 1 1
## [7141] 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 2 2 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1
## [7175] 1 3 1 1 2 1 1 2 3 1 3 1 1 1 1 1 1 1 2 2 1 2 1 1 2 2 2 1 2 1 2 1 1 1
## [7209] 1 1 2 2 2 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 2 1 1 2 1 1
## [7243] 1 1 1 2 1 1 1 1 1 1 3 1 2 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 3 1 2 2 1 1
## [7277] 2 3 1 2 1 1 1 2 1 1 3 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1
## [7311] 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 3 1 1 2 1 2 3 3 2 2 1 1 3 1
## [7345] 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 2 1 1 1 1 3 3 1 2 1 1
## [7379] 3 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1
## [7413] 2 1 1 1 1 3 1 3 2 1 1 1 1 1 2 2 2 3 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1
## [7447] 3 1 1 1 1 2 1 1 1 3 1 1 3 1 1 2 1 1 1 1 3 2 1 1 1 1 1 1 2 1 1 1 1 1
## [7481] 1 1 1 2 3 1 1 2 1 1 3 3 1 1 1 1 1 1 3 1 1 1 1 2 1 1 1 1 3 1 1 3 1 3
## [7515] 3 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1
## [7549] 1 1 2 2 1 3 1 2 1 1 1 1 2 1 1 1 1 3 1 1 3 3 1 1 1 1 1 1 1 1 2 1 1 1
## [7583] 1 2 1 1 3 3 1 2 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 3 1 2 1 3 1 1
## [7617] 1 1 2 1 2 2 1 1 1 1 1 1 1 3 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2
## [7651] 1 3 1 1 3 3 2 1 1 1 1 3 1 1 1 1 1 2 2 1 1 1 2 1 1 1 3 2 1 1 3 2 1 2
## [7685] 2 2 1 1 2 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 1 2 1 1 1 1 1 1 1
## [7719] 1 1 1 1 1 2 1 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 1 1 3 1 3 1 2 1 1 1 1 2
## [7753] 1 1 1 1 1 1 1 3 1 3 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 2
## [7787] 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 2 1 1 2 3 1 2 1 3 2 1 3 1 3 3
## [7821] 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 2 1 2 2 2 2 1 3 1 1 1 3 1 1 1 1 3 1
## [7855] 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
## [7889] 1 1 1 1 1 1 2 1 3 2 1 3 1 2 3 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3
## [7923] 1 3 2 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [7957] 1 2 1 1 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1
## [7991] 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1
## [8025] 1 1 3 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1
## [8059] 1 1 1 1 1 1 2 3 1 1 1 1 1 1 2 3 3 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1
## [8093] 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 2 2 1 2 1 1 1
## [8127] 1 2 3 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 2 1 2 2 1 1 2 1 1 1 1 1
## [8161] 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 1 2
## [8195] 1 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 3 1 2 1 1 1 1 1 2 1 1 1 2
## [8229] 1 1 1 1 1 1 1 1 1 2 1 3 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 3 1 2 3 1 1
## [8263] 1 1 1 2 2 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 3 1 1 1
## [8297] 2 1 2 1 1 2 2 1 1 1 1 2 2 2 1 1 1 3 1 2 1 1 2 1 1 1 1 1 1 3 1 1 1 1
## [8331] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## [8365] 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1
## [8399] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
## [8433] 1 1 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 2 3 1 1 2 1 1 1 1 1 1 1 3
## [8467] 1 1 1 3 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 3 1 1 1 1
## [8501] 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
## [8535] 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 2 1 1 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 2
## [8569] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1
## [8603] 1 1 1 1 1 1 1 2 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
## [8637] 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1
## [8671] 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 3 1 1 1 1 2 1 1 1 3 1 1 1 1 1
## [8705] 1 2 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 2 3
## [8739] 1 1 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [8773] 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1
## [8807] 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2
## [8841] 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 3 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1
## [8875] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
## [8909] 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## [8943] 1 1 1 1 1 1 1 1
##
## Within cluster sum of squares by cluster:
## [1] 50873.85 24147.69 27768.07
## (between_SS / total_SS = 32.4 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault"
data <- creditCardData
data$cluster <- k$cluster
head(data)
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1 40.90075 0.818182 95.40 0.00
## 2 3202.46742 0.909091 0.00 0.00
## 3 2495.14886 1.000000 773.17 773.17
## 4 1666.67054 0.636364 1499.00 1499.00
## 5 817.71434 1.000000 16.00 16.00
## 6 1809.82875 1.000000 1333.28 0.00
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1 95.40 0.000 0.166667
## 2 0.00 6442.945 0.000000
## 3 0.00 0.000 1.000000
## 4 0.00 205.788 0.083333
## 5 0.00 0.000 0.083333
## 6 1333.28 0.000 0.666667
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1 0.000000 0.083333
## 2 0.000000 0.000000
## 3 1.000000 0.000000
## 4 0.083333 0.000000
## 5 0.083333 0.000000
## 6 0.000000 0.583333
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1 0.000000 0 2 1000
## 2 0.250000 4 0 7000
## 3 0.000000 0 12 7500
## 4 0.083333 1 1 7500
## 5 0.000000 0 1 1200
## 6 0.000000 0 8 1800
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE cluster
## 1 201.8021 139.5098 0.000000 12 1
## 2 4103.0326 1072.3402 0.222222 12 2
## 3 622.0667 627.2848 0.000000 12 1
## 4 0.0000 312.3439 0.000000 12 1
## 5 678.3348 244.7912 0.000000 12 1
## 6 1400.0578 2407.2460 0.000000 12 1
From aove code, we got the summary of the clusters and its important values.
boxplot(data$BALANCE~data$cluster)
From here, we can see that in cluster 2, we have people whose balance is more on the lower side. In 3, we have more people having balance on higher side. In 1, we have people having balance in the middle range.
boxplot(data$PURCHASES~data$cluster)
In cluster 1, we have purchases amount in the higher range. In cluster 2, it’s more in the middleside and in 3, it’s on the lower side.
boxplot(data$PURCHASES_FREQUENCY~data$cluster)
In the same way, you can check the distribution of all variables on the basis of the cluster they are assigned to.
So guys, with this I conclude this post. Please stay tuned for more such interesting case-study.