Clustering and Its Implementation Using R

Hi MLEnthusiasts! Today, we will implement a case-study involving Credit Card Dataset for Clustering. We will discover customer segments to define marketing strategy. The dataset has been taken from Kaggle and the dataset has the following variables:

– CUST_ID

– BALANCE

– BALANCE_FREQUENCY

– PURCHASES

– ONEOFF_PURCHASES

– INSTALLMENTS_PURCHASES

– CASH_ADVANCE

– PURCHASES_FREQUENCY

– ONEOFF_PURCHASES_FREQUENCY

– CASH_ADVANCE_FREQUENCY

– CASH_ADVANCE_TRX

– PURCHASES_TRX

– CREDIT_LIMIT

– PAYMENTS

– MINIMUM_PAYMETS

– PRC_FULL_PAYMENT

– TENURE

Let’s start our analysis by first importing data in R using read.csv() function and then looking at its variables using View() function and summary() function.

creditCardData <- read.csv("CreditCard.csv")

summary(creditCardData)
##     CUST_ID        BALANCE        BALANCE_FREQUENCY   PURCHASES       
##  C10001 :   1   Min.   :    0.0   Min.   :0.0000    Min.   :    0.00  
##  C10002 :   1   1st Qu.:  128.3   1st Qu.:0.8889    1st Qu.:   39.63  
##  C10003 :   1   Median :  873.4   Median :1.0000    Median :  361.28  
##  C10004 :   1   Mean   : 1564.5   Mean   :0.8773    Mean   : 1003.20  
##  C10005 :   1   3rd Qu.: 2054.1   3rd Qu.:1.0000    3rd Qu.: 1110.13  
##  C10006 :   1   Max.   :19043.1   Max.   :1.0000    Max.   :49039.57  
##  (Other):8944                                                         
##  ONEOFF_PURCHASES  INSTALLMENTS_PURCHASES  CASH_ADVANCE    
##  Min.   :    0.0   Min.   :    0.0        Min.   :    0.0  
##  1st Qu.:    0.0   1st Qu.:    0.0        1st Qu.:    0.0  
##  Median :   38.0   Median :   89.0        Median :    0.0  
##  Mean   :  592.4   Mean   :  411.1        Mean   :  978.9  
##  3rd Qu.:  577.4   3rd Qu.:  468.6        3rd Qu.: 1113.8  
##  Max.   :40761.2   Max.   :22500.0        Max.   :47137.2  
##                                                            
##  PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
##  Min.   :0.00000     Min.   :0.00000           
##  1st Qu.:0.08333     1st Qu.:0.00000           
##  Median :0.50000     Median :0.08333           
##  Mean   :0.49035     Mean   :0.20246           
##  3rd Qu.:0.91667     3rd Qu.:0.30000           
##  Max.   :1.00000     Max.   :1.00000           
##                                                
##  PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX 
##  Min.   :0.0000                   Min.   :0.0000         Min.   :  0.000  
##  1st Qu.:0.0000                   1st Qu.:0.0000         1st Qu.:  0.000  
##  Median :0.1667                   Median :0.0000         Median :  0.000  
##  Mean   :0.3644                   Mean   :0.1351         Mean   :  3.249  
##  3rd Qu.:0.7500                   3rd Qu.:0.2222         3rd Qu.:  4.000  
##  Max.   :1.0000                   Max.   :1.5000         Max.   :123.000  
##                                                                           
##  PURCHASES_TRX     CREDIT_LIMIT      PAYMENTS       MINIMUM_PAYMENTS  
##  Min.   :  0.00   Min.   :   50   Min.   :    0.0   Min.   :    0.02  
##  1st Qu.:  1.00   1st Qu.: 1600   1st Qu.:  383.3   1st Qu.:  169.12  
##  Median :  7.00   Median : 3000   Median :  856.9   Median :  312.34  
##  Mean   : 14.71   Mean   : 4494   Mean   : 1733.1   Mean   :  864.21  
##  3rd Qu.: 17.00   3rd Qu.: 6500   3rd Qu.: 1901.1   3rd Qu.:  825.49  
##  Max.   :358.00   Max.   :30000   Max.   :50721.5   Max.   :76406.21  
##                   NA's   :1                         NA's   :313       
##  PRC_FULL_PAYMENT     TENURE     
##  Min.   :0.0000   Min.   : 6.00  
##  1st Qu.:0.0000   1st Qu.:12.00  
##  Median :0.0000   Median :12.00  
##  Mean   :0.1537   Mean   :11.52  
##  3rd Qu.:0.1429   3rd Qu.:12.00  
##  Max.   :1.0000   Max.   :12.00  
## 

As you can see, there are missing values in the dataset, represented by NAs. So, here, we have to do missing value imputation by looking at the distribution of the variables which have missing values. One of them is CREDIT_LIMIT and other one is MINIMUM_PAYMENTS. We will have a look at the distribution by using the hist() function.

hist(creditCardData$CREDIT_LIMIT)

The data is right skewed. So, here will will replace the missing values by the median of the variable.

creditCardData$CREDIT_LIMIT[is.na(creditCardData$CREDIT_LIMIT)] <- median(creditCardData$CREDIT_LIMIT, na.rm = TRUE)
summary(creditCardData)
##     CUST_ID        BALANCE        BALANCE_FREQUENCY   PURCHASES       
##  C10001 :   1   Min.   :    0.0   Min.   :0.0000    Min.   :    0.00  
##  C10002 :   1   1st Qu.:  128.3   1st Qu.:0.8889    1st Qu.:   39.63  
##  C10003 :   1   Median :  873.4   Median :1.0000    Median :  361.28  
##  C10004 :   1   Mean   : 1564.5   Mean   :0.8773    Mean   : 1003.20  
##  C10005 :   1   3rd Qu.: 2054.1   3rd Qu.:1.0000    3rd Qu.: 1110.13  
##  C10006 :   1   Max.   :19043.1   Max.   :1.0000    Max.   :49039.57  
##  (Other):8944                                                         
##  ONEOFF_PURCHASES  INSTALLMENTS_PURCHASES  CASH_ADVANCE    
##  Min.   :    0.0   Min.   :    0.0        Min.   :    0.0  
##  1st Qu.:    0.0   1st Qu.:    0.0        1st Qu.:    0.0  
##  Median :   38.0   Median :   89.0        Median :    0.0  
##  Mean   :  592.4   Mean   :  411.1        Mean   :  978.9  
##  3rd Qu.:  577.4   3rd Qu.:  468.6        3rd Qu.: 1113.8  
##  Max.   :40761.2   Max.   :22500.0        Max.   :47137.2  
##                                                            
##  PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
##  Min.   :0.00000     Min.   :0.00000           
##  1st Qu.:0.08333     1st Qu.:0.00000           
##  Median :0.50000     Median :0.08333           
##  Mean   :0.49035     Mean   :0.20246           
##  3rd Qu.:0.91667     3rd Qu.:0.30000           
##  Max.   :1.00000     Max.   :1.00000           
##                                                
##  PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX 
##  Min.   :0.0000                   Min.   :0.0000         Min.   :  0.000  
##  1st Qu.:0.0000                   1st Qu.:0.0000         1st Qu.:  0.000  
##  Median :0.1667                   Median :0.0000         Median :  0.000  
##  Mean   :0.3644                   Mean   :0.1351         Mean   :  3.249  
##  3rd Qu.:0.7500                   3rd Qu.:0.2222         3rd Qu.:  4.000  
##  Max.   :1.0000                   Max.   :1.5000         Max.   :123.000  
##                                                                           
##  PURCHASES_TRX     CREDIT_LIMIT      PAYMENTS       MINIMUM_PAYMENTS  
##  Min.   :  0.00   Min.   :   50   Min.   :    0.0   Min.   :    0.02  
##  1st Qu.:  1.00   1st Qu.: 1600   1st Qu.:  383.3   1st Qu.:  169.12  
##  Median :  7.00   Median : 3000   Median :  856.9   Median :  312.34  
##  Mean   : 14.71   Mean   : 4494   Mean   : 1733.1   Mean   :  864.21  
##  3rd Qu.: 17.00   3rd Qu.: 6500   3rd Qu.: 1901.1   3rd Qu.:  825.49  
##  Max.   :358.00   Max.   :30000   Max.   :50721.5   Max.   :76406.21  
##                                                     NA's   :313       
##  PRC_FULL_PAYMENT     TENURE     
##  Min.   :0.0000   Min.   : 6.00  
##  1st Qu.:0.0000   1st Qu.:12.00  
##  Median :0.0000   Median :12.00  
##  Mean   :0.1537   Mean   :11.52  
##  3rd Qu.:0.1429   3rd Qu.:12.00  
##  Max.   :1.0000   Max.   :12.00  
## 

Now, let’s do the same for MINIMUM_PAYMENTS.

hist(creditCardData$MINIMUM_PAYMENTS)

The data is right-skewed over here also with skewness co-efficient very high. The values greater than 500 can be called as outliers over here. Let’s replace all the missing values with median in this case too.

creditCardData$MINIMUM_PAYMENTS[is.na(creditCardData$MINIMUM_PAYMENTS)] <- median(creditCardData$MINIMUM_PAYMENTS, na.rm = TRUE)
summary(creditCardData)
##     CUST_ID        BALANCE        BALANCE_FREQUENCY   PURCHASES       
##  C10001 :   1   Min.   :    0.0   Min.   :0.0000    Min.   :    0.00  
##  C10002 :   1   1st Qu.:  128.3   1st Qu.:0.8889    1st Qu.:   39.63  
##  C10003 :   1   Median :  873.4   Median :1.0000    Median :  361.28  
##  C10004 :   1   Mean   : 1564.5   Mean   :0.8773    Mean   : 1003.20  
##  C10005 :   1   3rd Qu.: 2054.1   3rd Qu.:1.0000    3rd Qu.: 1110.13  
##  C10006 :   1   Max.   :19043.1   Max.   :1.0000    Max.   :49039.57  
##  (Other):8944                                                         
##  ONEOFF_PURCHASES  INSTALLMENTS_PURCHASES  CASH_ADVANCE    
##  Min.   :    0.0   Min.   :    0.0        Min.   :    0.0  
##  1st Qu.:    0.0   1st Qu.:    0.0        1st Qu.:    0.0  
##  Median :   38.0   Median :   89.0        Median :    0.0  
##  Mean   :  592.4   Mean   :  411.1        Mean   :  978.9  
##  3rd Qu.:  577.4   3rd Qu.:  468.6        3rd Qu.: 1113.8  
##  Max.   :40761.2   Max.   :22500.0        Max.   :47137.2  
##                                                            
##  PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
##  Min.   :0.00000     Min.   :0.00000           
##  1st Qu.:0.08333     1st Qu.:0.00000           
##  Median :0.50000     Median :0.08333           
##  Mean   :0.49035     Mean   :0.20246           
##  3rd Qu.:0.91667     3rd Qu.:0.30000           
##  Max.   :1.00000     Max.   :1.00000           
##                                                
##  PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX 
##  Min.   :0.0000                   Min.   :0.0000         Min.   :  0.000  
##  1st Qu.:0.0000                   1st Qu.:0.0000         1st Qu.:  0.000  
##  Median :0.1667                   Median :0.0000         Median :  0.000  
##  Mean   :0.3644                   Mean   :0.1351         Mean   :  3.249  
##  3rd Qu.:0.7500                   3rd Qu.:0.2222         3rd Qu.:  4.000  
##  Max.   :1.0000                   Max.   :1.5000         Max.   :123.000  
##                                                                           
##  PURCHASES_TRX     CREDIT_LIMIT      PAYMENTS       MINIMUM_PAYMENTS  
##  Min.   :  0.00   Min.   :   50   Min.   :    0.0   Min.   :    0.02  
##  1st Qu.:  1.00   1st Qu.: 1600   1st Qu.:  383.3   1st Qu.:  170.86  
##  Median :  7.00   Median : 3000   Median :  856.9   Median :  312.34  
##  Mean   : 14.71   Mean   : 4494   Mean   : 1733.1   Mean   :  844.91  
##  3rd Qu.: 17.00   3rd Qu.: 6500   3rd Qu.: 1901.1   3rd Qu.:  788.71  
##  Max.   :358.00   Max.   :30000   Max.   :50721.5   Max.   :76406.21  
##                                                                       
##  PRC_FULL_PAYMENT     TENURE     
##  Min.   :0.0000   Min.   : 6.00  
##  1st Qu.:0.0000   1st Qu.:12.00  
##  Median :0.0000   Median :12.00  
##  Mean   :0.1537   Mean   :11.52  
##  3rd Qu.:0.1429   3rd Qu.:12.00  
##  Max.   :1.0000   Max.   :12.00  
## 

Now, we have done missing value imputation. We can see that we don’t require CUST_ID in our analysis since it doesn’t contribute much in our model. So, let’s remove it.

creditCardData <- creditCardData[, -c(1)]
head(creditCardData)
##      BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1   40.90075          0.818182     95.40             0.00
## 2 3202.46742          0.909091      0.00             0.00
## 3 2495.14886          1.000000    773.17           773.17
## 4 1666.67054          0.636364   1499.00          1499.00
## 5  817.71434          1.000000     16.00            16.00
## 6 1809.82875          1.000000   1333.28             0.00
##   INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1                  95.40        0.000            0.166667
## 2                   0.00     6442.945            0.000000
## 3                   0.00        0.000            1.000000
## 4                   0.00      205.788            0.083333
## 5                   0.00        0.000            0.083333
## 6                1333.28        0.000            0.666667
##   ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1                   0.000000                         0.083333
## 2                   0.000000                         0.000000
## 3                   1.000000                         0.000000
## 4                   0.083333                         0.000000
## 5                   0.083333                         0.000000
## 6                   0.000000                         0.583333
##   CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1               0.000000                0             2         1000
## 2               0.250000                4             0         7000
## 3               0.000000                0            12         7500
## 4               0.083333                1             1         7500
## 5               0.000000                0             1         1200
## 6               0.000000                0             8         1800
##    PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 1  201.8021         139.5098         0.000000     12
## 2 4103.0326        1072.3402         0.222222     12
## 3  622.0667         627.2848         0.000000     12
## 4    0.0000         312.3439         0.000000     12
## 5  678.3348         244.7912         0.000000     12
## 6 1400.0578        2407.2460         0.000000     12

Now, that we have done this, let’s do the outlier handling for this dataset.I will be removing those data-points which show dramatic increase or decrease as compared to other data-points. Mostly it will be on 1-2% or 98-99%tile level. It will serve two benefits: One being that this will not lead to too much of data loss. Two, the quality and accuracy of our clustering will increase. Outliers not only induce multi-collinearity, they will also disrupt the accuracy of our clustering by distorting the Euclidean distances. So, let’s move ahead.

box1 = boxplot(creditCardData$BALANCE)


As can be seen, there are lot of outliers in this variable.

quantile(creditCardData$BALANCE, seq(0, 1, 0.02))
##           0%           2%           4%           6%           8% 
##     0.000000     2.140114     6.791763    11.735999    17.201403 
##          10%          12%          14%          16%          18% 
##    23.575529    31.452780    40.225116    49.857097    62.077146 
##          20%          22%          24%          26%          28% 
##    77.238026    94.013827   116.139987   141.418853   169.842053 
##          30%          32%          34%          36%          38% 
##   207.176552   245.192376   293.739058   348.041747   407.315239 
##          40%          42%          44%          46%          48% 
##   467.021989   535.938751   624.680672   709.474515   801.696530 
##          50%          52%          54%          56%          58% 
##   873.385231   946.573749  1016.507930  1080.748668  1137.698757 
##          60%          62%          64%          66%          68% 
##  1207.815587  1299.268313  1389.589016  1478.072900  1585.909487 
##          70%          72%          74%          76%          78% 
##  1698.588855  1827.325615  1967.656363  2142.242732  2362.829739 
##          80%          82%          84%          86%          88% 
##  2571.434263  2773.562692  3019.835517  3363.675352  3804.151958 
##          90%          92%          94%          96%          98% 
##  4338.563657  4881.429707  5544.159539  6460.903714  7969.618588 
##         100% 
## 19043.138560
box1$stats
##           [,1]
## [1,]    0.0000
## [2,]  128.2540
## [3,]  873.3852
## [4,] 2054.3728
## [5,] 4940.1139

There is a jump above 98th percentile.

creditCardData$BALANCE = ifelse(creditCardData$BALANCE > 7969, 7969, creditCardData$BALANCE)
boxplot(creditCardData$BALANCE)


The plot is quite better than before.Let’s do the same for other variables.

box2 <- boxplot(creditCardData$PURCHASES)

box2$stats
##         [,1]
## [1,]    0.00
## [2,]   39.58
## [3,]  361.28
## [4,] 1110.17
## [5,] 2711.90
quantile(creditCardData$PURCHASES, seq(0, 1, 0.01))
##         0%         1%         2%         3%         4%         5% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##         6%         7%         8%         9%        10%        11% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        12%        13%        14%        15%        16%        17% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        18%        19%        20%        21%        22%        23% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     6.9972 
##        24%        25%        26%        27%        28%        29% 
##    25.6444    39.6350    49.9566    59.6999    68.6952    79.0000 
##        30%        31%        32%        33%        34%        35% 
##    89.2850    99.7538   110.4680   120.8302   134.9360   148.3305 
##        36%        37%        38%        39%        40%        41% 
##   159.0448   172.0533   184.9696   200.0000   212.8500   227.6544 
##        42%        43%        44%        45%        46%        47% 
##   240.3860   254.8661   267.5696   282.3965   298.3566   313.9233 
##        48%        49%        50%        51%        52%        53% 
##   329.4268   345.6633   361.2800   380.0000   398.6400   419.0961 
##        54%        55%        56%        57%        58%        59% 
##   435.8488   453.2340   472.1912   494.9790   516.2268   539.8503 
##        60%        61%        62%        63%        64%        65% 
##   557.5460   584.1632   611.4620   639.1305   673.1180   704.0290 
##        66%        67%        68%        69%        70%        71% 
##   740.3106   779.6405   807.8096   848.2106   894.3160   933.7895 
##        72%        73%        74%        75%        76%        77% 
##   975.4060  1016.3939  1060.1300  1110.1300  1168.8100  1218.2049 
##        78%        79%        80%        81%        82%        83% 
##  1282.8246  1343.5166  1422.4380  1490.7128  1569.9518  1664.6431 
##        84%        85%        86%        87%        88%        89% 
##  1764.1376  1859.1160  1968.0532  2093.4020  2234.1520  2385.8153 
##        90%        91%        92%        93%        94%        95% 
##  2542.6240  2721.8562  2967.4824  3222.5813  3589.2156  3998.6195 
##        96%        97%        98%        99%       100% 
##  4490.7764  5183.4517  6335.7680  8977.2900 49039.5700
creditCardData$PURCHASES = ifelse(creditCardData$PURCHASES > 6336, 6336, creditCardData$PURCHASES)
boxplot(creditCardData$PURCHASES)

box3 <- boxplot(creditCardData$ONEOFF_PURCHASES)

box3$stats
##         [,1]
## [1,]    0.00
## [2,]    0.00
## [3,]   38.00
## [4,]  577.83
## [5,] 1443.33
quantile(creditCardData$ONEOFF_PURCHASES, seq(0, 1, 0.01))
##         0%         1%         2%         3%         4%         5% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##         6%         7%         8%         9%        10%        11% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        12%        13%        14%        15%        16%        17% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        18%        19%        20%        21%        22%        23% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        24%        25%        26%        27%        28%        29% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        30%        31%        32%        33%        34%        35% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        36%        37%        38%        39%        40%        41% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        42%        43%        44%        45%        46%        47% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        48%        49%        50%        51%        52%        53% 
##     0.0000    25.0000    38.0000    45.6500    57.0000    67.8594 
##        54%        55%        56%        57%        58%        59% 
##    79.0000    90.6905   105.0000   117.3137   134.1582   150.2957 
##        60%        61%        62%        63%        64%        65% 
##   168.3780   185.0000   204.7210   221.8101   242.6084   264.3005 
##        66%        67%        68%        69%        70%        71% 
##   290.0000   315.2490   339.8296   366.7169   400.1550   434.5543 
##        72%        73%        74%        75%        76%        77% 
##   463.0848   497.7885   537.3756   577.4050   626.0096   664.0365 
##        78%        79%        80%        81%        82%        83% 
##   712.7218   769.3897   816.9920   869.1290   930.0354   999.6964 
##        84%        85%        86%        87%        88%        89% 
##  1048.4432  1116.4075  1200.0000  1283.5115  1372.9396  1480.7132 
##        90%        91%        92%        93%        94%        95% 
##  1600.0990  1751.6232  1943.3240  2127.5609  2385.1696  2671.0940 
##        96%        97%        98%        99%       100% 
##  3076.2372  3609.9608  4432.5868  6689.8982 40761.2500
creditCardData$ONEOFF_PURCHASES = ifelse(creditCardData$ONEOFF_PURCHASES > 4433, 4433, creditCardData$ONEOFF_PURCHASES)
boxplot(creditCardData$ONEOFF_PURCHASES)

box4 <- boxplot(creditCardData$INSTALLMENTS_PURCHASES)

box4$stats
##         [,1]
## [1,]    0.00
## [2,]    0.00
## [3,]   89.00
## [4,]  468.65
## [5,] 1170.49
quantile(creditCardData$INSTALLMENTS_PURCHASES, seq(0, 1, 0.01))
##         0%         1%         2%         3%         4%         5% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##         6%         7%         8%         9%        10%        11% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        12%        13%        14%        15%        16%        17% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        18%        19%        20%        21%        22%        23% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        24%        25%        26%        27%        28%        29% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        30%        31%        32%        33%        34%        35% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        36%        37%        38%        39%        40%        41% 
##     0.0000     0.0000     0.0000     0.0000     0.0000     0.0000 
##        42%        43%        44%        45%        46%        47% 
##     0.0000     0.0000    13.0336    31.9260    46.7940    57.0000 
##        48%        49%        50%        51%        52%        53% 
##    68.7260    78.8218    89.0000    99.2395   110.0348   122.5776 
##        54%        55%        56%        57%        58%        59% 
##   133.7546   145.0095   158.4944   170.9916   183.9000   200.0000 
##        60%        61%        62%        63%        64%        65% 
##   213.9500   228.1746   239.7328   252.0000   267.1224   284.1105 
##        66%        67%        68%        69%        70%        71% 
##   299.8138   315.0166   332.5000   350.0000   371.1390   389.8508 
##        72%        73%        74%        75%        76%        77% 
##   411.3204   427.9954   449.2104   468.6375   494.1984   521.1274 
##        78%        79%        80%        81%        82%        83% 
##   547.3504   576.8342   603.2920   638.8760   679.9836   726.6076 
##        84%        85%        86%        87%        88%        89% 
##   775.7280   823.0615   873.6104   934.8400  1006.7812  1071.2159 
##        90%        91%        92%        93%        94%        95% 
##  1140.0700  1225.5399  1329.3908  1448.2455  1596.8604  1750.0875 
##        96%        97%        98%        99%       100% 
##  1957.0300  2273.2570  2757.3850  3886.2405 22500.0000
creditCardData$INSTALLMENTS_PURCHASES = ifelse(creditCardData$INSTALLMENTS_PURCHASES > 3886.2, 3886.2, creditCardData$INSTALLMENTS_PURCHASES)
boxplot(creditCardData$INSTALLMENTS_PURCHASES)

box5 <- boxplot(creditCardData$CASH_ADVANCE)

box5$stats
##          [,1]
## [1,]    0.000
## [2,]    0.000
## [3,]    0.000
## [4,] 1113.869
## [5,] 2784.295
quantile(creditCardData$CASH_ADVANCE, seq(0, 1, 0.01))
##          0%          1%          2%          3%          4%          5% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##          6%          7%          8%          9%         10%         11% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         12%         13%         14%         15%         16%         17% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         18%         19%         20%         21%         22%         23% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         24%         25%         26%         27%         28%         29% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         30%         31%         32%         33%         34%         35% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         36%         37%         38%         39%         40%         41% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         42%         43%         44%         45%         46%         47% 
##     0.00000     0.00000     0.00000     0.00000     0.00000     0.00000 
##         48%         49%         50%         51%         52%         53% 
##     0.00000     0.00000     0.00000     0.00000    19.01261    47.21669 
##         54%         55%         56%         57%         58%         59% 
##    73.29373    94.92347   113.69394   146.64741   175.86786   198.16096 
##         60%         61%         62%         63%         64%         65% 
##   238.63372   272.66419   302.20833   357.17746   399.65292   451.94552 
##         66%         67%         68%         69%         70%         71% 
##   490.53359   554.96069   639.73276   722.62786   797.27230   879.50541 
##         72%         73%         74%         75%         76%         77% 
##   929.70080   975.00317  1052.93473  1113.82114  1183.11998  1286.85883 
##         78%         79%         80%         81%         82%         83% 
##  1381.66606  1462.14780  1574.93378  1686.09064  1817.81642  1909.20380 
##         84%         85%         86%         87%         88%         89% 
##  2037.90189  2194.72201  2370.47308  2529.55907  2710.13105  2859.77980 
##         90%         91%         92%         93%         94%         95% 
##  3065.53456  3319.42513  3584.39104  3895.23291  4232.21233  4647.16912 
##         96%         97%         98%         99%        100% 
##  5264.20760  6010.90910  7298.60917  9588.16336 47137.21176
creditCardData$CASH_ADVANCE = ifelse(creditCardData$CASH_ADVANCE > 7299, 7299, creditCardData$CASH_ADVANCE)
boxplot(creditCardData$CASH_ADVANCE)

box6 <- boxplot(creditCardData$PURCHASES_FREQUENCY)

box7 <- boxplot(creditCardData$ONEOFF_PURCHASES_FREQUENCY)

box7$stats
##          [,1]
## [1,] 0.000000
## [2,] 0.000000
## [3,] 0.083333
## [4,] 0.300000
## [5,] 0.750000
quantile(creditCardData$ONEOFF_PURCHASES_FREQUENCY, seq(0, 1, 0.01))
##        0%        1%        2%        3%        4%        5%        6% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##        7%        8%        9%       10%       11%       12%       13% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       14%       15%       16%       17%       18%       19%       20% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       21%       22%       23%       24%       25%       26%       27% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       28%       29%       30%       31%       32%       33%       34% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       35%       36%       37%       38%       39%       40%       41% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       42%       43%       44%       45%       46%       47%       48% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       49%       50%       51%       52%       53%       54%       55% 
## 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 
##       56%       57%       58%       59%       60%       61%       62% 
## 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0909090 0.1250000 
##       63%       64%       65%       66%       67%       68%       69% 
## 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 
##       70%       71%       72%       73%       74%       75%       76% 
## 0.2222220 0.2500000 0.2500000 0.2500000 0.2500000 0.3000000 0.3333330 
##       77%       78%       79%       80%       81%       82%       83% 
## 0.3333330 0.3333330 0.3333330 0.4166670 0.4166670 0.4166670 0.5000000 
##       84%       85%       86%       87%       88%       89%       90% 
## 0.5000000 0.5295457 0.5833330 0.5833330 0.6666670 0.6666670 0.7500000 
##       91%       92%       93%       94%       95%       96%       97% 
## 0.7500000 0.8333330 0.9166670 0.9166670 1.0000000 1.0000000 1.0000000 
##       98%       99%      100% 
## 1.0000000 1.0000000 1.0000000
box8 <- boxplot(creditCardData$PURCHASES_INSTALLMENTS_FREQUENCY)

box9 <- boxplot(creditCardData$CASH_ADVANCE_FREQUENCY)

box9$stats
##          [,1]
## [1,] 0.000000
## [2,] 0.000000
## [3,] 0.000000
## [4,] 0.222222
## [5,] 0.545455
quantile(creditCardData$CASH_ADVANCE_FREQUENCY, seq(0, 1, 0.01))
##        0%        1%        2%        3%        4%        5%        6% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##        7%        8%        9%       10%       11%       12%       13% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       14%       15%       16%       17%       18%       19%       20% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       21%       22%       23%       24%       25%       26%       27% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       28%       29%       30%       31%       32%       33%       34% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       35%       36%       37%       38%       39%       40%       41% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       42%       43%       44%       45%       46%       47%       48% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       49%       50%       51%       52%       53%       54%       55% 
## 0.0000000 0.0000000 0.0000000 0.0833330 0.0833330 0.0833330 0.0833330 
##       56%       57%       58%       59%       60%       61%       62% 
## 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 0.0833330 
##       63%       64%       65%       66%       67%       68%       69% 
## 0.0833330 0.1000000 0.1250000 0.1666670 0.1666670 0.1666670 0.1666670 
##       70%       71%       72%       73%       74%       75%       76% 
## 0.1666670 0.1666670 0.1666670 0.1666670 0.1666670 0.2222220 0.2500000 
##       77%       78%       79%       80%       81%       82%       83% 
## 0.2500000 0.2500000 0.2500000 0.2500000 0.2500000 0.2750647 0.3333330 
##       84%       85%       86%       87%       88%       89%       90% 
## 0.3333330 0.3333330 0.3333330 0.3333330 0.4000000 0.4166670 0.4166670 
##       91%       92%       93%       94%       95%       96%       97% 
## 0.4166670 0.5000000 0.5000000 0.5000000 0.5833330 0.6000000 0.6666670 
##       98%       99%      100% 
## 0.7500000 0.8333330 1.5000000
creditCardData$CASH_ADVANCE_FREQUENCY = ifelse(creditCardData$CASH_ADVANCE_FREQUENCY > 0.833, 0.833, creditCardData$CASH_ADVANCE_FREQUENCY)
boxplot(creditCardData$CASH_ADVANCE_FREQUENCY)

box10 <- boxplot(creditCardData$CASH_ADVANCE_TRX)

box10$stats
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    4
## [5,]   10
## attr(,"class")
##         1 
## "integer"
quantile(creditCardData$CASH_ADVANCE_TRX, seq(0, 1, 0.01))
##   0%   1%   2%   3%   4%   5%   6%   7%   8%   9%  10%  11%  12%  13%  14% 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##  15%  16%  17%  18%  19%  20%  21%  22%  23%  24%  25%  26%  27%  28%  29% 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##  30%  31%  32%  33%  34%  35%  36%  37%  38%  39%  40%  41%  42%  43%  44% 
##    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
##  45%  46%  47%  48%  49%  50%  51%  52%  53%  54%  55%  56%  57%  58%  59% 
##    0    0    0    0    0    0    0    1    1    1    1    1    1    1    1 
##  60%  61%  62%  63%  64%  65%  66%  67%  68%  69%  70%  71%  72%  73%  74% 
##    1    1    2    2    2    2    2    2    2    3    3    3    3    3    4 
##  75%  76%  77%  78%  79%  80%  81%  82%  83%  84%  85%  86%  87%  88%  89% 
##    4    4    4    5    5    5    5    6    6    7    7    7    8    8    9 
##  90%  91%  92%  93%  94%  95%  96%  97%  98%  99% 100% 
##   10   10   11   12   13   15   17   19   23   29  123
creditCardData$CASH_ADVANCE_TRX = ifelse(creditCardData$CASH_ADVANCE_TRX > 29, 29, creditCardData$CASH_ADVANCE_TRX)
boxplot(creditCardData$CASH_ADVANCE_TRX)

box11 <- boxplot(creditCardData$PURCHASES_TRX) 

box11$stats
##      [,1]
## [1,]    0
## [2,]    1
## [3,]    7
## [4,]   17
## [5,]   41
## attr(,"class")
##         1 
## "integer"
quantile(creditCardData$PURCHASES_TRX, seq(0, 1, 0.01))
##     0%     1%     2%     3%     4%     5%     6%     7%     8%     9% 
##   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00 
##    10%    11%    12%    13%    14%    15%    16%    17%    18%    19% 
##   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00 
##    20%    21%    22%    23%    24%    25%    26%    27%    28%    29% 
##   0.00   0.00   0.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00 
##    30%    31%    32%    33%    34%    35%    36%    37%    38%    39% 
##   1.00   2.00   2.00   2.00   2.00   3.00   3.00   3.00   3.00   4.00 
##    40%    41%    42%    43%    44%    45%    46%    47%    48%    49% 
##   4.00   4.00   5.00   5.00   5.00   6.00   6.00   6.00   6.00   7.00 
##    50%    51%    52%    53%    54%    55%    56%    57%    58%    59% 
##   7.00   7.00   8.00   8.00   8.00   9.00   9.00  10.00  10.00  10.00 
##    60%    61%    62%    63%    64%    65%    66%    67%    68%    69% 
##  11.00  11.00  12.00  12.00  12.00  12.00  12.00  12.00  12.00  13.00 
##    70%    71%    72%    73%    74%    75%    76%    77%    78%    79% 
##  13.00  14.00  15.00  15.00  16.00  17.00  18.00  19.00  20.00  21.00 
##    80%    81%    82%    83%    84%    85%    86%    87%    88%    89% 
##  22.00  23.00  24.00  25.00  26.00  27.00  29.00  31.00  33.00  35.00 
##    90%    91%    92%    93%    94%    95%    96%    97%    98%    99% 
##  37.00  40.00  44.00  47.57  51.00  57.00  65.00  75.00  91.00 116.51 
##   100% 
## 358.00
creditCardData$PURCHASES_TRX = ifelse(creditCardData$PURCHASES_TRX > 116.51, 116.51, creditCardData$PURCHASES_TRX)
boxplot(creditCardData$PURCHASES_TRX)

box12 <- boxplot(creditCardData$CREDIT_LIMIT)

box12$stats
##       [,1]
## [1,]    50
## [2,]  1600
## [3,]  3000
## [4,]  6500
## [5,] 13600
quantile(creditCardData$CREDIT_LIMIT, seq(0, 1, 0.01))
##      0%      1%      2%      3%      4%      5%      6%      7%      8% 
##    50.0   500.0   700.0  1000.0  1000.0  1000.0  1000.0  1000.0  1000.0 
##      9%     10%     11%     12%     13%     14%     15%     16%     17% 
##  1000.0  1200.0  1200.0  1200.0  1200.0  1200.0  1200.0  1200.0  1500.0 
##     18%     19%     20%     21%     22%     23%     24%     25%     26% 
##  1500.0  1500.0  1500.0  1500.0  1500.0  1500.0  1500.0  1600.0  1700.0 
##     27%     28%     29%     30%     31%     32%     33%     34%     35% 
##  1800.0  1800.0  1910.5  2000.0  2000.0  2000.0  2000.0  2250.0  2500.0 
##     36%     37%     38%     39%     40%     41%     42%     43%     44% 
##  2500.0  2500.0  2500.0  2500.0  2500.0  2500.0  2722.9  3000.0  3000.0 
##     45%     46%     47%     48%     49%     50%     51%     52%     53% 
##  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3000.0  3500.0  3500.0 
##     54%     55%     56%     57%     58%     59%     60%     61%     62% 
##  3600.0  4000.0  4000.0  4000.0  4000.0  4000.0  4200.0  4500.0  4500.0 
##     63%     64%     65%     66%     67%     68%     69%     70%     71% 
##  4500.0  5000.0  5000.0  5000.0  5000.0  5500.0  5500.0  6000.0  6000.0 
##     72%     73%     74%     75%     76%     77%     78%     79%     80% 
##  6000.0  6000.0  6000.0  6500.0  6500.0  6500.0  7000.0  7000.0  7000.0 
##     81%     82%     83%     84%     85%     86%     87%     88%     89% 
##  7500.0  7500.0  7500.0  8000.0  8000.0  8500.0  8500.0  9000.0  9000.0 
##     90%     91%     92%     93%     94%     95%     96%     97%     98% 
##  9500.0 10000.0 10000.0 10500.0 11000.0 12000.0 12500.0 13500.0 15000.0 
##     99%    100% 
## 17000.0 30000.0
creditCardData$CREDIT_LIMIT <- ifelse(creditCardData$CREDIT_LIMIT > 17000, 17000, creditCardData$CREDIT_LIMIT)
boxplot(creditCardData$CREDIT_LIMIT)

box13 <- boxplot(creditCardData$PAYMENTS)

box13$stats
##           [,1]
## [1,]    0.0000
## [2,]  383.2739
## [3,]  856.9015
## [4,] 1901.2793
## [5,] 4177.3248
quantile(creditCardData$PAYMENTS, seq(0, 1, 0.01))
##          0%          1%          2%          3%          4%          5% 
##     0.00000     0.00000     0.00000    28.37037    65.42608    89.98892 
##          6%          7%          8%          9%         10%         11% 
##   111.57822   129.99246   149.73493   164.59309   179.61707   192.86103 
##         12%         13%         14%         15%         16%         17% 
##   205.79355   219.41584   234.64663   250.99708   265.35043   278.25977 
##         18%         19%         20%         21%         22%         23% 
##   290.11933   301.77971   313.14103   326.66086   340.67787   353.32117 
##         24%         25%         26%         27%         28%         29% 
##   367.52652   383.27617   398.92790   414.86863   429.80406   445.64342 
##         30%         31%         32%         33%         34%         35% 
##   459.43829   476.22344   495.62227   510.07714   527.08293   542.52790 
##         36%         37%         38%         39%         40%         41% 
##   557.85438   573.87988   588.87340   606.22629   624.26820   647.14105 
##         42%         43%         44%         45%         46%         47% 
##   667.35812   686.93747   706.35388   725.33038   750.53966   774.50228 
##         48%         49%         50%         51%         52%         53% 
##   801.43110   825.93060   856.90155   880.27188   911.09918   945.45105 
##         54%         55%         56%         57%         58%         59% 
##   977.17126  1009.60946  1042.09284  1074.91545  1108.41713  1144.79404 
##         60%         61%         62%         63%         64%         65% 
##  1185.25927  1223.57597  1263.35793  1301.12816  1334.80118  1369.33208 
##         66%         67%         68%         69%         70%         71% 
##  1403.42117  1450.08786  1493.02346  1550.44711  1604.09211  1654.84212 
##         72%         73%         74%         75%         76%         77% 
##  1705.43246  1771.32063  1844.18399  1901.13432  1964.88058  2048.25176 
##         78%         79%         80%         81%         82%         83% 
##  2136.61981  2221.29513  2314.01765  2418.59665  2526.70801  2645.23371 
##         84%         85%         86%         87%         88%         89% 
##  2794.96094  2945.83913  3110.37818  3304.40757  3517.96080  3726.96906 
##         90%         91%         92%         93%         94%         95% 
##  3923.90664  4201.80975  4577.01574  4921.93574  5513.52085  6082.09060 
##         96%         97%         98%         99%        100% 
##  6925.35328  8084.81289  9801.97521 13608.71554 50721.48336
creditCardData$PAYMENTS <- ifelse(creditCardData$PAYMENTS > 13608.71, 13608.71, creditCardData$PAYMENTS)
boxplot(creditCardData$PAYMENTS)

box14 <- boxplot(creditCardData$MINIMUM_PAYMENTS)

box14$stats
##             [,1]
## [1,]    0.019163
## [2,]  170.851668
## [3,]  312.343947
## [4,]  788.721609
## [5,] 1712.713459
quantile(creditCardData$MINIMUM_PAYMENTS, seq(0, 1, 0.01))
##           0%           1%           2%           3%           4% 
##     0.019163    20.040613    41.456921    55.311014    67.060567 
##           5%           6%           7%           8%           9% 
##    74.644117    82.488645    87.502150    95.200493   100.792418 
##          10%          11%          12%          13%          14% 
##   109.131328   115.562817   122.228152   128.162788   132.242811 
##          15%          16%          17%          18%          19% 
##   136.812654   140.972198   146.025353   149.855005   153.467444 
##          20%          21%          22%          23%          24% 
##   157.390750   162.137911   164.105812   166.388178   168.884636 
##          25%          26%          27%          28%          29% 
##   170.857654   173.055263   175.225481   177.147049   178.998434 
##          30%          31%          32%          33%          34% 
##   181.647737   184.780824   188.063927   190.847662   194.158800 
##          35%          36%          37%          38%          39% 
##   198.178921   202.843548   207.677942   212.696770   218.808479 
##          40%          41%          42%          43%          44% 
##   227.692134   234.793544   245.321491   255.207133   263.747395 
##          45%          46%          47%          48%          49% 
##   274.080603   285.428608   296.042891   309.501766   312.343947 
##          50%          51%          52%          53%          54% 
##   312.343947   312.343947   315.534359   329.146890   343.698868 
##          55%          56%          57%          58%          59% 
##   360.662595   374.443645   390.842218   409.404537   422.555350 
##          60%          61%          62%          63%          64% 
##   438.826129   454.987986   472.002637   489.173657   508.632190 
##          65%          66%          67%          68%          69% 
##   527.373644   548.872602   572.633982   597.043009   619.197464 
##          70%          71%          72%          73%          74% 
##   642.820796   669.546327   697.414238   725.149292   756.162286 
##          75%          76%          77%          78%          79% 
##   788.713501   830.906616   872.112364   915.688453   952.003611 
##          80%          81%          82%          83%          84% 
##   994.385464  1039.735325  1093.223398  1150.659627  1210.881642 
##          85%          86%          87%          88%          89% 
##  1280.879553  1339.241966  1429.947802  1507.031600  1615.390262 
##          90%          91%          92%          93%          94% 
##  1731.689977  1843.344040  1988.427640  2172.527924  2440.151272 
##          95%          96%          97%          98%          99% 
##  2719.566935  3068.337274  3658.647023  4800.983569  8626.691541 
##         100% 
## 76406.207520
creditCardData$MINIMUM_PAYMENTS <- ifelse(creditCardData$MINIMUM_PAYMENTS > 8627, 8627, creditCardData$MINIMUM_PAYMENTS)
boxplot(creditCardData$MINIMUM_PAYMENTS)

box15 <- boxplot(creditCardData$PRC_FULL_PAYMENT)

box15$stats
##          [,1]
## [1,] 0.000000
## [2,] 0.000000
## [3,] 0.000000
## [4,] 0.142857
## [5,] 0.333333
quantile(creditCardData$PRC_FULL_PAYMENT, seq(0, 1, 0.01))
##        0%        1%        2%        3%        4%        5%        6% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##        7%        8%        9%       10%       11%       12%       13% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       14%       15%       16%       17%       18%       19%       20% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       21%       22%       23%       24%       25%       26%       27% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       28%       29%       30%       31%       32%       33%       34% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       35%       36%       37%       38%       39%       40%       41% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       42%       43%       44%       45%       46%       47%       48% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       49%       50%       51%       52%       53%       54%       55% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       56%       57%       58%       59%       60%       61%       62% 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##       63%       64%       65%       66%       67%       68%       69% 
## 0.0000000 0.0000000 0.0000000 0.0833330 0.0833330 0.0833330 0.0833330 
##       70%       71%       72%       73%       74%       75%       76% 
## 0.0833330 0.0909090 0.0909090 0.1000000 0.1111110 0.1428570 0.1666670 
##       77%       78%       79%       80%       81%       82%       83% 
## 0.1666670 0.1818180 0.2222220 0.2500000 0.2727270 0.3000000 0.3333330 
##       84%       85%       86%       87%       88%       89%       90% 
## 0.3750000 0.4244046 0.5000000 0.5000000 0.5714290 0.6363640 0.6700003 
##       91%       92%       93%       94%       95%       96%       97% 
## 0.7500000 0.8181820 0.8750000 0.9166670 1.0000000 1.0000000 1.0000000 
##       98%       99%      100% 
## 1.0000000 1.0000000 1.0000000
box16 <- boxplot(creditCardData$TENURE)


Having done outlier handling for all the variables, let’s now go ahead to do scaling of the variables.

creditCardData_scaled <- scale(creditCardData, center = TRUE, scale = TRUE)
head(creditCardData_scaled)
##          BALANCE BALANCE_FREQUENCY   PURCHASES ONEOFF_PURCHASES
## [1,] -0.78204173        -0.2494205 -0.59477303       -0.5349217
## [2,]  0.88877311         0.1343172 -0.66547774       -0.5349217
## [3,]  0.51497162         0.5180549 -0.09245089        0.2845047
## [4,]  0.07713999        -1.0168960  0.44549041        1.0537589
## [5,] -0.37151372         0.5180549 -0.65361951       -0.5179645
## [6,]  0.15279580         0.5180549  0.32266877       -0.5349217
##      INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## [1,]             -0.4274896   -0.5535258          -0.8064453
## [2,]             -0.5685578    3.3983093          -1.2216898
## [3,]             -0.5685578   -0.5535258           1.2697723
## [4,]             -0.5685578   -0.4273040          -1.0140688
## [5,]             -0.5685578   -0.5535258          -1.0140688
## [6,]              1.4029655   -0.5535258           0.4392858
##      ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## [1,]                 -0.6786229                       -0.7072736
## [2,]                 -0.6786229                       -0.9169440
## [3,]                  2.6733017                       -0.9169440
## [4,]                 -0.3992970                       -0.9169440
## [5,]                 -0.3992970                       -0.9169440
## [6,]                 -0.6786229                        0.5507533
##      CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## [1,]             -0.6853668       -0.5574735    -0.5796820   -0.9779245
## [2,]              0.5931878        0.1666757    -0.6750920    0.7112937
## [3,]             -0.6853668       -0.5574735    -0.1026319    0.8520618
## [4,]             -0.2591837       -0.3764362    -0.6273870    0.8520618
## [5,]             -0.6853668       -0.5574735    -0.6273870   -0.9216172
## [6,]             -0.6853668       -0.5574735    -0.2934519   -0.7526954
##        PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT    TENURE
## [1,] -0.6429091      -0.49303098       -0.5255216 0.3606594
## [2,]  1.0896863       0.26363463        0.2342138 0.3606594
## [3,] -0.4562632      -0.09737218       -0.5255216 0.3606594
## [4,] -0.7325325      -0.35283651       -0.5255216 0.3606594
## [5,] -0.4312738      -0.40763191       -0.5255216 0.3606594
## [6,] -0.1107457       1.34644378       -0.5255216 0.3606594

Now, first let’s do the cluster analysis using K-means clustering.For that first, we will find out the optimal number of clusters by Calinski criteria using the following code.

library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-2
fit <- cascadeKM(creditCardData_scaled, 1, 10, iter = 1000)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 447500)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 447500)
plot(fit, sortg = TRUE, grpmts.plot = TRUE)

optimalNoOfClusters = as.numeric(which.max(fit$results[2,]))
cat("Optimal number of clusters by Calinski criterion is ", optimalNoOfClusters, "\n")
## Optimal number of clusters by Calinski criterion is  3

From the plot and the statement, we can see that the optimal number of clusters is 3. Let’s now go for plotting elbow chart for the same.

#calculate WSS of the data.
wss <- (nrow(creditCardData)-1)*sum(apply(creditCardData, 2, var))
for (i in 2:15) wss[i] <- sum(kmeans(creditCardData, centers = i)$withinss)
plot(1:15, wss, type = "b", xlab = "Number of clusters", ylab = "Within groups sum of squares", col = "red", pch = 2)


From chart we see that the from 2-4, there is significant change in the graph. Also, calinki criterion says that the optimal number of clusters is 3. So, let’s stick with 3.

k <- kmeans(creditCardData_scaled, 3)
k
## K-means clustering with 3 clusters of sizes 5892, 1672, 1386
## 
## Cluster means:
##      BALANCE BALANCE_FREQUENCY  PURCHASES ONEOFF_PURCHASES
## 1 -0.3983947        -0.1959587 -0.3212162       -0.2936399
## 2  1.1628069         0.3339297 -0.4133965       -0.3144702
## 3  0.2908576         0.4302005  1.8642171        1.6276481
##   INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1             -0.2178541   -0.3675839         -0.07727405
## 2             -0.3747862    1.5199256         -0.64452579
## 3              1.3782388   -0.2709316          1.10602152
##   ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1                 -0.2476700                       -0.0603146
## 2                 -0.3308258                       -0.5504815
## 3                  1.4519568                        0.9204753
##   CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1             -0.3469807       -0.3504420    -0.2780927   -0.3436813
## 2              1.5063303        1.4574325    -0.4281501    0.5601988
## 3             -0.3421170       -0.2684147     1.6986935    0.7852223
##     PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT     TENURE
## 1 -0.3467525       -0.2301229       0.01017108 -0.0366549
## 2  0.4857288        0.6258468      -0.40848594 -0.1032120
## 3  0.8881150        0.2232815       0.44953860  0.2803328
## 
## Clustering vector:
##    [1] 1 2 1 1 1 1 3 1 1 1 1 1 3 3 1 2 1 1 1 1 1 3 1 3 2 1 1 1 2 1 3 1 2 1
##   [35] 2 1 2 3 2 2 1 1 1 1 3 1 1 2 3 1 2 3 1 1 1 1 2 3 1 2 1 1 2 1 3 1 1 1
##   [69] 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 1 3 3 2 2 1 2 3 2 1 1 1 1 1 2 1 1 1 1
##  [103] 3 1 1 2 1 2 1 3 1 1 3 2 1 1 1 2 1 1 3 1 3 1 2 3 1 1 2 1 1 3 1 2 1 1
##  [137] 3 1 3 1 1 1 2 3 3 1 1 1 1 1 3 3 1 3 3 1 3 1 3 1 2 1 1 1 1 2 1 3 1 1
##  [171] 2 3 1 2 3 1 1 2 3 1 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 1 2 1 1 1 2
##  [205] 3 1 2 2 2 1 2 1 1 1 1 1 1 1 2 3 3 3 1 3 3 1 3 3 1 3 1 3 1 1 3 1 2 1
##  [239] 1 1 2 1 3 1 3 1 3 1 2 3 2 1 1 3 2 1 1 3 3 1 1 3 3 3 1 3 3 1 1 3 3 2
##  [273] 3 3 2 1 1 1 1 3 3 1 1 3 1 2 1 1 1 2 3 2 1 2 1 3 2 2 1 1 2 3 2 1 2 3
##  [307] 3 1 1 1 1 3 3 1 2 1 2 1 1 1 2 2 2 1 2 1 3 1 1 3 1 1 3 3 2 2 1 3 1 1
##  [341] 1 2 1 1 3 1 1 1 2 1 1 3 2 1 1 3 3 2 1 1 1 1 1 3 1 1 1 3 2 2 3 2 1 3
##  [375] 1 1 3 1 1 1 2 1 2 1 3 1 1 2 1 1 1 1 1 3 1 1 1 1 2 1 1 1 2 3 2 1 1 1
##  [409] 2 1 1 3 1 2 1 3 2 3 1 2 1 1 2 3 2 1 1 1 2 2 2 2 1 2 1 2 1 3 1 2 1 2
##  [443] 1 1 3 3 3 1 1 1 1 3 3 2 1 1 1 1 1 3 3 3 1 1 1 2 3 1 3 1 2 2 3 1 1 2
##  [477] 1 3 1 1 1 3 2 1 3 3 2 3 2 3 3 2 3 1 1 2 1 2 1 1 1 3 1 1 1 1 1 3 1 3
##  [511] 1 3 2 2 3 1 2 2 1 1 2 3 1 1 3 1 2 2 1 1 1 3 1 1 3 1 3 2 1 2 1 1 2 1
##  [545] 3 3 3 1 3 1 3 3 2 3 2 1 2 2 2 1 3 1 1 3 1 3 1 3 1 2 1 2 1 1 3 1 2 2
##  [579] 1 1 2 3 2 2 3 3 1 3 1 2 3 3 1 2 1 1 1 3 3 1 2 1 2 1 1 1 2 1 2 3 2 1
##  [613] 3 3 1 2 3 1 1 3 3 1 3 1 3 1 1 1 3 3 1 1 1 2 2 1 1 2 1 3 3 2 3 3 3 3
##  [647] 2 3 2 2 1 3 3 1 1 3 3 1 3 1 1 3 3 1 3 1 1 1 3 3 1 2 3 1 2 3 2 3 1 2
##  [681] 1 2 1 1 2 1 1 1 1 3 3 1 1 3 1 1 2 1 3 1 2 1 2 1 3 1 1 2 3 3 2 1 3 1
##  [715] 1 2 2 1 1 1 1 2 1 2 1 3 3 2 3 3 1 1 2 3 3 2 1 1 1 2 1 1 2 2 1 1 1 1
##  [749] 1 3 3 1 2 2 2 2 2 2 3 3 2 1 2 2 1 1 1 1 2 3 1 3 1 1 1 1 3 3 1 3 2 1
##  [783] 3 3 1 1 2 1 2 2 1 1 2 1 1 1 1 1 1 1 2 3 1 2 1 1 3 3 1 1 2 1 2 1 1 1
##  [817] 3 1 1 2 1 1 1 1 3 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 3 2 2 2 3 2 1 1 2 1
##  [851] 1 2 1 2 3 1 3 3 1 1 1 3 1 1 1 2 2 2 1 1 2 3 1 2 3 1 1 1 2 1 2 1 2 2
##  [885] 3 1 3 3 1 3 1 3 1 2 3 3 1 1 2 1 1 1 2 2 1 1 1 1 3 2 2 1 1 1 1 1 3 3
##  [919] 1 3 1 1 2 1 2 1 3 1 2 1 1 1 1 2 1 1 1 1 3 2 1 3 2 2 1 2 1 1 1 1 1 1
##  [953] 3 1 2 1 1 1 1 1 3 2 2 1 1 1 1 3 1 2 3 3 1 2 1 3 3 2 1 1 1 3 3 1 2 3
##  [987] 1 2 3 2 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 3 1 2 1 1 1 1 2 1 3
## [1021] 1 1 1 1 1 1 1 2 1 2 2 3 2 2 1 2 1 2 1 1 1 2 2 1 1 1 1 3 1 1 2 1 1 3
## [1055] 1 1 1 1 3 3 1 3 1 2 1 1 1 2 2 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 3 1 2 3
## [1089] 2 2 1 2 1 1 1 1 2 1 3 1 2 1 1 3 2 1 1 1 2 2 2 1 1 1 2 1 1 1 3 2 2 1
## [1123] 1 1 1 3 3 1 1 1 2 1 1 2 1 1 1 3 1 1 1 1 1 2 1 2 1 2 1 3 3 3 1 1 1 1
## [1157] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 2 3 1 1 1 2 2 2 3 1 1 1 1 1 1
## [1191] 3 1 1 3 1 2 1 1 1 3 1 1 3 1 1 1 1 1 2 1 1 2 3 3 1 1 3 2 3 1 1 1 2 1
## [1225] 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 1 3 3 3 1 2 1 1 3 3 2 1 1 1 1 1 3 3 1
## [1259] 1 2 1 1 2 3 2 1 1 1 3 3 1 3 1 1 1 3 1 3 1 1 1 1 2 1 1 1 3 3 3 1 3 1
## [1293] 3 1 1 1 1 1 1 1 2 1 3 1 3 3 1 2 3 1 1 3 1 1 3 2 1 1 2 2 1 1 3 2 1 1
## [1327] 1 1 2 1 2 3 2 2 1 2 2 1 3 1 3 2 3 3 1 2 1 1 2 3 2 3 1 1 1 1 1 3 3 2
## [1361] 3 3 1 1 1 2 1 2 3 3 1 1 1 3 3 1 1 1 2 1 1 1 3 1 3 1 1 2 1 2 1 1 1 1
## [1395] 1 1 1 1 1 3 2 1 1 3 1 1 3 2 1 1 2 3 3 1 1 1 3 1 1 1 3 1 3 3 2 1 1 3
## [1429] 1 1 1 1 3 3 3 2 1 1 1 3 1 1 2 1 3 1 1 3 1 2 2 2 1 2 3 1 1 1 1 1 3 2
## [1463] 2 3 2 3 3 1 1 1 1 3 1 1 3 2 1 3 3 3 1 2 3 1 3 1 2 2 1 2 3 2 1 2 1 1
## [1497] 1 3 3 1 1 1 1 3 2 2 1 1 1 1 3 1 1 2 3 1 1 2 2 3 1 1 3 3 3 2 1 3 1 1
## [1531] 1 1 3 3 1 1 2 1 1 1 1 1 1 1 1 3 1 2 1 3 1 2 1 1 1 1 1 1 2 3 3 1 2 2
## [1565] 1 1 1 3 1 3 1 1 3 1 1 3 1 2 2 1 1 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1
## [1599] 1 1 2 1 1 3 3 3 1 1 2 3 3 1 2 3 1 3 3 3 3 1 1 1 1 1 3 1 2 1 1 2 1 2
## [1633] 1 2 1 1 2 1 3 3 1 3 1 1 1 1 2 3 1 3 3 1 2 3 1 2 3 2 1 1 2 3 1 3 1 1
## [1667] 1 3 1 1 1 1 1 1 3 2 2 1 3 1 2 1 2 2 1 1 2 1 2 2 2 1 1 1 1 1 1 3 1 1
## [1701] 2 2 3 1 3 1 2 3 1 3 2 1 2 1 3 1 3 3 2 3 3 3 1 1 1 1 1 3 3 3 1 1 1 1
## [1735] 1 2 3 1 3 3 2 3 3 1 1 1 1 3 1 1 1 3 1 3 3 3 3 1 1 3 2 1 1 3 3 3 1 3
## [1769] 1 2 1 3 3 1 2 3 1 2 2 1 1 2 3 1 3 1 1 2 1 1 1 3 1 1 2 2 1 3 1 2 1 2
## [1803] 1 1 1 1 1 1 1 1 2 2 1 1 3 1 1 3 1 1 1 1 1 1 3 1 3 3 1 2 1 2 2 1 1 2
## [1837] 3 1 1 1 1 2 3 1 3 3 1 1 1 1 1 3 1 1 2 1 2 3 2 1 1 3 1 1 2 1 1 3 1 1
## [1871] 3 3 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 3 2 1 1 1 1 1 1 2 3 3
## [1905] 2 2 3 2 1 2 1 1 3 2 1 3 1 2 1 2 2 1 1 2 2 2 1 1 1 1 2 1 1 2 2 1 2 2
## [1939] 1 1 1 1 1 3 2 1 1 1 1 1 2 2 1 3 3 1 1 2 1 3 1 3 2 1 1 1 1 1 1 1 1 2
## [1973] 1 1 2 1 1 1 1 3 1 1 1 2 2 1 3 1 2 1 1 3 1 1 1 1 1 3 1 1 2 2 1 2 1 1
## [2007] 1 1 2 1 1 1 1 1 2 3 3 1 3 2 2 1 1 3 3 3 1 1 1 3 1 1 1 1 2 1 1 1 2 2
## [2041] 3 1 1 2 2 2 3 3 2 1 2 3 1 1 3 3 1 1 2 3 1 1 2 1 1 2 1 3 3 1 2 3 2 1
## [2075] 3 2 1 3 1 2 2 3 3 1 3 1 2 2 3 2 2 1 3 1 3 3 1 1 1 1 1 2 1 1 1 1 2 3
## [2109] 2 3 1 1 1 1 1 3 3 1 1 2 2 1 1 1 1 3 1 1 1 3 1 2 3 1 1 1 3 1 3 1 3 1
## [2143] 1 3 1 1 3 1 2 1 1 1 3 1 1 1 1 3 3 2 1 1 3 1 3 1 2 1 1 1 1 2 1 1 2 1
## [2177] 1 3 1 2 1 2 3 1 1 1 1 1 3 2 2 2 1 1 1 1 2 1 2 1 2 1 1 3 3 3 2 1 3 1
## [2211] 2 1 2 1 1 2 1 3 3 1 3 1 1 1 1 3 2 1 1 1 1 1 3 1 1 3 1 1 1 1 2 3 1 1
## [2245] 1 1 2 1 2 2 1 3 1 1 1 1 1 3 2 3 1 3 1 2 1 3 2 2 1 1 1 2 1 1 1 1 3 1
## [2279] 2 1 1 1 1 1 1 1 3 1 2 1 1 3 1 1 1 1 3 1 2 1 1 3 2 1 1 1 2 1 1 3 3 3
## [2313] 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 3 3 1 3 1 3 2 1 3 2 1 2 2 2 3
## [2347] 1 1 2 2 1 1 2 1 1 3 1 1 2 3 3 2 3 3 1 1 2 1 1 1 1 1 1 3 2 1 1 1 1 1
## [2381] 1 1 3 2 3 2 1 2 1 2 1 3 2 1 1 3 1 2 3 2 3 1 1 2 1 1 3 1 2 1 1 3 1 2
## [2415] 1 1 3 1 1 1 3 3 2 2 1 1 1 2 1 1 1 3 2 1 1 1 1 2 1 2 3 2 2 3 3 1 2 1
## [2449] 2 1 1 1 1 3 2 1 1 3 1 1 1 1 1 2 3 2 1 2 3 2 2 3 1 3 2 2 1 3 1 1 1 1
## [2483] 1 1 1 2 1 2 2 2 2 1 1 1 1 2 1 1 3 3 3 1 1 3 1 1 3 1 1 1 1 1 1 2 1 3
## [2517] 2 1 1 2 1 1 3 3 1 1 1 3 1 2 1 2 1 1 3 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1
## [2551] 3 3 1 1 2 2 1 1 1 1 3 2 1 1 1 1 1 3 3 1 1 2 1 3 2 1 2 1 2 2 1 2 1 1
## [2585] 1 2 3 1 1 1 1 3 1 2 2 1 2 1 2 2 1 1 1 1 1 2 2 2 2 1 1 1 2 3 1 1 1 1
## [2619] 1 2 1 1 1 1 1 2 1 1 1 3 1 2 2 2 1 3 1 1 1 2 1 3 2 2 1 2 1 1 1 2 1 1
## [2653] 2 1 1 1 3 2 1 1 2 1 3 1 3 1 1 1 1 2 1 3 2 2 1 1 2 2 2 1 2 2 2 1 2 2
## [2687] 1 3 3 1 2 1 2 2 1 2 2 3 3 2 1 1 1 1 1 1 2 1 1 3 1 1 3 1 1 1 1 3 1 1
## [2721] 1 1 1 3 1 1 2 2 3 2 2 1 1 1 1 1 1 2 1 1 1 2 2 3 1 3 2 1 1 1 3 1 1 1
## [2755] 1 2 2 3 1 3 3 3 2 2 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 2 2 1 1 3 3 1 1 1
## [2789] 1 1 1 1 2 3 1 1 2 1 1 1 3 3 1 3 1 1 3 1 1 1 1 3 1 1 1 2 3 1 1 1 2 1
## [2823] 3 3 1 1 1 1 2 2 2 2 1 3 3 1 2 3 1 2 2 1 1 3 1 1 2 3 1 1 1 1 2 1 1 1
## [2857] 1 1 1 2 1 1 2 1 1 2 1 2 3 2 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 2 2 2
## [2891] 1 1 1 1 2 1 1 3 2 1 1 1 1 3 2 1 1 1 1 1 1 1 1 3 1 2 1 1 2 1 1 1 2 2
## [2925] 1 1 2 3 2 1 1 3 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 3 1 1 2 1 1 1
## [2959] 1 1 2 3 2 1 2 2 2 1 1 1 3 1 1 2 1 1 3 1 3 2 1 1 1 1 1 3 1 1 3 2 1 1
## [2993] 2 1 1 2 1 1 1 1 1 2 1 1 1 1 3 3 1 1 3 1 1 3 2 1 1 1 1 1 3 1 3 3 2 1
## [3027] 1 2 1 1 3 1 2 1 1 2 2 1 2 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 3 1 1 1 3
## [3061] 1 1 2 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 1 1 2 1 1 1 3
## [3095] 1 2 2 1 1 2 1 1 2 2 1 3 2 2 1 2 1 3 1 1 1 2 1 1 3 1 2 1 1 3 1 2 1 2
## [3129] 1 1 1 1 1 1 1 2 1 1 1 2 3 2 1 1 1 3 3 1 1 1 1 3 2 2 1 2 3 1 2 2 3 1
## [3163] 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 1 2 1 3 3 1 1 1 3 1 1 1 1 2 1 2 1 1 3
## [3197] 1 2 2 1 3 1 1 3 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 2 2 1 1 3 1 1 1 3 1
## [3231] 1 1 2 1 3 1 1 1 3 1 3 3 1 2 1 1 1 1 1 1 1 3 1 2 1 1 1 1 1 2 3 2 2 1
## [3265] 2 2 1 3 1 2 3 2 1 2 2 3 1 2 1 1 2 1 3 2 1 1 1 3 1 1 1 2 3 1 1 1 1 1
## [3299] 2 1 2 1 2 1 3 1 1 1 2 1 1 1 1 1 3 3 1 1 1 1 3 2 1 1 3 3 1 1 1 1 1 2
## [3333] 1 1 1 3 3 1 1 2 2 1 1 2 1 1 2 3 2 1 3 2 3 1 1 1 3 1 3 2 2 1 1 1 2 1
## [3367] 2 1 1 1 1 1 1 2 1 1 1 1 1 3 1 1 3 1 1 1 1 3 3 1 2 1 1 2 1 1 1 2 3 1
## [3401] 2 1 1 1 3 1 1 1 3 2 1 3 2 2 2 1 1 1 1 1 2 1 2 3 1 1 1 3 1 1 1 1 1 1
## [3435] 1 3 1 1 1 1 2 1 1 1 1 1 3 1 2 1 1 1 3 1 3 1 1 1 2 1 3 2 1 1 1 1 1 3
## [3469] 1 1 2 1 1 1 1 3 3 1 1 3 1 2 1 2 2 3 1 1 1 1 2 3 3 1 1 3 2 2 1 3 2 2
## [3503] 1 3 1 1 1 2 1 1 3 1 1 3 2 1 2 1 1 2 3 3 3 1 1 1 1 3 2 2 3 3 2 2 1 2
## [3537] 2 3 1 2 3 1 1 1 3 2 1 2 2 1 1 1 2 3 1 2 2 1 1 1 2 1 2 1 1 1 1 1 1 3
## [3571] 2 2 1 1 2 1 3 1 1 1 1 1 1 1 3 3 1 1 2 1 1 2 1 2 1 1 1 3 1 1 1 3 1 1
## [3605] 2 1 1 3 1 3 2 1 1 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 3 1 3 2 1 3 3
## [3639] 3 1 1 2 1 3 2 2 1 2 1 2 3 1 3 1 1 1 2 1 1 3 1 1 1 3 2 1 2 1 1 3 1 3
## [3673] 1 1 1 1 3 2 1 1 3 2 3 1 1 1 1 3 3 1 1 3 3 1 1 3 2 1 3 1 2 1 3 1 1 3
## [3707] 1 3 3 2 1 3 2 1 1 1 2 1 1 3 2 2 2 1 1 3 1 1 1 3 3 1 1 3 3 2 2 2 3 1
## [3741] 1 1 3 1 3 1 1 1 1 2 1 2 1 2 1 1 1 3 2 1 2 1 2 1 1 1 2 2 3 1 3 2 1 1
## [3775] 1 2 2 1 3 1 1 2 1 3 1 3 1 1 3 1 2 3 1 1 3 3 3 1 1 1 3 1 2 3 2 1 2 3
## [3809] 1 1 1 2 1 1 2 1 1 1 2 1 1 2 2 3 1 2 1 1 3 1 3 1 1 1 1 1 3 2 1 1 2 1
## [3843] 1 1 1 1 3 1 2 1 1 3 1 1 1 1 1 3 2 3 2 1 3 1 1 1 1 1 3 1 1 2 1 1 2 3
## [3877] 2 1 3 1 1 1 1 3 1 3 3 3 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 3 2 1 2 1
## [3911] 3 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 3 2 1 3 1 1 1 1 3 1 3 1 1 1 1
## [3945] 3 2 1 1 3 1 1 1 1 1 3 3 3 1 1 1 1 2 1 1 1 1 1 1 1 3 1 3 1 1 1 2 1 1
## [3979] 1 1 1 1 1 1 2 1 1 1 3 2 1 1 1 1 1 1 2 3 1 2 1 1 3 1 3 2 3 3 2 1 3 1
## [4013] 3 3 3 1 2 2 2 3 1 1 2 1 1 3 1 1 2 1 3 1 1 3 1 1 1 1 1 1 2 2 1 2 1 1
## [4047] 2 1 1 1 1 3 1 1 1 1 2 1 1 1 1 2 2 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 2
## [4081] 1 3 1 1 2 1 3 1 1 2 1 2 3 1 1 2 1 2 1 1 2 1 3 2 1 3 1 1 1 3 1 1 3 1
## [4115] 1 1 1 1 1 3 1 1 3 2 3 2 3 1 1 2 1 1 2 3 1 3 1 2 3 2 3 1 1 2 2 1 3 1
## [4149] 2 1 3 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 2 2 1 1
## [4183] 1 1 3 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 3 1 3 1 1 1 2 1 3 3 1 2
## [4217] 3 1 1 2 3 1 1 2 1 1 1 1 2 3 2 1 3 1 1 3 2 1 1 2 3 2 1 1 1 1 1 1 1 2
## [4251] 1 1 2 1 1 1 2 1 1 1 3 1 1 2 3 1 3 1 1 1 1 1 3 2 2 1 3 1 1 1 1 1 3 3
## [4285] 3 3 1 3 1 1 1 3 1 1 1 2 3 3 1 3 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2 2
## [4319] 1 1 1 1 2 1 1 1 3 1 1 1 2 1 1 3 1 1 1 1 1 3 2 1 1 1 1 1 1 1 2 2 2 2
## [4353] 1 1 1 1 3 1 1 2 3 1 1 2 3 1 1 1 2 1 1 3 1 1 1 1 3 1 3 1 2 1 3 1 1 1
## [4387] 1 3 1 3 1 1 1 1 2 1 2 3 1 1 1 1 1 3 1 1 1 1 1 3 1 2 1 2 1 3 1 1 1 2
## [4421] 1 1 3 1 1 1 2 3 3 1 1 1 1 1 1 1 1 2 1 1 3 1 2 1 2 1 3 1 1 1 1 1 1 1
## [4455] 2 1 3 3 1 1 1 2 3 1 3 1 3 1 1 1 1 1 1 1 2 1 3 2 1 2 3 2 1 1 1 2 1 1
## [4489] 1 2 2 1 1 2 1 1 1 1 1 1 1 3 1 1 1 3 2 1 3 2 1 1 1 1 2 3 1 3 3 1 1 1
## [4523] 1 1 2 1 1 1 1 1 1 3 3 1 1 3 1 1 1 1 3 1 3 1 1 1 1 1 1 1 2 1 3 3 1 1
## [4557] 2 3 2 2 1 1 2 1 2 2 2 1 1 2 2 1 1 1 1 3 1 1 1 2 1 2 2 1 3 2 3 1 3 1
## [4591] 1 1 1 1 1 2 1 2 2 1 1 1 2 1 1 1 1 1 1 2 3 1 1 1 3 1 1 2 3 2 1 1 2 1
## [4625] 1 1 2 1 1 1 3 1 2 1 1 1 1 1 3 3 2 1 2 3 1 3 2 1 1 1 1 2 1 1 1 1 2 3
## [4659] 2 1 2 3 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1
## [4693] 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 3 2 3 2 1 3 1 2 2 1 2 1 1 2 2 2 1 2 1
## [4727] 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 2 2 2 1 2 1 3 1 1 1 2 2 1 3 1 3 3
## [4761] 3 1 1 3 1 1 1 1 3 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1
## [4795] 1 1 2 3 1 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 2 1 1 3 2
## [4829] 3 1 3 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 1 3 1 2 1 1 1 1 2 1 1 1 1 1
## [4863] 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2
## [4897] 1 1 2 1 3 3 1 1 3 3 3 2 1 1 1 3 1 3 1 1 1 3 1 3 3 3 1 1 1 1 1 1 1 1
## [4931] 1 1 1 1 2 1 1 1 1 1 1 1 2 2 1 2 1 1 2 1 3 1 2 2 1 1 1 1 1 3 2 1 2 1
## [4965] 1 3 2 1 1 1 1 1 2 1 1 1 1 2 1 3 1 3 2 1 3 1 1 1 1 2 1 2 1 1 1 2 1 1
## [4999] 3 1 1 3 1 1 2 3 1 1 1 1 3 2 1 1 1 1 3 1 3 1 1 1 3 3 1 1 1 3 3 1 1 2
## [5033] 2 2 1 2 1 1 2 3 1 1 2 1 3 2 3 1 3 1 1 3 2 2 3 1 1 1 1 2 1 2 1 1 1 3
## [5067] 1 1 2 2 1 1 1 1 1 1 1 1 3 2 1 1 1 2 1 1 3 1 3 3 1 3 1 1 3 1 1 2 1 1
## [5101] 1 2 1 1 1 1 2 1 1 1 1 3 1 1 1 1 2 1 1 1 3 1 1 2 1 1 1 1 3 3 1 1 2 1
## [5135] 2 3 3 3 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 3 1 1 1 1 1 1 2 1 1 2 1
## [5169] 3 1 1 2 1 1 1 3 1 2 2 1 1 2 1 1 1 1 1 1 1 3 1 1 1 2 3 1 1 3 2 1 1 2
## [5203] 1 1 2 1 1 1 3 1 1 1 3 1 2 1 1 3 3 1 2 3 1 1 1 2 3 1 1 3 1 1 1 1 1 1
## [5237] 1 1 1 1 3 1 1 2 1 1 1 1 1 3 1 3 1 1 1 2 1 1 1 1 3 1 1 1 3 1 2 3 3 1
## [5271] 1 2 1 3 2 3 3 1 1 1 3 2 2 1 1 1 3 2 1 1 1 1 1 1 1 2 2 1 3 1 1 1 1 1
## [5305] 1 1 1 1 1 1 3 2 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 3 1 1 2 1 1 1 3 1 1 1
## [5339] 3 1 3 2 1 1 3 2 1 3 1 2 1 3 2 1 1 1 2 1 2 1 2 1 1 2 1 1 2 1 1 1 2 2
## [5373] 1 1 2 1 1 3 1 1 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 2 1 3 1 1 3 3 1 1 3 1
## [5407] 1 1 2 3 2 1 1 1 1 1 1 1 3 1 1 3 2 2 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1
## [5441] 2 1 1 1 3 2 2 3 1 1 3 1 2 3 3 1 1 1 1 1 2 2 1 1 1 1 1 1 2 1 1 1 3 1
## [5475] 1 3 3 1 1 1 1 2 1 2 1 1 2 1 2 2 1 1 1 2 1 3 1 2 1 2 3 1 1 1 2 1 1 2
## [5509] 1 1 2 2 1 1 3 3 3 1 1 1 2 1 2 3 3 3 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1
## [5543] 1 2 3 3 1 3 2 3 1 1 1 2 1 1 1 2 1 1 1 1 1 3 1 1 2 3 2 2 1 1 2 3 1 3
## [5577] 1 1 1 1 2 2 1 1 1 2 3 2 1 1 1 1 1 1 1 1 2 3 2 3 1 1 2 1 1 1 1 2 1 1
## [5611] 2 1 1 1 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 2 1 1 1 1 1
## [5645] 1 3 2 1 1 3 1 1 1 1 1 1 1 3 1 1 1 2 2 1 1 2 2 2 1 1 1 1 1 3 3 1 1 1
## [5679] 1 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 3 3 1 1 1 1 2 1 2 1 2 2 1 1 1 1 1 2
## [5713] 1 3 1 1 2 2 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 1 2 2 3 1 3 1
## [5747] 1 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 2
## [5781] 2 1 1 1 1 1 2 1 1 2 1 1 1 1 3 1 2 2 1 1 1 2 1 1 3 3 1 3 1 1 1 1 3 1
## [5815] 2 3 1 2 1 1 1 1 1 2 1 1 3 1 1 1 2 3 3 1 1 2 3 1 2 3 2 1 2 1 1 1 1 3
## [5849] 1 1 2 2 3 1 1 1 2 1 1 1 3 3 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3
## [5883] 1 3 1 1 1 2 1 1 3 1 1 2 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 1 1
## [5917] 1 1 1 2 1 1 1 1 1 1 2 1 1 3 1 1 1 2 2 3 3 3 1 1 1 1 2 1 2 3 1 1 2 1
## [5951] 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 2 1 2 1 3 1 1 1 1 2 1 1 1 3 2 1
## [5985] 1 1 1 1 2 2 2 1 1 2 2 1 1 3 1 2 3 1 1 1 2 1 1 1 1 3 1 1 2 1 1 2 1 1
## [6019] 1 1 1 3 1 1 1 1 1 3 3 1 1 1 3 3 3 1 1 2 3 1 3 3 1 3 1 2 2 2 1 3 3 1
## [6053] 1 2 1 3 1 1 1 1 3 1 1 3 1 3 1 2 1 1 1 1 1 2 2 1 3 1 2 2 1 1 1 1 3 1
## [6087] 1 1 1 1 3 3 1 2 1 1 3 1 1 1 1 1 2 1 1 1 2 1 1 3 1 1 2 3 1 1 2 2 1 2
## [6121] 2 3 1 2 2 1 2 1 3 2 1 3 2 1 3 1 1 1 2 1 2 3 2 2 2 1 2 1 1 1 1 3 1 3
## [6155] 1 3 1 1 2 1 1 2 1 1 1 2 2 2 1 1 1 1 2 1 3 1 3 1 1 1 1 2 1 1 3 1 1 1
## [6189] 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 1 2 3 1 2 1 1 2 2 1 1 2 3 1 1 2 1 1
## [6223] 1 2 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 3 1 2 1 1 1 2
## [6257] 1 1 3 2 1 1 1 1 3 1 2 1 1 1 1 1 2 3 1 3 1 1 1 3 1 1 2 1 3 2 1 3 2 1
## [6291] 3 1 3 2 2 1 3 3 1 1 1 1 1 1 3 3 1 1 2 1 1 1 1 1 1 1 1 1 2 3 3 1 1 3
## [6325] 2 1 2 1 1 3 2 3 1 2 3 1 3 1 1 1 1 2 1 1 2 1 2 1 1 3 1 2 3 1 1 2 1 1
## [6359] 1 1 2 2 1 1 2 1 1 1 2 1 1 2 1 1 1 1 1 1 2 1 3 1 2 1 1 1 1 1 1 1 3 1
## [6393] 1 2 1 1 1 2 1 1 1 1 1 1 3 2 2 1 2 1 1 2 2 1 1 1 1 2 1 2 1 2 2 3 3 1
## [6427] 3 1 3 3 2 3 1 2 1 3 1 1 3 1 3 3 1 3 2 2 1 1 1 1 1 2 1 3 1 2 1 1 1 3
## [6461] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 3 1 1 1 2 2 1 1 2 1 2 1 1 1
## [6495] 2 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3 1 1 1 1 2 1 1 1 1 1 1
## [6529] 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 2 1 1 3 1
## [6563] 1 1 2 1 2 1 2 1 1 1 1 1 1 1 2 3 1 1 1 3 1 1 1 1 2 3 1 2 1 1 1 2 1 1
## [6597] 2 1 2 1 2 2 1 3 1 1 1 1 1 1 1 1 2 1 1 1 3 1 1 1 1 1 2 3 3 1 3 3 1 3
## [6631] 1 1 3 2 1 3 1 1 1 1 3 1 1 1 1 1 1 3 1 1 3 1 1 2 2 1 1 1 1 1 1 1 1 1
## [6665] 1 1 3 2 1 1 3 1 1 1 1 1 2 1 3 2 1 2 2 2 1 1 3 1 1 3 1 1 1 3 1 1 2 1
## [6699] 1 1 1 2 2 1 3 1 1 2 1 3 1 1 1 3 1 1 2 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1
## [6733] 1 1 1 1 1 1 1 1 1 1 2 2 1 3 3 3 1 3 3 1 1 1 1 1 2 1 2 2 1 1 1 1 1 1
## [6767] 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 2 1 3 1 1 1 2 1 1 1 1 1 3 1
## [6801] 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 2 3 1 2 1 1 1 1
## [6835] 1 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 3 2 1 1 3 1 2 2 1 1 1
## [6869] 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 3 1 1 2 1 2 1
## [6903] 1 1 1 2 3 1 1 2 2 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 1 1
## [6937] 1 1 1 1 1 2 1 2 3 1 1 1 1 2 1 2 1 1 1 1 2 1 1 2 1 1 1 2 2 2 1 1 1 2
## [6971] 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 3 1 3 1 1 2 1 2 1 1 1 1 1 1
## [7005] 1 1 1 1 3 1 1 2 1 3 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 3 1 1 1 1 1 1 1
## [7039] 1 1 2 2 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
## [7073] 1 2 1 1 1 2 3 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1
## [7107] 1 1 1 1 1 3 1 1 1 1 1 1 3 1 1 1 3 1 3 1 1 3 1 1 1 1 2 2 1 1 1 1 1 1
## [7141] 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 2 2 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1
## [7175] 1 3 1 1 2 1 1 2 3 1 3 1 1 1 1 1 1 1 2 2 1 2 1 1 2 2 2 1 2 1 2 1 1 1
## [7209] 1 1 2 2 2 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1 2 2 1 2 1 1 2 1 1
## [7243] 1 1 1 2 1 1 1 1 1 1 3 1 2 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 3 1 2 2 1 1
## [7277] 2 3 1 2 1 1 1 2 1 1 3 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1
## [7311] 1 1 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 1 1 1 3 1 1 2 1 2 3 3 2 2 1 1 3 1
## [7345] 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 3 1 2 1 1 1 2 1 1 1 1 3 3 1 2 1 1
## [7379] 3 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1
## [7413] 2 1 1 1 1 3 1 3 2 1 1 1 1 1 2 2 2 3 1 1 2 2 1 1 1 1 1 1 1 1 1 2 1 1
## [7447] 3 1 1 1 1 2 1 1 1 3 1 1 3 1 1 2 1 1 1 1 3 2 1 1 1 1 1 1 2 1 1 1 1 1
## [7481] 1 1 1 2 3 1 1 2 1 1 3 3 1 1 1 1 1 1 3 1 1 1 1 2 1 1 1 1 3 1 1 3 1 3
## [7515] 3 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1
## [7549] 1 1 2 2 1 3 1 2 1 1 1 1 2 1 1 1 1 3 1 1 3 3 1 1 1 1 1 1 1 1 2 1 1 1
## [7583] 1 2 1 1 3 3 1 2 1 2 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 3 1 2 1 3 1 1
## [7617] 1 1 2 1 2 2 1 1 1 1 1 1 1 3 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2
## [7651] 1 3 1 1 3 3 2 1 1 1 1 3 1 1 1 1 1 2 2 1 1 1 2 1 1 1 3 2 1 1 3 2 1 2
## [7685] 2 2 1 1 2 1 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 1 2 1 1 1 1 1 1 1
## [7719] 1 1 1 1 1 2 1 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 1 1 3 1 3 1 2 1 1 1 1 2
## [7753] 1 1 1 1 1 1 1 3 1 3 1 1 2 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 2
## [7787] 3 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 2 1 1 2 3 1 2 1 3 2 1 3 1 3 3
## [7821] 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 2 1 2 2 2 2 1 3 1 1 1 3 1 1 1 1 3 1
## [7855] 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
## [7889] 1 1 1 1 1 1 2 1 3 2 1 3 1 2 3 1 2 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3
## [7923] 1 3 2 1 1 1 1 1 3 1 3 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [7957] 1 2 1 1 2 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1
## [7991] 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1
## [8025] 1 1 3 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1
## [8059] 1 1 1 1 1 1 2 3 1 1 1 1 1 1 2 3 3 2 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1
## [8093] 1 1 3 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 2 2 1 2 1 1 1
## [8127] 1 2 3 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 2 1 2 2 1 1 2 1 1 1 1 1
## [8161] 2 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 1 2
## [8195] 1 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 3 1 2 1 1 1 1 1 2 1 1 1 2
## [8229] 1 1 1 1 1 1 1 1 1 2 1 3 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 3 1 2 3 1 1
## [8263] 1 1 1 2 2 1 2 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 1 1 3 1 1 1
## [8297] 2 1 2 1 1 2 2 1 1 1 1 2 2 2 1 1 1 3 1 2 1 1 2 1 1 1 1 1 1 3 1 1 1 1
## [8331] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## [8365] 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1
## [8399] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
## [8433] 1 1 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 2 3 1 1 2 1 1 1 1 1 1 1 3
## [8467] 1 1 1 3 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 3 1 1 1 1
## [8501] 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1
## [8535] 1 1 1 1 1 1 1 3 1 1 1 2 1 1 1 2 1 1 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 2
## [8569] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1
## [8603] 1 1 1 1 1 1 1 2 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
## [8637] 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1
## [8671] 1 1 2 1 1 1 1 1 2 1 1 1 1 1 2 1 1 1 2 3 1 1 1 1 2 1 1 1 3 1 1 1 1 1
## [8705] 1 2 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 1 1 1 2 3
## [8739] 1 1 2 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [8773] 1 1 1 1 1 2 1 1 2 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 2 1
## [8807] 1 1 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 2 2
## [8841] 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 3 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1
## [8875] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
## [8909] 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## [8943] 1 1 1 1 1 1 1 1
## 
## Within cluster sum of squares by cluster:
## [1] 50873.85 24147.69 27768.07
##  (between_SS / total_SS =  32.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
data <- creditCardData
data$cluster <- k$cluster
head(data)
##      BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1   40.90075          0.818182     95.40             0.00
## 2 3202.46742          0.909091      0.00             0.00
## 3 2495.14886          1.000000    773.17           773.17
## 4 1666.67054          0.636364   1499.00          1499.00
## 5  817.71434          1.000000     16.00            16.00
## 6 1809.82875          1.000000   1333.28             0.00
##   INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1                  95.40        0.000            0.166667
## 2                   0.00     6442.945            0.000000
## 3                   0.00        0.000            1.000000
## 4                   0.00      205.788            0.083333
## 5                   0.00        0.000            0.083333
## 6                1333.28        0.000            0.666667
##   ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1                   0.000000                         0.083333
## 2                   0.000000                         0.000000
## 3                   1.000000                         0.000000
## 4                   0.083333                         0.000000
## 5                   0.083333                         0.000000
## 6                   0.000000                         0.583333
##   CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1               0.000000                0             2         1000
## 2               0.250000                4             0         7000
## 3               0.000000                0            12         7500
## 4               0.083333                1             1         7500
## 5               0.000000                0             1         1200
## 6               0.000000                0             8         1800
##    PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE cluster
## 1  201.8021         139.5098         0.000000     12       1
## 2 4103.0326        1072.3402         0.222222     12       2
## 3  622.0667         627.2848         0.000000     12       1
## 4    0.0000         312.3439         0.000000     12       1
## 5  678.3348         244.7912         0.000000     12       1
## 6 1400.0578        2407.2460         0.000000     12       1

From aove code, we got the summary of the clusters and its important values.

boxplot(data$BALANCE~data$cluster)


From here, we can see that in cluster 2, we have people whose balance is more on the lower side. In 3, we have more people having balance on higher side. In 1, we have people having balance in the middle range.

boxplot(data$PURCHASES~data$cluster)


In cluster 1, we have purchases amount in the higher range. In cluster 2, it’s more in the middleside and in 3, it’s on the lower side.

boxplot(data$PURCHASES_FREQUENCY~data$cluster)


In the same way, you can check the distribution of all variables on the basis of the cluster they are assigned to.

So guys, with this I conclude this post. Please stay tuned for more such interesting case-study.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s