• +91-9872993883
• +91-8283824812
• info@ris-ai.com

# Customer Behaviour Analysis using Python ¶

Project Objective: If any company or show-room manager have to find out whether their old or new customer want to buy there product (let say car here) how can they find out that. This can be achieved if they have data (Social_Network_Ads.csv) about customer salary age and other factor field(independent variables) to find out if customer will purchased(dependent variable) the car or not.

Here we use Logistic Regression to find out the prediction. We use Linear Regression , most commonly used when the data in question has binary output, so when it belongs to one class or another, or is either a 0 or 1. If we come to know in advance that this customer have chances to buy the product then marketing team can target them to achieve their salaes.

## Importing the libraries ¶

Firstly, we import necessary library(numpy, matplotlib and pandas) for this model.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


#### Importing the dataset ¶

Now we read CSV file Social_Network_Ads.csv It contains data of 400 rows and 3 columns - "Age", "EstimatedSalary", "Purchased" The first 2 columns shows customer Age and Estimated Salary. And the last column i.e. Purchased have Binary Value 0 and 1 if the value is 0 then customer estimated to not buy the product and if value is 1 then customer is estimated to buy the product.

In [2]:
dataset = pd.read_csv('Social_Network_Ads.csv')
print(dataset)
dataset.shape
X = dataset.iloc[:, :-1].values  #which simply means take all rows and all columns except last one
y = dataset.iloc[:, -1].values   #which simply means take all rows and only columns with last column

     Age  EstimatedSalary  Purchased
0     19            19000          0
1     35            20000          0
2     26            43000          0
3     27            57000          0
4     19            76000          0
5     27            58000          0
6     27            84000          0
7     32           150000          1
8     25            33000          0
9     35            65000          0
10    26            80000          0
11    26            52000          0
12    20            86000          0
13    32            18000          0
14    18            82000          0
15    29            80000          0
16    47            25000          1
17    45            26000          1
18    46            28000          1
19    48            29000          1
20    45            22000          1
21    47            49000          1
22    48            41000          1
23    45            22000          1
24    46            23000          1
25    47            20000          1
26    49            28000          1
27    47            30000          1
28    29            43000          0
29    31            18000          0
..   ...              ...        ...
370   60            46000          1
371   60            83000          1
372   39            73000          0
373   59           130000          1
374   37            80000          0
375   46            32000          1
376   46            74000          0
377   42            53000          0
378   41            87000          1
379   58            23000          1
380   42            64000          0
381   48            33000          1
382   44           139000          1
383   49            28000          1
384   57            33000          1
385   56            60000          1
386   49            39000          1
387   39            71000          0
388   47            34000          1
389   48            35000          1
390   48            33000          1
391   47            23000          1
392   45            45000          1
393   60            42000          1
394   39            59000          0
395   46            41000          1
396   51            23000          1
397   50            20000          1
398   36            33000          0
399   49            36000          1

[400 rows x 3 columns]


#### Splitting the dataset into the Training set and Test set ¶

Next we have to split the dataset into training and testing. We will use the training dataset for training the model and then check the performance of the model on the test dataset.

For this we will use the train_test_split method from library model_selection We are providing a test_size of 0.25 (for better performance because here dataset contain large amount of data) which means test set will contain 100 observations and training set will contain 300 observations. The random_state=0 is required only if you want to compare your results with mine.

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

In [4]:
print(X_train)  # X_train dataset containg 300 observation for training

[[    44  39000]
[    32 120000]
[    38  50000]
[    32 135000]
[    52  21000]
[    53 104000]
[    39  42000]
[    38  61000]
[    36  50000]
[    36  63000]
[    35  25000]
[    35  50000]
[    42  73000]
[    47  49000]
[    59  29000]
[    49  65000]
[    45 131000]
[    31  89000]
[    46  82000]
[    47  51000]
[    26  15000]
[    60 102000]
[    38 112000]
[    40 107000]
[    42  53000]
[    35  59000]
[    48  41000]
[    48 134000]
[    38 113000]
[    29 148000]
[    26  15000]
[    60  42000]
[    24  19000]
[    42 149000]
[    46  96000]
[    28  59000]
[    39  96000]
[    28  89000]
[    41  72000]
[    45  26000]
[    33  69000]
[    20  82000]
[    31  74000]
[    42  80000]
[    35  72000]
[    33 149000]
[    40  71000]
[    51 146000]
[    46  79000]
[    35  75000]
[    38  51000]
[    36  75000]
[    37  78000]
[    38  61000]
[    60 108000]
[    20  82000]
[    57  74000]
[    42  65000]
[    26  80000]
[    46 117000]
[    35  61000]
[    21  68000]
[    28  44000]
[    41  87000]
[    37  33000]
[    27  90000]
[    39  42000]
[    28 123000]
[    31 118000]
[    25  87000]
[    35  71000]
[    37  70000]
[    35  39000]
[    47  23000]
[    35 147000]
[    48 138000]
[    26  86000]
[    25  79000]
[    52 138000]
[    51  23000]
[    35  60000]
[    33 113000]
[    30 107000]
[    48  33000]
[    41  80000]
[    48  96000]
[    31  18000]
[    31  71000]
[    43 129000]
[    59  76000]
[    18  44000]
[    36 118000]
[    42  90000]
[    47  30000]
[    26  43000]
[    40  78000]
[    46  59000]
[    59  42000]
[    46  74000]
[    35  91000]
[    28  59000]
[    40  57000]
[    59 143000]
[    57  26000]
[    52  38000]
[    47 113000]
[    53 143000]
[    35  27000]
[    58 101000]
[    45  45000]
[    23  82000]
[    46  23000]
[    42  65000]
[    28  84000]
[    38  59000]
[    26  84000]
[    29  28000]
[    37  71000]
[    22  55000]
[    48  35000]
[    49  28000]
[    38  65000]
[    27  17000]
[    46  28000]
[    48 141000]
[    26  17000]
[    35  97000]
[    39  59000]
[    24  27000]
[    32  18000]
[    46  88000]
[    35  58000]
[    56  60000]
[    47  34000]
[    40  72000]
[    32 100000]
[    19  21000]
[    25  90000]
[    35  88000]
[    28  32000]
[    50  20000]
[    40  59000]
[    50  44000]
[    35  72000]
[    40 142000]
[    46  32000]
[    39  71000]
[    20  74000]
[    29  75000]
[    31  76000]
[    47  25000]
[    40  61000]
[    34 112000]
[    38  80000]
[    42  75000]
[    47  47000]
[    39  75000]
[    19  25000]
[    37  80000]
[    36  60000]
[    41  52000]
[    36 125000]
[    48  29000]
[    36 126000]
[    51 134000]
[    27  57000]
[    38  71000]
[    39  61000]
[    22  27000]
[    33  60000]
[    48  74000]
[    58  23000]
[    53  72000]
[    32 117000]
[    54  70000]
[    30  80000]
[    58  95000]
[    26  52000]
[    45  79000]
[    24  55000]
[    40  75000]
[    33  28000]
[    44 139000]
[    22  18000]
[    33  51000]
[    43 133000]
[    24  32000]
[    46  22000]
[    35  55000]
[    54 104000]
[    48 119000]
[    35  53000]
[    37 144000]
[    23  66000]
[    37 137000]
[    31  58000]
[    33  41000]
[    45  22000]
[    30  15000]
[    19  19000]
[    49  74000]
[    39 122000]
[    35  73000]
[    39  71000]
[    24  23000]
[    41  72000]
[    29  83000]
[    54  26000]
[    35  44000]
[    37  75000]
[    29  47000]
[    31  68000]
[    42  54000]
[    30 135000]
[    52 114000]
[    50  36000]
[    56 133000]
[    29  61000]
[    30  89000]
[    26  16000]
[    33  31000]
[    41  72000]
[    36  33000]
[    55 125000]
[    48 131000]
[    41  71000]
[    30  62000]
[    37  72000]
[    41  63000]
[    58  47000]
[    30 116000]
[    20  49000]
[    37  74000]
[    41  59000]
[    49  89000]
[    28  79000]
[    53  82000]
[    40  57000]
[    60  34000]
[    35 108000]
[    21  72000]
[    38  71000]
[    39 106000]
[    37  57000]
[    26  72000]
[    35  23000]
[    54 108000]
[    30  17000]
[    39 134000]
[    29  43000]
[    33  43000]
[    35  38000]
[    41  45000]
[    41  72000]
[    39 134000]
[    27 137000]
[    21  16000]
[    26  32000]
[    31  66000]
[    39  73000]
[    41  79000]
[    47  50000]
[    41  30000]
[    37  93000]
[    60  46000]
[    25  22000]
[    28  37000]
[    38  55000]
[    36  54000]
[    20  36000]
[    56 104000]
[    40  57000]
[    42 108000]
[    20  23000]
[    40  65000]
[    47  20000]
[    18  86000]
[    35  79000]
[    57  33000]
[    34  72000]
[    49  39000]
[    27  31000]
[    19  70000]
[    39  79000]
[    26  81000]
[    25  80000]
[    28  85000]
[    55  39000]
[    50  88000]
[    49  88000]
[    52 150000]
[    35  65000]
[    42  54000]
[    34  43000]
[    37  52000]
[    48  30000]
[    29  43000]
[    36  52000]
[    27  54000]
[    26 118000]]

In [5]:
print(y_train)   # y_train dataset containg 300 observation for training

[0 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 1
0 0 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 0 1
1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 1 1 1 1 0 1 1 0
1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0
0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 1 0 0
0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0
0 0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1
0 0 0 0]

In [6]:
print(X_test)  # X_test dataset containg 100 observation for testing

[[    30  87000]
[    38  50000]
[    35  75000]
[    30  79000]
[    35  50000]
[    27  20000]
[    31  15000]
[    36 144000]
[    18  68000]
[    47  43000]
[    30  49000]
[    28  55000]
[    37  55000]
[    39  77000]
[    20  86000]
[    32 117000]
[    37  77000]
[    19  85000]
[    55 130000]
[    35  22000]
[    35  47000]
[    47 144000]
[    41  51000]
[    47 105000]
[    23  28000]
[    49 141000]
[    28  87000]
[    29  80000]
[    37  62000]
[    32  86000]
[    21  88000]
[    37  79000]
[    57  60000]
[    37  53000]
[    24  58000]
[    18  52000]
[    22  81000]
[    34  43000]
[    31  34000]
[    49  36000]
[    27  88000]
[    41  52000]
[    27  84000]
[    35  20000]
[    43 112000]
[    27  58000]
[    37  80000]
[    52  90000]
[    26  30000]
[    49  86000]
[    57 122000]
[    34  25000]
[    35  57000]
[    34 115000]
[    59  88000]
[    45  32000]
[    29  83000]
[    26  80000]
[    49  28000]
[    23  20000]
[    32  18000]
[    60  42000]
[    19  76000]
[    36  99000]
[    19  26000]
[    60  83000]
[    24  89000]
[    27  58000]
[    40  47000]
[    42  70000]
[    32 150000]
[    35  77000]
[    22  63000]
[    45  22000]
[    27  89000]
[    18  82000]
[    42  79000]
[    40  60000]
[    53  34000]
[    47 107000]
[    58 144000]
[    59  83000]
[    24  55000]
[    26  35000]
[    58  38000]
[    42  80000]
[    40  75000]
[    59 130000]
[    46  41000]
[    41  60000]
[    42  64000]
[    37 146000]
[    23  48000]
[    25  33000]
[    24  84000]
[    27  96000]
[    23  63000]
[    48  33000]
[    48  90000]
[    42 104000]]

In [7]:
print(y_test)  # y_test dataset containg 100 observation for testing

[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0
0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 1 1]


#### Feature Scaling¶

Feature Scaling refers to putting the values in the same range or same scale so that no variable is dominated by the other. Here we can skip feature scaling as it is not nessecary but we are using it to improve traing performance for final prediction as it convert raw data to in its own encoded form so that machine can improve training the data.

We are using StandardScaler from the library sklearn.preprocessing.We create an object of the StandardScaler class and call the fit method passing the X_train and X_test.

In [8]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [9]:
print(X_train)  # X_train dataset containg 300 observation for training after Feature Scaling the data

[[ 0.58164944 -0.88670699]
[-0.60673761  1.46173768]
[-0.01254409 -0.5677824 ]
[-0.60673761  1.89663484]
[ 1.37390747 -1.40858358]
[ 1.47293972  0.99784738]
[ 0.08648817 -0.79972756]
[-0.01254409 -0.24885782]
[-0.21060859 -0.5677824 ]
[-0.21060859 -0.19087153]
[-0.30964085 -1.29261101]
[-0.30964085 -0.5677824 ]
[ 0.38358493  0.09905991]
[ 0.8787462  -0.59677555]
[ 2.06713324 -1.17663843]
[ 1.07681071 -0.13288524]
[ 0.68068169  1.78066227]
[-0.70576986  0.56295021]
[ 0.77971394  0.35999821]
[ 0.8787462  -0.53878926]
[-1.20093113 -1.58254245]
[ 2.1661655   0.93986109]
[-0.01254409  1.22979253]
[ 0.18552042  1.08482681]
[ 0.38358493 -0.48080297]
[-0.30964085 -0.30684411]
[ 0.97777845 -0.8287207 ]
[ 0.97777845  1.8676417 ]
[-0.01254409  1.25878567]
[-0.90383437  2.27354572]
[-1.20093113 -1.58254245]
[ 2.1661655  -0.79972756]
[-1.39899564 -1.46656987]
[ 0.38358493  2.30253886]
[ 0.77971394  0.76590222]
[-1.00286662 -0.30684411]
[ 0.08648817  0.76590222]
[-1.00286662  0.56295021]
[ 0.28455268  0.07006676]
[ 0.68068169 -1.26361786]
[-0.50770535 -0.01691267]
[-1.79512465  0.35999821]
[-0.70576986  0.12805305]
[ 0.38358493  0.30201192]
[-0.30964085  0.07006676]
[-0.50770535  2.30253886]
[ 0.18552042  0.04107362]
[ 1.27487521  2.21555943]
[ 0.77971394  0.27301877]
[-0.30964085  0.1570462 ]
[-0.01254409 -0.53878926]
[-0.21060859  0.1570462 ]
[-0.11157634  0.24402563]
[-0.01254409 -0.24885782]
[ 2.1661655   1.11381995]
[-1.79512465  0.35999821]
[ 1.86906873  0.12805305]
[ 0.38358493 -0.13288524]
[-1.20093113  0.30201192]
[ 0.77971394  1.37475825]
[-0.30964085 -0.24885782]
[-1.6960924  -0.04590581]
[-1.00286662 -0.74174127]
[ 0.28455268  0.50496393]
[-0.11157634 -1.06066585]
[-1.10189888  0.59194336]
[ 0.08648817 -0.79972756]
[-1.00286662  1.54871711]
[-0.70576986  1.40375139]
[-1.29996338  0.50496393]
[-0.30964085  0.04107362]
[-0.11157634  0.01208048]
[-0.30964085 -0.88670699]
[ 0.8787462  -1.3505973 ]
[-0.30964085  2.24455257]
[ 0.97777845  1.98361427]
[-1.20093113  0.47597078]
[-1.29996338  0.27301877]
[ 1.37390747  1.98361427]
[ 1.27487521 -1.3505973 ]
[-0.30964085 -0.27785096]
[-0.50770535  1.25878567]
[-0.80480212  1.08482681]
[ 0.97777845 -1.06066585]
[ 0.28455268  0.30201192]
[ 0.97777845  0.76590222]
[-0.70576986 -1.49556302]
[-0.70576986  0.04107362]
[ 0.48261718  1.72267598]
[ 2.06713324  0.18603934]
[-1.99318916 -0.74174127]
[-0.21060859  1.40375139]
[ 0.38358493  0.59194336]
[ 0.8787462  -1.14764529]
[-1.20093113 -0.77073441]
[ 0.18552042  0.24402563]
[ 0.77971394 -0.30684411]
[ 2.06713324 -0.79972756]
[ 0.77971394  0.12805305]
[-0.30964085  0.6209365 ]
[-1.00286662 -0.30684411]
[ 0.18552042 -0.3648304 ]
[ 2.06713324  2.12857999]
[ 1.86906873 -1.26361786]
[ 1.37390747 -0.91570013]
[ 0.8787462   1.25878567]
[ 1.47293972  2.12857999]
[-0.30964085 -1.23462472]
[ 1.96810099  0.91086794]
[ 0.68068169 -0.71274813]
[-1.49802789  0.35999821]
[ 0.77971394 -1.3505973 ]
[ 0.38358493 -0.13288524]
[-1.00286662  0.41798449]
[-0.01254409 -0.30684411]
[-1.20093113  0.41798449]
[-0.90383437 -1.20563157]
[-0.11157634  0.04107362]
[-1.59706014 -0.42281668]
[ 0.97777845 -1.00267957]
[ 1.07681071 -1.20563157]
[-0.01254409 -0.13288524]
[-1.10189888 -1.52455616]
[ 0.77971394 -1.20563157]
[ 0.97777845  2.07059371]
[-1.20093113 -1.52455616]
[-0.30964085  0.79489537]
[ 0.08648817 -0.30684411]
[-1.39899564 -1.23462472]
[-0.60673761 -1.49556302]
[ 0.77971394  0.53395707]
[-0.30964085 -0.33583725]
[ 1.77003648 -0.27785096]
[ 0.8787462  -1.03167271]
[ 0.18552042  0.07006676]
[-0.60673761  0.8818748 ]
[-1.89415691 -1.40858358]
[-1.29996338  0.59194336]
[-0.30964085  0.53395707]
[-1.00286662 -1.089659  ]
[ 1.17584296 -1.43757673]
[ 0.18552042 -0.30684411]
[ 1.17584296 -0.74174127]
[-0.30964085  0.07006676]
[ 0.18552042  2.09958685]
[ 0.77971394 -1.089659  ]
[ 0.08648817  0.04107362]
[-1.79512465  0.12805305]
[-0.90383437  0.1570462 ]
[-0.70576986  0.18603934]
[ 0.8787462  -1.29261101]
[ 0.18552042 -0.24885782]
[-0.4086731   1.22979253]
[-0.01254409  0.30201192]
[ 0.38358493  0.1570462 ]
[ 0.8787462  -0.65476184]
[ 0.08648817  0.1570462 ]
[-1.89415691 -1.29261101]
[-0.11157634  0.30201192]
[-0.21060859 -0.27785096]
[ 0.28455268 -0.50979612]
[-0.21060859  1.6067034 ]
[ 0.97777845 -1.17663843]
[-0.21060859  1.63569655]
[ 1.27487521  1.8676417 ]
[-1.10189888 -0.3648304 ]
[-0.01254409  0.04107362]
[ 0.08648817 -0.24885782]
[-1.59706014 -1.23462472]
[-0.50770535 -0.27785096]
[ 0.97777845  0.12805305]
[ 1.96810099 -1.3505973 ]
[ 1.47293972  0.07006676]
[-0.60673761  1.37475825]
[ 1.57197197  0.01208048]
[-0.80480212  0.30201192]
[ 1.96810099  0.73690908]
[-1.20093113 -0.50979612]
[ 0.68068169  0.27301877]
[-1.39899564 -0.42281668]
[ 0.18552042  0.1570462 ]
[-0.50770535 -1.20563157]
[ 0.58164944  2.01260742]
[-1.59706014 -1.49556302]
[-0.50770535 -0.53878926]
[ 0.48261718  1.83864855]
[-1.39899564 -1.089659  ]
[ 0.77971394 -1.37959044]
[-0.30964085 -0.42281668]
[ 1.57197197  0.99784738]
[ 0.97777845  1.43274454]
[-0.30964085 -0.48080297]
[-0.11157634  2.15757314]
[-1.49802789 -0.1038921 ]
[-0.11157634  1.95462113]
[-0.70576986 -0.33583725]
[-0.50770535 -0.8287207 ]
[ 0.68068169 -1.37959044]
[-0.80480212 -1.58254245]
[-1.89415691 -1.46656987]
[ 1.07681071  0.12805305]
[ 0.08648817  1.51972397]
[-0.30964085  0.09905991]
[ 0.08648817  0.04107362]
[-1.39899564 -1.3505973 ]
[ 0.28455268  0.07006676]
[-0.90383437  0.38899135]
[ 1.57197197 -1.26361786]
[-0.30964085 -0.74174127]
[-0.11157634  0.1570462 ]
[-0.90383437 -0.65476184]
[-0.70576986 -0.04590581]
[ 0.38358493 -0.45180983]
[-0.80480212  1.89663484]
[ 1.37390747  1.28777882]
[ 1.17584296 -0.97368642]
[ 1.77003648  1.83864855]
[-0.90383437 -0.24885782]
[-0.80480212  0.56295021]
[-1.20093113 -1.5535493 ]
[-0.50770535 -1.11865214]
[ 0.28455268  0.07006676]
[-0.21060859 -1.06066585]
[ 1.67100423  1.6067034 ]
[ 0.97777845  1.78066227]
[ 0.28455268  0.04107362]
[-0.80480212 -0.21986468]
[-0.11157634  0.07006676]
[ 0.28455268 -0.19087153]
[ 1.96810099 -0.65476184]
[-0.80480212  1.3457651 ]
[-1.79512465 -0.59677555]
[-0.11157634  0.12805305]
[ 0.28455268 -0.30684411]
[ 1.07681071  0.56295021]
[-1.00286662  0.27301877]
[ 1.47293972  0.35999821]
[ 0.18552042 -0.3648304 ]
[ 2.1661655  -1.03167271]
[-0.30964085  1.11381995]
[-1.6960924   0.07006676]
[-0.01254409  0.04107362]
[ 0.08648817  1.05583366]
[-0.11157634 -0.3648304 ]
[-1.20093113  0.07006676]
[-0.30964085 -1.3505973 ]
[ 1.57197197  1.11381995]
[-0.80480212 -1.52455616]
[ 0.08648817  1.8676417 ]
[-0.90383437 -0.77073441]
[-0.50770535 -0.77073441]
[-0.30964085 -0.91570013]
[ 0.28455268 -0.71274813]
[ 0.28455268  0.07006676]
[ 0.08648817  1.8676417 ]
[-1.10189888  1.95462113]
[-1.6960924  -1.5535493 ]
[-1.20093113 -1.089659  ]
[-0.70576986 -0.1038921 ]
[ 0.08648817  0.09905991]
[ 0.28455268  0.27301877]
[ 0.8787462  -0.5677824 ]
[ 0.28455268 -1.14764529]
[-0.11157634  0.67892279]
[ 2.1661655  -0.68375498]
[-1.29996338 -1.37959044]
[-1.00286662 -0.94469328]
[-0.01254409 -0.42281668]
[-0.21060859 -0.45180983]
[-1.79512465 -0.97368642]
[ 1.77003648  0.99784738]
[ 0.18552042 -0.3648304 ]
[ 0.38358493  1.11381995]
[-1.79512465 -1.3505973 ]
[ 0.18552042 -0.13288524]
[ 0.8787462  -1.43757673]
[-1.99318916  0.47597078]
[-0.30964085  0.27301877]
[ 1.86906873 -1.06066585]
[-0.4086731   0.07006676]
[ 1.07681071 -0.88670699]
[-1.10189888 -1.11865214]
[-1.89415691  0.01208048]
[ 0.08648817  0.27301877]
[-1.20093113  0.33100506]
[-1.29996338  0.30201192]
[-1.00286662  0.44697764]
[ 1.67100423 -0.88670699]
[ 1.17584296  0.53395707]
[ 1.07681071  0.53395707]
[ 1.37390747  2.331532  ]
[-0.30964085 -0.13288524]
[ 0.38358493 -0.45180983]
[-0.4086731  -0.77073441]
[-0.11157634 -0.50979612]
[ 0.97777845 -1.14764529]
[-0.90383437 -0.77073441]
[-0.21060859 -0.50979612]
[-1.10189888 -0.45180983]
[-1.20093113  1.40375139]]

In [10]:
print(X_test)  # X_test dataset containg 300 observation for testing after Feature Scaling the data

[[-0.80480212  0.50496393]
[-0.01254409 -0.5677824 ]
[-0.30964085  0.1570462 ]
[-0.80480212  0.27301877]
[-0.30964085 -0.5677824 ]
[-1.10189888 -1.43757673]
[-0.70576986 -1.58254245]
[-0.21060859  2.15757314]
[-1.99318916 -0.04590581]
[ 0.8787462  -0.77073441]
[-0.80480212 -0.59677555]
[-1.00286662 -0.42281668]
[-0.11157634 -0.42281668]
[ 0.08648817  0.21503249]
[-1.79512465  0.47597078]
[-0.60673761  1.37475825]
[-0.11157634  0.21503249]
[-1.89415691  0.44697764]
[ 1.67100423  1.75166912]
[-0.30964085 -1.37959044]
[-0.30964085 -0.65476184]
[ 0.8787462   2.15757314]
[ 0.28455268 -0.53878926]
[ 0.8787462   1.02684052]
[-1.49802789 -1.20563157]
[ 1.07681071  2.07059371]
[-1.00286662  0.50496393]
[-0.90383437  0.30201192]
[-0.11157634 -0.21986468]
[-0.60673761  0.47597078]
[-1.6960924   0.53395707]
[-0.11157634  0.27301877]
[ 1.86906873 -0.27785096]
[-0.11157634 -0.48080297]
[-1.39899564 -0.33583725]
[-1.99318916 -0.50979612]
[-1.59706014  0.33100506]
[-0.4086731  -0.77073441]
[-0.70576986 -1.03167271]
[ 1.07681071 -0.97368642]
[-1.10189888  0.53395707]
[ 0.28455268 -0.50979612]
[-1.10189888  0.41798449]
[-0.30964085 -1.43757673]
[ 0.48261718  1.22979253]
[-1.10189888 -0.33583725]
[-0.11157634  0.30201192]
[ 1.37390747  0.59194336]
[-1.20093113 -1.14764529]
[ 1.07681071  0.47597078]
[ 1.86906873  1.51972397]
[-0.4086731  -1.29261101]
[-0.30964085 -0.3648304 ]
[-0.4086731   1.31677196]
[ 2.06713324  0.53395707]
[ 0.68068169 -1.089659  ]
[-0.90383437  0.38899135]
[-1.20093113  0.30201192]
[ 1.07681071 -1.20563157]
[-1.49802789 -1.43757673]
[-0.60673761 -1.49556302]
[ 2.1661655  -0.79972756]
[-1.89415691  0.18603934]
[-0.21060859  0.85288166]
[-1.89415691 -1.26361786]
[ 2.1661655   0.38899135]
[-1.39899564  0.56295021]
[-1.10189888 -0.33583725]
[ 0.18552042 -0.65476184]
[ 0.38358493  0.01208048]
[-0.60673761  2.331532  ]
[-0.30964085  0.21503249]
[-1.59706014 -0.19087153]
[ 0.68068169 -1.37959044]
[-1.10189888  0.56295021]
[-1.99318916  0.35999821]
[ 0.38358493  0.27301877]
[ 0.18552042 -0.27785096]
[ 1.47293972 -1.03167271]
[ 0.8787462   1.08482681]
[ 1.96810099  2.15757314]
[ 2.06713324  0.38899135]
[-1.39899564 -0.42281668]
[-1.20093113 -1.00267957]
[ 1.96810099 -0.91570013]
[ 0.38358493  0.30201192]
[ 0.18552042  0.1570462 ]
[ 2.06713324  1.75166912]
[ 0.77971394 -0.8287207 ]
[ 0.28455268 -0.27785096]
[ 0.38358493 -0.16187839]
[-0.11157634  2.21555943]
[-1.49802789 -0.62576869]
[-1.29996338 -1.06066585]
[-1.39899564  0.41798449]
[-1.10189888  0.76590222]
[-1.49802789 -0.19087153]
[ 0.97777845 -1.06066585]
[ 0.97777845  0.59194336]
[ 0.38358493  0.99784738]]


# Customer Behaviour Analysis using Python ¶

#### Using Logistic Regression model on the Training set ¶

This is a very simple step. We will be using the LogisticRegression class from the library sklearn.linear_model. First we create an object of the LogisticRegression class and call the fit method passing the X_train(independent variable) and y_train(dependent variable) of traing set.

In [11]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

Out[11]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='auto', n_jobs=None, penalty='l2',
random_state=0, solver='lbfgs', tol=0.0001, verbose=0,
warm_start=False)

#### Predicting a new result ¶

Here, we predict first row in dataset i.e. first customer haveing age 30 and estimated salary 87000. We use two Dimensional array [[]] because in LogisticRegression it aspect 2D array. Result is 0 which means customer will not buy the car.

In [12]:
print(classifier.predict(sc.transform([[30,87000]])))

[0]

##### Predicting the Test set results¶

Now, we have the y_pred which are the predicted values from our Model and y_test which are the actual values. Let us compare are see how well our model did. As you can see below - our basic model did pretty well.

In [13]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[0 0]
[1 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[1 1]
[0 0]
[1 1]
[0 0]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 1]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[1 1]
[0 0]
[1 1]
[1 1]
[0 0]
[0 0]
[0 0]
[1 1]
[0 1]
[0 0]
[0 0]
[0 1]
[0 0]
[0 0]
[1 1]
[0 0]
[0 1]
[0 0]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[0 1]
[0 0]
[0 0]
[1 0]
[0 0]
[1 1]
[1 1]
[1 1]
[1 0]
[0 0]
[0 0]
[1 1]
[1 1]
[0 0]
[1 1]
[0 1]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[0 0]
[0 1]
[0 0]
[0 1]
[1 1]
[1 1]]

###### Making the Confusion Matrix¶

We use Confusion Matrix to find out that how much prediction is correct or not (where we have done right or wrong) that is to find the accuracy via accuracy_score. We will be using confusion_matrix and accuracy_score from the library sklearn.metrics.

In the result we can see that the 65 is correct prediction on class 0 (not buy car) and 3 is incorrect on class 1 (will buy the car) in testing and 8 is incorrect on class 0 (not buy car) and 24 is correct in 1 (will buy the car) in prediction. 0.89 is the prediction i.e. 89% of prediction is correct.

In [14]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[65  3]
[ 8 24]]

Out[14]:
0.89

Take input value from user

In [15]:
print("ENTER CUSTOMER AGE AND EstimatedSalary TO FIND OUT CUSTOMER WILL BUY CAR OR NOT:")

age=int(input("Age: "))
salary=int(input("EstimatedSalary: "))
result = classifier.predict(sc.transform([[age,salary]]))
print(result)
if result==[1]:
else:

ENTER CUSTOMER AGE AND EstimatedSalary TO FIND OUT CUSTOMER WILL BUY CAR OR NOT:
Age: 32
EstimatedSalary: 150000
[1]