Support Vector Machines
A Support Vector Machine (SVM) is a supervised Machine Learning Model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.
At first approximation what SVMs do is to find a separating line(or hyperplane) between data of two classes. SVM is an algorithm that takes the data as an input and outputs a line that separates those classes if possible.SVM tries to make a decision boundary in such a way that the separation between the two classes(that street) is as wide as possible.
import pandas as pd from sklearn import svm from sklearn.model_selection import train_test_split from sklearn import metrics
*(Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data Here is the link to download csv file ( )
dataset = pd.read_csv('Social_Network_Ads.csv')
x = dataset.iloc[:, [2, 3]].values # take features data and y = dataset.iloc[:, 4].values # target data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
model = svm.SVC(kernel='linear', C=100, gamma=1) model.fit(x_train, y_train) model.score(x_train, y_train) y_pred = model.predict(x_test) print(y_pred)
[0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0]
kernel parameters selects the type of hyperplane used to separate the data. Using ‘linear’ will use a linear hyperplane (a line in the case of 2D data). ‘rbf’ and ‘poly’ uses a non linear hyper-plane for linear data it always be ‘linear’ and for non linear data it always be ‘rbf’ or ‘poly’.
It controls the trade off between smooth decision boundary and classifying training points correctly. A large value of c means you will get more training points correctly.
It defines how far the influence of a single training example reaches. If it has a low value it means that every point has a far reach and conversely high value of gamma means that every point has close reach.
print("accuracy:", metrics.accuracy_score(y_test, y_pred))
A support vector machine allows you to classify data that’s linearly separable.
If it isn’t linearly separable, you can use the kernel trick to make it work.
However, for text classification it’s better to just stick to a linear kernel.
This makes the algorithm very suitable for text classification problems, where it’s common to have access to a dataset of at most a couple of thousands of tagged samples.