Difference between Regression and Classification.¶
The main difference between Regression and Classification algorithms that Regression algorithms
are used to predict the continuous values such as salary, age, etc. and Classification
algorithms are used to classify the values such as Male or Female, True or
False, Spam or Not Spam etc.
Regression:¶
As I start the regression first i go through the overview of regression like definition,
advantages, different technique of regression. So I start with overview..
¶
Definition: Regression is a method to analysis the set of statistical processes for estimating
the relationships between a dependent variable and one or more independent variables. when we
want to predict a continuous dependent variable from a number of independent variables we use
regression analysis
¶
Advantages:
¶
- Predict sales in the near and long term.
- Understand inventory levels.
- Understand supply and demand.
- Review and understand how different variables impact all of these things
Regression method of forecasting involves examining the relationship between two different
variables, known as the dependent and independent variables.
¶
Types of Regression:¶
- Linear regression
- Logistic regression
- Ridge regression
- Polynomial regression
#### After this I generally go through 3 types of regression Linear, Logistic and
Polynomial regression.
Linear Regression comprises a predictor variable and a dependent variable
related
to each other in a linear fashion. Logistic Regression uses a sigmoid curve
to show the
relationship between the target and independent variables. However, caution should be
exercised: logistic regression works best with large data sets that have an almost equal
occurrence of values in target variables.
Polynomial Regression models
a non-linear
dataset using a linear model. It is the equivalent of making a square peg fit into a
round hole. It works in a similar way to multiple linear regression (which is just
linear regression but with multiple independent variables), but uses a non-linear curve.
After that I perform Simple Linear Regression : simple linear
regression is a linear
regression model with a single explanatory variable.That is, it concerns two-dimensional
sample points with one independent variable and one dependent variable (conventionally,
the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a
non-vertical straight line) that, as accurately as possible, predicts the dependent
variable values as a function of the independent variable. The adjective simple refers
to the fact that the outcome variable is related to a single predictor.
I perform Linear Regression on a simple dataset of random number and
also calculate
the constrain value that is =|yi-wixi| where yi is target value ,wi is cofficient and xi
is predictor. And also calculate max and min Illustrutive that is wixi+eps (max
illustrutive) wixi-eps (min illustrutive)
After that I performed Vector Regression on Life expectancy dataset to
predict life
expectancy. So vector regression are supervised learning models with associated learning
algorithms that analyze data for classification and regression analysis. and I used
Sklearn to import SVM after that I analyse the dataset and get information about dataset
and apply vector regression by fitting the dataset values and plot different graphs for
analyse the predicted values.
After that I perform various type of regression on same dataset that are
Follow:
- Random forest regression
- Catboost regression
- XgBoost regression
- LightGBM regression
- Decision Tree
So before performing these types of regression I go through the defination of these
regression:
1. Random Forest Regression is a supervised learning algorithm that uses
ensemble
learning method for regression. Ensemble learning method is a technique that combines
predictions from multiple machine learning algorithms to make a more accurate prediction
than a single model. A Random Forest operates by constructing several decision trees
during training time and outputting the mean of the classes as the prediction of all the
trees.
2. Decision tree builds regression or classification models in the form of
a tree
structure. It breaks down a dataset into smaller and smaller subsets while at the same
time an associated decision tree is incrementally developed. The final result is a tree
with decision nodes and leaf nodes.
3. XGBoost is an efficient implementation of gradient boosting that can be
used for
regression predictive modeling.XGBoost is a powerful approach for building supervised
regression models. The validity of this statement can be inferred by knowing about its
(XGBoost) objective function and base learners.
4. CatBoost is a recently open-sourced Machine Learning algorithm from
Yandex. It
can easily integrate with deep learning frameworks like Google’s TensorFlow and Apple’s
Core ML. It can work with diverse data types to help solve a wide range of problems that
businesses face today. To top it up, it provides best-in-class accuracy.
5. LightGBM extends the gradient boosting algorithm by adding a type of
automatic
feature selection as well as focusing on boosting examples with larger gradients. This
can result in a dramatic speedup of training and improved predictive performance.
After that I perform the various operation on life expexctency dataset to predict the life
expectancy in next years and Perform Regression Algorithms to predict and classified the
data and also claculate various Mean Absolute/Squared Error to check the data
consistancy.