Logistic Regression — Explained and Implemented
Through this article, we’re going to learn about Logistic Regression(LR). This article is divided into two parts:
- Brief Explanation of Logistic Regression
- Implementation of Logistic Regression
For those who have no idea regarding Logistic Regression, I’d suggest going through a more detailed article: Logistic Regression Explained
Let’s get started.
1. Brief Explanation of Logistic Regression
Despite having the word ‘regression’ in its name, Logistic Regression is a kind of binary classification algorithm. It is named ‘Logistic Regression’ because it’s similar to Linear Regression. The term “Logistic” is taken from the Logit function that is used in this method of classification.
It is a Binary-Classification technique, therefore the output variable is dichotomous in nature. i.e, always in contrast with each other. Either the output would be (0 or 1) or (yes or no).
This technique has applications such as spam filtering, checking the presence of a disease, etc
Logistic regression performs pretty well only with the dependent variables. i.e, say
[x1, x2, x3 ……xn] are the features of our data, and
[y] is the target label of the data, then
our features [ x1, x2, x3 …… xn ] depending more on the target label [y] will perform much better than the features[ x1, x2, x3 …… xn ] not depending on the target label [y].
In layman terms: To predict the presence of diabetes using the depending features such as body temperature, sugar level, age of the patient would give more accurate results, compared to predicting the presence of diabetes using the patient's age, location, and gender.
Traditionally, this logistic regression is being used for binary classification. But, it can also be used for multi-class classification.
Multinomial classification: It is used for multiclass/categorical classification which is a non-traditional/special linear regression case. This method uses log of odds as the dependent variable.
LR uses the logit function to predict the probability of occurrence of a binary event.
Uses sigmoid function(0,1)
where (y) has a value of:
Linear vs. Logistic
Linear regression is a regression technique and hence gives continuous output i.e, the intermediate target values are not binary, therefore would a better choice for applications such as House pricing, stock price prediction, etc.
Logistic regression being a classification technique will give discreet output such as 0 or 1, to classify the input to one of the target labels. therefore LR is a better option for applications such as spam mail detection or cancer detection.
Advantages
- Low computation, widely used
Disadvantages
- Not apt for multi-class, can’t solve non-linear problems.
- It will not perform well with independent variables.
2. Implementation
Implementing Logistic regression for diabetes detection in a person using features such as age, body mass, skinfold thickness, blood pressure, etc., and predicting if the person has diabetes(1) or not(0).
Conclusion
Logistic Regression is one of the most used binary classification techniques. Through this article, we briefly learned about logistic regression, its applications, advantages. Finally, we implementing LR using the ‘scikit-learn’ for detecting if a person has diabetes or not.
You can reach out to me at:
LinkedIn: https://www.linkedin.com/in/manmohan-dogra-4a6655169/
GitHub: https://github.com/immohann
Twitter: https://twitter.com/immohann