Logistic regression is used to analyze the relationship between a dichotomous dependent variable and one or more categorical or continuous independent variables. It specifies the likelihood of the response variable as a function of various predictors. The model expressed as , where
refers to the parameters and
represents the independent variables. The
, or log of the odds ratio, is defined as
. It expresses the natural logarithm of the ratio between the probability that an event will occur,
, to the probability that an event will not occur,
.
The models estimates, , express the relationship between the independent and dependent variable on a log-odds scale. A coefficient of
would indicate that a one unit difference in
is associated with a log-odds increase in the occurce of
by
. To get a clearer understanding of the constant effect of a predictor on the likelihood that an outcome will occur, odds-ratios can be calculated. This can be expressed as
, which is the exponentiate of the model. Alongside the odd-ratio, it’s often worth calculating predicted probabilities of
at specific values of key predictors. This is done through
where z refers to the
regression equation.
Using the GermanCredit dataset in the Caret package, we will construct a logistic regression model to estimate the likelihood of a consumer being a good loan applicant based on a number of predictor variables.
library(caret) data(GermanCredit) # split the data into training and testing datasets Train <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE) training <- GermanCredit[ Train, ] testing <- GermanCredit[ -Train, ] # use glm to train the model on the training dataset. make sure to set family to "binomial" mod_fit_one <- glm(Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own + CreditHistory.Critical, data=training, family="binomial") summary(mod_fit_one) # estimates exp(coef(mod_fit_one)) # odds ratios predict(mod_fit_one, newdata=testing, type="response") # predicted probabilities
Great, we’re all done, right? Not just yet. There are some critical questions that still remain. Is the model any good? How well does the model fit the data? Which predictors are most important? Are the predictions accurate? In the next post, I’ll provide an overview of how to evaluate logistic regression models in R.