Logistic Regression¶
- Logistic Regression is a supervised learning algorithm used primarily for binary classification tasks.
- Despite the name "regression," it is not used for predicting continuous values ā instead, it predicts probabilities of classes (e.g., whether an email is spam or not).
Mathematical Notation¶
Linear combination of inputs¶
$z = W^TX+b$
Sigmoid Function¶
$\sigma(z) = \frac{1}{1 + e^{-z}}$
Optimization (Gradient Descent)¶
$\mathbf{w} := \mathbf{w} - \alpha \cdot \nabla_\mathbf{w} \mathcal{L(w)}$
$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \dots $
- $x_1$, $x_2$, $x_3$,⦠: independent variables
- $b_0$: intercept of the line
- $b_1$, $b_2$, ā¦: coefficients
NLL Loss Function¶
Binary classification
$\mathcal{L}_{\text{binary}} = - \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]$
Multi-class classification
$\mathcal{L}_{\text{multi-class}} = - \log \hat{y}_c$
$\text{where } \hat{y}_c = \text{predicted probability of the correct class } c$
Dataset¶
- Download the dataset by click Social_Network_Ads.csv (10.93 KB)
- Sample Notebook codes from Kaggle
- Sample code from textbook
- Or just use the file (07_1_social_network_ads.csv) provided by professor Giseop Noh
Practice Steps¶
Import libraries and packages
(Data) Load data and perform feature scaling
(Data) Split data into training and testing sets
(Model) Build the model
(Model Training) Train the model
(Model Evaluation) Evaluate the model's performance
(Result) Visualize the model's operation
Logistic Regression Model Using scikit-learn Library¶
0. Import necessary libraries and packages¶
InĀ [66]:
# Import numpy, matplotlib, pandas as np, plt, pd respectively.
# Import scikit-learn (sklearn) and the necessary packages.
import sklearn
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
1. Load data and perform feature scaling¶
InĀ [67]:
# Load the dataset (Social_Network_Ads.csv) and standardize it through feature scaling.
dataset = pd.read_csv('07_1_social_network_ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Check shape
print(X.shape, y.shape)
dataset
(400, 2) (400,)
Out[67]:
| User ID | Gender | Age | EstimatedSalary | Purchased | |
|---|---|---|---|---|---|
| 0 | 15624510 | Male | 19 | 19000 | 0 |
| 1 | 15810944 | Male | 35 | 20000 | 0 |
| 2 | 15668575 | Female | 26 | 43000 | 0 |
| 3 | 15603246 | Female | 27 | 57000 | 0 |
| 4 | 15804002 | Male | 19 | 76000 | 0 |
| ... | ... | ... | ... | ... | ... |
| 395 | 15691863 | Female | 46 | 41000 | 1 |
| 396 | 15706071 | Male | 51 | 23000 | 1 |
| 397 | 15654296 | Female | 50 | 20000 | 1 |
| 398 | 15755018 | Male | 36 | 33000 | 0 |
| 399 | 15594041 | Female | 49 | 36000 | 1 |
400 rows Ć 5 columns
InĀ [68]:
# Transform the entire dataset X into a standard normal distribution using mean and variance
sc = StandardScaler()
X = sc.fit_transform(X)
2. Split data into training and testing sets¶
InĀ [69]:
# Split X (feature vector) and y (actual values) into training and testing datasets.
# Use test_size to set the ratio of training and testing datasets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
print(f'X_train: {X_train.shape}')
print(f'X_test: {X_test.shape}')
print(f'y_train: {y_train.shape}')
print(f'y_test: {y_train.shape}')
X_train: (300, 2) X_test: (100, 2) y_train: (300,) y_test: (300,)
3. Build the model¶
InĀ [70]:
# Load the logistic regression model from the scikit-learn library.
classifier = LogisticRegression()
4. Train the model¶
InĀ [71]:
# Train the model using the fit function.
# The fit function implements 1) weight initialization, 2) loss function calculation, and 3) weight update.
classifier.fit(X_train, y_train) # Only training data is used during the training phase
w_1 = classifier.coef_
w_0 = classifier.intercept_
print(w_1)
print(w_0)
[[2.31551264 1.14039584]] [-1.09602631]
5. Evaluate the model's performance¶
InĀ [72]:
# Extract predictions obtained through logistic regression using the predict function and evaluate them with the test data.
y_pred = classifier.predict(X_test) # Only test data is used during the prediction phase
result = sklearn.metrics.accuracy_score(y_test, y_pred) # Model accuracy
print(f'Accuracy: {result}')
Accuracy: 0.82
6. Visualize the model's operation¶
InĀ [73]:
# Function to plot the decision boundary of the logistic regression model
# ė”ģ§ģ¤ķ± ķź· ėŖØėøģ ź²°ģ ź²½ź³ė„¼ ģź°ķķė ķØģ
def plot_logistic_regression_result(X_set, y_set, classifier):
# Create a grid of points covering the feature space (Age and Estimated Salary)
# ķ¹ģ§ ź³µź°(ėģ“ģ ģģ źøģ¬)ģ ķ¬ķØķė ź²©ģ ģ ģģ±
X1, X2 = np.meshgrid(
np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01), # Grid for the first feature
# 첫 ė²ģ§ø ķ¹ģ§(ėģ“)ģ ėķ ź²©ģ ģģ±
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01) # Grid for the second feature
# ė ė²ģ§ø ķ¹ģ§(ģģ źøģ¬)ģ ėķ ź²©ģ ģģ±
)
# Predict the class for each point in the grid and reshape the predictions to match the grid shape
# 격ģ ė“ ź° ģ ģ ėķ“ ķ“ėģ¤ė„¼ ģģø”ķź³ , ģģø” 결과넼 격ģ ėŖØģģ ė§ź² ė³ķ
plt.contourf(
X1, X2, # Grid points / 격ģ ģ
classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), # Predictions reshaped to grid / 격ģ ķķė” ė³ķė ģģø”ź°
alpha = 0.3, # Transparency of the decision boundary / ź²°ģ ź²½ź³ģ ķ¬ėŖ
ė
cmap = ListedColormap(('red', 'green')) # Colors for the two classes / ė ķ“ėģ¤ģ ėķ ģģ
)
# Set the limits for the x-axis and y-axis of the plot
# ź·øėķģ xģ¶ź³¼ yģ¶ ė²ģ ģ¤ģ
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
# Plot the actual data points on top of the decision boundary
# ź²°ģ ź²½ź³ ģģ ģ¤ģ ė°ģ“ķ° ķ¬ģøķø ģź°ķ
for i, j in enumerate(np.unique(y_set)): # Loop through each unique class in the dataset
# ė°ģ“ķ°ģ
ģ ź° ź³ ģ ķ“ėģ¤ģ ėķ“ ė°ė³µ
plt.scatter(
X_set[y_set == j, 0], # Data points for the first feature (Age) / 첫 ė²ģ§ø ķ¹ģ§(ėģ“)ģ ėķ ė°ģ“ķ° ķ¬ģøķø
X_set[y_set == j, 1], # Data points for the second feature (Estimated Salary) / ė ė²ģ§ø ķ¹ģ§(ģģ źøģ¬)ģ ėķ ė°ģ“ķ° ķ¬ģøķø
color = ListedColormap(('red', 'green'))(i), # Color corresponding to the class / ķ“ėģ¤ģ ķ“ė¹ķė ģģ
label = 'Purchased' if j == 1 else 'Not Purchased' # Label for the class / ķ“ėģ¤ģ ėķ ė ģ“ėø (구매 ģ¬ė¶ ķģ)
)
# Add a title and labels for the axes
# ź·øėķ ģ ėŖ©ź³¼ ģ¶ ė ģ“ėø ģ¶ź°
plt.title('Logistic Regression (Training set)') # Title of the plot / ź·øėķ ģ ėŖ©
plt.xlabel('Age') # Label for the x-axis / xģ¶ ė ģ“ėø
plt.ylabel('Estimated Salary') # Label for the y-axis / yģ¶ ė ģ“ėø
# Add a legend to distinguish between the classes
# ķ“ėģ¤ ź° źµ¬ė¶ģ ģķ ė²ė” ģ¶ź°
plt.legend()
# Display the plot
# ź·øėķ ģ¶ė „
plt.show()
InĀ [74]:
# Visualize the train dataset
plot_logistic_regression_result(X_train, y_train, classifier)
InĀ [75]:
# Visualize the test dataset.
plot_logistic_regression_result(X_test, y_test, classifier)
[Practice 4-2] Logistic Regression Model Implemented by Defining Functions¶
1. Load data and perform feature scaling¶
InĀ [76]:
# Load the dataset (Social_Network_Ads.csv) and standardize it through feature scaling.
dataset = pd.read_csv('07_1_social_network_ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
sc = StandardScaler()
X = sc.fit_transform(X) # Transform the training data X into a standard normal distribution using mean and variance
2. Split data into training and testing sets¶
InĀ [77]:
# Split X (feature vector) and y (actual values) into training and testing datasets.
# Use test_size to set the ratio of training and testing datasets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
3. Build the model¶
InĀ [78]:
# Declare the functions and variables required for logistic regression.
w_2 = 1 # Initialize weights
w_1 = 1
w_0 = 1
lr = 0.1 # Learning rate (typically 0 < lr < 1)
def sigmoid(x):
return 1.0/(1 + np.exp(-x)) # exp(x) is equivalent to the exponential function (e^(x))
def BCE_loss(y, y_hat): # Binary Cross Entropy loss
loss = np.mean(y * (np.log(y_hat)) + (1-y) * np.log(1-y_hat))
return -loss
def gradient_descent(X, y, y_hat): # Verify the update equation for logistic regression
global w_2, w_1, w_0
w_2 = w_2 - lr * (-X[0]) * (y - y_hat)
w_1 = w_1 - lr * (-X[1]) * (y - y_hat)
w_0 = w_0 - lr * (-1) * (y - y_hat)
4. Train the model¶
InĀ [79]:
# Train the model using the implemented functions.
# This function implements 1) weight initialization, 2) loss function calculation, and 3) weight update.
for epoch in range(100): # Set the desired number of iterations
for X, y in zip(X_train, y_train):
y_hat = sigmoid(X[0] * w_2 + X[1] * w_1 + w_0) # Extract the model's predictions
loss = BCE_loss(y, y_hat) # Calculate the value of the loss function
gradient_descent(X, y, y_hat) # Update w1, w0 using the value of the loss function through gradient descent
print(w_2, w_1, w_0)
2.6403713018077064 1.5941124018564723 -1.3225304890221656
5. Evaluate the model's performance¶
InĀ [80]:
# predict ķØģ넼 ķµķ“ ė”ģ§ģ¤ķ± ķź·ė” ģ»ģ ģģø”ź°ģ ģ¶ģ¶ķ“ ķ
ģ¤ķø ė°ģ“ķ°ė” ķź°ķė¤.
def predict(X):
global w_2, w_1, w_0
preds = sigmoid(X[:, 0] * w_2 + X[:, 1] * w_1 + w_0) # ķ¹ģ§ ė²”ķ°ģ ź°ģ¤ģ¹ė„¼ ķģ©
pred_class = []
pred_class = [1 if i > 0.5 else 0 for i in preds] # ģģø” ź²°ź³¼ź° 0.5 ģ“ģģ“ė©“ 1, ź·øė ģ§ ģģ¼ė©“ 0
return np.array(pred_class)
y_hat = predict(X_test)
result = sklearn.metrics.accuracy_score(y_test, y_hat)
print(f'Accuracy: {result}')
Accuracy: 0.82
Modify hard coded functions based above¶
InĀ [81]:
class MyLogisticRegression:
def __init__(self, X, y, num_epoch=100):
self.X = X
self.y = y
self.w_2 = 1 # weight for the first feature
self.w_1 = 1 # weight for the second feature
self.w_0 = 1 # bias term
self.lr = 0.1
self.num_epoch = num_epoch
def sigmoid(self, x):
return 1.0 / (1 + np.exp(-x))
def BCE_loss(self, y, y_hat):
'''Binary Cross Entropy loss'''
epsilon = 1e-15 # to avoid log(0)
y_hat = np.clip(y_hat, epsilon, 1 - epsilon)
loss = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
return loss
def gradient_descent(self, X_sample, y, y_hat):
# Update weights and bias using gradient descent for the current sample
self.w_2 = self.w_2 - self.lr * (-X_sample[0]) * (y - y_hat)
self.w_1 = self.w_1 - self.lr * (-X_sample[1]) * (y - y_hat)
self.w_0 = self.w_0 - self.lr * (-1) * (y - y_hat)
def train(self):
for epoch in range(self.num_epoch):
total_loss = 0
for X_sample, y_sample in zip(self.X, self.y):
y_hat = self.sigmoid(X_sample[0] * self.w_2 + X_sample[1] * self.w_1 + self.w_0)
loss = self.BCE_loss(y_sample, y_hat)
total_loss += loss
self.gradient_descent(X_sample, y_sample, y_hat)
print("Final weights:", self.w_2, self.w_1, self.w_0)
def predict(self, X):
preds = self.sigmoid(X[:, 0] * self.w_2 + X[:, 1] * self.w_1 + self.w_0)
return np.array([1 if i > 0.5 else 0 for i in preds])
def evaluate(self, X, y):
y_hat = self.predict(X)
result = sklearn.metrics.accuracy_score(y, y_hat)
print("Accuracy:", result)
return result
InĀ [82]:
my_classifier = MyLogisticRegression(X_train, y_train)
my_classifier.train()
my_classifier.predict(X_test)
Final weights: 2.6403713018077064 1.5941124018564723 -1.3225304890221656
Out[82]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0])
6. Visualize the model's operation¶
InĀ [83]:
# Visualize the training dataset (red).
plot_logistic_regression_result(X_train, y_train, my_classifier)
# X_set, y_set = X_train, y_train
# X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
# np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
# plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
# alpha = 0.3, cmap = ListedColormap(('red', 'green')))
# plt.xlim(X1.min(), X1.max())
# plt.ylim(X2.min(), X2.max())
# for i, j in enumerate(np.unique(y_set)):
# plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
# c = ListedColormap(('red', 'green'))(i), label = j)
# plt.title('Logistic Regression (Training set)')
# plt.xlabel('Age')
# plt.ylabel('Estimated Salary')
# plt.legend()
# plt.show()
InĀ [84]:
# Visualize the test dataset (green).
plot_logistic_regression_result(X_test, y_test, my_classifier)
# X_set, y_set = X_test, y_test
# X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
# np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
# plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
# alpha = 0.3, cmap = ListedColormap(('red', 'green')))
# plt.xlim(X1.min(), X1.max())
# plt.ylim(X2.min(), X2.max())
# for i, j in enumerate(np.unique(y_set)):
# plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
# c = ListedColormap(('red', 'green'))(i), label = j)
# plt.title('Logistic Regression (Test set)')
# plt.xlabel('Age')
# plt.ylabel('Estimated Salary')
# plt.legend()
# plt.show()