Logistic Regression in Python from Scratch

Hello Everyone . . . In this post, I’ll be sharing the code for the equations that are used in the implementation of Logistic Regression Algorithm. They are written in Python without the use of any Prediction Libraries. Even if you are a beginner, you’ll find the implementation easy and intuitive.

Most of the readers are already acquainted with ML Libraries like scikit-learn, TensorFlow, Keras, Caffe, etc. They simplify our work by providing 3-4 lines of code to use the above-mentioned model for their Data. So people generally do not have much of an idea of what is going on behind the curtains, which certainly is not a good practice.  In Data Science, one must know exactly what is happening with their data, that is helpful during debugging and it acts as a base to understand the advanced topics.

I’ll directly jump in to explain the steps involved in the Regression. I’ll jot down the equations that are used to apply the model to the Data. I am putting down useful Links for readers who want to understand each step in detail. So if you do not have the Basic knowledge of the Model then I’ll recommend you to go through one of the sites before going further.

Let us Start now. I assume by now the reader knows the working of the Algorithm.

m_train = Number of training examples
m_test = Number of test examples

Objectives

  • Initialize the parameters of the model
  • Learn the parameters of the model by minimizing the cost
  • Use the learned parameters to make predictions (on the test set)
  • Analyse the results and conclude

Implementation

Equation: Sigmoid Function

sigmoid

Purpose: Compute the sigmoid of z

Arguments:
x  — A scalar or numpy array of any size.

Return:
s — sigmoid(z)

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))
    return s
Equation: Initializing with Zeros

Purpose: This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.

Argument:

dim — size of the w vector we want (or number of parameters in this case)

Returns:
w — initialized vector of shape (dim, 1)
b — initialized scalar (corresponds to the bias)

def initialize_with_zeros(dim):
    w = np.zeros([dim, 1])
    b = 0
    assert(w.shape == (dim, 1))
    return w, b
Equation: Forward and Backward Propagation

forwardprop

Here are the two formulas you will be using:

forwardpropeq

Purpose: Implement the cost function and its gradient for the propagation.

Arguments:
w — weights, a numpy array of size
b — bias, a scalar
X — data of size
Y — true “label” vector of size (1, number of examples)

Return:
cost — negative log-likelihood cost for logistic regression
dw — gradient of the loss with respect to w, thus same shape as w
db — gradient of the loss with respect to b, thus same shape as b

def propagate(w, b, X, Y):
    m = X.shape[1]
 
    # FORWARD PROPAGATION (FROM X TO COST)
    A = sigmoid(np.dot(w.T, X ) + b)     
    cost = np.sum((Y * np.log(A) + (1 - Y) * np.log(1 - A)), axis = 1)/-m 
 
    # BACKWARD PROPAGATION 
    dw = np.dot(X, (A - Y).T)/ m # [n, m] * [m*1] = [n*1]
    db = np.sum(A - Y, axis = 1)/ m # [1*m]

    cost = np.squeeze(cost)
    grads = {"dw": dw, "db": db}
 
    return grads, cost
Equation: Optimization

optimize

Purpose: This function optimizes w and b by running a gradient descent algorithm.

Arguments:
w — weights, a numpy array of size (# of features, 1)
b — bias, a scalar
X — data of shape (number of Features, number of examples)
Y — true “label” vector of shape (1, number of examples)
num_iterations — number of iterations of the optimization loop
learning_rate — learning rate of the gradient descent update rule
print_cost — True to print the loss every 100 steps

Return:
params — dictionary containing the weights w and bias b
grads — dictionary containing the gradients of the weights and bias with respect to the cost function.
costs — list of all the costs computed during the optimization, this will be used to plot the learning curve.

Tips:
You basically need to write down two steps and iterate through them:
– Calculate the cost and the gradient for the current parameters. Use propagate().
– Update the parameters using gradient descent rule for w and b.

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
     costs = []
 
     for i in range(num_iterations):
         # Cost and gradient calculation 
         grads, cost = propagate(w, b, X, Y)
 
         # Retrieve derivatives from grads
         dw = grads["dw"]
         db = grads["db"]
 
         # update rule
         w = w - learning_rate * dw
         b = b - learning_rate * db
 
         # Record the costs
         if i % 100 == 0:
             costs.append(cost)
 
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
 
     params = {"w": w, "b": b}
     grads = {"dw": dw, "db": db}
 
     return params, grads, costs
Equation: Predicting Test Cases

weightintobias

Purpose: Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b).

Arguments:
w — weights, a numpy array of size
b — bias, a scalar
X — data of size

Return:
Y_prediction — a numpy array (vector) containing all predictions (0/1) for the examples in X

def predict(w, b, X):
     m = X.shape[1]
     Y_prediction = np.zeros((1,m))
     w = w.reshape(X.shape[0], 1)
 
     A = sigmoid(np.dot(w.T, X) + b)
     for i in range(A.shape[1]):
         # Convert probabilities a[0,i] to actual predictions p[0,i]
         Y_prediction[0][i] = 0 if A[0][i] <= 0.5 else 1
 
     assert(Y_prediction.shape == (1, m))
     return Y_prediction
Equation: Merge All Functions into Model

Purpose: Builds the logistic regression model by calling the function you’ve implemented previously.

Arguments:
X_train — training set represented by a numpy array of shape (# features, m_train)
Y_train — training labels represented by a numpy array (vector) of shape (1, m_train)
X_test — test set represented by a numpy array of shape (# features, m_test)
Y_test — test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations — hyperparameter representing the number of iterations to optimize the parameters
learning_rate — hyperparameter representing the learning rate used in the update rule of optimize()
print_cost — Set to true to print the cost every 100 iterations

Return:
d — dictionary containing information about the model.

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, 
learning_rate = 0.5, print_cost = False):
     # initialize parameters with zeros
     w, b = initialize_with_zeros(X_train.shape[0])

     # Gradient descent
     parameters, grads, costs = 
     optimize(w, b, X_train, Y_train, num_iterations = 
     num_iterations, learning_rate = learning_rate, print_cost = False)
 
     # Retrieve parameters w and b from dictionary "parameters"
     w = parameters["w"]
     b = parameters["b"]
 
     # Predict test/train set examples
     Y_prediction_test = predict(w, b, X_test)
     Y_prediction_train = predict(w, b, X_train)

     # Print train/test Errors
     print("train accuracy: {} %".format(100 - 
            np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
 
     print("test accuracy: {} %".format(100 - np.mean(
            np.abs(Y_prediction_test - Y_test)) * 100))

     d = {"costs": costs,
          "Y_prediction_test": Y_prediction_test, 
          "Y_prediction_train" : Y_prediction_train, 
          "w" : w, 
          "b" : b,
          "learning_rate" : learning_rate,
          "num_iterations": num_iterations}
 
     return d
Choice of learning rate

Reminder: In order for Gradient Descent to work you must choose the learning rate wisely. The learning rate αα determines how rapidly we update the parameters. If the learning rate is too large we may “overshoot” the optimal value. Similarly, if it is too small we will need too many iterations to converge to the best values. That’s why it is crucial to use a well-tuned learning rate.

Conclusion

  • Preprocessing the dataset is important.
  • You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
  • Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s