*Linear Regression is one of the fundamental machine learning algorithms used to predict a continuous variable using one or more explanatory variables (features). In this tutorial, you will learn how to implement a simple linear regression in Tensorflow 2.0 using the Gradient Tape API.*

## Overview

In this tutorial, you will understand:

- Fundamentals of Linear Regression
- How the weights of linear regression are computed
- How to implement using Gradient Tape (TensorFlow 2.0)

**Spacy for NLP course:** Master industry level Natural Language Processing using Spacy. Learn how to setup Spacy, tokenization in NLP, rule based matching, POS tagging, and Word 2 Vector. Train NLP models and build chatbot with Spacy and Rasa.

## Fundamentals of Linear Regression

Let’s briefly cover the fundamentals of linear regression.

The equation for simple linear regression is given by,

$$ Y = mX + C + e$$

where `Y`

denotes a continuous variable, which is the output you want to predict and X denoted the feature variables (input). `e`

is the error, the part of Y which the X is not able to explain.

`m`

is the coefficient and `C`

is the bias term. Together they are called ‘weights’.

The error term above is nothing but the difference between the actual (Y) and the predicted value (Y_hat). So, this can be written as:

$$ e = Y – Y_{hat}$$

Where, Y_hat is the predicted value of Y. Or simply, `Y = mx + C`

.

The objective of Linear Regression is to find the value of `m`

and `C`

so that Y and Y_hat are as close as possible. This is done by minimizing a loss function.

So, is `e`

the loss function?

No. Because `e`

can take both +ve and -ve. +ve when Y is greater than Y_hat and -ve otherwise. There is a possibility that they cancel each other across the observations that we predict.

So, a sign insensitive loss function is needed.

The loss function used in a linear regression model is the mean squared error (MSE), calculated between predicted values and actual values of y.

$$ MSE = (Y_{hat})^2 – (Y)^2 $$

Gradient descent optimization may be used to determine the parameters `m`

and `C`

by minimizing the loss function. It iterates the values of `m`

and `C`

intelligently until the loss is minimized sufficiently.

This is the fundamental concept behind Linear Regression.

## How the weights of linear regression are computed

Now the key here is how the Gradient Descent iterates the values of `m`

and `C`

to arrive at the best predictions in the minimum possible time.

But how exactly?

- Start with a random value for weights
`m`

and`C`

. - Use it to predict Y and compute the loss (mean squared error)
- From
`m`

and`C`

, subtract the partial derivative (of the loss function with respect to the weights) multiplied with a learning rate (α).

$$ m = m – α * (\frac{δJ}{δm}) $$

$$ c = c – α * (\frac{δJ}{δc}) $$

The vector that contains the partial derivative of the loss with respect to the weights is called the ‘Gradient’.

The Gradient Tape provided by Tensorflow can be used to compute this conveniently.

This is exactly what I am going to show you how to implement in TensorFlow 2.0 in detail. It’s very easy.

## Implementing Linear Regression using Gradient Tape (TensorFlow 2.0)

First, import the needed packages: tensorflow, numpy and matplotlib.

```
# Import Relevant libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
```

Next, let’s prepare a training dataset simulated with random values and define the learning rate, the number of epochs (which means the number of times we will iterate through the dataset to update the weights).

The y dataset has been created by adding random noise to the x dataset. From the plot, you can clearly see that y and x have a largely linear relationship.

```
# Learning rate
learning_rate = 0.01
# Number of loops for training through all your data to update the parameters
training_epochs = 100
# the training dataset
x_train = np.linspace(0, 10, 100)
y_train = x_train + np.random.normal(0,1,100)
# plot of data
plt.scatter(x_train, y_train)
```

Next, let’s define the weight and bias, or m and c to be 0. This is usually set randomly.

These values will be updated based on the gradients computed using Gradient Tape. But before going to that, let’s define the loss function and the function to predict the Y using the parameters.

```
# declare weights
weight = tf.Variable(0.)
bias = tf.Variable(0.)
```

After this, let’s define the linear regression function to get predicted values of y, or y_pred.

```
# Define linear regression expression y
def linreg(x):
y = weight*x + bias
return y
```

Now, define the loss function, which in this case is MSE.

```
# Define loss function (MSE)
def squared_error(y_pred, y_true):
return tf.reduce_mean(tf.square(y_pred - y_true))
```

Now that you have all functions defined, the next step is to train the model.

We will be using gradient tape here to keep track of the loss after every epoch and then to differentiate that loss with respect to the weight and bias to get gradients.

This gradient will then be multiplied with the learning rate and subtracted from the existing value of parameters to get new optimized parameter values.

This piece of code below is the main workhorse, where we actually implement linear regression in tensorflow.

```
# train model
for epoch in range(training_epochs):
# Compute loss within Gradient Tape context
with tf.GradientTape() as tape:
y_predicted = linreg(x_train)
loss = squared_error(y_predicted, y_train)
# Get gradients
gradients = tape.gradient(loss, [weight,bias])
# Adjust weights
weight.assign_sub(gradients[0]*learning_rate)
bias.assign_sub(gradients[1]*learning_rate)
# Print output
print(f"Epoch count {epoch}: Loss value: {loss.numpy()}")
# OUTPUT
#> Epoch count 0: Loss value: 33.03848648071289
#> Epoch count 1: Loss value: 3.971679925918579
#> Epoch count 2: Loss value: 1.0887922048568726
#> Epoch count 3: Loss value: 0.8028544783592224
#> Epoch count 4: Loss value: 0.774485170841217
#> Epoch count 5: Loss value: 0.7716620564460754
#> Epoch count 6: Loss value: 0.7713727355003357
#> Epoch count 7: Loss value: 0.771334707736969
#> Epoch count 8: Loss value: 0.771321713924408
#> ...truncated...
#> Epoch count 94: Loss value: 0.7707335948944092
#> Epoch count 95: Loss value: 0.7707293629646301
#> Epoch count 96: Loss value: 0.7707250118255615
#> Epoch count 97: Loss value: 0.7707207202911377
#> Epoch count 98: Loss value: 0.7707165479660034
#> Epoch count 99: Loss value: 0.7707126140594482
```

The training process is now complete. You can see that the model reached near minimum loss in the 6th epoch itself. Thanks to gradient tape. Let’s see the final value of weight and bias.

## Results

```
print(weight.numpy())
print(bias.numpy())
#> 0.9558898
#> 0.17020188
```

The final step of this linear regression model is to plot the best fit line based on our final optimized parameter values.

```
# Plot the best fit line
plt.scatter(x_train, y_train)
plt.plot(x_train, linreg(x_train), 'r')
plt.show()
```

You can see we have managed to get a pretty good fit based on our training model.