TensorFlow is a great deep learning framework. In fact, it is still the reigning monarch within the deep learning framework kingdom. However, it has some frustrating limitations. One of these is the difficulties that arise during debugging. In TensorFlow, it’s difficult to diagnose what is happening in your model. This is due to its *static graph* structure (for details, see my TensorFlow tutorial) – in TensorFlow the developer has to first create the full set of graph operations, and only then are these operations compiled with a TensorFlow session object and fed data. Wouldn’t it be great if you could define operations, then immediately run data through them to observe what the output was? Or wouldn’t it be great to set standard Python debug breakpoints within your code, so you can step into your deep learning training loops wherever and whenever you like and examine the tensors and arrays in your models? This is now possible using the TensorFlow Eager API, available in the latest version of TensorFlow.

The TensorFlow Eager API allows you to dynamically create your model in an imperative programming framework. In other words, you can create tensors, operations and other TensorFlow objects by typing the command into Python, and run them straight way without the need to set up the usual session infrastructure. This is useful for debugging, as mentioned above, but also it allows dynamic adjustments of deep learning models as training progresses. In fact, in natural language processing, the ability to create dynamic graphs is useful, given that sentences and other utterances in natural language have varying lengths. In this TensorFlow Eager tutorial, I’ll show you the basics of the new API and also show how you can use it to create a fully fledged convolutional neural network.

**Recommended video course – **If you’d like to learn more about TensorFlow, and you’re more of a video learner, check out this cheap online course: Complete Guide to TensorFlow for Deep Learning with Python

# TensorFlow Eager basics

The first thing you need to do to use TensorFlow Eager is to enable Eager execution. To do so, you can run the following (note, you can type this directly into your Python interpreter):

Now you can define TensorFlow operations and run them on the fly. In the code below, a numpy range from 0 to 9 is multiplied by a scalar value of 10, using the TensorFlow *multiply* operation:

This code snippet will output the following:

tf.Tensor([ 0 10 20 30 40 50 60 70 80 90], shape=(10,), dtype=int32)

Notice we can immediately access the results of the operation. If we ran the above without running the *tf.enable_eager_execution()* command, we would instead see the definition of the TensorFlow operation i.e.:

Tensor(“Mul:0”, shape=(10,), dtype=int32)

Notice also how easily TensorFlow Eager interacts with the numpy framework. So far, so good. Now, the main component of any deep learning API is how gradients are handled – this will be addressed in the next section.

## Gradients in TensorFlow Eager

Gradient calculation is necessary in neural networks during the back-propagation stage (if you’d like to know more, check out my neural networks tutorial). The gradient calculations in the TensorFlow Eager API work similarly to the autograd package used in PyTorch. To calculate the gradient of an operation using Eager, you can use the *gradients_function()* operation. The code below calculates the gradient for an $x^3$ function:

Notice the use of *tfe.gradients_function(f_cubed*) – when called, this operation will return the gradient of *df/dx* for the *x* value. The code above returns the value 27 – this makes sense as the derivative of $x^3$ is $3x^2 = 3 * 3^2 = 27$. The final line shows the *grad* operation, and then the conversion of the output to a numpy scalar i.e. a float value.

We can show the use of this *gradients_function* in a more complicated example – polynomial line fitting. In this example, we will use TensorFlow Eager to discover the weights of a noisy 3rd order polynomial. This is what the line looks like:

As can be observed from the code, the polynomial is expressed as $x^3 – 4x^2 – 2x +2$ with some random noise added. Therefore, we want our code to find a “weight” vector of approximately [1, -4, -2, 2]. First, let’s define a few functions:

The first function is a simple randomized batching function. The second is a class definition for our polynomial model. Upon initialization, we create a weight variable *self.w* and set to a TensorFlow Eager variable type, randomly initialized as a 4 length vector. Next, we define a function *f* which returns the weight vector by the third order polynomial form. Finally, we have a loss function defined, which returns the mean squared error between the current model output and the noisy *y *vector.

To train the model, we can run the following:

First, we create a model and then use a TensorFlow Eager function called *implicit_gradients*. This function will detect any upstream or parent gradients involved in calculating the loss, which is handy. We are using a standard Adam optimizer for this task. Finally a loop begins, which supplies the batch data and the model to the gradient function. Then the program applies the returned gradients to the optimizer to perform the optimizing step.

After running this code, we get the following output graph:

The orange line is the fitted line, the blue is the “ground truth”. Not perfect, but not too bad.

Next, I’ll show you how to use TensorFlow Eager to create a proper neural network classifier trained on the MNIST dataset.

# A neural network with TensorFlow Eager

In the code below, I’ll show you how to create a Convolutional Neural Network to classify MNIST images using TensorFlow Eager. If you’re not sure about Convolutional Neural Networks, you can check out my tutorial here. The first part of the code shows you how to extract the MNIST dataset:

In the case above, we are making use of the Keras datasets now available in TensorFlow (by the way, the Keras deep learning framework is now heavily embedded within TensorFlow – to learn more about Keras see my tutorial). The raw MNIST image dataset has values ranging from 0 to 255 which represent the grayscale values – these need to be scaled to between 0 and 1. The function below accomplishes this:

Next, in order to setup the Keras image dataset into a TensorFlow Dataset object, we use the following code. This code creates a scaled training and testing dataset. This dataset is also randomly shuffled and ready for batch extraction. It also applies the *tf.one_hot *function to the labels to convert the integer label to a one hot vector of length 10 (one for each hand-written digit). If you’re not familiar with the TensorFlow Dataset API, check out my TensorFlow Dataset tutorial.

The next section of code creates the MNIST model itself, which will be trained. The best practice at the moment for TensorFlow Eager is to create a class definition for the model which inherits from the tf.keras.Model class. This is useful for a number of reasons, but the main one for our purposes is the ability to call on the model.variables property when determining Eager gradients, and this “gathers together” all the trainable variables within the model. The code looks like:

In the model definition, we create layers to implement the following network structure:

- 32 channel, 5×5 convolutional layer with ReLU activation
- 2×2 max pooling, with (2,2) strides
- 64 channel 5×5 convolutional layer with ReLU activation
- Flattening
- Dense/Fully connected layer with 750 nodes, ReLU activation
- Dropout layer
- Dense/Fully connected layer with 10 nodes, no activation

As stated above, if you’re not sure what these terms mean, see my Convolutional Neural Network tutorial. Note that the *call *method is a mandatory method for the tf.keras.Model superclass – it is where the forward pass through the model is defined.

The next function is the loss function for the optimization:

Note that this function calls the forward pass through the model (which is an instance of our MNISTModel) and calculates the “raw” output. This raw output, along with the labels, passes through to the TensorFlow function *softmax_cross_entropy_with_logits_v2*. This applies the softmax activation to the “raw” output from the model, then creates a cross entropy loss.

Next, I define an accuracy function below, to keep track of how the training is progressing regarding training set accuracy, and also to check test set accuracy:

Finally, the full training code for the model is shown below:

In the code above, we create the model along with an optimizer. The code then enters the training loop, by iterating over the training dataset *train_ds*. Then follows the definition of the gradients for the model. Here we are using the TensorFlow Eager object called *GradientTape()*. This is an efficient way of defining the gradients over all the variables involved in the forward pass. It will track all the operations during the forward pass and will efficiently “play back” these operations during back-propagation.

Using the Python *with *functionality, we can include the *loss**_fn* function, and all associated upstream variables and operations, within the tape to be recorded. Then, to extract the gradients of the relevant model variables, we call *tape.gradient. *The first argument is the “target” for the calculation, i.e. the loss, and the second argument is the “source” i.e. all the model variables.

We then pass the gradients and the variables zipped together to the Adam optimizer for a training step. Every 10 iterations some results are printed and the training loop exits if the iterations number exceeds the maximum number of epochs.

Running this code for 1000 iterations will give you a loss < 0.05, and training set accuracy approaching 100%. The code below calculates the test set accuracy:

You should be able to get a test set accuracy, using the code defined above, on the order of 98% or greater for the trained model.

In this post, I’ve shown you the basics of using the TensorFlow Eager API for imperative deep learning. I’ve also shown you how to use the autograd-like functionality to perform a polynomial line fitting task and build a convolutional neural network which achieves relatively high test set accuracy for the MNIST classification task. Hopefully you can now use this new TensorFlow paradigm to reduce development time and enhance debugging for your future TensorFlow projects. All the best!

**Recommended video course – **If you’d like to learn more about TensorFlow, and you’re more of a video learner, check out this cheap online course: Complete Guide to TensorFlow for Deep Learning with Python