# Introduction to TensorBoard and TensorFlow visualization

Deep learning can be complicated…and sometimes frustrating. Why is my lousy 10 layer conv-net only achieving 95% accuracy on MNIST!? We’ve all been there – something is wrong and it can be hard to figure out why. Often the best solution to a problem can be found by visualizing the issue. This is why TensorFlow’s TensorBoard add-on is such a useful tool, and one reason why TensorFlow is a more mature solution than other deep learning frameworks. It also produces pretty pictures, and let’s face it, everybody loves pretty pictures.

In the latest release of TensorFlow (v1.10 as of writing), TensorBoard has been released with a whole new bunch of functionality. This tutorial is going to cover the basics, so that future tutorials can cover more specific (and complicated) features on TensorBoard. The code for this tutorial can be found on this site’s Github page

## Visualizing the graph in TensorBoard

As you are likely to be aware, TensorFlow calculations are performed in the context of a computational graph (if you’re not aware of this, check out my TensorFlow tutorial). To communicate the structure of your network, and to check it for complicated networks, it is useful to be able to visualize the computational graph. Visualizing the graph of your network is very straight-forward in TensorBoard. To do so, all that is required is to build your network, create a session, then create a TensorFlow FileWriter object.

The FileWriter definition takes the file path of the location you want to store the TensorBoard file in as the first argument, and the TensorFlow graph object, sess.graph, as the second argument. This can be observed in the code below:


1writer = tf.summary.FileWriter(STORE_PATH, sess.graph)



The same FileWriter that can be used to display your computational graph in TensorBoard will also be used for other visualization functions, as will be shown below. In this example, a simple, single hidden layer neural network will be created in TensorFlow to classify MNIST hand-written digits. The graph for this network is what will be visualized. The network, as defined in TensorFlow, looks like:


1234567891011121314151617181920212223242526272829# declare the training data placeholders
x = tf.placeholder(tf.float32, [None, 28, 28])
# reshape input x - for 28 x 28 pixels = 784
x_rs = tf.reshape(x, [-1, 784])
# scale the input data (maximum is 255.0, minimum is 0.0)
x_sc = tf.div(x_rs, 255.0)
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.int64, [None, 1])
# convert the y data to one hot values
y_one_hot = tf.reshape(tf.one_hot(y, 10), [-1, 10])

W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.01), name='W')
b1 = tf.Variable(tf.random_normal([300]), name='b')
hidden_logits = tf.add(tf.matmul(x_sc, W1), b1)
hidden_out = tf.nn.sigmoid(hidden_logits)

W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.05), name='W')
b2 = tf.Variable(tf.random_normal([10]), name='b')
logits = tf.add(tf.matmul(hidden_out, W2), b2)

# now let's define the cost function which we are going to train the model on
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_one_hot,
logits=logits))

# add an optimiser
optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_one_hot, 1), tf.argmax(logits, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))



I won’t go through the code in detail, but here is a summary of what is going on:

• Input placeholders are created
• The x input image data, which is of size (-1, 28, 28) is flattened to (-1, 784) and scaled from a range of 0-255 (greyscale pixels) to 0-1
• The y labels are converted to one-hot format
• A hidden layer is created with 300 nodes and sigmoid activation
• An output layer of 10 nodes is created
• The logits of this output layer are sent to the TensorFlow softmax + cross entropy loss function
• Gradient descent optimization is used
• Some accuracy calculations are performed

So when a TensorFlow session is created, and the FileWriter defined and run, you can start TensorBoard to visualize the graph. To define the FileWriter and send it the graph, run the following:


123# start the session
with tf.Session() as sess:
writer = tf.summary.FileWriter(STORE_PATH, sess.graph)



After running the above code, it is time to start TensorBoard. To do this, run the following command in the command prompt:


1tensorboard --logdir=STORE_PATH



This will create a local server and the text output in the command prompt will let you know what web address to type into your browser to access TensorBoard. After doing this and loading up TensorBoard, you can click on the Graph tab and observe something like the following:

As you can see, there is a lot going on in the graph above. The major components which are the most obvious are the weight variable blocks (W, W_1, b, b_1 etc.), the weight initialization operations (random_normal) and the softmax_cross_entropy nodes. These larger rectangle boxes with rounded edges are called “namespaces”. These are like sub-diagrams in the graph, which contain children operation and can be expanded. More on these shortly.

Surrounding these larger colored blocks are a host of other operations – MatMul, Add, Sigmoid and so on – these operations are shown as ovals. Other nodes which you can probably see are the small circles which represent constants. Finally, if you look carefully, you will be able to observe some ovals and rounded rectangles with dotted outlines. These are an automatic simplification by TensorBoard to reduce clutter in the graph. They show common linkages which apply to many of the nodes – such as all of the nodes requiring initialization (init), those nodes which have gradients associated, and those nodes which will be trained by gradient descent. If you look at the upper right hand side of the diagram, you’ll be able to see these linkages to the gradients, GradientDescent and init nodes:

One final thing to observe within the graph are the linkages or edges connecting the nodes – these are actually tensors flowing around the computational graph. Zooming in more closely reveals these linkages:

As can be observed, the edges between the node display the dimensions of the Tensors flowing around the graph. This is handy for debugging for more complicated graphs. Now that these basics have been reviewed, we’ll examine how to reduce the clutter of your graph visualizations.

### Namespaces

Namespaces are scopes which you can surround your graph components with to group them together. By doing so, the detail within the namespace will be collapsed into a single Namespace node within the computational graph visualization in TensorBoard. To create a namespace in TensorFlow, you use the Python with functionality like so:


123456    with tf.name_scope("layer_1"):
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.01), name='W')
b1 = tf.Variable(tf.random_normal([300]), name='b')
hidden_logits = tf.add(tf.matmul(x_sc, W1), b1)
hidden_out = tf.nn.sigmoid(hidden_logits)



As you can observe in the code above, the first layer variables have been surrounded with tf.name_scope(“layer_1”). This will group all of the operations / variables within the scope together in the graph. Doing the same for the input placeholders and associated operations, the second layer and the accuracy operations, and re-running we can generate the following, much cleaner visualization in TensorBoard:

As you can see, the use of namespaces drastically cleans up the clutter of a TensorBoard visualization. You can still access the detail within the namespace nodes by double clicking on the block to expand.

Before we move onto other visualization options within TensorBoard, it’s worth noting the following:

• tf.variable_scope() can also be used instead of tf.name_scope(). Variable scope is used as part of the get_variable() variable sharing mechanism in TensorFlow.
• You’ll notice in the first cluttered visualization of the graph, the weights and bias variables/operations had underscored numbers following them i.e. W_1 and b_1. When operations share the same name outside of a namescope, TensorFlow automatically appends a number to the operation name so that not two operations are labelled the same. However, when a name or variable scope is added, you can name operations the same thing and the namescope will be appended to the name of the operation. For instance, the weight variable in the first layer is called ‘W’ in the definition, but given it is now in the namescope “layer_1” it is now called “layer_1/W”. The “W” in layer 2 is called “layer_2/W”.

Now that visualization of the computational graph has been covered, it’s time to move onto other visualization functions which can aid in debugging and analyzing your networks.

## Scalar summaries

At any point within your network, you can log scalar (i.e. single, real valued) quantities to display in TensorBoard. This is useful for tracking things like the improvement of accuracy or the reduction in the loss function during training, or studying the standard deviation of your weight distributions and so on. It is executed very easily. For instance, the code below shows you how to log the accuracy scalar within this graph:


12# add a summary to store the accuracy
tf.summary.scalar('acc_summary', accuracy)



The first argument is the name you chose to give the quantity within the TensorBoard visualization, and the second is the operation (which must return a single real value) you want to log. The output of the tf.summary.scalar() call is an operation. In the code above, I have not assigned this operation to any variable within Python, though the user can do so if they desire. However, as with everything else in TensorFlow, these summary operations will not do anything until they are run. Given that often there are a lot of summaries run in any given graph depending on what the developer wants to observe, there is a handy helper function called merge_all(). This merges together all the summary calls within the graph so that you only have to call the merge operation and it will gather all the other summary operations for you and log the data. It looks like this:


1merged = tf.summary.merge_all()



During execution within a Session, the developer can then simply run merged. A collection of summary objects will be returned from running this merging operation, and these can then be output to the FileWriter mentioned previously. The training code for the network looks like the following, and you can check to see where the merged operation has been called:


12345678910111213141516     # start the session
with tf.Session() as sess:
sess.run(init_op)
writer = tf.summary.FileWriter(STORE_PATH, sess.graph)
# initialise the variables
total_batch = int(len(y_train) / batch_size)
for epoch in range(epochs):
avg_cost = 0
for i in range(total_batch):
batch_x, batch_y = get_batch(x_train, y_train, batch_size=batch_size)
_, c = sess.run([optimiser, cross_entropy], feed_dict={x: batch_x, y: batch_y.reshape(-1, 1)})
avg_cost += c / total_batch
acc, summary = sess.run([accuracy, merged], feed_dict={x: x_test, y: y_test.reshape(-1, 1)})
print("Epoch: {}, cost={:.3f}, test set accuracy={:.3f}%".format(epoch + 1, avg_cost, acc*100))
writer.add_summary(summary, epoch)
print("\nTraining complete!")



I won’t go through the training details of the code above – it is similar to that shown in other tutorials of mine like my TensorFlow tutorial. However, there are a couple of lines to note. First, you can observe that after every training epoch two operations are run – accuracy and merged. The merged operation returns a list of summary objects ready for writing to the FileWriter stored in summary. This list of objects is then added to the summary by running writer.add_summary(). The first argument to this function is the list of summary objects, and the second is an optional argument which logs the global training step along with the summary data.

Before showing you the results of this code, it is important to note something. TensorBoard starts to behave badly when there are multiple output files within the same folder that you launched the TensorBoard instance from. Therefore, if you are running your code multiple times you have two options:

• Delete the FileWriter output file after each run or,
• Use the fact that TensorBoard can perform sub-folder searches for TensorBoard files. So for instance, you could create a separate sub-folder for each run i.e. “Run_1”, “Run_2” etc. and then launch TensorBoard from the command line, pointing it to the parent directory. This is recommended when you are doing multiple runs for cross validation, or other diagnostic and testing runs.

To access the accuracy scalar summary that was logged, launch TensorBoard again and click on the Scalar tab. You’ll see something like this:

The scalar page in TensorBoard has a few features which are worth checking out. Of particular note is the smoothing slider. This explains why there are two lines in the graph above – the thicker orange line is the smoothed values, and the lighter orange line is the actual accuracy values which were logged. This smoothing can be useful for displaying the overall trend when the summary logging frequency is higher i.e. after every training step rather than after every epoch as in this example.

The next useful data visualization in TensorBoard is the histogram summary.

## Histogram summaries

Histogram summaries are useful for examining the statistical distributions of your network variables and operation outputs. For instance, in my weight initialization tutorial, I have used TensorBoard histograms to show how poor weight initialization can lead to sub-optimal network learning due to less than optimal outputs from the node activation function. To log histogram summaries, all the developer needs to do is create a similar operation to the scalar summaries:


12tf.summary.histogram("Hidden_logits", hidden_logits)
tf.summary.histogram("Hidden_output", hidden_out)



I have added these summaries so that we can examine how the distribution of the inputs and outputs of the hidden layer progress over training epochs. After running the code, you can open TensorBoard again. Clicking on the histogram tab in TensorBoard will give you something like the following:

This view in the histogram tab is the offset view, so that you can clearly observe how the distribution changes through the training epochs. On the left hand side of the histogram page in TensorBoard you can choose another option called “Overlay” which looks like the following:

Another view of the statistical distribution of your histogram summaries can be accessed through the “Distributions” tab in TensorBoard. An example of the graphs available in this tab is below:

This graph gives another way of visualizing the distribution of the data in your histogram summaries. The x-axis is the number of steps or epochs, and the different shadings represent varying multiples of the standard deviation from the mean.

This covers off the histogram summaries – it is now time to review the last summary type that will be covered off in this tutorial – the image summary.

## Image summaries

The final summary to be reviewed is the image summary. This summary object allows the developer to capture images of interest during training to visualize them. These images can be either grayscale or RGB. One possible application of image summaries is to use them to visualize which images are classified well by the classifier, and which ones are classified poorly. This application will be demonstrated in this example – the additional code can be observed below:


1234567    with tf.variable_scope("getimages"):
correct_inputs = tf.boolean_mask(x_sc, correct_prediction)
image_summary_true = tf.summary.image('correct_images', tf.reshape(correct_inputs, (-1, 28, 28, 1)),
max_outputs=5)
incorrect_inputs = tf.boolean_mask(x_sc, tf.logical_not(correct_prediction))
image_summary_false = tf.summary.image('incorrect_images', tf.reshape(incorrect_inputs, (-1, 28, 28, 1)),
max_outputs=5)



Ok, there is a bit going on in the code above, which I will explain.  In the first line, a boolean mask operation is created – this basically takes the scaled input tensor (representing the hand written digit images) and returns only those images which were correctly classified by the network. The tensor correct_prediction is a boolean vector of True and False values which indicate whether each image was correctly classified or not.

After the correct inputs have been extracted by this process, these inputs are then passed to tf.summary.image() which is how the image summaries are stored. The first argument to this function is the namespace of the images. The second is the images themselves. Note, that the input tensor x_sc is a flattened version of the 28 x 28 pixel images. We need to reshape the input tensor into a form acceptable to tf.summary.image(). The acceptable form is a 4D tensor of the following structure: (no. samples, image width, image height, color depth). In this case we use the automatic dimension prediction capabilities for the first dimension, so that it can dynamically adapt to the number of correctly classified images. The next two dimensions are the 28 x 28 pixels of the images. Finally, the last dimension is 1 as the images are greyscale – this would be 3 for RGB color images.

The last argument to tf.summary.image() is the maximum number of images to send to TensorBoard. This is an important memory saving feature – if we export too many images there is the possibility that the FileWriter object will become unwieldy. In this case, we only want to look at 5 images of correct predictions.

The next line in the code is the exact opposite of what was performed in the correct_inputs operation – in this case, all the incorrectly classified images are extracted. These are likewise sent to tf.summary.image() for storage.

If you re-run the code with the addition of these extra lines and go to the Images tab in TensorBoard, you’ll be able to see images such as the following for the correctly classified cases:

This is obviously a nice, clearly written “7” which the network has correctly classified. Alternatively, the image below is an example of the incorrectly classified case:

With the state-of-the-art classifier that is my brain, I can see that this image is a badly written “4”. However, the neural network created in this example hasn’t been able to correctly classify such a poorly written number.

That concludes this introductory TensorBoard visualization tutorial. TensorBoard is expanding with new versions of TensorFlow, and there are now additional summaries and visualizations that can be used such as video summaries, text summaries and even a debugger. These will be topics of future posts. I hope this tutorial assists you in getting a leg up into the great deep learning visualization tool that is TensorBoard.