Google Colaboratory introduction – Learn how to build deep learning systems in Google Colaboratory

It’s a great time to be practising deep learning. The main existing deep learning frameworks like TensorFlow, Keras and PyTorch are maturing and offer a lot of functionality to streamline the deep learning process. There are also other great tool sets emerging for the deep learning practitioner. One of these is the Google Colaboratory environment. This environment, based on Python Jupyter notebooks, gives the user free access to Tesla K80 GPUs. If your local machine lacks a GPU, there is now no need to hire out GPU time on Amazon AWS, at least for prototyping smaller learning tasks. This opens up the ability of anybody to experiment with deep learning beyond simple datasets like MNIST. Google has also just recently opened up the free use of TPUs (Tensor Processing Units) within the environment. 

Free access to GPUs and TPUs are just one benefit of Google Colaboratory. This post will explore the capabilities of the environment and show you how to efficiently and effectively use it as a deep learning “home base”. I’ll also be running through an example CIFAR10 classifier built in TensorFlow to demonstrate its use. The Google Colaboratory notebook for this tutorial can be found here.


Eager to build deep learning systems? Learn how here


Google Colaboratory basics

Google Colaboratory is based on the Jupyter notebook design and operation paradigm. I won’t be reviewing how Jupyter works, as I imagine most Python and TensorFlow users are already aware of this package. To access the environment, you must have a Google Drive account and be signed in. The .ipynb files that you create will be saved in your Google Drive account. 

To access Google Colaboratory, go here. Once you open up a new file, the first thing to do is rename the file (File -> Rename) and setup your running environment (i.e. whether to use a standard CPU, GPU or TPU). Whenever you change your running environment, the current notebook session will restart – so it is best to do this first up. To do so, go to Runtime -> Change runtime type. 

One of the most important and useful components of Google Colaboratory is the ability to share your notebooks with others, and also allow others to comment on your work. The ability to do this can be see in the Share and Comment buttons on the top right of the screen, see below:

The Comment functionality allows users to make comments on individual cells within the notebook, which is useful for remote collaboration.

Each cell can be selected as either a “code” cell, or a “text” cell. The text cells allow the developer to create commentary surrounding the code, which is useful for explaining what is going on or creating document-like implementation of various algorithms. This is all similar to standard Jupyter notebooks. 

Local bash commands can be run from the cells also, which interact with the virtual machine (VM) that has been created as part of your session. For instance, to install the PyDrive package which will be used later in this introduction, run the following straight into one of the :

!pip install -U -q PyDrive

This runs a normal pip installation command on the VM and installs the PyDrive package. One important thing to note is that Google Colaboratory will time-out and erase your session environment after a period of inactivity. This is to free up VM space for other users. Therefore, it may be necessary in some cases to run a series of pip install commands at the beginning of your notebook each time to get your local environment ready for your particular use. However, deep learning checkpoints, data and result summaries can be exported to various other locations with permanent storage such as Google Drive, your local hard-drive and so on. 

You can also run other common LINUX commands such as ls, mkdir, rmdir and curl. A fuller list of bash commands and functionality available on Google Colaboratory can be found by running !ls \bin. 

Now that these basics have been covered, it is time to examine how to access TensorBoard while in Google Collaboratory.

Accessing TensorBoard

TensorBoard is a useful visualization tool for TensorFlow, see my previous post for more details on how to use it. It can be accessed by calling on log files written during or after the training process. It is most useful to have access to these files during training, in my view, so that one can observe whether your current approach is yielding results or not. On a PC it is easy to call up the TensorBoard server and access through the web-browser, however this can’t be done in a straight-forward fashion in Google Colaboratory. 

Nevertheless, there is an easy solution – ngrok. This package creates secure tunnels through firewalls and other network blocking limitations and allows access to the public internet. I am indebted for this solution to the writer of this blog post. To download and install ngrok to Google Colaboratory, run the following commands:

!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

The next step is to create a TensorBoard session in the usual way. To do this in Google Colaboratory, one can run the following commands:

LOG_DIR = './log'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

The get_ipython() command allows one to access IPython commands, and system_raw() executes commands in the native command prompt / terminal. The string argument passed to system_raw() starts a TensorBoard session which searches for log files in LOG_DIR, and runs on port 6006.

The next step is to execute ngrok and print out the link which will take the user to the TensorBoard portal:

get_ipython().system_raw('./ngrok http 6006 &')
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

The first line in the above starts the ngrok tunnel via the http protocol to port 6006 – the same port that TensorBoard can be accessed from. The second line is a complicated looking curl command. The curl command in Linux is used to run http requests. In this case, a request is being made to “http://localhost:4040/api/tunnels”. This is an ngrok API running locally that contains information about the tunnels that are operating.

The information received from that curl http request is then sent to the local Python 3 application via the Linux pipe “|” operator. The results come into Python via sys.stdin in json format – and the public URL of the tunnel that has been created is printed to screen. Running this command will return a URL in Google Colaboratory that looks something like:

https://712a59dd.ngrok.io

Clicking on the link in Google Colaboratory will send your browser to your TensorBoard portal. So now you can use TensorBoard in Google Colaboratory – which is very handy.

Next we’ll be looking at saving models and loading files from Google Drive – this is the easiest way to checkpoint and restore models while using Google Colaboratory.

Saving and restoring files in Google Colaboratory

In this post, I’m going to introduce the two ways of working with files in Google Colaboratory that I believe will be of most common use. The files to be worked with generally consist of training or testing data, and saved model data (i.e. checkpoints or fully trained model data).

The most simple way of loading and downloading files in Google Colaboratory is using the inbuilt folder structure browser available. Clicking on View -> Table of contents in the menu will launch a left hand pane/menu. At the top of this pane, there will be a tab called “Files” – selecting this will show you the file structure of your current runtime session, from which you can upload and download from your local PC.

Alternatively, this can be performed programatically by running the following commands:

from google.colab import files
uploaded = files.upload()

The code above will launch a dialog box which allows you to navigate to a local file to upload to your session. The following code will download a specified file to your downloads area on your PC (if you’re using Windows):

files.download("downloaded_weights.12-1.05.hdf5")

So far so good. However, this is a very manual way of playing around with files. This won’t be possible during training, so storing checkpoints to your local drive from Google Colaboratory isn’t feasible using this method. Another issue is when you are running a long running training session on Google Colaboratory. If your training finishes and you don’t interact with the console for a while (i.e. you run an overnight training session and you’re asleep when the training finishes), your runtime will be automatically ended and released to free up resources.

Unfortunately this means that you will also lose all your training progress and model data up to that point. So in other words, it is important to be able to programatically store files / checkpoints while training. In the example below, I’m going to show you how to setup a training callback which automatically stores checkpoints to your Google Drive account, which can then be downloaded and used again later. I’ll demonstrate it in the context of training a TensorFlow/Keras model to classify CIFAR-10 images. For more details on that, see my tutorial or my book.

A file saving example using Keras and callbacks

First off, I’ll show you the imports required, the data preparation using the Dataset API and then the Keras model development. I won’t explain these, as the details are outlined in the aforementioned tutorial, so check that out if you’d like to understand the model better. The Google Colaboratory notebook for this tutorial can be found here. We’re going to use the PyDrive package to do all the talking to Google Drive, so first you have to install it in your session:

!pip install -U -q PyDrive

Next comes all the imports, data stuff and Keras model development:

import tensorflow as tf
from tensorflow import keras
import datetime as dt
import os
import numpy as np
from google.colab import files
from google.colab import drive
# these are all the Google Drive and authentication libraries required
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# import the CIFAR-10 data then load into TensorFlow datasets
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
# the training set with data augmentation
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(256).shuffle(10000)
train_dataset = train_dataset.map(lambda x, y: (tf.div(tf.cast(x, tf.float32), 255.0), tf.reshape(tf.one_hot(y, 10), (-1, 10))))
train_dataset = train_dataset.map(lambda x, y: (tf.image.central_crop(x, 0.75), y))
train_dataset = train_dataset.map(lambda x, y: (tf.image.random_flip_left_right(x), y))
train_dataset = train_dataset.repeat()
# the validation set
valid_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(5000).shuffle(10000)
valid_dataset = valid_dataset.map(lambda x, y: (tf.div(tf.cast(x, tf.float32),255.0), tf.reshape(tf.one_hot(y, 10), (-1, 10))))
valid_dataset = valid_dataset.map(lambda x, y: (tf.image.central_crop(x, 0.75), y))
valid_dataset = valid_dataset.repeat()
# now the model creation function
def create_model():
    model = keras.models.Sequential([
        keras.layers.Conv2D(96, 3, padding='same', activation=tf.nn.relu,
                            kernel_initializer=keras.initializers.VarianceScaling(distribution='truncated_normal'),
                            kernel_regularizer=keras.regularizers.l2(l=0.001),
                            input_shape=(24, 24, 3)),
        keras.layers.Conv2D(96, 3, 2, padding='same', activation=tf.nn.relu,
                            kernel_initializer=keras.initializers.VarianceScaling(distribution='truncated_normal'),
                            kernel_regularizer=keras.regularizers.l2(l=0.001)),
        keras.layers.Dropout(0.2),
        keras.layers.Conv2D(192, 3, padding='same', activation=tf.nn.relu,
                            kernel_initializer=keras.initializers.VarianceScaling(distribution='truncated_normal'),
                            kernel_regularizer=keras.regularizers.l2(l=0.001)),
        keras.layers.Conv2D(192, 3, 2, padding='same', activation=tf.nn.relu,
                            kernel_regularizer=keras.regularizers.l2(l=0.001)),
        keras.layers.BatchNormalization(),
        keras.layers.Dropout(0.5),
        keras.layers.Flatten(),
        keras.layers.Dense(256, activation=tf.nn.relu,
                           kernel_initializer=keras.initializers.VarianceScaling(),
                           kernel_regularizer=keras.regularizers.l2(l=0.001)),
        keras.layers.Dense(10),
        keras.layers.Softmax()
    ])

    model.compile(optimizer=tf.train.AdamOptimizer(),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model
# finally create the model
model = create_model()

The next step in the code is to create the “GoogleDriveStore” callback. This callback inherits from the general keras.callbacks.Callback super class, and this allows the class definition below to create functions which are run at the beginning of the training, after each epoch etc. The code below is the initialization step of the callback which fires at the beginning of the training:

class GoogleDriveStore(keras.callbacks.Callback):
    def on_train_begin(self, logs={}, model_folder="."):
        self.first = True
        self.init_date = dt.datetime.now()
        self.model_folder = model_folder
        
        # Authenticate and create the PyDrive client.
        auth.authenticate_user()
        gauth = GoogleAuth()
        gauth.credentials = GoogleCredentials.get_application_default()
        self.drive = GoogleDrive(gauth)

You can observe that first a few initialization variables are set, whose purpose will become clear shortly. After that, there is 4 lines related to setting up the Google Drive connection. I’ll confess that I am not 100% sure how these authentication functions work in detail, but I’ll give it a shot at explaining.

First, the auth.authenticate_user() function is run – this is a native Google Colaborotory function which will supply a link for the user to click on. This will lead the user to logon to a Google account and will supply a token that needs to be entered into the Colaboratory notebook to complete the authentication.

Next, this authentication needs to be loaded into the PyDrive Google Drive connection. First an OAuth authenciation object is created (gauth). Then the credentials for this connection are supplied via the get_application_default() method of GoogleCredentials. I’m not sure how this method works exactly, but it seems to pick up the authentication that was performed in the first step by running auth.authenticate_user(). Finally, a GoogleDrive object is created, and the authentication credentials are passed to this object creation.

Now the callback has an authenticated Google Drive connection at its disposal. The next step is to create the checkpoint storage to Google Drive after the end of each epoch:

    def on_epoch_begin(self, batch, logs={}):
        if not self.first:
          # get the latest 
          model_files = os.listdir(self.model_folder)
          max_date = self.init_date
          for f in model_files:
            if os.path.isfile(self.model_folder + "/" + f):
              if f.split(".")[-1] == 'hdf5':
                creation_date = dt.datetime.fromtimestamp(
                    os.path.getmtime((self.model_folder + "/" + f)))
                if creation_date > max_date:
                  file_name = f
                  latest_file_path = self.model_folder + "/" + f
                  max_date = creation_date
          uploaded = self.drive.CreateFile({'title': file_name})
          uploaded.SetContentFile(latest_file_path)
          uploaded.Upload()
        else:
          self.first = False

The function above simply loops through all the files within the self.model_folder directory. It searches for files with the hdf5 extension, which is the Keras model save format. It then finds the hdf5 file with the most recent creation date (latest_file_path). Once this has been found, a file is created using the CreateFile method of the GoogleDrive object within PyDrive, and a name is assigned to the file. On the next line, the content of this file is set to be equal to the latest hdf5 file stored locally. Finally, using the upload() method, the file is saved to Google Drive.

The remainder of the training code looks like the following:

g_drive_callback = GoogleDriveStore()
callbacks = [
  # Write TensorBoard logs to ./logs directory
keras.callbacks.TensorBoard(log_dir='./log/{}'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")), write_images=True),
  keras.callbacks.ModelCheckpoint("./weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=True),
  g_drive_callback
]
model.fit(train_dataset,  epochs=50, steps_per_epoch=len(x_train)//256,
          validation_data=valid_dataset,
          validation_steps=3, callbacks=callbacks)

As can be observed in the code above, there are in total three callbacks being included in the training – a TensorBoard callback (updates the TensorBoard result file), a Model Checkpoint callback (which creates the hdf5 model checkpoints) and finally the Google Drive callback which I created.

The next thing to cover is how to load a trained model from Google Drive back into your Google Colaboratory session.

Loading a trained model from Google Drive

The easiest way to access the files from Google Drive and make them available to your Google Colaboratory session is to mount your Google Drive into the session. This can be performed easily (though, you’ll have to authenticate if you haven’t already, using the token code as explained previously) by calling the drive.mount() method. The code below shows you how to use this method to reload the data into a model and run some predictions:

drive.mount('/content/gdrive')
model = create_model()
model.load_weights("./gdrive/My Drive/weights.12-1.05.hdf5")
model.predict(valid_dataset, steps=1)

The final line will print out an array of predictions from the newly loaded Keras model. Note the argument supplied to drive.mount() is the location in the session’s file structure where the drive contents should be mounted. This location is then accessed to load the weights in the code above.

That gives you a quick overview of Google Colaboratory, plus a couple of handy code snippets which will allow you to run TensorBoard normally from Colaboratory, and how to save and load files from Google Drive – a must for long training sessions. I hope this allows you to get the most out of this great way to test and build deep learning models, with free GPU time!


Eager to build deep learning systems? Learn how here