Introduction to ResNet in TensorFlow 2

ResNet layers and abstractions

In previous tutorials, I’ve explained convolutional neural networks (CNN) and shown how to code them. The convolutional layer has proven to be a great success in the area of image recognition and processing in machine learning. However, state of the art techniques don’t involve just a few CNN layers. Rather, they can be very deep, consisting of 10s to >100 numbers of layers. One of the most successful CNN architectures developed has been the ResNet architecture. It was first introduced in 2015 (see this paper) and won the ILSVRC 2015 image classification task. The winning ResNet consisted of a whopping 152 layers, and in order to successfully make a network that deep, a significant innovation in CNN architecture was developed for ResNet. This innovation will be discussed in this post, and an example ResNet architecture will be developed in TensorFlow 2 and compared to a standard architecture. Because of the training requirements for this task, I have developed the code in Google Colaboratory (which gives free GPU time – see my tutorial here), and the notebook can be found on this site’s Github repository.


Eager to build deep learning systems in TensorFlow 2? Get the book here


Introduction to the ResNet architecture

The degradation problem

The vanishing gradient problem was an initial barrier to making neural networks deeper and more powerful. However, as explained in this post, the problem has now largely been solved through the use of ReLU activations and batch normalization. Given this is true, and given enough computational power and data, we should be able to stack many CNN layers and dramatically increase classification accuracy, right? Well – to a degree. An early architecture, called the VGG-19 architecture, had 19 layers. However, this is a long way off the 152 layers of the version of ResNet that won the ILSVRC 2015 image classification task. The reason deeper networks were not successful prior to the ResNet architecture was due to something called the degradation problem. Note, this is not the vanishing gradient problem, but something else. It was observed that making the network deeper led to higher classification errors. One might think this is due to overfitting of the data – but not so fast, the degradation problem leads to higher training errors too! Consider the diagrams below from the original ResNet paper:

Illustration of degradation problem that ResNet solves

Illustration of degradation problem that ResNet solves

Note that the 56-layer network has higher test and training errors. Theoretically, this doesn’t make much sense. Let’s say the 20-layer network learns some mapping H(x) that gives a training error of 10%. If another 36 layers are added, we would expect that the error would at least not be any worse than 10%. Why? Well, the 36 extra layers, at worst, could just learn identity functions. In other words, the extra 36 layers could just learn to pass through the output from the first 20-layers of the network. This would give the same error of 10%. This doesn’t seem to happen though. It appears neural networks aren’t great at learning the identity function in deep architectures. Not only don’t they learn the identity function (and hence pass through the 20 layer error rate), they make things worse. Beyond a certain number of layers, they begin to degrade the performance of the network compared to shallower implementations. Here is where the ResNet architecture comes in.

The ResNet solution

The ResNet solution relies on making the identity function option explicit in the architecture, rather than relying on the network itself to learn the identity function where appropriate. It consists of building networks which consist of the following CNN blocks:

ResNet building block

ResNet building block from here

In the diagram above, the input tensor x enters the building block. This input then splits. On one path, the input is processed by two stacked convolutional layers (called a “weight layer” in the above). This path is the “standard” CNN processing part of the building block. The ResNet innovation is the “identity” path. Here, the input x is simply added to the output of the CNN component of the building block, F(x). The output from the block is then F(x) + x with a final ReLU activation applied at the end. This identity path in the ResNet building block allows the neural network to more easily pass through any abstractions learnt in previous layers. Alternatively, it can more easily build incremental abstractions on top of the abstractions learnt in the previous layers. What do I mean by this? The diagram below may help:

ResNet layers and abstractions

Layers and abstractions

Generally speaking, as CNN layers are added to a network, the network during training will learn lower level abstractions in the early layers (i.e lines, colours, corners, basic shapes etc.) and higher level abstractions in the later layers (groups of geometries, objects etc.). Let’s say that, when trying to classify an aircraft in an image, there are some mid-level abstractions which reliably signal that an aircraft is present. Say the shape of a jet engine near a wing (this is just an example). These abstractions might be able to be learnt in, say, 10 layers.  

However, if we add an additional 20 or more layers after these first 10 layers, these reliable signals may get degraded / obfuscated. The ResNet architecture gives the network a more explicit chance of muting further CNN abstractions on some filters by driving F(x) to zero, with the output of the block defaulting to its input x. Not only that, the ResNet architecture allows blocks to “tinker” more easily with the input. This is because the block only has to learn the incremental difference between the previous layer abstraction and the optimal output H(x). In other words, it has to learn F(x) = H(x) – x. This is a residual expression, hence the name ResNet. This, theoretically at least, should be easier to learn than the full expression H(x).

An (somewhat tortured) analogy might assist here. Say you are trying to draw the picture of a tree. Someone hands you a picture of a pencil outline of the main structure of the tree – the trunk, large branches, smaller branches etc. Now say you are somewhat proud, and you don’t want too much help in drawing the picture. So, you rub out parts of the pencil outline of the tree that you were handed. You then proceed to add some detail to the picture you were handed, but you have to redraw parts that you already rubbed out. This is kind of like the case of a standard non-ResNet network. Because layers seem to struggle to reproduce an identity function, at each subsequent layer they essentially erase or degrade some of the previous level abstractions and these need to be re-estimated (at least to an extent).

Alternatively, you, the artist, might not be too proud and you happily accept the pencil outline that you received. It is much easier to then add new details to what you have already been given. This is like what the ResNet blocks do – they take what they are give i.e. x and just make tweaks to it by adding F(x). This analogy isn’t perfect, but it should give you an idea of what is going on here, and how the ResNet blocks help the learning along.

A full 34-layer version of ResNet is (partially) illustrated below (from the original paper):

ResNet architecture

ResNet-34 architecture (partial)

The diagram above shows roughly the first half of the ResNet 34-layer architecture, along with the equivalent layers of the VGG-19 architecture and a “plain” version of the ResNet architecture. The “plain” version has the same CNN layers, but lacks the identity path previously presented in the ResNet building block. These identity paths can be seen looping around every second CNN layer on the right hand side of the ResNet (“residual”) architecture.

In the next section, I’m going to show you how to build a ResNet architecture in TensorFlow 2/Keras. In the example, we’ll compare both the “plain” and “residual” networks on the CIFAR-10 classification task. Note that for computational ease, I’ll only include 10 ResNet blocks.

Building ResNet in TensorFlow 2

As discussed previously, the code for this example can be found on this site’s Github repository. Importing the CIFAR-10 dataset can be performed easily by using the Keras datasets API:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import datetime as dt

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

We then perform some pre-processing of the training and test data. This pre-processing includes image renormalization (converting the data so it resides in the range [0,1]) and centrally cropping the image to 75% of it’s normal extents. Data augmentation is also performed by randomly flipping the image about the centre axis. This is performed using the TensorFlow Dataset API – more details on the code below can be found in this, this post and my book.

train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(64).shuffle(10000)
train_dataset = train_dataset.map(lambda x, y: (tf.cast(x, tf.float32) / 255.0, y))
train_dataset = train_dataset.map(lambda x, y: (tf.image.central_crop(x, 0.75), y))
train_dataset = train_dataset.map(lambda x, y: (tf.image.random_flip_left_right(x), y))
train_dataset = train_dataset.repeat()

valid_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(5000).shuffle(10000)
valid_dataset = valid_dataset.map(lambda x, y: (tf.cast(x, tf.float32) / 255.0, y))
valid_dataset = valid_dataset.map(lambda x, y: (tf.image.central_crop(x, 0.75), y))
valid_dataset = valid_dataset.repeat()

In this example, to build the network, we’re going to use the Keras Functional API, in the TensorFlow 2 context. Here is what the ResNet model definition looks like:

inputs = keras.Input(shape=(24, 24, 3))
x = layers.Conv2D(32, 3, activation='relu')(inputs)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D(3)(x)

num_res_net_blocks = 10
for i in range(num_res_net_blocks):
    x = res_net_block(x, 64, 3)

x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(10, activation='softmax')(x)

res_net_model = keras.Model(inputs, outputs)

First, we specify the input dimensions to Keras. The raw CIFAR-10 images have a size of (32, 32, 3) – but because we are performing central cropping of 75%, the post-processed images are of size (24, 24, 3). Next, we create 2 standard CNN layers, with 32 and 64 filters respectively (for more on convolutional layers, see this post and my book). The filter window sizes are 3 x 3, in line with the original ResNet architectures. Next some max pooling is performed and then it is time to produce some ResNet building blocks. In this case, 10 ResNet blocks are created by calling the res_net_block() function:

def res_net_block(input_data, filters, conv_size):
  x = layers.Conv2D(filters, conv_size, activation='relu', padding='same')(input_data)
  x = layers.BatchNormalization()(x)
  x = layers.Conv2D(filters, conv_size, activation=None, padding='same')(x)
  x = layers.BatchNormalization()(x)
  x = layers.Add()([x, input_data])
  x = layers.Activation('relu')(x)
  return x

The first few lines of this function are standard CNN layers with Batch Normalization, except the 2nd layer does not have an activation function (this is because one will be applied after the residual addition part of the block). After these two layers, the residual addition part, where the input data is added to the CNN output (F(x)), is executed. Here we can make use of the Keras Add layer, which simply adds two tensors together. Finally, a ReLU activation is applied to the result of this addition and the outcome is returned.

After the ResNet block loop is finished, some final layers are added. First, a final CNN layer is added, followed by a Global Average Pooling (GAP) layer (for more on GAP layers, see here). Finally, we have a couple of dense classification layers with a dropout layer in between. This model was trained over 30 epochs and then an alternative “plain” model was also created. This was created by taking the same architecture but replacing the res_net_block function with the following function:

def non_res_block(input_data, filters, conv_size):
  x = layers.Conv2D(filters, conv_size, activation='relu', padding='same')(input_data)
  x = layers.BatchNormalization()(x)
  x = layers.Conv2D(filters, conv_size, activation='relu', padding='same')(x)
  x = layers.BatchNormalization()(x)
  return x

Note that this function is simply two standard CNN layers, with no residual components included. The training code is as follows:

callbacks = [
  # Write TensorBoard logs to `./logs` directory
  keras.callbacks.TensorBoard(log_dir='./log/{}'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")), write_images=True),
]

res_net_model.compile(optimizer=keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])
res_net_model.fit(train_dataset, epochs=30, steps_per_epoch=195,
          validation_data=valid_dataset,
          validation_steps=3, callbacks=callbacks)

ResNet training and validation results

The accuracy results of the training of these two models can be observed below:

ResNet vs "plain" training accuracy

ResNet (red) vs “plain” (pink) training accuracy

 

ResNet vs "plain" testing accuracy

ResNet (blue) vs “plain” (green) training accuracy

As can be observed there is around a 5-6% improvement in the training accuracy from a ResNet architecture compared to the “plain” non-ResNet architecture. I have run this comparison a number of times and the 5-6% gap is consistent across the runs. These results illustrate the power of the ResNet idea, even for a relatively shallow 10 layer ResNet architecture. As demonstrated in the original paper, this effect will be more pronounced in deeper networks. Note that this network is not very well optimized, and the accuracy could be improved by running for more iterations. However, it is enough to show the benefits of the ResNet architecture. In future posts, I’ll demonstrate other ResNet-based architectures which can achieve even better results.  


Eager to build deep learning systems in TensorFlow 2? Get the book here

11 thoughts on “Introduction to ResNet in TensorFlow 2”

  1. I would like to thnkx for the efforts you have put in writing this website. I am hoping the same high-grade blog post from you in the upcoming also. Actually your creative writing abilities has inspired me to get my own web site now. Really the blogging is spreading its wings rapidly. Your write up is a good example of it.

  2. This is a really good tip particularly to those new to the blogosphere.
    Brief but very accurate information… Many thanks for sharing
    this one. A must read article!

  3. By way of introduction, I am Mark Schaefer, and I represent Nutritional Products International. We serve both international and domestic manufacturers who are seeking to gain more distribution within the United States. Your brand recently caught my attention, so I am contacting you today to discuss the possibility of expanding your national distribution reach.We provide expertise in all areas of distribution, and our offerings include the following: Turnkey/One-stop solution, Active accounts with major U.S. distributors and retailers, Our executive team held executive positions with Walmart and Amazon, Our proven sales force has public relations, branding, and marketing all under one roof, We focus on both new and existing product lines, Warehousing and logistics. Our company has a proven history of initiating accounts and placing orders with major distribution outlets. Our history allows us to have intimate and unique relationships with key buyers across the United States, thus giving your brand a fast track to market in a professional manner. Please contact me directly so that we can discuss your brand further. Kind Regards, Mark Schaefer, marks@nutricompany.com, VP of Business Development, Nutritional Products International, 101 Plaza Real S, Ste #224, Boca Raton, FL 33432, Office: 561-544-0719

  4. Насосы: тепловые насосы – циркуляционные насосы для вашей системы отопления.
    Это сердце любой системы отопления: циркуляционный насос, часто также называемый тепловым насосом. Беспричинно от того, работает ли порядок отопления на газе, масле, тепловом насосе или солнечной системе – без циркуляционного насоса сносный не работает. Он отвечает ради циркуляцию отопительной воды в системе отопления и изза то, воеже тепло доходило прежде потребителей тепла, таких как радиаторы. Посмотри циркуляционный насос для гвс выбрать смотри на сайте [url=https://www.nasoscirkulyacionnyi.ru]nasoscirkulyacionnyi.ru[/url] ибп для циркуляционного насоса как выбрать. Поскольку насосы должны мучиться до 5000 часов в год, они представляют собой одного из крупнейших потребителей энергии в домашнем хозяйстве, поэтому важно, чтобы они соответствовали последнему слову техники, поскольку современные высокоэффективные насосы потребляют прибл. На 80% меньше электроэнергии. Однако большинство старых насосов являются устаревшими и крупногабаритными, поэтому стоит взглянуть на котельную!
    Циркуляционные насосы ради отопления используются ради циркуляции отопительной воды в замкнутой системе отопления. Они надежно гарантируют, что теплая вода через теплогенератора направляется в радиаторы либо в систему теплого пола и возвращается вследствие обратку. Циркуляционный насос вынужден как преодолеть гидравлическое отпор арматуры и трубопроводов из-за замкнутой системы.
    Экономьте электроэнергию с помощью современных циркуляционных насосов.
    Современные циркуляционные насосы для систем отопления отличаются высокой эффективностью и низким энергопотреблением – всего маломальски ватт во эпоха работы. Сообразно сравнению с устаревшими насосами прошлых десятилетий существует великий потенциал экономии энергии. Именно поэтому стоит заменить заскорузлый насос отопления для современный высокоэффективный насос, что зачастую спешно окупается после счет небольшого энергопотребления. Циркуляционные насосы для систем отопления доступны в различных исполнениях, которые могут различаться сообразно максимальному напору и по резьбе подключения.
    Циркуляционный насос также нередко называют тепловым насосом сиречь насосом чтобы теплого пола. Сообразно сути, это одни и те же водяные насосы, только названы по-разному из-за их применения. В зависимости через области применения и, в частности, присоединительных размеров, у нас вы легко найдете подобающий насос.
    Циркуляционный насос – это насос, обеспечивающий снабжение систем отопления (теплой) водой. Циркуляционный насос можно встречать в каждом доме либо здании, где есть радиаторы и / сиречь полы с подогревом. Ежели лупить и усердный пол, и радиаторы, то циркуляционных насосов несколько. В больших домах / зданиях в большинстве случаев также устанавливаются несколько циркуляционных насосов чтобы обеспечения полной циркуляции. Фактически вы можете сказать, который циркуляционный насос – это душа вашей системы центрального отопления или системы теплых полов. Коли центральное отопление alias полы с подогревом не работают разве работают плохо, это многократно связано с циркуляционным насосом, кто надо заменить. В зависимости от системы отопления вы можете свободно встречать циркуляционный насос ради передней панелью вашего центрального отопления. Почти во всех случаях размеры насоса, которым вынужден совпадать другой циркуляционный насос из нашего ассортимента, указаны на вашем старом насосе. Буде это не так, вы можете легко найти марку и серийный часть в таблице обмена . Наши инструкции, приведенные ниже, помогут вам заменить насос отопления.
    Мы выбрали три насоса (центрального) отопления иначе циркуляционные насосы ради нашего ассортимента. Обладая этими тремя размерами, мы можем предложить замену теплового насоса для каждой установки, с через которой вы также напрямую сэкономите для расходах на электроэнергию. Подробнее об этом ниже! Опричь того, заменить циркуляционный насос, насос отопления разве насос теплого пола довольно просто. Это экономит затраты на установку. Ознакомьтесь с инструкциями и / тож видео ниже. Вестимо, вы также можете обещать установку теплового насоса у специалиста. Всетаки чтобы экономии средств рекомендуем запрещать циркуляционный насос у нас сообразно конкурентоспособной цене. Тутто вы платите один после установку у специалиста.

  5. Pingback: online casino slots mac

Leave a Reply

Your email address will not be published. Required fields are marked *