Neural Networks Tutorial – A Pathway to Deep Learning

Chances are, if you are searching for a tutorial on artificial neural networks (ANN) you already have some idea of what they are, and what they are capable of doing.  But did you know that neural networks are the foundation of the new and exciting field of deep learning?  Deep learning is the field of machine learning that is making many state-of-the-art advancements, from beating players at Go and Poker (reinforcement learning), to speeding up drug discovery and assisting self-driving cars.  If these types of cutting edge applications excite you like they excite me, then you will be interesting in learning as much as you can about deep learning.  However, that requires you to know quite a bit about how neural networks work.  This tutorial article is designed to help you get up to speed in neural networks as quickly as possible.

In this tutorial I’ll be presenting some concepts, code and maths that will enable you to build and understand a simple neural network.  Some tutorials focus only on the code and skip the maths – but this impedes understanding. I’ll take things as slowly as possible, but it might help to brush up on your matrices and differentiation if you need to. The code will be in Python, so it will be beneficial if you have a basic understanding of how Python works.  You’ll pretty much get away with knowing about Python functions, loops and the basics of the numpy library.  By the end of this neural networks tutorial you’ll be able to build an ANN in Python that will correctly classify handwritten digits in images with a fair degree of accuracy.

Once you’re done with this tutorial, you can dive a little deeper with the following posts:

Python TensorFlow Tutorial – Build a Neural Network
Improve your neural networks – Part 1 [TIPS AND TRICKS]
Stochastic Gradient Descent – Mini-batch and more

All of the relevant code in this tutorial can be found here.

Here’s an outline of the tutorial, with links, so you can easily navigate to the parts you want:

1 What are artificial neural networks?
2 The structure of an ANN
2.1 The artificial neuron
2.2 Nodes
2.3 The bias
2.4 Putting together the structure
2.5 The notation
3 The feed-forward pass
3.1 A feed-forward example
3.2 Our first attempt at a feed-forward function
3.3 A more efficient implementation
3.4 Vectorisation in neural networks
3.5 Matrix multiplication
4 Gradient descent and optimisation
4.1 A simple example in code
4.2 The cost function
4.3 Gradient descent in neural networks
4.4 A two dimensional gradient descent example
4.5 Backpropagation in depth
4.6 Propagating into the hidden layers
4.7 Vectorisation of backpropagation
4.8 Implementing the gradient descent step
4.9 The final gradient descent algorithm
5 Implementing the neural network in Python
5.1 Scaling data
5.2 Creating test and training datasets
5.3 Setting up the output layer
5.4 Creating the neural network
5.5 Assessing the accuracy of the trained model

1 What are artificial neural networks?

Artificial neural networks (ANNs) are software implementations of the neuronal structure of our brains.  We don’t need to talk about the complex biology of our brain structures, but suffice to say, the brain contains neurons which are kind of like organic switches.  These can change their output state depending on the strength of their electrical or chemical input.  The neural network in a person’s brain is a hugely interconnected network of neurons, where the output of any given neuron may be the input to thousands of other neurons.  Learning occurs by repeatedly activating certain neural connections over others, and this reinforces those connections.  This makes them more likely to produce a desired outcome given a specified input.  This learning involves feedback – when the desired outcome occurs, the neural connections causing that outcome become strengthened.

Artificial neural networks attempt to simplify and mimic this brain behaviour.  They can be trained in a supervised or unsupervised manner.  In a supervised ANN, the network is trained by providing matched input and output data samples, with the intention of getting the ANN to provide a desired output for a given input.  An example is an e-mail spam filter – the input training data could be the count of various words in the body of the e-mail, and the output training data would be a classification of whether the e-mail was truly spam or not.  If many examples of e-mails are passed through the neural network this allows the network to learn what input data makes it likely that an e-mail is spam or not.  This learning takes place be adjusting the weights of the ANN connections, but this will be discussed further in the next section.

Unsupervised learning in an ANN is an attempt to get the ANN to “understand” the structure of the provided input data “on its own”.  This type of ANN will not be discussed in this post.

2 The structure of an ANN

2.1 The artificial neuron

The biological neuron is simulated in an ANN by an activation function. In classification tasks (e.g. identifying spam e-mails) this activation function has to have a “switch on” characteristic – in other words, once the input is greater than a certain value, the output should change state i.e. from 0 to 1, from -1 to 1 or from 0 to >0. This simulates the “turning on” of a biological neuron. A common activation function that is used is the sigmoid function:

\begin{equation*}
f(z) = \frac{1}{1+exp(-z)}
\end{equation*}

Which looks like this:

import matplotlib.pylab as plt
import numpy as np
x = np.arange(-8, 8, 0.1)
f = 1 / (1 + np.exp(-x))
plt.plot(x, f)
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show()

As can be seen in the figure above, the function is “activated” i.e. it moves from 0 to 1 when the input x is greater than a certain value. The sigmoid function isn’t a step function however, the edge is “soft”, and the output doesn’t change instantaneously. This means that there is a derivative of the function and this is important for the training algorithm which is discussed more in Section 4.

2.2 Nodes

As mentioned previously, biological neurons are connected hierarchical networks, with the outputs of some neurons being the inputs to others. We can represent these networks as connected layers of nodes. Each node takes multiple weighted inputs, applies the activation function to the summation of these inputs, and in doing so generates an output. I’ll break this down further, but to help things along, consider the diagram below:

Figure 2. Node with inputs

The circle in the image above represents the node. The node is the “seat” of the activation function, and takes the weighted inputs, sums them, then inputs them to the activation function. The output of the activation function is shown as h in the above diagram. Note: a node as I have shown above is also called a perceptron in some literature.

What about this “weight” idea that has been mentioned? The weights are real valued numbers (i.e. not binary 1s or 0s), which are multiplied by the inputs and then summed up in the node. So, in other words, the weighted input to the node above would be:

\begin{equation*}
x_1w_1 + x_2w_2 + x_3w_3 + b
\end{equation*}

Here the $w_i$ values are weights (ignore the $b$ for the moment).  What are these weights all about?  Well, they are the variables that are changed during the learning process, and, along with the input, determine the output of the node.  The $b$ is the weight of the +1 bias element – the inclusion of this bias enhances the flexibility of the node, which is best demonstrated in an example.

2.3 The bias

Let’s take an extremely simple node, with only one input and one output:

 

Figure 2. Simple node

The input to the activation function of the node in this case is simply $x_1w_1$.  What does changing $w_1$ do in this simple network?

w1 = 0.5
w2 = 1.0
w3 = 2.0
l1 = 'w = 0.5'
l2 = 'w = 1.0'
l3 = 'w = 2.0'
for w, l in [(w1, l1), (w2, l2), (w3, l3)]:
    f = 1 / (1 + np.exp(-x*w))
    plt.plot(x, f, label=l)
plt.xlabel('x')
plt.ylabel('h_w(x)')
plt.legend(loc=2)
plt.show()

Figure 4. Effect of adjusting weights

Here we can see that changing the weight changes the slope of the output of the sigmoid activation function, which is obviously useful if we want to model different strengths of relationships between the input and output variables.  However, what if we only want the output to change when x is greater than 1?  This is where the bias comes in – let’s consider the same network with a bias input:

Figure 5. Effect of bias

 

w = 5.0
b1 = -8.0
b2 = 0.0
b3 = 8.0
l1 = 'b = -8.0'
l2 = 'b = 0.0'
l3 = 'b = 8.0'
for b, l in [(b1, l1), (b2, l2), (b3, l3)]:
    f = 1 / (1 + np.exp(-(x*w+b)))
    plt.plot(x, f, label=l)
plt.xlabel('x')
plt.ylabel('h_wb(x)')
plt.legend(loc=2)
plt.show()

Figure 6. Effect of bias adjusments

In this case, the $w_1$ has been increased to simulate a more defined “turn on” function.  As you can see, by varying the bias “weight” $b$, you can change when the node activates.  Therefore, by adding a bias term, you can make the node simulate a generic if function, i.e. if (x > z) then 1 else 0.  Without a bias term, you are unable to vary the z in that if statement, it will be always stuck around 0.  This is obviously very useful if you are trying to simulate conditional relationships.

2.4 Putting together the structure

Hopefully the previous explanations have given you a good overview of how a given node/neuron/perceptron in a neural network operates.  However, as you are probably aware, there are many such interconnected nodes in a fully fledged neural network.  These structures can come in a myriad of different forms, but the most common simple neural network structure consists of an input layer, a hidden layer and an output layer.  An example of such a structure can be seen below:

Figure 10. Three layer neural network

The three layers of the network can be seen in the above figure – Layer 1 represents the input layer, where the external input data enters the network. Layer 2 is called the hidden layer as this layer is not part of the input or output. Note: neural networks can have many hidden layers, but in this case for simplicity I have just included one. Finally, Layer 3 is the output layer. You can observe the many connections between the layers, in particular between Layer 1 (L1) and Layer 2 (L2). As can be seen, each node in L1 has a connection to all the nodes in L2. Likewise for the nodes in L2 to the single output node L3. Each of these connections will have an associated weight.

2.5 The notation

The maths below requires some fairly precise notation so that we know what we are talking about.  The notation I am using here is similar to that used in the Stanford deep learning tutorial.  In the upcoming equations, each of these weights are identified with the following notation: ${w_{ij}}^{(l)}$. $i$ refers to the node number of the connection in layer $l+1$ and $j$ refers to the node number of the connection in layer $l$. Take special note of this order. So, for the connection between node 1 in layer 1 and node 2 in layer 2, the weight notation would be ${w_{21}}^{(1)}$. This notation may seem a bit odd, as you would expect the *i* and *j* to refer the node numbers in layers $l$ and $l+1$ respectively (i.e. in the direction of input to output), rather than the opposite. However, this notation makes more sense when you add the bias.

As you can observe in the figure above – the (+1) bias is connected to each of the nodes in the subsequent layer. So the bias in layer 1 is connected to the all the nodes in layer two. Because the bias is not a true node with an activation function, it has no inputs (it always outputs the value +1). The notation of the bias weight is ${b_i}^{(l)}$, where *i* is the node number in the layer $l+1$ – the same as used for the normal weight notation ${w_{21}}^{(1)}$. So, the weight on the connection between the bias in layer 1 and the second node in layer 2 is given by ${b_2}^{(1)}$.

Remember, these values – ${w_{ji}}^{(1)}$ and ${b_i}^{(l)}$ – all need to be calculated in the training phase of the ANN.

Finally, the node output notation is ${h_j}^{(l)}$, where $j$ denotes the node number in layer $l$ of the network. As can be observed in the three layer network above, the output of node 2 in layer 2 has the notation of ${h_2}^{(2)}$.

Now that we have the notation all sorted out, it is now time to look at how you calculate the output of the network when the input and the weights are known. The process of calculating the output of the neural network given these values is called the feed-forward pass or process.

3 The feed-forward pass

To demonstrate how to calculate the output from the input in neural networks, let’s start with the specific case of the three layer neural network that was presented above. Below it is presented in equation form, then it will be demonstrated with a concrete example and some Python code:

\begin{align}
h_1^{(2)} &= f(w_{11}^{(1)}x_1 + w_{12}^{(1)} x_2 + w_{13}^{(1)} x_3 + b_1^{(1)}) \\
h_2^{(2)} &= f(w_{21}^{(1)}x_1 + w_{22}^{(1)} x_2 + w_{23}^{(1)} x_3 + b_2^{(1)}) \\
h_3^{(2)} &= f(w_{31}^{(1)}x_1 + w_{32}^{(1)} x_2 + w_{33}^{(1)} x_3 + b_3^{(1)}) \\
h_{W,b}(x) &= h_1^{(3)} = f(w_{11}^{(2)}h_1^{(2)} + w_{12}^{(2)} h_2^{(2)} + w_{13}^{(2)} h_3^{(2)} + b_1^{(2)})
\end{align}

In the equation above $f(\bullet)$ refers to the node activation function, in this case the sigmoid function. The first line, ${h_1}^{(2)}$ is the output of the first node in the second layer, and its inputs are $w_{11}^{(1)}x_1$, $w_{12}^{(1)} x_2$, $w_{13}^{(1)}x_3$ and $b_1^{(1)}$. These inputs can be traced in the three-layer connection diagram above. They are simply summed and then passed through the activation function to calculate the output of the first node. Likewise, for the other two nodes in the second layer.

The final line is the output of the only node in the third and final layer, which is ultimate output of the neural network. As can be observed, rather than taking the weighted input variables ($x_1, x_2, x_3$), the final node takes as input the weighted output of the nodes of the second layer ($h_{1}^{(2)}$, $h_{2}^{(2)}$, $h_{3}^{(2)}$), plus the weighted bias. Therefore, you can see in equation form the hierarchical nature of artificial neural networks.

3.1 A feed-forward example

Now, let’s do a simple first example of the output of this neural network in Python. First things first, notice that the weights between layer 1 and 2 ($w_{11}^{(1)}, w_{12}^{(1)}, \dots$) are ideally suited to matrix representation? Observe:

\begin{equation}
W^{(1)} =
\begin{pmatrix}
w_{11}^{(1)} & w_{12}^{(1)} & w_{13}^{(1)} \\
w_{21}^{(1)} & w_{22}^{(1)} & w_{23}^{(1)} \\
w_{31}^{(1)} & w_{32}^{(1)} & w_{33}^{(1)} \\
\end{pmatrix}
\end{equation}

This matrix can be easily represented using numpy arrays:

import numpy as np
w1 = np.array([[0.2, 0.2, 0.2], [0.4, 0.4, 0.4], [0.6, 0.6, 0.6]])

Here I have just filled up the layer 1 weight array with some example weights. We can do the same for the layer 2 weight array:

\begin{equation}
W^{(2)} =
\begin{pmatrix}
w_{11}^{(2)} & w_{12}^{(2)} & w_{13}^{(2)}
\end{pmatrix}
\end{equation}

w2 = np.zeros((1, 3))
w2[0,:] = np.array([0.5, 0.5, 0.5])

We can also setup some dummy values in the layer 1 bias weight array/vector, and the layer 2 bias weight (which is only a single value in this neural network structure – i.e. a scalar):

b1 = np.array([0.8, 0.8, 0.8])
b2 = np.array([0.2])

Finally, before we write the main program to calculate the output from the neural network, it’s handy to setup a separate Python function for the activation function:

def f(x):
    return 1 / (1 + np.exp(-x))

3.2 Our first attempt at a feed-forward function

Below is a simple way of calculating the output of the neural network, using nested loops in python.  We’ll look at more efficient ways of calculating the output shortly.

def simple_looped_nn_calc(n_layers, x, w, b):
    for l in range(n_layers-1):
        #Setup the input array which the weights will be multiplied by for each layer
        #If it's the first layer, the input array will be the x input vector
        #If it's not the first layer, the input to the next layer will be the 
        #output of the previous layer
        if l == 0:
            node_in = x
        else:
            node_in = h
        #Setup the output array for the nodes in layer l + 1
        h = np.zeros((w[l].shape[0],))
        #loop through the rows of the weight array
        for i in range(w[l].shape[0]):
            #setup the sum inside the activation function
            f_sum = 0
            #loop through the columns of the weight array
            for j in range(w[l].shape[1]):
                f_sum += w[l][i][j] * node_in[j]
            #add the bias
            f_sum += b[l][i]
            #finally use the activation function to calculate the
            #i-th output i.e. h1, h2, h3
            h[i] = f(f_sum)
    return h

This function takes as input the number of layers in the neural network, the x input array/vector, then Python tuples or lists of the weights and bias weights of the network, with each element in the tuple/list representing a layer $l$ in the network.  In other words, the inputs are setup in the following:

w = [w1, w2]
b = [b1, b2]
#a dummy x input vector
x = [1.5, 2.0, 3.0]

The function first checks what the input is to the layer of nodes/weights being considered. If we are looking at the first layer, the input to the second layer nodes is the input vector $x$ multiplied by the relevant weights. After the first layer though, the inputs to subsequent layers are the output of the previous layers. Finally, there is a nested loop through the relevant $i$ and $j$ values of the weight vectors and the bias. The function uses the dimensions of the weights for each layer to figure out the number of nodes and therefore the structure of the network.

Calling the function:

simple_looped_nn_calc(3, x, w, b)

gives the output of 0.8354.  We can confirm this results by manually performing the calculations in the original equations:

\begin{align}
h_1^{(2)} &= f(0.2*1.5 + 0.2*2.0 + 0.2*3.0 + 0.8) = 0.8909 \\
h_2^{(2)} &= f(0.4*1.5 + 0.4*2.0 + 0.4*3.0 + 0.8) = 0.9677 \\
h_3^{(2)} &= f(0.6*1.5 + 0.6*2.0 + 0.6*3.0 + 0.8) = 0.9909 \\
h_{W,b}(x) &= h_1^{(3)} = f(0.5*0.8909 + 0.5*0.9677 + 0.5*0.9909 + 0.2) = 0.8354
\end{align}

3.3 A more efficient implementation

As was stated earlier – using loops isn’t the most efficient way of calculating the feed forward step in Python. This is because the loops in Python are notoriously slow. An alternative, more efficient mechanism of doing the feed forward step in Python and numpy will be discussed shortly. We can benchmark how efficient the algorithm is by using the %timeit function in IPython, which runs the function a number of times and returns the average time that the function takes to run:

%timeit simple_looped_nn_calc(3, x, w, b)

Running this tells us that the looped feed forward takes $40\mu s$. A result in the tens of microseconds sounds very fast, but when applied to very large practical NNs with 100s of nodes per layer, this speed will become prohibitive, especially when training the network, as will become clear later in this tutorial.  If we try a four layer neural network using the same code, we get significantly worse performance – $70\mu s$ in fact.

3.4 Vectorisation in neural networks

There is a way to write the equations even more compactly, and to calculate the feed forward process in neural networks more efficiently, from a computational perspective.  Firstly, we can introduce a new variable $z_{i}^{(l)}$ which is the summated input into node $i$ of layer $l$, including the bias term.  So in the case of the first node in layer 2, $z$ is equal to:

$$z_{1}^{(2)} = w_{11}^{(1)}x_1 + w_{12}^{(1)} x_2 + w_{13}^{(1)} x_3 + b_1^{(1)} = \sum_{j=1}^{n} w_{ij}^{(1)}x_i + b_{i}^{(1)}$$

where n is the number of nodes in layer 1.  Using this notation, the unwieldy previous set of equations for the example three layer network can be reduced to:

\begin{align}
z^{(2)} &= W^{(1)} x + b^{(1)} \\
h^{(2)} &= f(z^{(2)}) \\
z^{(3)} &= W^{(2)} h^{(2)} + b^{(2)} \\
h_{W,b}(x) &= h^{(3)} = f(z^{(3)})
\end{align}

Note the use of capital W to denote the matrix form of the weights.  It should be noted that all of the elements in the above equation are now matrices / vectors.  If you’re unfamiliar with these concepts, they will be explained more fully in the next section. Can the above equation be simplified even further?  Yes, it can.  We can forward propagate the calculations through any number of layers in the neural network by generalising:

\begin{align}
z^{(l+1)} &= W^{(l)} h^{(l)} + b^{(l)}   \\
h^{(l+1)} &= f(z^{(l+1)})
\end{align}

Here we can see the general feed forward process, where the output of layer $l$ becomes the input to layer $l+1$. We know that $h^{(1)}$ is simply the input layer $x$ and $h^{(n_l)}$ (where $n_l$ is the number of layers in the network) is the output of the output layer. Notice in the above equations that we have dropped references to the node numbers $i$ and $j$ – how can we do this? Don’t we still have to loop through and calculate all the various node inputs and outputs?

The answer is that we can use matrix multiplications to do this more simply. This process is called “vectorisation” and it has two benefits – first, it makes the code less complicated, as you will see shortly. Second, we can use fast linear algebra routines in Python (and other languages) rather than using loops, which will speed up our programs. Numpy can handle these calculations easily. First, for those who aren’t familiar with matrix operations, the next section is a brief recap.

3.5 Matrix multiplication

Let’s expand out $z^{(l+1)} = W^{(l)} h^{(l)} + b^{(l)}$ in explicit matrix/vector form for the input layer (i.e. $h^{(l)} = x$):

\begin{align}
z^{(2)} &=
\begin{pmatrix}
w_{11}^{(1)} & w_{12}^{(1)} & w_{13}^{(1)} \\
w_{21}^{(1)} & w_{22}^{(1)} & w_{23}^{(1)} \\
w_{31}^{(1)} & w_{32}^{(1)} & w_{33}^{(1)} \\
\end{pmatrix}
\begin{pmatrix}
x_{1} \\
x_{2} \\
x_{3} \\
\end{pmatrix} +
\begin{pmatrix}
b_{1}^{(1)} \\
b_{2}^{(1)} \\
b_{3}^{(1)} \\
\end{pmatrix} \\
&=
\begin{pmatrix}
w_{11}^{(1)}x_{1} + w_{12}^{(1)}x_{2} + w_{13}^{(1)}x_{3} \\
w_{21}^{(1)}x_{1} + w_{22}^{(1)}x_{2} + w_{23}^{(1)}x_{3} \\
w_{31}^{(1)}x_{1} + w_{32}^{(1)}x_{2} + w_{33}^{(1)}x_{3} \\
\end{pmatrix} +
\begin{pmatrix}
b_{1}^{(1)} \\
b_{2}^{(1)} \\
b_{3}^{(1)} \\
\end{pmatrix} \\
&=
\begin{pmatrix}
w_{11}^{(1)}x_{1} + w_{12}^{(1)}x_{2} + w_{13}^{(1)}x_{3} + b_{1}^{(1)} \\
w_{21}^{(1)}x_{1} + w_{22}^{(1)}x_{2} + w_{23}^{(1)}x_{3} + b_{2}^{(1)} \\
w_{31}^{(1)}x_{1} + w_{32}^{(1)}x_{2} + w_{33}^{(1)}x_{3} + b_{3}^{(1)} \\
\end{pmatrix} \\
\end{align}

For those who aren’t aware of how matrix multiplication works, it is a good idea to scrub up on matrix operations. There are many sites which cover this well. However, just quickly, when the weight matrix is multiplied by the input layer vector, each element in the $row$ of the weight matrix is multiplied by each element in the single $column$ of the input vector, then summed to create a new (3 x 1) vector. Then you can simply add the bias weights vector to achieve the final result.

You can observe how each row of the final result above corresponds to the argument of the activation function in the original non-matrix set of equations above. If the activation function is capable of being applied element-wise (i.e. to each row separately in the $z^{(1)}$ vector), then we can do all our calculations using matrices and vectors rather than slow Python loops. Thankfully, numpy allows us to do just that, with reasonably fast matrix operations and element-wise functions. Let’s have a look at a much more simplified (and faster) version of the simple_looped_nn_calc:

def matrix_feed_forward_calc(n_layers, x, w, b):
    for l in range(n_layers-1):
        if l == 0:
            node_in = x
        else:
            node_in = h
        z = w[l].dot(node_in) + b[l]
        h = f(z)
    return h

Note line 7  where the matrix multiplication occurs – if you just use the $*$ symbol when multiplying the weights by the node input vector in numpy it will attempt to perform some sort of element-wise multiplication, rather than the true matrix multiplication that we desire. Therefore you need to use the a.dot(b) notation when performing matrix multiplication in numpy.

If we perform %timeit again using this new function and a simple 4 layer network, we only get an improvement of $24\mu s$ (a reduction from $70\mu s$ to $46\mu s$).  However, if we increase the size of the 4 layer network to layers of 100-100-50-10 nodes the results are much more impressive.  The Python looped based method takes a whopping $41ms$ – note, that is milliseconds, and the vectorised implementation only takes $84\mu s$ to forward propagate through the neural network.  By using vectorised calculations instead of Python loops we have increased the efficiency of the calculation 500 fold! That’s a huge improvement. There is even the possibility of faster implementations of matrix operations using deep learning packages such as TensorFlow and Theano which utilise your computer’s GPU (rather than the CPU), the architecture of which is more suited to fast matrix computations  (I have a TensorFlow tutorial post also).

That brings us to an end of the feed-forward introduction for neural networks.  The next section will deal with how to actually train a neural network so that it can perform classification tasks, using gradient descent and backpropagation.

4 Gradient descent and optimisation

As mentioned in Section 1, the setting of the values of the weights which link the layers in the network is what constitutes the training of the system. In supervised learning, the idea is to reduce the error between the input and the desired output. So if we have a neural network with one output layer, and given some input $x$ we want the neural network to output a 2, yet the network actually produces a 5, a simple expression of the error is $abs(2-5)=3$. For the mathematically minded, this would be the $L^1$ norm of the error (don’t worry about it if you don’t know what this is).

The idea of supervised learning is to provide many input-output pairs of known data and vary the weights based on these samples so that the error expression is minimised. We can specify these input-output pairs as $\{ (x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)}) \}$ where $m$ is the number of training samples that we have on hand to train the weights of the network. Each of these inputs or outputs can be vectors – that is $x^{(1)}$ is not necessarily just one value, it could be an $N$ dimensional series of values. For instance, let’s say that we’re training a spam-detection neural network – in such a case $x^{(1)}$ could be a count of all the different significant words in an e-mail e.g.:

\begin{align}
x^{(1)} &=
\begin{pmatrix}
No. of “prince” \\
No. of “nigeria” \\
No. of “extension” \\
\vdots \\
No. of “mum” \\
No. of “burger” \\
\end{pmatrix} \\
&=
\begin{pmatrix}
2 \\
2 \\
0 \\
\vdots \\
0 \\
1 \\
\end{pmatrix}
\end{align}

$y^{(1)}$ in this case could be a single scalar value, either a 1 or a 0 to designate whether the e-mail is spam or not. Or, in other applications it could be a $K$ dimensional vector. As an example, say we have input $x$ that is a vector of the pixel greyscale readings of a photograph.  We also have an output $y$ that is a 26 dimensional vector that designates, with a 1 or 0, what letter of the alphabet is shown in the photograph i.e. $(1, 0, \ldots, 0)$ for a, $(0, 1, \ldots, 0)$ for b and so on.  This 26 dimensional output vector could be used to classify letters in photographs.

In training the network with these $(x, y)$ pairs, the goal is to get the neural network better and better at predicting the correct $y$ given $x$. This is performed by varying the weights so as to minimize the error. How do we know how to vary the weights, given an error in the output of the network? This is where the concept of gradient descent comes in handy. Consider the diagram below:

 

Figure 8. Simple, one-dimensional gradient descent

In this diagram we have a blue plot of the error depending on a single scalar weight value, $w$. The minimum possible error is marked by the black cross, but we don’t know what $w$ value gives that minimum error. We start out at a random value of $w$, which gives an error marked by the red dot on the curve labelled with “1”. We need to change $w$ in a way to approach that minimum possible error, the black cross. One of the most common ways of approaching that value is called gradient descent.

To proceed with this method, first the gradient of the error with respect to $w$ is calculated at point “1”. For those who don’t know, the gradient is the slope of the error curve at that point. It is shown in the diagram above by the black arrow which “pierces” point “1”. The gradient also gives directional information – if it is positive with respect to an increase in $w$, a step in that direction will lead to an increase in the error. If it is negative with respect to an increase in $w$ (as it is in the diagram above), a step in that will lead to a decrease in the error. Obviously, we wish to make a step in $w$ that will lead to a decrease in the error. The magnitude of the gradient or the “steepness” of the slope, gives an indication of how fast the error curve or function is changing at that point. The higher the magnitude of the gradient, the faster the error is changing at that point with respect to $w$.

The gradient descent method uses the gradient to make an informed step change in $w$ to lead it towards the minimum of the error curve. This is an iterative method, that involves multiple steps. Each time, the $w$ value is updated according to:

\begin{equation}
w_{new} = w_{old} – \alpha * \nabla error
\end{equation}

Here $w_{new}$ denotes the new $w$ position, $w_{old}$ denotes the current or old $w$ position, $\nabla error$ is the gradient of the error at $w_{old}$ and $\alpha$ is the step size. The step size $\alpha$ will determine how quickly the solution converges on the minimum error. However, this parameter has to be tuned – if it is too large, you can imagine the solution bouncing around on either side of the minimum in the above diagram. This will result in an optimisation of $w$ that does not converge. As this iterative algorithm approaches the minimum, the gradient or change in the error with each step will reduce. You can see in the graph above that the gradient lines will “flatten out” as the solution point approaches the minimum. As the solution approaches the minimum error, because of the decreasing gradient, it will result in only small improvements to the error.  When the solution approaches this “flattening” out of the error we want to exit the iterative process.  This exit can be performed by either stopping after a certain number of iterations or via some sort of “stop condition”.  This stop condition might be when the change in the error drops below a certain limit, often called the precision.

4.1 A simple example in code

Below is an example of a simple Python implementation of gradient descent for solving the minimum of the equation $f(x) = x^4 – 3x^3 + 2$ taken from Wikipedia.  The gradient of this function is able to be calculated analytically (i.e. we can do it easily using calculus, which we can’t do with many real world applications) and is $f'(x) = 4x^3 – 9x^2$. This means at every value of $x$, we can calculate the gradient of the function by using a simple equation. Again, using calculus we can know that the exact minimum of this equation is $x = 2.25$ .

x_old = 0 # The value does not matter as long as abs(x_new - x_old) > precision
x_new = 6 # The algorithm starts at x=6
gamma = 0.01 # step size
precision = 0.00001

def df(x):
    y = 4 * x**3 - 9 * x**2
    return y

while abs(x_new - x_old) > precision:
    x_old = x_new
    x_new += -gamma * df(x_old)

print("The local minimum occurs at %f" % x_new)

This function prints “The local minimum occurs at 2.249965”, which agrees with the exact solution within the precision.  This code implements the weight adjustment algorithm that I showed above, and can be seen to find the minimum of the function correctly within the given precision. This is a very simple example of gradient descent, and finding the gradient works quite differently when training neural networks. However, the main idea remains – we figure out the gradient of the neural network then adjust the weights in a step to try to get closer to the minimum error that we are trying to find. Another difference between this toy example of gradient descent is that the weight vector is multi-dimensional, and therefore the gradient descent method must search a multi-dimensional space for the minimum point.

The way we figure out the gradient of a neural network is via the famous backpropagation method, which will be discussed shortly. First however, we have to look at the error function more closely.

4.2 The cost function

Previously, we’ve talked about iteratively minimising the error of the output of the neural network by varying the weights in gradient descent. However, as it turns out, there is a mathematically more generalised way of looking at things that allows us to reduce the error while also preventing things like overfitting (this will be discussed more in later articles). This more general optimisation formulation revolves around minimising what’s called the cost function. The equivalent cost function of a single training pair ($x^z$, $y^z$) in a neural network is:

\begin{align}
J(w,b,x,y) &= \frac{1}{2} \parallel y^z – h^{(n_l)}(x^z) \parallel ^2 \\
&= \frac{1}{2} \parallel y^z – y_{pred}(x^z) \parallel ^2
\end{align}

This shows the cost function of the $z_{th}$ training sample, where $h^{(n_l)}$ is the output of the final layer of the neural network i.e. the output of the neural network. I’ve also represented $h^{(n_l)}$ as $y_{pred}$ to highlight the prediction of the neural network given $x^z$. The two vertical lines represent the $L^2$ norm of the error, or what is known as the sum-of-squares error (SSE). SSE is a very common way of representing the error of a machine learning system. Instead of taking just the absolute error $abs(y_{pred}(x^z) – y^z)$, we use the square of the error. There are many reasons why the SSE is often used which will not be discussed here – suffice to say that this is a very common way of representing the errors in machine learning. The $\frac{1}{2}$ out the front is just a constant added that tidies things up when we differentiate the cost function, which we’ll be doing when we perform backpropagation.

Note that the formulation for the cost function above is for a single $(x,y)$ training pair. We want to minimise the cost function over all of our $m$ training pairs. Therefore, we want to find the minimum *mean squared error* (MSE) over all the training samples:

\begin{align}
J(w,b) &= \frac{1}{m} \sum_{z=0}^m \frac{1}{2} \parallel y^z – h^{(n_l)}(x^z) \parallel ^2 \\
&= \frac{1}{m} \sum_{z=0}^m J(W, b, x^{(z)}, y^{(z)})
\end{align}

So, how do you use the cost function $J$ above to train the weights of our network? Using gradient descent and backpropagation. First, let’s look at gradient descent more closely in neural networks.

4.3 Gradient descent in neural networks

Gradient descent for every weight $w_{(ij)}^{(l)}$ and every bias $b_i^{(l)}$ in the neural network looks like the following:

\begin{align}
w_{ij}^{(l)} &= w_{ij}^{(l)} – \alpha \frac{\partial}{\partial w_{ij}^{(l)}} J(w,b) \\
b_{i}^{(l)} &= b_{i}^{(l)} – \alpha \frac{\partial}{\partial b_{i}^{(l)}} J(w,b)
\end{align}

Basically, the equation above is similiar to the previously shown gradient descent algorithm: $w_{new} = w_{old} – \alpha * \nabla error$. The new and old subscripts are missing, but the values on the left side of the equation are new and the values on the right side are old. Again, we have an iterative process whereby the weights are updated in each iteration, this time based on the cost function $J(w,b)$.

The values $\frac{\partial}{\partial w_{ij}^{(l)}}$ and $\frac{\partial}{\partial b_{i}^{(l)}}$ are the partial derivatives of the single sample cost function based on the weight values. What does this mean? Recall that for the simple gradient descent example mentioned previously, each step depends on the slope of the error/cost term with respect to the weights. Another word for slope or gradient is the derivative. A normal derivative has the notation $\frac{d}{dx}$. If $x$ in this instance is a vector, then such a derivative will also be a vector, displaying the gradient in all the dimensions of $x$.

4.4 A two dimensional gradient descent example

Let’s take the example of a standard two-dimensional gradient descent problem. Below is a diagram of an iterative two-dimensional gradient descent run:

Figure 9. Two-dimensional gradient descent

The blue lines in the above diagram are the contour lines of the cost function – designating regions with an error value that is approximately the same. As can be observed in the diagram above, each step ($p_1 \to p_2 \to p_3$) in the gradient descent involves a gradient or derivative that is an arrow/vector.  This vector spans both the $[x_1, x_2]$ dimensions, as the solution works its way towards the minimum in the centre. So, for instance, the derivative evaluated at $p_1$ might be $\frac {d}{dx} = [2.1, 0.7]$, where the derivative is a vector to designate the two directions. The partial derivative $\frac {\partial}{\partial x_1}$ in this case would be a scalar $\to [2.1]$ – in other words, it is the gradient in only one direction of the search space ($x_1$). In gradient descent, it is often the case that the partial derivative of all the possible search directions are calculated, then “gathered up” to determine a new, complete, step direction.

In neural networks, we don’t have a simple cost function where we can easily evaluate the gradient, like we did in our toy gradient descent example ($f(x) = x^4 – 3x^3 + 2$). In fact, things are even trickier. While we can compare the output of the neural network to our expected training value, $y^{(z)}$ and feasibly look at how changing the weights of the output layer would change the cost function for the sample (i.e. calculating the gradient), how on earth do we do that for all the hidden layers of the network?

The answer to that is the backpropagation method. This method allows us to “share” the cost function or error to all the weights in the network – or in other words, it allows us to determine how much of the error is caused by any given weight.

4.5 Backpropagation in depth

In this section, I’m going to delve into the maths a little. If you’re wary of the maths of how backpropagation works, then it may be best to skip this section.  The next section will show you how to implement backpropagation in code – so if you want to skip straight on to using this method, feel free to skip the rest of this section. However, if you don’t mind a little bit of maths, I encourage you to push on to the end of this section as it will give you a good depth of understanding in training neural networks. This will be invaluable to understanding some of the key ideas in deep learning, rather than just being a code cruncher who doesn’t really understand how the code works.

First let’s recall some of the foundational equations from Section 3 for the following three layer neural network:

Figure 10. Three layer neural network (again)

The output of this neural network can be calculated by:

\begin{equation}
h_{W,b}(x) = h_1^{(3)} = f(w_{11}^{(2)}h_1^{(2)} + w_{12}^{(2)} h_2^{(2)} + w_{13}^{(2)} h_3^{(2)} + b_1^{(2)})
\end{equation}

We can also simplify the above to $h_1^{(3)} = f(z_1^{(2)})$ by defining $z_1^{(2)}$ as:

$$z_{1}^{(2)} = w_{11}^{(2)}h_1^{(2)} + w_{12}^{(2)} h_2^{(2)} + w_{13}^{(2)} h_3^{(2)} + b_1^{(2)}$$

Let’s say we want to find out how much a change in the weight $w_{12}^{(2)}$ has on the cost function $J$. This is to evaluate $\frac {\partial J}{\partial w_{12}^{(2)}}$. To do so, we have to use something called the chain function:

$$\frac {\partial J}{\partial w_{12}^{(2)}} = \frac {\partial J}{\partial h_1^{(3)}} \frac {\partial h_1^{(3)}}{\partial z_1^{(2)}} \frac {\partial z_1^{(2)}}{\partial w_{12}^{(2)}}$$

If you look at the terms on the right – the numerators “cancel out” the denominators, in the same way that $\frac {2}{5} \frac {5}{2} = \frac {2}{2} = 1$. Therefore we can construct $\frac {\partial J}{\partial w_{12}^{(2)}}$ by stringing together a few partial derivatives (which are quite easy, thankfully). Let’s start with $\frac {\partial z_1^{(2)}}{\partial w_{12}^{(2)}}$:

\begin{align}
\frac {\partial z_1^{(2)}}{\partial w_{12}^{(2)}} &= \frac {\partial}{\partial w_{12}^{(2)}} (w_{11}^{(1)}h_1^{(2)} + w_{12}^{(1)} h_2^{(2)} + w_{13}^{(1)} h_3^{(2)} + b_1^{(1)})\\
&= \frac {\partial}{\partial w_{12}^{(2)}} (w_{12}^{(1)} h_2^{(2)})\\
&= h_2^{(2)}
\end{align}

The partial derivative of $z_1^{(2)}$ with respect $w_{12}^{(2)}$ only operates on one term within the parentheses, $w_{12}^{(1)} h_2^{(2)}$, as all the other terms don’t vary at all when $w_{12}^{(2)}$ does. The derivative of a constant is 1, therefore $\frac {\partial}{\partial w_{12}^{(2)}} (w_{12}^{(1)} h_2^{(2)})$ collapses to just $h_2^{(2)}$, which is simply the output of the second node in layer 2.

The next partial derivative in the chain is $\frac {\partial h_1^{(3)}}{\partial z_1^{(2)}}$, which is the partial derivative of the activation function of the $h_1^{(3)}$ output node. Because of the requirement to be able to derive this derivative, the activation functions in neural networks need to be differentiable. For the common sigmoid activation function (shown in Section 2.1), the derivative is:

$$\frac {\partial h}{\partial z} = f'(z) = f(z)(1-f(z))$$

Where $f(z)$ is the activation function. So far so good – now we have to work out how to deal with the first term $\frac {\partial J}{\partial h_1^{(3)}}$. Remember that $J(w,b,x,y)$ is the mean squared error loss function, which looks like (for our case):

$$J(w,b,x,y) = \frac{1}{2} \parallel y_1 – h_1^{(3)}(z_1^{(2)}) \parallel ^2$$

Here $y_1$ is the training target for the output node. Again using the chain rule:

\begin{align}
&Let\ u = \parallel y_1 – h_1^{(3)}(z_1^{(2)}) \parallel\ and\ J = \frac {1}{2} u^2\\
&Using\ \frac {\partial J}{\partial h} = \frac {\partial J}{\partial u} \frac {\partial u}{\partial h}:\\
&\frac {\partial J}{\partial h} = -(y_1 – h_1^{(3)})
\end{align}

So we’ve now figured out how to calculate $\frac {\partial J}{\partial w_{12}^{(2)}}$, at least for the weights connecting the output layer. Before we move to any hidden layers (i.e. layer 2 in our example case), let’s introduce some simplifications to tighten up our notation and introduce $\delta$:

$$\delta_i^{(n_l)} = -(y_i – h_i^{(n_l)})\cdot f^\prime(z_i^{(n_l)})$$

Where $i$ is the node number of the output layer. In our selected example there is only one such layer, therefore $i=1$ always in this case. Now we can write the complete cost function derivative as:

\begin{align}
\frac{\partial}{\partial W_{ij}^{(l)}} J(W,b,x, y) &= h^{(l)}_j \delta_i^{(l+1)} \\
\end{align}

Where, for the output layer in our case, $l$ = 2 and $i$ remains the node number.

4.6 Propagating into the hidden layers

What about for weights feeding into any hidden layers (layer 2 in our case)? For the weights connecting the output layer, the $\frac {\partial J}{\partial h} = -(y_i – h_i^{(n_l)})$ derivative made sense, as the cost function can be directly calculated by comparing the output layer to the training data. The output of the hidden nodes, however, have no such direct reference, rather, they are connected to the cost function only through mediating weights and potentially other layers of nodes. How can we find the variation in the cost function from changes to weights embedded deep within the neural network? As mentioned previously, we use the backpropagation method.

Now that we’ve done the hard work using the chain rule, we’ll now take a more graphical approach. The term that needs to propagate back through the network is the $\delta_i^{(n_l)}$ term, as this is the network’s ultimate connection to the cost function. What about node j in the second layer (hidden layer)? How does it contribute to $\delta_i^{(n_l)}$ in our test network? It contributes via the weight $w_{ij}^{(2)}$ – see the diagram below for the case of $j=1$ and $i=1$.

Figure 11. Simple backpropagation illustration

As can be observed from above, the output layer $\delta$ is communicated to the hidden node by the weight of the connection. In the case where there is only one output layer node, the generalised hidden layer $\delta$ is defined as:

$$\delta_j^{(l)} = \delta_1^{(l+1)} w_{1j}^{(l)}\ f^\prime(z_j)^{(l)}$$

Where $j$ is the node number in layer $l$. What about the case where there are multiple output nodes? In this case, the weighted sum of all the communicated errors are taken to calculate $\delta_j^{(l)}$, as shown in the diagram below:

Figure 12. Backpropagation illustration with multiple outputs

As can be observed from the above, each $\delta$ value from the output layer is included in the sum used to calculate $\delta_1^{(2)}$, but each output $\delta$ is weighted according to the appropriate $w_{i1}^{(2)}$ value. In other words, node 1 in layer 2 contributes to the error of three output nodes, therefore the measured error (or cost function value) at each of these nodes has to be “passed back” to the $\delta$ value for this node. Now we can develop a generalised expression for the $\delta$ values for nodes in the hidden layers:

$$\delta_j^{(l)} = (\sum_{i=1}^{s_{(l+1)}} w_{ij}^{(l)} \delta_i^{(l+1)})\ f^\prime(z_j^{(l)})$$

Where $j$ is the node number in layer $l$ and $i$ is the node number in layer $l+1$ (which is the same notation we have used from the start). The value $s_{(l+1)}$ is the number of nodes in layer $(l+1)$.

So we now know how to calculate: $$\frac{\partial}{\partial W_{ij}^{(l)}} J(W,b,x, y) = h^{(l)}_j \delta_i^{(l+1)}$$ as shown previously. What about the bias weights? I’m not going to derive them as I did with the normal weights in the interest of saving time / space. However, the reader shouldn’t have too many issues following the same steps, using the chain rule, to arrive at:

$$\frac{\partial}{\partial b_{i}^{(l)}} J(W,b,x, y) = \delta_i^{(l+1)}$$

Great – so we now know how to perform our original gradient descent problem for neural networks:

\begin{align}
w_{ij}^{(l)} &= w_{ij}^{(l)} – \alpha \frac{\partial}{\partial w_{ij}^{(l)}} J(w,b) \\
b_{i}^{(l)} &= b_{i}^{(l)} – \alpha \frac{\partial}{\partial b_{i}^{(l)}} J(w,b)
\end{align}

However, to perform this gradient descent training of the weights, we would have to resort to loops within loops. As previously shown in Section 3.4 of this neural network tutorial, performing such calculations in Python using loops is slow for large networks. Therefore, we need to figure out how to vectorise such calculations, which the next section will show.

4.7 Vectorisation of backpropagation

To consider how to vectorise the gradient descent calculations in neural networks, let’s first look at a naïve vectorised version of the gradient of the cost function (warning: this is not in a correct form yet!):

\begin{align}
\frac{\partial J}{\partial W^{(l)}} &= h^{(l)} \delta^{(l+1)}\\
\frac{\partial J}{\partial b^{(l)}} &= \delta^{(l+1)}
\end{align}

Now, let’s look at what element of the above equations. What does $h^{(l)}$ look like? Pretty simple, just a $(s_l \times 1)$ vector, where $s_l$ is the number of nodes in layer $l$. What does the multiplication of $h^{(l)} \delta^{(l+1)}$ look like? Well, because we know that $\alpha \times \frac{\partial J}{\partial W^{(l)}}$ must be the same size of the weight matrix $W^{(l)}$, we know that the outcome of $h^{(l)} \delta^{(l+1)}$ must also be the same size as the weight matrix for layer $l$. In other words it has to be of size $(s_{l+1} \times s_{l})$.

We know that $\delta^{(l+1)}$ has the dimension $(s_{l+1} \times 1)$ and that $h^{(l)}$ has the dimension of $(s_l \times 1)$. The rules of matrix multiplication show that a matrix of dimension $(\mathbf n \times m)$ multiplied by a matrix of dimension $(o \times \mathbf p)$ will have a product matrix of size $(\mathbf n \times \mathbf p)$. If we perform a straight multiplication between $h^{(l)}$ and $\delta^{(l+1)}$, the number of columns of the first vector (i.e. 1 column) will not equal the number of rows of the second vector (i.e. 3 rows), therefore we can’t perform a proper matrix multiplication. The only way we can get a proper outcome of size $(s_{l+1} \times s_{l})$ is by using a matrix transpose. A transpose swaps the dimensions of a matrix around e.g. a $(s_l \times 1)$ sized vector becomes a $(1 \times s_l)$ sized vector, and is denoted by a superscript of $T$. Therefore, we can do the following:

$$\delta^{(l+1)} (h^{(l)})^T = (s_{l+1} \times 1) \times (1 \times s_l) = (s_{l+1} \times s_l)$$

As can be observed below, by using the transpose operation we can arrive at the outcome we desired.

A final vectorisation that can be performed is during the weighted addition of the errors in the backpropagation step:

$$\delta_j^{(l)} = (\sum_{i=1}^{s_{(l+1)}} w_{ij}^{(l)} \delta_i^{(l+1)})\ f^\prime(z_j^{(l)}) = \left((W^{(l)})^T \delta^{(l+1)}\right) \bullet f'(z^{(l)})$$

The $\bullet$ symbol in the above designates an element-by-element multiplication (called the Hadamard product), not a matrix multiplication.  Note that the matrix multiplication $\left((W^{(l)})^T \delta^{(l+1)}\right)$ performs the necessary summation of the weights and $\delta$ values – the reader can check that this is the case.

4.8 Implementing the gradient descent step

Now, how do we integrate this new vectorisation into the gradient descent steps of our soon-to-be coded algorithm? First, we have to look again at the overall cost function we are trying to minimise (not just the sample-by-sample cost function shown in the preceding equation):

\begin{align}
J(w,b) &= \frac{1}{m} \sum_{z=0}^m J(W, b, x^{(z)}, y^{(z)})
\end{align}

As we can observe, the total cost function is the mean of all the sample-by-sample cost function calculations. Also remember the gradient descent calculation (showing the element-by-element version along with the vectorised version):

\begin{align}
w_{ij}^{(l)} &= w_{ij}^{(l)} – \alpha \frac{\partial}{\partial w_{ij}^{(l)}} J(w,b)\\
W^{(l)} &= W^{(l)} – \alpha \frac{\partial}{\partial W^{(l)}} J(w,b)\\
&= W^{(l)} – \alpha \left[\frac{1}{m} \sum_{z=1}^{m} \frac {\partial}{\partial W^{(l)}} J(w,b,x^{(z)},y^{(z)}) \right]\\
\end{align}

So that means as we go along through our training samples or batches, we have to have a term that is summing up the partial derivatives of the individual sample cost function calculations. This term will gather up all the values for the mean calculation. Let’s call this “summing up” term $\Delta W^{(l)}$. Likewise, the equivalent bias term can be called $\Delta b^{(l)}$. Therefore, at each sample iteration of the final training algorithm, we have to perform the following steps:

\begin{align}
\Delta W^{(l)} &= \Delta W^{(l)} + \frac {\partial}{\partial W^{(l)}} J(w,b,x^{(z)},y^{(z)})\\
&= \Delta W^{(l)} + \delta^{(l+1)} (h^{(l)})^T\\
\Delta b^{(l)} &= \Delta b^{(1)} + \delta^{(l+1)}
\end{align}

By performing the above operations at each iteration, we slowly build up the previously mentioned sum $\sum_{z=1}^{m} \frac {\partial}{\partial W^{(l)}} J(w,b,x^{(z)},y^{(z)})$ (and the same for $b$). Once all the samples have been iterated through, and the $\Delta$ values have been summed up, we update the weight parameters :

\begin{align}
W^{(l)} &= W^{(l)} – \alpha \left[\frac{1}{m} \Delta W^{(l)} \right] \\
b^{(l)} &= b^{(l)} – \alpha \left[\frac{1}{m} \Delta b^{(l)}\right]
\end{align}

4.9 The final gradient descent algorithm

So, no we’ve finally made it to the point where we can specify the entire backpropagation-based gradient descent training of our neural networks. It has taken quite a few steps to show, but hopefully it has been instructive. The final backpropagation algorithm is as follows:

Randomly initialise the weights for each layer $W^{(l)}$
While iterations < iteration limit:
1. Set $\Delta W$ and $\Delta b$ to zero
2. For samples 1 to m:
a. Perform a feed foward pass through all the $n_l$ layers. Store the activation function outputs $h^{(l)}$
b. Calculate the $\delta^{(n_l)}$ value for the output layer
c. Use backpropagation to calculate the $\delta^{(l)}$ values for layers 2 to $n_l-1$
d. Update the $\Delta W^{(l)}$ and $\Delta b^{(l)}$ for each layer
3. Perform a gradient descent step using:

$W^{(l)} = W^{(l)} – \alpha \left[\frac{1}{m} \Delta W^{(l)} \right]$
$b^{(l)} = b^{(l)} – \alpha \left[\frac{1}{m} \Delta b^{(l)}\right]$

As specified in the algorithm above, we would repeat the gradient descent routine until we are happy that the average cost function has reached a minimum. At this point, our network is trained and (ideally) ready for use.

The next part of this neural networks tutorial will show how to implement this algorithm to train a neural network that recognises hand-written digits.

5 Implementing the neural network in Python

In the last section we looked at the theory surrounding gradient descent training in neural networks and the backpropagation method. In this article, we are going to apply that theory to develop some code to perform training and prediction on the MNIST dataset. The MNIST dataset is a kind of go-to dataset in neural network and deep learning examples, so we’ll stick with it here too. What it consists of is a record of images of hand-written digits with associated labels that tell us what the digit is. Each image is 8 x 8 pixels in size, and the image data sample is represented by 64 data points which denote the pixel intensity. In this example, we’ll be using the MNIST dataset provided in the Python Machine Learning library called scikit learn. An example of the image (and the extraction of the data from the scikit learn dataset) is shown in the code below (for an image of 1):

from sklearn.datasets import load_digits
digits = load_digits()
print(digits.data.shape)
import matplotlib.pyplot as plt 
plt.gray() 
plt.matshow(digits.images[1]) 
plt.show()

Figure 13. MNIST digit “1”

The code above prints (1797, 64) to show the shape of input data matrix and the pixelated digit “1” in the image above.  The code we are going to write in this neural networks tutorial will try and estimate the digits that these pixels represent (using neural networks of course). First things first, we need to get the input data in shape. To do so, we need to do two things:

  1. Scale the data
  2. Split the data into test and train sets

5.1 Scaling data

Why do we need to scale the input data?  First, have a look at one of the dataset pixel representations:

digits.data[0,:]
Out[2]:
array([  0.,   0.,   5.,  13.,   9.,   1.,   0.,   0.,   0.,   0.,  13.,
        15.,  10.,  15.,   5.,   0.,   0.,   3.,  15.,   2.,   0.,  11.,
         8.,   0.,   0.,   4.,  12.,   0.,   0.,   8.,   8.,   0.,   0.,
         5.,   8.,   0.,   0.,   9.,   8.,   0.,   0.,   4.,  11.,   0.,
         1.,  12.,   7.,   0.,   0.,   2.,  14.,   5.,  10.,  12.,   0.,
         0.,   0.,   0.,   6.,  13.,  10.,   0.,   0.,   0.])

Notice that the input data ranges from 0 up to 15?  It’s standard practice to scale the input data so that it all fits mostly between either 0 to 1 or with a small range centred around 0 i.e. -1 to 1.  Why?  Well, it can help the convergence of the neural network and is especially important if we are combining different data types.  Thankfully, this is easily done using sci-kit learn:

from sklearn.preprocessing import StandardScaler
X_scale = StandardScaler()
X = X_scale.fit_transform(digits.data)
X[0,:]
Out[3]:
array([ 0.        , -0.33501649, -0.04308102,  0.27407152, -0.66447751,
       -0.84412939, -0.40972392, -0.12502292, -0.05907756, -0.62400926,
        0.4829745 ,  0.75962245, -0.05842586,  1.12772113,  0.87958306,
       -0.13043338, -0.04462507,  0.11144272,  0.89588044, -0.86066632,
       -1.14964846,  0.51547187,  1.90596347, -0.11422184, -0.03337973,
        0.48648928,  0.46988512, -1.49990136, -1.61406277,  0.07639777,
        1.54181413, -0.04723238,  0.        ,  0.76465553,  0.05263019,
       -1.44763006, -1.73666443,  0.04361588,  1.43955804,  0.        ,
       -0.06134367,  0.8105536 ,  0.63011714, -1.12245711, -1.06623158,
        0.66096475,  0.81845076, -0.08874162, -0.03543326,  0.74211893,
        1.15065212, -0.86867056,  0.11012973,  0.53761116, -0.75743581,
       -0.20978513, -0.02359646, -0.29908135,  0.08671869,  0.20829258,
       -0.36677122, -1.14664746, -0.5056698 , -0.19600752])

The scikit learn standard scaler by default normalises the data by subtracting the mean and dividing by the standard deviation.  As can be observed, most of the data points are centered around zero and contained within -2 and 2.  This is a good starting point.  There is no real need to scale the output data $y$.

5.2 Creating test and training datasets

In machine learning, there is a phenomenon called “overfitting”. This occurs when models, during training, become too complex – they become really well adapted to predict the training data, but when they are asked to predict something based on new data that they haven’t “seen” before, they perform poorly. In other words, the models don’t generalise very well. To make sure that we are not creating models which are too complex, it is common practice to split the dataset into a training set and a test set. The training set is, obviously, the data that the model will be trained on, and the test set is the data that the model will be tested on after it has been trained. The amount of training data is always more numerous than the testing data, and is usually between 60-80% of the total dataset.

Again, scikit learn makes this splitting of the data into training and testing sets easy:

from sklearn.model_selection import train_test_split
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)

In this case, we’ve made the test set to be 40% of the total data, leaving 60% to train with. The train_test_split function in scikit learn pushes the data randomly into the different datasets – in other words, it doesn’t take the first 60% of rows as the training set and the second 40% of rows as the test set. This avoids data collection artefacts from degrading the performance of the model.

5.3 Setting up the output layer

As you would have been able to gather, we need the output layer to predict whether the digit represented by the input pixels is between 0 and 9. Therefore, a sensible neural network architecture would be to have an output layer of 10 nodes, with each of these nodes representing a digit from 0 to 9. We want to train the network so that when, say, an image of the digit “5” is presented to the neural network, the node in the output layer representing 5 has the highest value. Ideally, we would want to see an output looking like this: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]. However, in reality, we can settle for something like this: [0.01, 0.1, 0.2, 0.05, 0.3, 0.8, 0.4, 0.03, 0.25, 0.02]. In this case, we can take the maximum index of the output array and call that our predicted digit.

For the MNIST data supplied in the scikit learn dataset, the “targets” or the classification of the handwritten digits is in the form of a single number. We need to convert that single number into a vector so that it lines up with our 10 node output layer. In other words, if the target value in the dataset is “1” we want to convert it into the vector: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]. The code below does just that:

import numpy as np
def convert_y_to_vect(y):
    y_vect = np.zeros((len(y), 10))
    for i in range(len(y)):
        y_vect[i, y[i]] = 1
    return y_vect
y_v_train = convert_y_to_vect(y_train)
y_v_test = convert_y_to_vect(y_test)
y_train[0], y_v_train[0]
Out[8]:
(1, array([ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]))

 

As can be observed above, the MNIST target (1) has been converted into the vector [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], which is what we want.

5.4 Creating the neural network

The next step is to specify the structure of the neural network. For the input layer, we know we need 64 nodes to cover the 64 pixels in the image. As discussed, we need 10 output layer nodes to predict the digits. We’ll also need a hidden layer in our network to allow for the complexity of the task. Usually, the number of hidden layer nodes is somewhere between the number of input layers and the number of output layers. Let’s define a simple Python list that designates the structure of our network:

nn_structure = [64, 30, 10]

We’ll use sigmoid activation functions again, so let’s setup the sigmoid function and its derivative:

def f(x):
    return 1 / (1 + np.exp(-x))
def f_deriv(x):
    return f(x) * (1 - f(x))

Ok, so we now have an idea of what our neural network will look like. How do we train it? Remember the algorithm from Section 4.9 , which we’ll repeat here for ease of access and review:

Randomly initialise the weights for each layer $W^{(l)}$
While iterations < iteration limit:
1. Set $\Delta W$ and $\Delta b$ to zero
2. For samples 1 to m:
a. Perform a feed foward pass through all the $n_l$ layers. Store the activation function outputs $h^{(l)}$
b. Calculate the $\delta^{(n_l)}$ value for the output layer
c. Use backpropagation to calculate the $\delta^{(l)}$ values for layers 2 to $n_l-1$
d. Update the $\Delta W^{(l)}$ and $\Delta b^{(l)}$ for each layer
3. Perform a gradient descent step using:

$W^{(l)} = W^{(l)} – \alpha \left[\frac{1}{m} \Delta W^{(l)} \right]$
$b^{(l)} = b^{(l)} – \alpha \left[\frac{1}{m} \Delta b^{(l)}\right]$

So the first step is to initialise the weights for each layer. To make it easy to organise the various layers, we’ll use Python dictionary objects (initialised by {}). Finally, the weights have to be initialised with random values – this is to ensure that the neural network will converge correctly during training. We use the numpy library random_sample function to do this. The weight initialisation code is shown below:

import numpy.random as r
def setup_and_init_weights(nn_structure):
    W = {}
    b = {}
    for l in range(1, len(nn_structure)):
        W[l] = r.random_sample((nn_structure[l], nn_structure[l-1]))
        b[l] = r.random_sample((nn_structure[l],))
    return W, b

The next step is to set the mean accumulation values $\Delta W$ and $\Delta b$ to zero (they need to be the same size as the weight and bias matrices):

def init_tri_values(nn_structure):
    tri_W = {}
    tri_b = {}
    for l in range(1, len(nn_structure)):
        tri_W[l] = np.zeros((nn_structure[l], nn_structure[l-1]))
        tri_b[l] = np.zeros((nn_structure[l],))
    return tri_W, tri_b

If we now step into the gradient descent loop, the first step is to perform a feed forward pass through the network. The code below is a variation on the feed forward function created in Section 3:

def feed_forward(x, W, b):
    h = {1: x}
    z = {}
    for l in range(1, len(W) + 1):
        # if it is the first layer, then the input into the weights is x, otherwise, 
        # it is the output from the last layer
        if l == 1:
            node_in = x
        else:
            node_in = h[l]
        z[l+1] = W[l].dot(node_in) + b[l] # z^(l+1) = W^(l)*h^(l) + b^(l)  
        h[l+1] = f(z[l+1]) # h^(l) = f(z^(l)) 
    return h, z

Finally, we have to then calculate the output layer delta $\delta^{(n_l)}$ and any hidden layer delta values $\delta^{(l)}$ to perform the backpropagation pass:

def calculate_out_layer_delta(y, h_out, z_out):
    # delta^(nl) = -(y_i - h_i^(nl)) * f'(z_i^(nl))
    return -(y-h_out) * f_deriv(z_out)

def calculate_hidden_delta(delta_plus_1, w_l, z_l):
    # delta^(l) = (transpose(W^(l)) * delta^(l+1)) * f'(z^(l))
    return np.dot(np.transpose(w_l), delta_plus_1) * f_deriv(z_l)

Now we can put all the steps together into the final function:

def train_nn(nn_structure, X, y, iter_num=3000, alpha=0.25):
    W, b = setup_and_init_weights(nn_structure)
    cnt = 0
    m = len(y)
    avg_cost_func = []
    print('Starting gradient descent for {} iterations'.format(iter_num))
    while cnt < iter_num:
        if cnt%1000 == 0:
            print('Iteration {} of {}'.format(cnt, iter_num))
        tri_W, tri_b = init_tri_values(nn_structure)
        avg_cost = 0
        for i in range(len(y)):
            delta = {}
            # perform the feed forward pass and return the stored h and z values, to be used in the
            # gradient descent step
            h, z = feed_forward(X[i, :], W, b)
            # loop from nl-1 to 1 backpropagating the errors
            for l in range(len(nn_structure), 0, -1):
                if l == len(nn_structure):
                    delta[l] = calculate_out_layer_delta(y[i,:], h[l], z[l])
                    avg_cost += np.linalg.norm((y[i,:]-h[l]))
                else:
                    if l > 1:
                        delta[l] = calculate_hidden_delta(delta[l+1], W[l], z[l])
                    # triW^(l) = triW^(l) + delta^(l+1) * transpose(h^(l))
                    tri_W[l] += np.dot(delta[l+1][:,np.newaxis], np.transpose(h[l][:,np.newaxis])) 
                    # trib^(l) = trib^(l) + delta^(l+1)
                    tri_b[l] += delta[l+1]
        # perform the gradient descent step for the weights in each layer
        for l in range(len(nn_structure) - 1, 0, -1):
            W[l] += -alpha * (1.0/m * tri_W[l])
            b[l] += -alpha * (1.0/m * tri_b[l])
        # complete the average cost calculation
        avg_cost = 1.0/m * avg_cost
        avg_cost_func.append(avg_cost)
        cnt += 1
    return W, b, avg_cost_func

The function above deserves a bit of explanation. First, we aren’t setting a termination of the gradient descent process based on some change or precision of the cost function. Rather, we are just running it for a set number of iterations (3,000 in this case) and we’ll monitor how the average cost function changes as we progress through the training (avg_cost_func list in the above code). In each iteration of the gradient descent, we cycle through each training sample (range(len(y)) and perform the feed forward pass and then the backpropagation. The backpropagation step is an iteration through the layers starting at the output layer and working backwards – range(len(nn_structure), 0, -1). We calculate the average cost, which we are tracking during the training, at the output layer (l == len(nn_structure)). We also update the mean accumulation values, $\Delta W$ and $\Delta b$, designated as tri_W and tri_b, for every layer apart from the output layer (there are no weights connecting the output layer to any further layer).

Finally, after we have looped through all the training samples, accumulating the tri_W and tri_b values, we perform a gradient descent step change in the weight and bias values:
$$W^{(l)} = W^{(l)} – \alpha \left[\frac{1}{m} \Delta W^{(l)} \right]$$
$$b^{(l)} = b^{(l)} – \alpha \left[\frac{1}{m} \Delta b^{(l)}\right]$$

After the process is completed, we return the trained weight and bias values, along with our tracked average cost for each iteration. Now it’s time to run the function – NOTE: this may take a few minutes depending on the capabilities of your computer.

W, b, avg_cost_func = train_nn(nn_structure, X_train, y_v_train)

Now we can have a look at how the average cost function decreased as we went through the gradient descent iterations of the training, slowly converging on a minimum in the function:

plt.plot(avg_cost_func)
plt.ylabel('Average J')
plt.xlabel('Iteration number')
plt.show()

We can see in the above plot, that by 3,000 iterations of our gradient descent our average cost function value has started to “plateau” and therefore any further increases in the number of iterations isn’t likely to improve the performance of the network by much.

5.5 Assessing the accuracy of the trained model

Now that we’ve trained our MNIST neural network, we want to see how it performs on the test set. Is our model any good? Given a test input (64 pixels), we need to find what the output of our neural network is – we do that by simply performing a feed forward pass through the network using our trained weight and bias values. As discussed previously, we assess the prediction of the output layer by taking the node with the maximum output as the predicted digit. We can use the numpy.argmax function for this, which returns the index of the array value with the highest value:

def predict_y(W, b, X, n_layers):
    m = X.shape[0]
    y = np.zeros((m,))
    for i in range(m):
        h, z = feed_forward(X[i, :], W, b)
        y[i] = np.argmax(h[n_layers])
    return y

Finally, we can assess the accuracy of the prediction (i.e. the percentage of times the network predicted the handwritten digit correctly), by using the scikit learn accuracy_score function:

from sklearn.metrics import accuracy_score
y_pred = predict_y(W, b, X_test, 3)
accuracy_score(y_test, y_pred)*100

This gives an 86% accuracy of predicting the digits.  Sounds pretty good right? Well actually no, it’s pretty bad. The current state-of-the-art deep learning algorithms achieve accuracy scores of 99.7% (see here), so we are a fair way off that sort of accuracy.  There are many more exciting things to learn – my next post will cover some tips and tricks on how to improve the accuracy substantially on this simple neural network.  However, beyond that, we have a whole realm of state-of-the-art deep learning algorithms to learn and investigate, from convolution neural networks to deep belief nets and recurrent neural networks.  If you followed along ok with this post, you will be in a good position to advance to these newer techniques.

Stick around to find out more about this rapidly advancing area of machine learning.  As a start, check out these posts:
Python TensorFlow Tutorial – Build a Neural Network
Improve your neural networks – Part 1 [TIPS AND TRICKS]
Stochastic Gradient Descent – Mini-batch and more

 

663 thoughts on “Neural Networks Tutorial – A Pathway to Deep Learning”

  1. Thank you for your posting,
    By the way, In 4.5, when you simplify h1(3) as f(z1(2)) by defining z1(2) , z1(2) seems have changed. Is this intentional?

  2. Pingback: Data Science Weekly – Issue 177 | A bunch of data

  3. hi andy!

    I think this deep learning tutorial is one of the best online today – thank you andy!

    Could you let me know best math-technical books (ebooks) and technical papers where this algorithm is explained more clear in deep?

    1. Thanks for your kind comments Ernest.

      In my opinion the best book on Deep Learning is “Deep Learning” by Ian Goodfellow. Hope that helps

  4. Thank you for this excellent tutorial. One of the best.
    If you do not mind I do have a question.
    Why did you use the SSE as the cost function and not the cross entropy cost function?
    As far as I know I think that cross entropy cost function works better with classification problems.

    1. Hi Mallam, you are welcome – glad you have found it useful. You are correct, the cross entropy cost function is better for classification. However, it is more difficult to explain for this introductory tutorial, so I stuck with SSE as it is more readily associated with “error”. In my TensorFlow tutorials, I have used the cross entropy cost function, and I hope to dedicate a post to this function soon – it is an interesting concept. Thanks again, all the best

      1. I do have another comment just for the sake of optimization.
        There is no need for the if condition in the feed_forward function.
        You already stated in the first line that the first item in the h dictionary is x
        So you can remove the if condition and put h[l] instead of node_in in the line computing z[l+1] and it will give the same results.

  5. Andy, Great article to start learning neural network. Appreciate your time and effort. Thanks for making machine learning concepts simple to understand!

  6. I have read many articles that explain ML in way too many equations assuming that the reader will understand (no pre-req. mentioned). This is the first article I have seen where there is math and with real program examples explains how these equations tie to numbers. This makes learning ML less intimidating.

    Thanks for posting such excellent tutorials.

  7. Hi Andy, this is by far the most understandable explanation of the topic and will contribute significantly to the success of my thesis on ML. I especially like the way you visualize the concept and show the mathematical connections. The code samples also fit very well.

  8. Hello Andy, many readers already recognized the value of the post, but I still feel the need to add my own opinion.
    There is no single resource that a reader should consult to have a good understanding of this domain, however, this post is one of the bests introductory ones I have ever seen since I am studying the subject.
    Even the math somehow seems more accessible here until some point -which is farther than other publications (I have to admit, I did not fully grasp the math described at back propagation).
    What I especially like, is the progressive establishment of building knowledge blocks, in a way that leads to a ‘feeling’ of the solution.
    I really like how your platform is structured and you seem to address more advanced topics like RNN and RL in other posts, which I will consult for sure.
    Thanks again for the exceptional material!

  9. This tutorial is simply EXCELLENT! I haven’t found any clearer and in-depth theoretical description of the basic NN principles accompanied by python code implementations!

  10. Hmm it looks like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I wrote and say, I’m thoroughly enjoying your blog. I too am an aspiring blog blogger but I’m still new to everything. Do you have any tips for newbie blog writers? I’d certainly appreciate it.

  11. Superb blog you have here but I was curious if you knew of any discussion boards that cover the same topics discussed here? I’d really love to be a part of group where I can get opinions from other knowledgeable people that share the same interest. If you have any suggestions, please let me know. Appreciate it!

  12. After exploring a number of the blog posts on your website, I honestly like your way of writing a blog. I book marked it to my bookmark website list and will be checking back in the near future. Take a look at my web site too and tell me how you feel.

  13. I’ll right away grasp your rss as I can not to find your email subscription link
    or e-newsletter service. Do you’ve any? Kindly allow me recognize so that I may just subscribe.
    Thanks.

  14. Hey I know this is off topic but I was wondering if you knew of any
    widgets I could add to my blog that automatically tweet my newest twitter updates.
    I’ve been looking for a plug-in like this for quite some time and was hoping
    maybe you would have some experience with something like this.
    Please let me know if you run into anything. I truly enjoy reading your blog
    and I look forward to your new updates.

    Here is my web blog :: testosterone booster

  15. Great items from you, man. I have have in mind your stuff prior
    to and you’re just extremely fantastic. I actually like what you’ve bought here, certainly like what
    you’re stating and the best way by which you assert it.
    You are making it entertaining and you still take care of to stay
    it wise. I can not wait to learn far more from you.

    That is really a wonderful site.

  16. I wish to convey my passion for your kindness supporting
    those individuals that actually need help with
    this important theme. Your special commitment to getting the
    message along was pretty good and have surely permitted others much like me to attain their targets.
    Your personal helpful hints and tips means so much to me and especially to my office colleagues.
    Regards; from all of us.

    my web site; aging skin

  17. I wanted to follow up and let you know how , very much I loved discovering
    your website today. I’d consider it the honor
    to do things at my place of work and be able to make
    use of the tips discussed on your site and also be involved in visitors’ remarks like this.
    Should a position associated with guest writer become available at your
    end, remember to let me know.

    Look at my webpage … cadets.wycombeaircadets.org

  18. I don’t even know how I ended up here, but I thought
    this post was good. I don’t know who you are but certainly you’re going
    to a famous blogger if you are not already 😉 Cheers!

  19. Definitely believe that which you said. Your favorite reason appeared to be on the internet
    the simplest thing to be aware of. I say to you, I certainly
    get irked while people think about worries that they plainly do not know
    about. You managed to hit the nail upon the top as
    well as defined out the whole thing without having side effect , people can take a signal.
    Will likely be back to get more. Thanks

  20. First off I would like to say excellent blog!
    I had a quick question in which I’d like to ask if you do not mind.
    I was interested to find out how you center yourself and clear your mind
    before writing. I’ve had difficulty clearing my mind in getting my ideas out.

    I do enjoy writing but it just seems like the first
    10 to 15 minutes are generally lost simply just trying to figure out
    how to begin. Any suggestions or hints? Kudos!

    Take a look at my web blog … eat healthy foods

  21. Hello there! This is my first visit to your blog! We are a collection of volunteers and starting
    a new project in a community in the same niche. Your blog provided us useful information to work on. You have done
    a extraordinary job!

  22. I simply needed to thank you very much once more.
    I’m not certain what I would’ve created without the type of ideas revealed by you directly on my question. It truly
    was an absolute distressing dilemma in my opinion, but discovering this expert technique you processed
    it forced me to cry for fulfillment. I’m thankful for
    your work as well as hope that you really know what a powerful job your are undertaking teaching many people all through a blog.
    Most probably you’ve never got to know all of us.

    Here is my web site: hemp seed oil uses

  23. Hiya, I’m really glad I have found this information. Today bloggers publish just about
    gossips and net and this is really frustrating.

    A good website with exciting content, this is what I need.
    Thanks for keeping this site, I’ll be visiting it. Do you do newsletters?

    Can not find it.

    my homepage – cannabis doctor

  24. Hey There. I found your blog the usage of msn. This is a really well written article.
    I will make sure to bookmark it and come back
    to learn extra of your helpful information. Thanks for the post.
    I will certainly comeback.

    Also visit my webpage :: try hemp seeds

  25. A lot of thanks for your whole effort on this web site.
    Gloria delights in making time for investigations and it is
    easy to see why. Many of us hear all of the lively
    mode you make valuable tricks by means of your website and even foster response from others on this theme so our
    favorite daughter is without question studying a lot of things.

    Have fun with the rest of the year. Your conducting a fantastic job.

    Feel free to surf to my page – http://23.95.102.216/viewtopic.php?id=480791

  26. I am just commenting to let you understand what a
    helpful experience my wife’s child experienced reading yuor web blog.
    She realized too many things, most notably how it is like to possess an ideal coaching
    style to have others completely know precisely specific impossible subject
    areas. You truly did more than visitors’ desires.
    Many thanks for showing such insightful,
    healthy, informative organic and natural skin care cool guidance on your topic to Mary.

  27. I loved as much as you’ll receive carried out right here. The sketch is attractive,
    your authored subject matter stylish. nonetheless,
    you command get bought an edginess over that you wish be
    delivering the following. unwell unquestionably come more formerly again since exactly the same nearly very often inside case you shield this hike.

  28. Terrific work! This is the type of info that are meant to be shared around the web.
    Disgrace on the search engines for now not positioning this put up upper!

    Come on over and visit my site . Thanks =)

  29. Outstanding blog! I discovered it whilst surfing around with regards
    to Yahoo Information. Do you have virtually any tips on how to obtain listed in Yahoo
    News? I just possess been seeking out for a while although I rarely
    ever seem to generate it happen! Many thanks

  30. Hey I am so delighted I found your weblog,
    I really found you by mistake, while I was researching on Askjeeve for something else, Nonetheless I am here now and would just like to say cheers
    for a incredible post and a all round exciting blog (I
    also love the theme/design), I don’t have time to go through it all at the moment but I have book-marked it and also included your
    RSS feeds, so when I have time I will be back to read more, Please do keep up the fantastic job.

  31. Superb post in spite of this I was pondering if you can write a
    litte more with this topic? I would be seriously grateful
    as you could challenging a little bit on top
    of that. Thank you!Take a look at my web site to read the most
    modern articles about togel. Almost all articles we write will likely be from reliable sources.

  32. really good examination. I hope you might continue to function so
    that you can place insight created for the readers of the website.

    Too visit my personal site to get all the latest articles or blogs
    about togel.

  33. Hello! I just wanted to ask if you ever have any problems with
    hackers? My last blog (wordpress) was hacked and I ended up losing a few months
    of hard work due to no backup. Do you have any solutions to stop hackers?

  34. I am not sure where you’re getting your info, but good topic.

    I needs to spend some time learning much more or understanding more.
    Thanks for great info I was looking for this info for my mission.

  35. First off I want to say awesome blog! I had a quick question in which I’d like to ask if you do not mind.
    I was curious to find out how you center yourself and clear your mind before
    writing. I have had a tough time clearing my thoughts in getting my thoughts out.
    I do take pleasure in writing but it just seems like the first 10 to 15 minutes are usually lost just trying to figure out how to
    begin. Any recommendations or tips? Thank you!

  36. Hey this is kinda of off topic but I was wondering if
    blogs use WYSIWYG editors or if you have to manually
    code with HTML. I’m starting a blog soon but have no coding experience so
    I wanted to get advice from someone with experience. Any help would be enormously appreciated!

  37. We stumbled over here from a different website and thought I may as well check things out.
    I like what I see so i am just following you.
    Look forward to looking over your web page again.

    Review my page :: Luz

  38. Hello, I think your blog might be having browser compatibility
    issues. When I look at your blog in Opera, it
    looks fine but when opening in Internet Explorer, it has some overlapping.
    I just wanted to give you a quick heads up! Other then that, terrific
    blog!

  39. I leave a response each time I like a post on a website or I have something to valuable to contribute to the conversation. It is triggered by the sincerness displayed in the post I browsed.
    And after this article Neural Networks Tutorial – A Pathway to Deep Learning – Adventures in Machine Learning.
    I was actually moved enough to drop a comment :
    -P I actually do have a few questions for you if it’s okay.

    Could it be only me or does it seem like some of the remarks look like they are left by brain dead individuals?
    😛 And, if you are posting at additional social sites, I would
    like to keep up with you. Could you list the
    complete urls of your community sites like your Facebook page, twitter feed,
    or linkedin profile?

    My web blog: cannabis seeds

  40. Hi, i think that i noticed you visited my weblog so i came to go
    back the prefer?.I am trying to to find issues to improve my website!I assume its
    good enough to make use of some of your ideas!!

  41. Normally I don’t read article on blogs, however I wish to say
    that this write-up very forced me to check out and do it!
    Your writing taste has been amazed me. Thank you, very nice article.

  42. Hi there, I believe your site might be having internet browser compatibility issues.
    Whenever I take a look at your web site in Safari, it looks
    fine but when opening in IE, it has some overlapping issues.

    I just wanted to provide you with a quick heads up! Other than that, wonderful website!

    Also visit my site – healthy skin

  43. Definitely believe that that you said. Your favorite reason seemed to be at the web the
    simplest factor to remember of. I say to you, I definitely get annoyed while folks consider
    concerns that they just don’t understand about. You managed to hit the nail upon the top as smartly as outlined out the entire thing without having side effect ,
    folks could take a signal. Will likely be again to get more.
    Thank you!

    Take a look at my web blog oral sex

  44. Great blog here! Also your site loads up very fast! What host are you using?
    Can I get your affiliate link to your host?
    I wish my website loaded up as quickly as yours lol

  45. Great beat ! I would like to apprentice at the same time as you amend your website, how could i subscribe for a blog website?

    The account helped me a acceptable deal. I had been tiny bit acquainted of this your broadcast provided bright transparent idea

    Feel free to visit my web site; diabetic diet

  46. you’re truly a good webmaster. The site loading pace is incredible.

    It kind of feels that you’re doing any distinctive trick.
    Also, The contents are masterpiece. you’ve performed a magnificent process on this topic!

  47. I really like your blog.. very nice colors & theme. Did you design this website yourself or did you hire someone to do it for you?
    Plz answer back as I’m looking to create my own blog and would like to find out where u got this from.
    kudos

    Here is my web-site: imperios6.com

  48. This design is steller! You certainly know how to keep a reader entertained.

    Between your wit and your videos, I was almost moved to start my own blog (well, almost…HaHa!) Excellent job.

    I really loved what you had to say, and more than that, how you presented it.
    Too cool!

  49. This is the perfect web site for anyone who really wants
    to understand this topic. You realize so much its almost hard to argue with you (not that I really will need to?HaHa).
    You certainly put a new spin on a topic which has been written about for ages.
    Great stuff, just excellent!

    Feel free to surf to my blog :: quit smoking remedies

  50. I together with my buddies have been checking out
    the good tips from your site while immediately I
    got a horrible suspicion I had not thanked you for those techniques.
    Those ladies happened to be for this reason happy to read
    them and now have absolutely been having fun with them.
    We appreciate you truly being very considerate and also for
    obtaining variety of quality guides most people are really desperate to discover.
    Our own honest apologies for not expressing appreciation to you earlier.

    Feel free to visit my blog … facial skin treatment

  51. Heya i’m for the first time here. I found this board and I find It truly useful & it helped me out
    much. I hope to give something back and help others like you aided me.

  52. hello there and thank you for your info – I have certainly picked
    up anything new from right here. I did however expertise some technical issues
    using this site, as I experienced to reload the site a lot of times previous to I could get
    it to load properly. I had been wondering if your web hosting
    is OK? Not that I’m complaining, but sluggish loading instances
    times will often affect your placement in google and could damage your high-quality
    score if ads and marketing with Adwords. Anyway I’m adding this
    RSS to my e-mail and can look out for a lot more of your
    respective fascinating content. Make sure you update this again soon..

    My web-site – http://www.diclelife.com

  53. I am extremely impressed with your writing skills and also with
    the structure on your blog. Is this a paid subject matter or did you modify it yourself?

    Either way keep up the excellent high quality writing, it
    is uncommon to peer a great weblog like this one nowadays..

    Feel free to surf to my web page: http://www.aniene.net

  54. Very great post. I simply stumbled upon your blog and wished to mention that
    I’ve really loved surfing around your weblog posts.
    In any case I will be subscribing to your feed and I’m hoping you write once more very soon!

  55. I would like to thnkx for the efforts you have put in writing this website.
    I’m hoping the same high-grade web site post from you in the upcoming also.
    Actually your creative writing skills has encouraged me to get my own web site now.
    Really the blogging is spreading its wings fast. Your write
    up is a good example of it.

    Here is my website weight loss suggestions

  56. I have to convey my respect for your kindness for all those that require guidance on this one field. Your special commitment to passing the solution up and down has been incredibly functional and has continually empowered most people just like me to achieve their dreams. Your amazing insightful information entails much to me and especially to my peers. Thanks a ton; from all of us.start living a better life

  57. I’m really enjoying the theme/design of your website.
    Do youu ever ruun into any internet browser
    compatibility issues? A couple off mmy blog readers have complained about my blog not workiong correctly
    in Explorer but looks great in Chrome. Do you have any recommendations to help fix this issue?

    Take a look at my homepage pubg free uc hack

  58. Oh my goodness! Incredible article dude! Thanks, However I am going through issues
    with your RSS. I don’t understand why I am unable to
    subscribe to it. Is there anybody having similar RSS problems?

    Anyone that knows the solution will you kindly respond?
    Thanx!!

  59. Greetings from Carolina! I’m bored to tears at work so I decided to check out
    your website on my iphone during lunch break. I really
    like the knowledge you present here and can’t wait to take a look when I get home.
    I’m surprised at how quick your blog loaded on my phone ..
    I’m not even using WIFI, just 3G .. Anyhow, wonderful blog!

  60. Right here is the right site for everyone who hopes to understand this topic.
    You know so much its almost tough to argue with you (not that I actually would want to…HaHa).

    You definitely put a brand new spin on a subject that has been discussed for ages.
    Great stuff, just excellent!

  61. I’m truly enjoying the design and layout of your website.
    It’s a very easy on the eyes which makes it much more pleasant for me to come here and visit more
    often. Did you hire out a developer to create your theme?
    Superb work!

  62. Hey, I think your website might be having browser
    compatibility issues. When I look at your blog site in Firefox, it looks fine but when opening in Internet Explorer,
    it has some overlapping. I just wanted to give you a quick
    heads up! Other then that, fantastic blog!

  63. Just want to say your article is as surprising.
    The clarity in your post is just nice and i can assume you are an expert on this subject.
    Fine with your permission let me to grab your RSS feed to keep updated with forthcoming post.
    Thanks a million and please keep up the enjoyable work.

  64. Appreciating the commitment you put into your website and detailed
    information you offer. It’s great to come across a blog every
    once in a while that isn’t the same old rehashed material.
    Fantastic read! I’ve saved your site and I’m adding your RSS feeds to my Google
    account.

  65. Howdy, i read your blog occasionally and i own a similar one and i was just curious if you
    get a lot of spam remarks? If so how do you reduce it, any plugin or anything you can recommend?
    I get so much lately it’s driving me mad so any assistance is very
    much appreciated.

  66. Hello there! This blog post couldn’t be written any better!
    Reading through this post reminds me of my previous roommate!

    He continually kept preaching about this. I’ll forward this information to him.
    Pretty sure he will have a great read. I appreciate you for sharing!

  67. Die Pilze wachsen hauptsächlich in nördlichen und subpolaren Regionen. Finnischer und russischer Chaga ist in Sibirien besonders verbreitet. Dieser unscheinbare, dunkel gefärbte Pilz wächst häufig an den Stämmen von Birken, wo er die besten Wachstumsbedingungen hat. Die weiße Rinde der Birke liefert Betulin, einen Wirkstoff, der in der Hautpflege verwendet wird. Es gibt viele Namen für den Chaga-Pilz, darunter Schiefer-Schillerporling, Chaga-Pilz, Tschaga und Inonotus obliquus. Der Pilz wurde in der traditionellen Medizin in einer Reihe von Situationen verwendet. Darüber hinaus wurden Chaga-Tee, Chaga-Extrakt oder Chaga-Tinktur verwendet, um die Immunität zu stärken und zahlreiche Krankheiten zu behandeln. Eine aktuelle Studie aus der Medizin- und Arzneimittelforschung untersuchte den Pilz und seine Inhaltsstoffe auf seine mögliche Wirksamkeit bei der Behandlung einiger Krebserkrankungen1, entzündliche Probleme in der Leber2 und auf seine antiallergische Wirkung im Zusammenhang mit den Pilz-Triterpenoiden3.

  68. Just desire to say your article is as amazing. The clarity
    on your submit is just excellent and that i could suppose you
    are a professional on this subject. Fine together with your permission allow me
    to seize your RSS feed to stay updated with impending post.
    Thanks a million and please continue the gratifying work.

  69. Have you ever thought about creating an ebook or guest authoring on other websites?
    I have a blog based upon on the same topics you discuss and would really like to have you
    share some stories/information. I know my audience would enjoy your work.
    If you are even remotely interested, feel free to send me an e mail.

  70. I really like your blog.. very nice colors & theme.
    Did you create this website yourself or did you hire someone to do it for you?
    Plz respond as I’m looking to construct my own blog and would like
    to find out where u got this from. appreciate it

  71. Have you ever considered about adding a little bit more than just
    your articles? I mean, what you say is fundamental and all.
    However think about if you added some great visuals or videos to give your posts
    more, “pop”! Your content is excellent but with pics and video clips,
    this website could undeniably be one of the very best in its niche.
    Great blog!

  72. It is appropriate time to make some plans for the future and
    it’s time to be happy. I’ve read this post and if I could I want to suggest you some interesting things or
    suggestions. Maybe you can write next articles referring to this article.

    I want to read more things about it!

  73. Hi there! Do you know if they make any plugins to assist with SEO?
    I’m trying to get my blog to rank for some targeted keywords but I’m not seeing very good
    results. If you know of any please share. Many thanks!

  74. I am not sure where you are getting your info, but great
    topic. I needs to spend some time learning much more or understanding more.
    Thanks for great info I was looking for this information for my mission.

  75. Aw, this was a really nice post. In concept I would like to put in writing like this moreover – taking time and actual effort to make a very good article… but what can I say… I procrastinate alot and under no circumstances appear to get one thing done.

  76. I know this if off topic but I’m looking into starting my own weblog and was wondering what all is needed
    to get setup? I’m assuming having a blog like yours would cost a pretty penny?
    I’m not very web savvy so I’m not 100% sure. Any recommendations or advice would
    be greatly appreciated. Appreciate it

  77. Hey would you mind sharing which blog platform you’re working with?
    I’m planning to start my own blog soon but I’m having a hard time
    making a decision between BlogEngine/Wordpress/B2evolution and Drupal.

    The reason I ask is because your design and style seems different
    then most blogs and I’m looking for something unique.
    P.S Apologies for getting off-topic but I had to ask!

  78. My developer is trying to persuade me to move to .net from PHP.
    I have always disliked the idea because of the
    costs. But he’s tryiong none the less. I’ve been using WordPress on numerous websites for about a
    year and am anxious about switching to another platform.
    I have heard very good things about blogengine.net.
    Is there a way I can transfer all my wordpress content into it?
    Any kind of help would be greatly appreciated!

  79. Admiring the hard work you put into your blog and
    in depth information you present. It’s nice to come across a blog every
    once in a while that isn’t the same old rehashed information. Fantastic
    read! I’ve saved your site and I’m including your RSS feeds to my Google account.

  80. http://energoefekt.com.pl http://dylanferrandis.com http://rupaulonfox.com http://englishwebteachers.com http://gosciniecmurckowski.pl http://piszemydlaciebie.pl http://dylanferrandis.com http://tyxynk.com http://w-sumie.com.pl http://szalonypodroznik.pl http://woco.pl http://merikotka.com http://stolpo.pl http://stworzwnetrze.com.pl http://firefoxstory.com http://rain-shine-sweet.com http://meblelobos.pl http://jakategocena.pl http://gazetastonoga.pl http://e-szczawnica.org http://kb-direct.pl http://chuck.com.pl http://fluorognost.com http://venndo.pl http://ijcai-19.org http://wiecznauroda.pl http://naropa2016.orgThese army
    drivened blogs are commonly related to the experiences of a U.S
    troop member as well as of wives as well as husbands that are patiently awaiting their
    partners. Us senate Majority Leader Trent Lott 22 Senator Lott,
    at a celebration recognizing The 2nd way to make some income is adding marked advertisement room on your
    internet site where people could pay you so they could market on your website.

  81. Hey would you mind sharing which blog platform you’re working with?
    I’m going to start my own blog in the near future but I’m having a tough
    time choosing between BlogEngine/Wordpress/B2evolution and Drupal.
    The reason I ask is because your design seems different then most blogs and
    I’m looking for something completely unique.

    P.S Sorry for getting off-topic but I had to ask!

  82. Just desire to say your article is as astonishing.
    The clarity in your post is simply great and i can assume you are an expert on this subject.
    Well with your permission allow me to grab your feed to keep up to date with forthcoming post.
    Thanks a million and please keep up the rewarding work.

  83. Have you ever thought about adding a little bit more than just your articles?
    I mean, what you say is fundamental and everything.
    But imagine if you added some great pictures or videos to give your posts more, “pop”!

    Your content is excellent but with images and clips, this website could certainly be one of the
    best in its niche. Very good blog!

  84. Excellent read, I just passed this onto a colleague who was doing some research on that. And he actually bought me lunch as I found it for him smile So let me rephrase that: Thank you for lunch!

  85. I loved as much as you’ll receive carried out right here. The sketch is attractive, your authored material stylish.
    nonetheless, you command get bought an edginess over that you
    wish be delivering the following. unwell unquestionably come more formerly again as
    exactly the same nearly very often inside case you shield this increase.

  86. Howdy just wanted to give you a quick heads up. The words in your post seem to be running off the screen in Internet explorer.

    I’m not sure if this is a format issue or something to do with web browser compatibility but I figured I’d post to let you know.
    The design and style look great though! Hope you get the problem solved soon. Many thanks

  87. This is getting a bit more subjective, but I much prefer the Zune Marketplace. The interface is colorful, has more flair, and some cool features like ‘Mixview’ that let you quickly see related albums, songs, or other users related to what you’re listening to. Clicking on one of those will center on that item, and another set of “neighbors” will come into view, allowing you to navigate around exploring by similar artists, songs, or users. Speaking of users, the Zune “Social” is also great fun, letting you find others with shared tastes and becoming friends with them. You then can listen to a playlist created based on an amalgamation of what all your friends are listening to, which is also enjoyable. Those concerned with privacy will be relieved to know you can prevent the public from seeing your personal listening habits if you so choose.Best Digital Marketing Agency

  88. Howdy! I know this is kind of off topic but I was wondering which blog platform are
    you using for this site? I’m getting tired of WordPress because I’ve had problems with
    hackers and I’m looking at options for another platform.

    I would be great if you could point me in the direction of a good platform.

  89. The new Zune browser is surprisingly good, but not as good as the iPod’s. It works well, but isn’t as fast as Safari, and has a clunkier interface. If you occasionally plan on using the web browser that’s not an issue, but if you’re planning to browse the web alot from your PMP then the iPod’s larger screen and better browser may be important.Creative Marketing Agency

  90. I am extremely impressed with your writing skills and also with the layout for your blog.

    Is that this a paid subject or did you customize
    it yourself? Either way keep up the excellent quality writing,
    it’s uncommon to see a great blog like this one today..

  91. I do agree with all of the ideas you have presented in your post. They’re really convincing and will definitely work. Still, the posts are very short for newbies. Could you please extend them a bit from next time? Thanks for the post.

  92. I know this if off topic but I’m looking into starting my own weblog and was curious what all is needed
    to get setup? I’m assuming having a blog like yours would
    cost a pretty penny? I’m not very internet smart so I’m not 100% positive.
    Any suggestions or advice would be greatly appreciated.
    Appreciate it

  93. This is getting a bit more subjective, but I much prefer the Zune Marketplace. The interface is colorful, has more flair, and some cool features like ‘Mixview’ that let you quickly see related albums, songs, or other users related to what you’re listening to. Clicking on one of those will center on that item, and another set of “neighbors” will come into view, allowing you to navigate around exploring by similar artists, songs, or users. Speaking of users, the Zune “Social” is also great fun, letting you find others with shared tastes and becoming friends with them. You then can listen to a playlist created based on an amalgamation of what all your friends are listening to, which is also enjoyable. Those concerned with privacy will be relieved to know you can prevent the public from seeing your personal listening habits if you so choose.Detox Rehab

  94. Thanks for some other informative site. The place else may I am getting that type of information written in such a perfect means?

    I’ve a project that I’m simply now working on, and I’ve
    been at the glance out for such info.

  95. This is getting a bit more subjective, but I much prefer the Zune Marketplace. The interface is colorful, has more flair, and some cool features like ‘Mixview’ that let you quickly see related albums, songs, or other users related to what you’re listening to. Clicking on one of those will center on that item, and another set of “neighbors” will come into view, allowing you to navigate around exploring by similar artists, songs, or users. Speaking of users, the Zune “Social” is also great fun, letting you find others with shared tastes and becoming friends with them. You then can listen to a playlist created based on an amalgamation of what all your friends are listening to, which is also enjoyable. Those concerned with privacy will be relieved to know you can prevent the public from seeing your personal listening habits if you so choose.Tennessee Rehab

  96. I must say, as a lot as I enjoyed reading what you had to say, I couldnt help but lose interest after a while. Its as if you had a wonderful grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from far more than one angle. Or maybe you shouldnt generalise so considerably. Its better if you think about what others may have to say instead of just going for a gut reaction to the subject. Think about adjusting your own believed process and giving others who may read this the benefit of the doubt.Tennessee addiction treatment

  97. A motivating discussion is worth comment. I do believe that you ought to publish more about this subject matter, it may not be
    a taboo subject but typically people don’t speak about such
    topics. To the next! Best wishes!!

  98. Do you mind if I quote a few of your posts as long as I
    provide credit and sources back to your webpage?

    My blog site is in the very same area of interest as yours and my visitors would certainly benefit from
    a lot of the information you provide here. Please let me know if this alright with you.

    Thanks!

  99. Its like you read my thoughts! You appear to grasp
    a lot approximately this, like you wrote the book in it or something.

    I believe that you just could do with a few % to pressure the message house a little bit, however instead of that, this is great blog.
    A fantastic read. I’ll certainly be back.

  100. When I initially commented I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get four
    emails with the same comment. Is there any way you can remove me
    from that service? Many thanks!

  101. موسسه ثبت فرهنگ با سال ها تجربه در امور ثبت شرکتها ، ثبت نام و علامت تجاری
    ، طراح های صنعتی ، اخذ کارت بازرگانی و کلیه امور حقوقی و ثبتی مفتخر است با در اختیار
    داشتن کادر مجرب و کارآزموده و سیستم هوشمند
    ارتباط با مشتریان کلیه درخواست های شما هم
    وطن گرامی را به صورت کاملا مکانیزه به نتیجه برساند
    . هر زمان تصمیم به انجام
    امور ثبتی خود داشتید ، حتما از مشاوره رایگان وکلا و متخصصین ثبت فرهنک در هر ساعتی
    از شبانه روز استفاده کنید.

    اگر زمان کافی برای مراجعه به دفتر ثبت فرهنک را ندارید نگران نباشید.
    ما در هر نقطه ای از شهر تهران که باشید به شما خدمات حضوری ارائه می کنیم.

    برای این کار کافیست با ما تماس بگیرید و از همکاران ما درخواست اعزام
    مشاور در محل داشته باشید.

  102. Fantastic goods from you, man. I’ve understand your stuff
    previous to and you are just too great. I actually like what you have acquired here, certainly like what you’re saying and the way in which you say it.
    You make it enjoyable and you still care for to keep it smart.

    I can not wait to read far more from you. This is really a great website.

  103. Thanks for any other informative blog. The place else may just I am getting that type of information written in such
    an ideal method? I’ve a venture that I am just now operating
    on, and I have been at the glance out for such info.

  104. Hi! This is kind of off topic but I need some advice from
    an established blog. Is it very hard to set up your own blog?
    I’m not very techincal but I can figure things out pretty fast.
    I’m thinking about creating my own but I’m not sure where to start.

    Do you have any points or suggestions? Cheers

  105. Thanks for your entire work on this blog. Kim loves setting aside time for internet research and it’s easy to understand why. A number of us learn all about the powerful tactic you present priceless ideas through the blog and improve participation from some others on that content and our favorite princess is always becoming educated a whole lot. Take pleasure in the rest of the new year. You are doing a fantastic job.

  106. Its like you read my mind! You seem to know a lot about this, like you wrote the book in it or something.
    I think that you could do with some pics to drive the message
    home a bit, but other than that, this is excellent blog.
    A fantastic read. I’ll definitely be back.

  107. Needed to compose you a tiny note to finally thank you very much yet again for your personal splendid methods you have discussed above. It is strangely open-handed with people like you to provide publicly all that a number of people would have marketed as an electronic book to generate some bucks for their own end, primarily now that you could possibly have tried it if you ever wanted. These inspiring ideas likewise acted like a fantastic way to know that the rest have the same dreams really like my personal own to see a whole lot more concerning this problem. I’m sure there are thousands of more enjoyable times in the future for many who check out your blog.Addiction treatment Tennessee

  108. Thanks for a marvelous posting! I genuinely enjoyed reading it, you’re a great
    author. I will be sure to bookmark your blog and
    will often come back sometime soon. I want to encourage you
    to definitely continue your great writing, have a nice weekend!

  109. Do you have a spam problem on this site; I also am a blogger,
    and I was wanting to know your situation; we have developed some nice practices and we are
    looking to swap techniques with others, please shoot me an email if interested.

  110. I’m really enjoying the theme/design of your blog.
    Do you ever run into any browser compatibility problems?

    A number of my blog visitors have complained about my site
    not working correctly in Explorer but looks great in Safari.
    Do you have any suggestions to help fix this issue?

  111. Do you have a spam issue on this site; I also am
    a blogger, and I was wanting to know your situation; we have developed
    some nice practices and we are looking to
    exchange solutions with others, be sure to shoot me an email if interested.

  112. I do not even know how I ended up here, but I thought this post was great.
    I do not know who you are but certainly you are going to a
    famous blogger if you are not already 😉 Cheers!

  113. Hey I am so delighted I found your web site, I really found you
    by error, while I was browsing on Yahoo for something else, Anyhow I
    am here now and would just like to say thanks for a remarkable post
    and a all round thrilling blog (I also love
    the theme/design), I don’t have time to go through
    it all at the minute but I have bookmarked it and also included your RSS feeds, so when I have time I
    will be back to read more, Please do keep up the great
    work.

  114. Hey there would you mind letting me know which
    web host you’re utilizing? I’ve loaded your blog in 3 completely different browsers and
    I must say this blog loads a lot faster then most.

    Can you suggest a good web hosting provider at a fair price?
    Kudos, I appreciate it!

  115. I’m not sure why but this website is loading incredibly slow
    for me. Is anyone else having this problem or is
    it a problem on my end? I’ll check back later on and
    see if the problem still exists.

  116. I do not even know how I ended up here, but I thought this post was
    good. I don’t know who you are but certainly you’re going to
    a famous blogger if you aren’t already 😉 Cheers!

  117. The Planet’s Veery first Global Heemp and ccbd shop scotland; royalinsight.co.uk, Marketplace

    LoveToCBD.сom is thе planet’s very firѕt Hemp and CBD markketplace
    aimed ɑt all companies operating in thе CBD sector.
    Right һere, yoս can easily buuy ɑnd sell all varieties ߋf CBD products
    without leaving our web site. LoveToCBD.ϲom effectively brings togethr ᎪLL kinds ߋf CBD companies frfom аll partts ߋf the globe.
    Just thіnk of our platformm aѕ a collectuon ᧐f all CBD ecommerce shops andd internet sites providing
    ΑLL products ɑnd brand names ᥙnder one roof.

    Why Choose Uѕ

    1. Start selling instantly: уou dο noot need to thіnk
    about building and optimising an ecommerce store οr a website, which takes time.

    Just register yߋur profile and start listing аll your products on our platform ԝithout delay!

    2. Access tⲟ 1000s οf buyers: yoᥙr gooɗs and business ᴡill get 1000s ⲟf views frоm CBD shops and other buyers fгom daay оne!

    3. Save money: with oour low subscription rates, үou wіll not neеd to spend money ߋn websites, domains, hosting,
    advertising and marketing oг SEO agencies.

    Promoting ɑnd marketing

    Whilst most traditional social media ɑnd online
    ssearch engine marketing opportunities ɑre fading аway, oour CBD market plaсе enables intermal
    product promotion јust liкe bannjer advertisements
    ɑnd feattured listings. Ꮲut yⲟur vape brand іn front of thousands ᧐f customers noԝ!

    View our marketing and advertising options.

    Ꮤe lοok forward to seeіng adventuresinmachinelearning.сom
    onboard.

    Join now at https://lovetocbd.com

    Kind regards

    Sienna

  118. I am really impressed with your writing skills and also with the layout on your blog.
    Is this a paid theme or did you modify it yourself?

    Anyway keep up the excellent quality writing, it is rare to see a great blog like this one nowadays.