Create an AI That Can Read Handwriting in Just 26 Lines

In this machine learning tutorial, we will be using Python and Keras to train a CNN (convolutional neural network) on the MNIST dataset. The MNIST (Modified National Institute of Standards and Technology) database is an enormous repository of images of handwritten characters that is available online for free. It is commonly used in tutorials as it avoids various issues that many data-scientists may have such as gathering data or cleaning the dataset before training a machine learning algorithm on it.

Keras is a machine learning library for Python. Out of all the machine learning libraries available on python, Keras is considered to be high level - meaning that it is simpler to create neural networks in it compared to lower level libraries - making it an excellent starting step for beginners. Another example of a machine learning library is TensorFlow, which is lower level than Keras.

Python is the programming language that this tutorial is in. Python is a high level programming language, first released in 1991, which is famous for being easy to use and friendly for beginners. Python's design philosophy is centred around clean and readable code making it a popular choice in the computer science industry too. It also has a huge repository of 3rd party plugins called modules. A machine learning library is an example of a Python module. The Python programming language can be downloaded here: https://www.python.org/. In this tutorial we will be using version 3.7 so make sure to install that version. You can use whichever editor you want but Pycharm is recommended for being user friendly, Python oriented and working well right out the box with no need for pre-configuration.

Before we begin writing our code, we need to start by installing the relevant modules in Python. There are different ways to do this. I will demonstrate 4 different ways to do this that different people can use.

  1. (This method requires Pycharm) Create your new file and write out the imports required in our machine learning program.

    import keras
    from keras.datasets import mnist
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Flatten
    from keras.layers import Conv2D, MaxPooling2D
    from keras import backend as K

    Many of these lines will be underlined in red, hover over the first one, import keras, and click the red light bulb that pops up. Once you click the lightbulb, a little menu will show up. You can then click install modules and Pycharm will install the required Python modules automatically. You can also keep these lines since they are the beginning lines of our final program.

  2. (This is for Microsoft Windows users who chose to install Python to PATH) Open a CMD/Powershell and type in the following

    python -m pip install Keras
  3. (This is for Microsoft Windows users for whom method 2 did not work) Go to C:\Users\[Your Username]\AppData\Local\Programs\Python\Python37\Scripts Then open a CMD/Powershell instance here and type

    ./pip3.exe install Keras
  4. (This method is for Linux and Mac users) Open a terminal and type in the following:

    pip3 install Keras

Now that the installation process is out the way, we can begin writing our code.

First we need to import the modules we will be using in the rest of the program. The first import we will be doing is the machine learning library that we will be using.

import keras

This is technically the only import we need, however, adding other imports can make writing code easier. For example, to access the MNIST database after importing keras, we would have to write keras.datasets.mnist, this is long and leads to decreased readability and is also longer to type out. Instead, we can also import MNIST in particular like this:

from keras.datasets import mnist

This means that if we want to reference the MNIST dataset in our code, instead of typing keras.datasets.mnist all we need to do is type mnist. Similarly, we can do this for other different parts of the Keras module we will be using in our code.

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

The next thing we need to is define some of the parameters that Keras will be using to train our model.

batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28

These 3 parameters will be used later. The batch size parameter represents how many samples a neural network takes before updating its weights. In this case it is 128, so the neural network will consider 128 samples at once and then the change in the weight after that batch will be the average change required from the 128 samples. The number of classes represents how many possible labels there are in the training data. Since the MNIST dataset is handwritten digits, that means there are 10 (from 0 to 9). The epochs is how many times over does the program train on the full dataset. The more epochs the better, though you may end up over training your neural network. The final line represents the dimensions of the images in the MNIST Dataset.

How to Develop a CNN for MNIST Handwritten Digit ...

Next, we need to extract the images from the MNIST dataset. Usually, this process would include splitting the dataset into a training dataset (on which you train your model) and a testing dataset (on which you test your model), however, Keras has done this for us before hand making our job a bit easier.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

These will later be used by our model when it is being trained by Keras.

We also need to tell the model how to handle the different types of images and modify our input data (both training and testing) accordingly. There are different formats images can take when converted to arrays, for example the colours can be ordered so they come first in the data, or last. First is default but sometimes they are last, usually for performance purposes. So we have to put in a clause in our training dataset to reshape our training dataset depending on the standard used.

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

The next step is the last step of the pre-processing of our input data in our machine learning program. We need to convert the training data to be floating point numbers instead of integers. Then we need to take the values, which would usually be between 0 and 255, to be in between 0 and 1. Luckily, Python has shortcuts for this, so all we need to write is:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

Now, onto the pre-processing of the output data. The MNIST dataset is stored in such a way that the images are labelled as the number they are. For example, if there was an image of a handwritten 2, it would be labelled 2. However, our neural network won't output the number 2, it will ideally output [0,0,1,0,0,0,0,0,0,0]. This means that we have to convert all the output data to their vector forms. This is called one hot encoding and is used because it increases the accuracy of neural networks. Fortunately, Keras handles this for us with built-in functions.

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

At this point, all the pre-processing that needs to be done is done. Next, we have to define the structure of our neural network. Our model type is "sequential", since we have a simple list of layers that feed into the next layer. The other type of model is "functional" where a layer can feed into any other layer in the neural network and not just the next layer up in the neural network. Our neural network will also rely on convolutional and pooling layers, since they are best for image recognition.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

The structure that I will be using is essentially:

  1. 2D convolutional layer with 32 neurons and a kernel size of 3x3
  2. RELU (Rectified Linear Unit) activation function
  3. 2D convolutional layer with 64 neurons and a kernel size of 3x3
  4. RELU activation function
  5. Pooling layer with a 2x2 pool size
  6. A normal layer with 128 neurons
  7. RELU activation function
  8. An output layer with 10 neurons
  9. Softmax activation function

After doing this tutorial, I recommend messing around with the different features of this neural structure to see how performance of the machine learning model changes.

Since we have described the model topology to Keras, we can get Keras to compile the structure. It is at this point where we specify the loss function and optimiser that are to be used during the training of this model. The loss function we have chosen is the loss function for categorical cross-entropy provided in Keras since the output of our neural network is categorical. The optimiser function that we are using is "adadelta". Adadelta is an improved version of stochastic gradient descent and comes with a learning rate that changes as the neural network trains.

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy', 'categorical_accuracy'])

The metrics=['accuracy', 'categorical_accuracy'] just tells Keras to print the accuracy metric and the categorical accuracy while the neural network trains. You can change this to other metrics like 'loss', 'MeanSquaredError' or 'AUC'. To put multiple, just add multiple elements to the list as is shown.

With this, Keras now knows how to go about training the model. We can now fit the model to its input data, x_train. We have to provide the constants, batch_size and epochs, which we specified at the start of the program and pass in the training input, training output, testing input and testing output. The training dataset will be used to train our dataset and the testing dataset will be used to evaluate the performance of the neural network.

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

We can also score the dataset again, separately to the training process, and print the loss and accuracy of the model output from the testing dataset using:

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

You can now run the script that you have written up over the course of the tutorial. If you hit run it'll start going through the 12 epochs you have specified. Don't worry if it ends up taking a long time - even a 3000$ NVIDIA GPU takes nearly half a minute to do an epoch! Once the process is finished you should have a neural network with 99.25% accuracy and potentially the first neural network you've made. Congrats! If you wish to further challenge your image recognition skills with more difficult datasets, the next step might be the Iris flower dataset - which is a dataset of 3 species of flowers with 50 samples which have an image and 2 measurements of length of the sample.

The output from running the script should look something like this:

https://i.paste.pics/8471b5e7464202ff64ae7cd44db41aca.png

However, the tutorial doesn't have to end here. An important part of machine learning and data science is being able to visualise your results. The graphing library we will be using for this is called plotly. You can install it using the same methods as Keras, replacing 'Keras' with 'plotly'. All we have to do is add the following line at the top, along with our other imports for Keras.

import plotly.graph_objects as go

The as go after the import is another shortcut to shorten module names, instead of having to type plotly.graph_objects every time we want to reference the graph objects, we can just write go. After this we need to change our model fitting line from:

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

to:

history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

This assigns all the history of the model to the variable history. We can then access this when plotting our graphs.

In this graph, I plan to both plot accuracy and loss at once so I need to add two graphs.

fig = go.Figure()

fig.add_trace(
    go.Scatter(
        y = history.history['loss'],
        mode = 'lines',
        name = 'loss'
    )
)

fig.add_trace(
    go.Scatter(
        y = history.history['accuracy'],
        mode = 'lines',
        name = 'accuracy'
    )
)

fig.show()

This initialises the graph and adds both 'traces' (lines) to the graph. Once the model is trained and evaluated, a graph will open in your browser. The graph will show the model's accuracy and loss with each epoch and show you how the model improved as it trained. After finishing training, the outputted graph should look something like this:

https://i.paste.pics/fb6456c589db15442f9e9dfd713dc1db.png

Now you have a graph, you can finally show off the progress of your neural network.

Final Code:

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
import plotly.graph_objects as go
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy', 'categorical_accuracy'])

history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

fig = go.Figure()

fig.add_trace(
    go.Scatter(
        y = history.history['loss'],
        mode = 'lines',
        name = 'loss'
    )
)

fig.add_trace(
    go.Scatter(
        y = history.history['accuracy'],
        mode = 'lines',
        name = 'accuracy'
    )
)

fig.show()