Project completed in week 9 (23.11.-27.11.20) of the Data Science Bootcamp at Spiced Academy in Berlin.
This week we dived into Deep Learning and learned about different types neural networks (NN) and their applications in various domains. The main goal of this project was to learn and understand what each hyperparameter in a NN model does and how to tune it, so this week was more theoretical and math-heavy than usual.
Building a Neural Network
For my first deep learning project, I used the famous Fashion MNIST dataset created by Zalando. The dataset contains 60K images of 10 clothing items (e.g., tops, sandals, trousers). In order to classify the images in the correct item category, I tried two types of NN:
- Artificial Neural Network (ANN): represents a group of multiple perceptrons/ neurons at each layer. ANN is also called Feed-Forward Neural Network, because the inputs are processed only forward. An ANN consists of three layers: input, hidden, and output.
- Convolutional Neural Network (CNN): uses filters to extract features and capture the spatial information from images. CNNs are the go-to model for image classification.
In this post, I will present only the CNN model, since it’s the one that performed best in my project. Here’s an overview of my model:
model = keras.models.Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1))) model.add(MaxPooling2D((2, 2))) model.add(Flatten()) model.add(Dense(100, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(10, activation='softmax'))
First [line 1] I instantiated the model. Then I started adding several layers of with different hyperparameters.
Conv2Dis a 2D convolution layer which creates a convolution kernel that is filled with layers input to produce a tensor of outputs. I used 32 filters, as it’s recommended to use powers of 2 as values.
kernel_sizedetermines the height and width of the kernel, passed as a tuple.
activationspecifies the activation function, which transforms the summed weighted input from the node into the activation of the node.
relu(Rectified Linear Activation Function) outputs the input directly if it is positive and 0 if it is negative.
kernel_initializerrefers to the functions for initializing the weights, which in this case is uniform distribution.
input_shaperepresents the dimension of the images (28×28 px) and their color code (1 for black-and-white). This hyperparameter needs to be specified only in the first layer.
Next  I added a
MaxPooling2D layer, which downsamples the input representation by taking the maximum value over the window defined by
pool_size (2, 2) for each dimension along the features axis.
Then  I added a
Flatten layer that flattens the images, so that the pixel values are between 0 an 1. This is done because when working with images, if the values are positive and large, a ReLU neuron becomes almost a linear unit, losing many of its advantages.
Lastly [5,6] I added two
Dense layers, which are fully connected layers, where the first parameter declares the number of desired units. So in  I have a layer with has 100 neurons with ReLU activation. The last layer  has 10 hidden layers (number of clothing items) and
softmax activation, which is used for multi-class classification.
Tuning the hyperparameters
Finally, I compiled the model:
optimizerdefines the stochastic gradient descent algorithm that is used. I’ve tried both
sgd(Stochastic Gradient Descent) and
adam(Adaptive Moment Estimation), and stuck with the latter because it is more advanced it it generally performs better.
lossdefines the cost function.
metricsis a list of all the evaluation scores I want to compute. In this case, accuracy is enough.
model.fit( xtrain, to_categorical(ytrain), epochs=15, batch_size=32, validation_data=(xtest, to_categorical(ytest)) )
epochsrepresents the number of iterations on the training data.
batch_sizeis the number of images to feed tot he model in one go, it normally ranges from 16 to 512, but in any case it’s smaller than the total number of samples.
validation_datarepresents the part of the dataset kept for testing the model.
to_categoricalone-hot-encodes the labels (clothing items).
Evaluating the model performance
The CNN had an accuracy of 99.43% on the train set and 90.69% on the test set. This is a really good score! I think it could’ve been even better if I had let the model train longer (i.e. more epochs).
Friday Lightning Talk
This Friday talk was a bit different from the previous ones. Instead of presenting our projects, we had to read and present a paper about a Deep Learning application, like generative art, object recognition, or text generation. Of course, I chose the latter topic and tried LSTM to generate poems by E.A. Poe. But I talked about GPT-3, a state-of-the-art deep learning model that can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. However, I focused not so much on the technical details, as on the ethics and implications of this technology. This opened an important discussion in our group, which I think should be included in the curriculum of every tech degree.