Week 9/12 #DataScienceBootcamp

Week 9 (23.11.-27.11.)

  • Topic: Deep Learning
  • Lessons: Artificial Neural Networks, Backpropagation, Keras, Convolutional Neural Networks, Pretrained Networks, Transfer Learning, Deep Learning Papers
  • Project: Classify images of clothing items with neural networks models
  • Dataset: Fashion MNIST
  • Code: GitHub

This week we dived into Deep Learning and learned about different types neural networks and their applications in various domains. The main goal of this project was to learn and understand what each hyperparameter in a NN model does and how to tune it, so it was more theoretical and math-heavy than usual.

Building a Neural Network

For my first deep learning project, I used the famous Fashion MNIST dataset created by Zalando, which contains 60K images of 10 clothing items (like T-Shirts, sandals, trousers), and I classified the images in the correct item category. I tried two types of NN:

  • Artificial Neural Network (ANN): a group of multiple perceptrons/ neurons at each layer. Also called Feed-Forward Neural Network, because the inputs are processed only forward. It consists of three layers: input, hidden, and output.
  • Convolutional Neural Network (CNN): are the go-to method for image recognition. CNN use filters to extract features and capture the spatial information from images.

In this post I will present only the CNN model, since it’s the one that performed best. Here’s an overview of my model:

model = keras.models.Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))

First [line 1] I instantiated the model. Then I started adding several layers of with different hyperparameters.

  • Conv2D is a 2D convolution layer which creates a convolution kernel that is filled with layers input to produce a tensor of outputs. I used 32 filters, as it’s recommended to use powers of 2 as values.
  • kernel_size determines the height and width of the kernel, passed as a tuple.
  • activation specifies the activation function, which transforms the summed weighted input from the node into the activation of the node. relu (Rectified Linear Activation Function) outputs the input directly if it is positive and 0 if it is negative.
  • kernel_initializer refers to the functions for initializing the weights, which in this case is uniform distribution.
  • The input_shape represents the dimension of the images (28×28 px) and their color code (1 for black-and-white). This needs to be specified only in the first layer.

Next [3] I added a MaxPooling2D layer, which downsamples the input representation by taking the maximum value over the window defined by pool_size (2, 2) for each dimension along the features axis.

Then [4] I added a Flatten layer that flattens the images, so that the pixel values are between 0 an 1. This is done because when working with images, if the values are positive and large, a ReLU neuron becomes almost a linear unit, losing many of its advantages.

Lastly [5,6] I added two Dense layers, which are fully connected layers, where the first parameter declares the number of desired units. So in [5] I have a layer with has 100 neurons with ReLU activation. The last layer [6] has 10 hidden layers (number of clothing items) and softmax activation, which is used for multi-class classification.

Finally, I compiled the model:

model.compile(optimizer='adam',loss='categorical_crossentropy', metrics=['accuracy'])
  • optimizer defines the stochastic gradient descent algorithm that is used. I’ve tried both sgd (Stochastic Gradient Descent) and adam (Adaptive Moment Estimation), and stuck with the latter because it is more advanced it it generally performs better.
  • loss defines the cost function.
  • metrics is a list of all the evaluation scores I want to compute. In this case, accuracy is enough.
    validation_data=(xtest, to_categorical(ytest))
  • epochs represents the number of iterations on the training data.
  • batch_size is the number of images to feed tot he model in one go, it normally ranges from 16 to 512, but in any case it’s smaller than the total number of samples.
  • validation_data represents the part of the dataset kept for testing the model.
  • to_categorical one-hot-encodes the labels (clothing items).

Evaluating the model performance

The CNN had an accuracy of 99.43% on the train set and 90.69% on the test set. This is a really good score, and actually it could’ve been even better if I had let the model train longer (i.e. more epochs).

Friday Lightning Talk

This Friday talk was a bit different from the previous ones. Instead of presenting our projects, we had to read and present a paper about a Deep Learning application, for example generative art, object recognition, or text generation. Quite predictably, I chose the latter topic and tried LSTM to generate poems by E.A. Poe. But I talked about GPT-3, a state-of-the-art deep learning model that can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. However, I focused not so much on the technical details, as on the ethics and implications of this technology.

Comments are closed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: