12

Intuitively Create CNN for Fashion Image Multi-class Classification

 4 years ago
source link: https://mc.ai/intuitively-create-cnn-for-fashion-image-multi-class-classification/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
IbiAFv7.png!web

Intuitively Create CNN for Fashion Image Multi-class Classification

Img adapted from Pixabay via link

In my previous article , I walked though how to build a Convolution Neural Network (CNN) for a binary image classification problem. In this article, I will create another CNN for retail marketing industry. What sets this article unique: different format of input data which requires different data processing methods, and different CNN architecture for multi-class classification. It is split into 6 parts.

  1. Problem statement
  2. Data processing
  3. Model building
  4. Model compiling
  5. Model fitting
  6. Model evaluation
  1. Problem statement

We are given a set of images from retail industry. The task is to create a CNN model to predict the label of a fashion image: 0 as T-shirt; 1 as Trouser; 2 as Pullover; 3 as Dress; 4 as Coat; 5 as Sandal; 6 as Shirt; 7 as Sneaker; 8 as Bag; 9 as Ankle boot.

The data we used is Fashion MINST dataset with 70, 000 images, of which 60,000 for training set, and 10,000 for test set. All images are in grayscale with 28 pixels in height and 28 pixels in width. Each pixel representing the darkness of the pixel ranges from 0 (black) to 255 (white).

Figure 1 is a snippet of the training data. Note, each row representing an image has an associated label and 784-pixel values.

Fig.1 A snippet of training data

First, read in training and test data and convert dataframe type to numpy array .

fashion_train_df = pd.read_csv(‘fashion-mnist_train.csv’,sep=’,’)
fashion_test_df = pd.read_csv(‘fashion-mnist_test.csv’, sep = ‘,’)
training = np.array(fashion_train_df, dtype = ‘float32’)
testing = np.array(fashion_test_df, dtype=’float32')

If you want to view the image in color or grayscale mode, try below:

i = random.randint(1,60000) #select random index from 1 to 60,000
plt.imshow( training[i,1:].reshape((28,28)) ) # reshape and plot the image
plt.imshow( training[i,1:].reshape((28,28)) , cmap = ‘gray’) # reshape and plot the image

Next, scale the independent variables, namely the pixels, between 0 and 1.

X_train = training[:,1:]/255
y_train = training[:,0]
X_test = testing[:,1:]/255
y_test = testing[:,0]

Then, split the training data into training and validation sets, with validation taking 20%. With validation set, the model will be evaluated on its ability to generalize prediction on new data.

X_train, X_validate, y_train, y_validate = train_test_split(X_train, y_train, test_size = 0.2, random_state = 12345)

Finally, we need to reshape X_train , X_validate , X_test . This is a critical point. Keras only accepts a special shape of input data for CNN, namely (batch size, pixel width, pixel height, number of colour channels). Therefore,

X_train = X_train.reshape((-1, 28, 28, 1))
X_test = X_test.reshape(X_test.shape[0], *(28, 28, 1))
X_validate = X_validate.reshape(X_validate.shape[0], *(28, 28, 1))

Note, two methods are used to reshape the data above, achieving the same goal. 1st method sets the 1st dimension for Numpy to infer, while 2nd defines the 1st dimension with an *.

Great, now the data is ready to train the model.

In general, building a CNN requires 4 steps: convolution, max pooling, flattening and full connection. Here we will build a CNN model with 2 convolution layers.

Fundamentally, CNN is based on convolution. In simple words, convolutions use a kernel matrix to scan a given image and apply a filter to obtain a certain effect, such as blurring and sharpening. In CNN, kernels are used for feature extraction to select the most important pixels of an image and meanwhile preserves the spatial relationship between pixels.

If you want detailed explanation on the concept, please check the previous article here . Feel free to explore this fantastic website to visualize how convolution works. Another great website is by Ryerson University. It visually and interactively shows how a CNN works.

classifier = Sequential()
classifier.add(Conv2D(64,3, 3, input_shape = (28,28,1), activation=’relu’))

Note, Number of feature detector is set to be 64, and the feature detector is a 3×3 array. input_shape is the shape of input images on which we apply feature detectors through convolution. We set it to be (28, 28, 1). Here, 1 is number of channel for a grayscale image, 28×28 is the image dimension in each channel. This needs to the same as the shape of X_train , X_test , X_validate .

Final argument is the activation function. we use ReLU to remove negative pixel values in feature maps. This is because depending on the parameters used in convolution, we may obtain negative pixels in feature maps. Removing negative pixels add non-linearity for a non-linear classification problem.

Max pooling is to reduce size of a feature map produced by convolution by sliding a table and taking the maximum value in the table. Ultimately, it aims to reduce the number of nodes in the fully connected layers without losing key features and spatial structure information in the images.

Specifically, we use MaxPooling2D() function to add the pooling layer. In general, we use a 2×2 table for pooling.

classifier.add(MaxPooling2D(pool_size = (2, 2)))

Dropout is the solution for over-fitting. How does drop out work? During each training iteration, some neurons are randomly disabled to prevent them from depending on each other too much. By overwriting these neurons, neural network retains a different architecture each time, helping neural network learn independent correlations of the data. This prevent the neurons over-learn. Specifically,

classifier.add(Dropout(0.25))

Note, we set 25% of neurons to disabled at each iteration.

3.4 Convolution & Max Pooling

Based on previous experiments, add a 2nd layer for convolution and max pooling to improve model performance.

classifier.add(Conv2D(32,3, 3, activation=’relu’))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

Flattening is to take all reduced feature maps after pooling into a single vector as the input for the fully connected layers. Specifically,

classifier.add(Flatten())

With above, we converted an input image into a one-dimensional vector. Now let’s build a classifier using this vector as the input. Specifically,

classifier.add(Dense(output_dim = 32, activation = ‘relu’))
classifier.add(Dense(output_dim = 10, activation = ‘sigmoid’))

Note, for the 1st hidden layer, output_dim as the number of nodes in the hidden layer, is set to be 32. Please feel free to try more. Use ReLU as activation function.

With that done, congratulation for finishing the model building. Figure 2 is what we built.

Fig.2 CNN architecture diagram (Img created by Author)

With all layers added, let’s configure CNN for training. An important decision to make is the loss function. As advice, if one sample can have multiple classes or labels, use categorical_crossentropy . If classes are mutually exclusive (e.g. when each sample belongs exactly to one class), use sparse_categorical_crossentropy . Here use the latter.

classifier.compile(loss =’sparse_categorical_crossentropy’, optimizer=Adam(lr=0.001), metrics =[‘accuracy’])

Now the model is ready to be trained. We train the model for 50 iterations on the data. The model updates its gradients every 512 samples. Use ( X_validate , y_validate ) to evaluate the model loss and accuracy.

epochs = 50
history = classifier.fit(X_train, y_train, batch_size = 512, nb_epoch = epochs, verbose = 1, validation_data = (X_validate, y_validate))

At end, we obtained a training accuracy of 92% and test accuracy of 90% . Quite good results!

Now, let’s evaluate the model on test sets. Specifically,

evaluation = classifier.evaluate(X_test, y_test)

We obtained a test accuracy of 90% ! Figure 3 below shows a view of predicted and Real class of the images.

Fig.3 Predicted and True class comparison

Finally, if you want tune the model with much more data, feel free to explore this link . If you want to check more advanced Data Science Innovation in Retail industry, check this page .

Great! Huge congratulation to the end. Hopefully, this gives a sense of how to create a CNN for fashion image classification. If you need the source code, feel free to visit my Github page. Many thanks for your time!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK