Neural Network observations with change in parameters

William Scott

4 min readMar 1, 2019

Implementing Neural Networks and checking how the accuracy changes upon the change of parameters

Photo by The Roaming Platypus on Unsplash

find the code here.

Question 1

Self-Implemented NN:

Accuracy:

Parameter Values used for sigmoid:

_lambda = 0.01

lr = 1

Parameter values used for relu:

_lambda = 1

lr = 0.01

for relu, the most problematic things are that the values tend to go to nan, due to 2 main reasons

-1 in exponential value
0 divided by zero in sigmoid

to handle this,

I used a little normalization in softmax by subtracting the max value with the whole array.
And divided the whole randomly generated weight matrix with the square root of the dimensions.

Architecture:

The notations are followed from Andrew NG in his youtube videos.
the input is considered as the output from the first layer
and then at every layer there are two different processes that has to be done
calculating the hypothesis
passing through the activation function
the output from the activation is considered as the output of that particular layer
the activation functions used are relu and sigmoid
softmax is the last layer of the architecture
I used cross entropy and its derivative for calculating the losses

Overfit and Underfit:

We say that there is overfitting when the performance on test set is much lower than the performance on train set
But in this case the overfit doesn’t seem to happen
We always have an option of early stopping in which, we can stop the iterations when the error seems to reduce.
Underfit seems to happene for few plots, in which the accuracy continues to reduce

Sigmoid (1 Hidden Layer)

Train — (2500 Iterations)

Sigmoid (1 Hidden Layer)

Test — (2500 Iterations)

Observations:

The accuracy was pretty impressive.
More number of iterations are chosen by reducing the learning rate.
This is done to avoid the nan values
Stable softmax is used to normalize the values
I implemented preprocessing by sklearn for the values to normalize a little.
The last layer is considered to be softmax
Used sigmoid as the activation funciton

Sigmoid (3 Hidden Layer)

Train — (1000 Iterations)

Sigmoid (3 Hidden Layer)

Test — (1000 Iterations)

Observations

This seems to be a case of vanishing gradient.
The values tend to be constant, which changes from 43 to 51 respectively.
No actual learning is done in this case
In the three hidden layer, the last layer is considered as softmax

Relu (1 Hidden Layer)

Train — (4000 Iterations)

Relu (1 Hidden Layer)

Test — (4000 Iterations)

Observations:

Relu seems to perform same as the softmax for 1 hidden layer
The activation function is used as relu.
In the back propogation, we used the derivative of the relu

Relu (3 Hidden Layer)

Train — (2500 Iterations)

Relu (3 Hidden Layer)

Test — (2500 Iterations)

Observations:

3 hidden layers again seems to be the case of vanishing gradient, as the accuracy is not above average on both training and testing.
Initially the value is totally 0, and then from there the value raises to a constant value

1 (d): NN using sklearn

Classifier:

MLPClassifier(activation=’logistic’, solver=’lbfgs’, alpha=1e-5, hidden_layer_sizes=(100), random_state=44)

Train test split: (size=0.33, random_state=42)

Observations:

Using lbfgs solver the accuracy is pretty good, but sgd is not good for 3 hidden layers
The number of iterations needed are very less 3 or 5

Question 2:

Observations:

The given images are RBG valued images and we have to extract the features from it so that we can feed it to the SVM
I tried using PCA so that I can reduce the dimensionally
The given data is a pre-trained data
The last layer will return 1000 features which we will use for the training process

Train Accuracy:

For 300–0.65%, For 1000–83%

Test Accuracy:

For 300–0.65%, For 1000–83%

Confusion Matrix (300):

[[ 54, 95],

[ 4, 147]]

Neural Network observations with change in parameters

Question 1

Self-Implemented NN:

Architecture:

Overfit and Underfit:

Sigmoid (1 Hidden Layer)

Sigmoid (1 Hidden Layer)

Sigmoid (3 Hidden Layer)

Sigmoid (3 Hidden Layer)

Relu (1 Hidden Layer)

Relu (1 Hidden Layer)

Relu (3 Hidden Layer)

Relu (3 Hidden Layer)

1 (d): NN using sklearn

Question 2:

Written by William Scott