Neural Network observations with change in parameters

William Scott
4 min readMar 1, 2019

Implementing Neural Networks and checking how the accuracy changes upon the change of parameters

find the code here.

Question 1

Self-Implemented NN:

Accuracy:

Parameter Values used for sigmoid:

_lambda = 0.01

lr = 1

Parameter values used for relu:

_lambda = 1

lr = 0.01

for relu, the most problematic things are that the values tend to go to nan, due to 2 main reasons

  • -1 in exponential value
  • 0 divided by zero in sigmoid

to handle this,

  • I used a little normalization in softmax by subtracting the max value with the whole array.
  • And divided the whole randomly generated weight matrix with the square root of the dimensions.

Architecture:

  • The notations are followed from Andrew NG in his youtube videos.
  • the input is considered as the output from the first layer
  • and then at every layer there are two different processes that has to be done
  • calculating the hypothesis
  • passing through the activation function
  • the output from the activation is considered as the output of that particular layer
  • the activation functions used are relu and sigmoid
  • softmax is the last layer of the architecture
  • I used cross entropy and its derivative for calculating the losses

Overfit and Underfit:

  • We say that there is overfitting when the performance on test set is much lower than the performance on train set
  • But in this case the overfit doesn’t seem to happen
  • We always have an option of early stopping in which, we can stop the iterations when the error seems to reduce.
  • Underfit seems to happene for few plots, in which the accuracy continues to reduce

Sigmoid (1 Hidden Layer)

Train — (2500 Iterations)

Sigmoid (1 Hidden Layer)

Test — (2500 Iterations)

Observations:

  • The accuracy was pretty impressive.
  • More number of iterations are chosen by reducing the learning rate.
  • This is done to avoid the nan values
  • Stable softmax is used to normalize the values
  • I implemented preprocessing by sklearn for the values to normalize a little.
  • The last layer is considered to be softmax
  • Used sigmoid as the activation funciton

Sigmoid (3 Hidden Layer)

Train — (1000 Iterations)

Sigmoid (3 Hidden Layer)

Test — (1000 Iterations)

Observations

  • This seems to be a case of vanishing gradient.
  • The values tend to be constant, which changes from 43 to 51 respectively.
  • No actual learning is done in this case
  • In the three hidden layer, the last layer is considered as softmax

Relu (1 Hidden Layer)

Train — (4000 Iterations)

Relu (1 Hidden Layer)

Test — (4000 Iterations)

Observations:

  • Relu seems to perform same as the softmax for 1 hidden layer
  • The activation function is used as relu.
  • In the back propogation, we used the derivative of the relu

Relu (3 Hidden Layer)

Train — (2500 Iterations)

Relu (3 Hidden Layer)

Test — (2500 Iterations)

Observations:

  • 3 hidden layers again seems to be the case of vanishing gradient, as the accuracy is not above average on both training and testing.
  • Initially the value is totally 0, and then from there the value raises to a constant value

1 (d): NN using sklearn

Classifier:

MLPClassifier(activation=’logistic’, solver=’lbfgs’, alpha=1e-5, hidden_layer_sizes=(100), random_state=44)

Train test split: (size=0.33, random_state=42)

Observations:

  • Using lbfgs solver the accuracy is pretty good, but sgd is not good for 3 hidden layers
  • The number of iterations needed are very less 3 or 5

Question 2:

Observations:

  • The given images are RBG valued images and we have to extract the features from it so that we can feed it to the SVM
  • I tried using PCA so that I can reduce the dimensionally
  • The given data is a pre-trained data
  • The last layer will return 1000 features which we will use for the training process

Train Accuracy:

For 300–0.65%, For 1000–83%

Test Accuracy:

For 300–0.65%, For 1000–83%

Confusion Matrix (300):

[[ 54, 95],

[ 4, 147]]

--

--