Neural Network observations with change in parameters
Implementing Neural Networks and checking how the accuracy changes upon the change of parameters
find the code here.
Question 1
Self-Implemented NN:
Accuracy:
Parameter Values used for sigmoid:
_lambda = 0.01
lr = 1
Parameter values used for relu:
_lambda = 1
lr = 0.01
for relu, the most problematic things are that the values tend to go to nan, due to 2 main reasons
- -1 in exponential value
- 0 divided by zero in sigmoid
to handle this,
- I used a little normalization in softmax by subtracting the max value with the whole array.
- And divided the whole randomly generated weight matrix with the square root of the dimensions.
Architecture:
- The notations are followed from Andrew NG in his youtube videos.
- the input is considered as the output from the first layer
- and then at every layer there are two different processes that has to be done
- calculating the hypothesis
- passing through the activation function
- the output from the activation is considered as the output of that particular layer
- the activation functions used are relu and sigmoid
- softmax is the last layer of the architecture
- I used cross entropy and its derivative for calculating the losses
Overfit and Underfit:
- We say that there is overfitting when the performance on test set is much lower than the performance on train set
- But in this case the overfit doesn’t seem to happen
- We always have an option of early stopping in which, we can stop the iterations when the error seems to reduce.
- Underfit seems to happene for few plots, in which the accuracy continues to reduce
Sigmoid (1 Hidden Layer)
Train — (2500 Iterations)
Sigmoid (1 Hidden Layer)
Test — (2500 Iterations)
Observations:
- The accuracy was pretty impressive.
- More number of iterations are chosen by reducing the learning rate.
- This is done to avoid the nan values
- Stable softmax is used to normalize the values
- I implemented preprocessing by sklearn for the values to normalize a little.
- The last layer is considered to be softmax
- Used sigmoid as the activation funciton
Sigmoid (3 Hidden Layer)
Train — (1000 Iterations)
Sigmoid (3 Hidden Layer)
Test — (1000 Iterations)
Observations
- This seems to be a case of vanishing gradient.
- The values tend to be constant, which changes from 43 to 51 respectively.
- No actual learning is done in this case
- In the three hidden layer, the last layer is considered as softmax
Relu (1 Hidden Layer)
Train — (4000 Iterations)
Relu (1 Hidden Layer)
Test — (4000 Iterations)
Observations:
- Relu seems to perform same as the softmax for 1 hidden layer
- The activation function is used as relu.
- In the back propogation, we used the derivative of the relu
Relu (3 Hidden Layer)
Train — (2500 Iterations)
Relu (3 Hidden Layer)
Test — (2500 Iterations)
Observations:
- 3 hidden layers again seems to be the case of vanishing gradient, as the accuracy is not above average on both training and testing.
- Initially the value is totally 0, and then from there the value raises to a constant value
1 (d): NN using sklearn
Classifier:
MLPClassifier(activation=’logistic’, solver=’lbfgs’, alpha=1e-5, hidden_layer_sizes=(100), random_state=44)
Train test split: (size=0.33, random_state=42)
Observations:
- Using lbfgs solver the accuracy is pretty good, but sgd is not good for 3 hidden layers
- The number of iterations needed are very less 3 or 5
Question 2:
Observations:
- The given images are RBG valued images and we have to extract the features from it so that we can feed it to the SVM
- I tried using PCA so that I can reduce the dimensionally
- The given data is a pre-trained data
- The last layer will return 1000 features which we will use for the training process
Train Accuracy:
For 300–0.65%, For 1000–83%
Test Accuracy:
For 300–0.65%, For 1000–83%
Confusion Matrix (300):
[[ 54, 95],
[ 4, 147]]