Decision Tree vs Random Forest

3 min readMar 1, 2019

Photo by Aldino Hartan Putra on Unsplash

This blog is about the results and observations of the works in Decision Tree and Random Forest.

Click here to check the code and the problem statement.

Part A:

The accuracies of both the models seem to be pretty close. But the accuracy on the random forest is little better than the decision tree. The speed on the decision tree seems to be little better as the random forest has to calculate for multiple trees.

Part B:

Mode of Selection of values:

The values are selected just at the point where the train accuracy increases and test accuracy is decreasing. This is shows overfit and I tried to use the values that doesn’t make the model overfit.

DecisionTreeClassifier(min_samples_split=0.1, min_samples_leaf=19, max_depth=5)

max_depth: range(1, 50)

selected: 5

min_samples_leaf: range(1, 50)

selected: 19

min_samples_split: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]

selected: 0.1

RandomForestClassifier( n_estimators=8 , min_samples_leaf = 3, max_depth = 10, random_state = None)

max_depth: range(1, 50)

selected: 10

n_estimators: range(1, 50)

selected: 8

min_samples_leaf: range(1, 50)

selected: 3

max_features: range(1, 20)

selected: None

min_samples_split: np.linspace(0.1,0.9,11)

selected: None (Not necessary for this)

Part C:

Decision Tree:

I used cross-validation on the dataset and found that there is not much variance in the results compared to my original model.

Random Forest:

I used cross-validation on the dataset and found that there is not much variance in the results compared to my original model.

Part D:

KFold is used to generate 5 folds and for each fold the error on the validation is stored along with the model. And I am testing the variance of both decision tree and random forest. And as shown in the last the variance of decision tree is greater than that of random forest.

Part E:

I loaded the serialised data using joblib and I used the same random state to get the same split on the dataset for test and train. And I predicted the results with the test. The results are same.