Random forests is an ensemble learning strategy for training a team of decision
ID: 3679642 • Letter: R
Question
Random forests is an ensemble learning strategy for training a team of decision tree learners. This is a supervised learning technique that incorporates bagging and the random sub-space method to create a diverse ensemble (or team) of independent models. As we saw, these models work together to solve a given prediction problem. Why are decision trees a good choice for bagging? Describe why bagging is an ineffective approach to acquaint with K-nearest Neighbours? That is, why would a bagged K-nearest neighbours provide little benefit opposed to an unbagged K-nearest neighbours? How should you decide the number of sub-model that you should use in a bagged model? What benefit does incorporating the random sub-space method have, in random forests? That is, what gains does random forests have over bagged decision trees? In your opinion, explain whether bagging will ever degrade performance as opposed to not bagging.Explanation / Answer
(a)The bagging method creates a sequence of classifiers Hm, m=1,…,M in respect to modifications of the training set. These classifiers are combined into a compound classifier.
(b)
k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classificationand regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:
k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.
Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.
The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.
A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning technique.
(c)In general, the more trees you use the better get the results. However, the improvement decreases as the number of trees increases, i.e. at a certain point the benefit in prediction performance from learning more trees will be lower than the cost in computation time for learning these additional trees.
Random forests are ensemble methods, and you average over many trees. Similarly, if you want to estimate an average of a real-valued random variable (e.g. the average heigth of a citizen in your country) you can take a sample. The expected variance will decrease as the square root of the sample size, and at a certain point the cost of collecting a larger sample will be higher than the benefit in accuracy obtained from such larger sample.
In your case you observe that in a single experiment on a single test set a forest of 10 trees performs better than a forest of 500 trees. This may be due to statistical variance. If this would happen systematically, I would hypothesize that there is something wrong with the implementation.
Typical values for the number of trees is 10, 30 or 100. I think in only very few practical cases more than 300 trees outweights the cost of learning them (well, except maybe if you have a really huge dataset).
(d)There are a number of dimensions you can look at to give you a sense of what will be a reasonable algorithm to start with, namely:
random forest = learning ensemble consisting of a bagging of un-pruned decision tree learners with a randomized selection of features at each split.
Decision trees … one of most popular learning methods commonly used for data exploration One type of decision tree is called CART…classification and regression tree (Breiman 1983) CART … greedy, top-down binary, recursive partitioning that divides feature space into sets of disjoint rectangular regions. Regions should be pure wrt response variable. Simple model is fit in each region – majority vote for classification, constant value for regression.
Random forest: first randomization through Random forest: first randomization through bagging bagging Bagging = bootstrap aggregation . Bootstrap sample = create new training sets by random sampling the given one NN times with replacement.
(e)Bagging was applied to classi cation trees using the following data sets: waveform (simulated) heart breast cancer (Wisconsin) ionosphere diabetes glass soybean All of these except the heart data are in the UCI repository.
Testing was done using random divisions of each data set into a learning and test set, constructing the usual tree classi er using the learning set, and bagging this tree using 50 bootstrap replicates. This was repeated 100 times for each data set.The average test set missclassi cation rate using a single tree is denoted by eS and the bagging rate by eB.
working of bagging:Let each (y; x) case in L be independently drawn from the probability distribution P . Suppose y is numerical and '(x; L) the predictor. Then the aggregated predictor is 'A(x; P ) = EL'(x; L): Take Y , X to be random variables having the distribution P and independent of L. The average prediction error e in '(x; L) is e = ELEY ;X(Y '(X; L))2 : De ne the error in the aggregated predictor 'A to be eA = EY ;X(Y 'A(X; P ))2 : Using the inequality (EZ) 2 EZ2 gives e = EY 2 2EY 'A + EY ;XEL'2(X; L) E(Y 'A) 2 = eA
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.