1) Specify what cross-validation is used for and with the help of an example exp
ID: 3853227 • Letter: 1
Question
1) Specify what cross-validation is used for and with the help of an example explain how it works.
2) For many machine learning problems, multiple valid hypotheses are possible. Specify the factors that generally determine which hypothesis is best out of the set of valid hypotheses.
3) Describe three methods that can be used to prevent decision trees from over-fitting the training data
4)Explain the concept of ensemble learning, specify the two key properties that the learners need to have in order for ensemble learning to be effective and expplain why the two key properties are necessary.
5) Incremental learning uses three types of memory models to store knowledge and information observed in the training data. Describe two memory models along with their advantages and disadvantages and explain which is best suited for real world applications.
Explanation / Answer
1)Cross Validation
Cross validation is nothing but a model of evaluation method. It is uswed for gettingbetter results than residuals. The main drawback of residual is that it doesn't give lerning level indication whic is measurment of how muuch the learner lerning. With the basic idea of whole class of evalution methods comes with new model called as cross validation.
Types:
The holdout method is one of the cross validation methods.In this model the data is separated into two sets.Those are training set and the testing set.But the function approximator uses only the training set only to fit the functions.After that the function approximator is takes the help of accumlator to predict the output values for the data in the testing set .Generally this model is preferable to the residual method becouse it doesn't take longer to compute.In this model the evaluation depend on data points end up in the training set, thus the evaluation significantly different.
K-fold cross validation is the advances model base on he holdout method. In this model the data set is divided into k subsets, thus the holdout method is done with k times. In this process one set is used as the test set and the remaining k-1 subsets are used to form a training set. The main advantage of this method is that we can get better results with the help of k divisions.And also every data point is in a test set exactly once, and gets in a training set k-1 times.But the main drawback is that algorithm needs to run for k times
Leave-one-out cross validation is with K equal to N. That means algorithm runs in N separate times. Hence the evaluation given by leave-one-out cross validation error is better result, but we need to pass it at first time and costs more. LOO-XVE takes no longer time than residual error.
2)Types that determine best hypothesis:
Several factors affect the hypothisis model.They are:
Sample Size
The larger sample size causes the higher the power. That means the sample size should be under the control of experimenter. Increasing sample size gives best results and is also way to increase power.But it sometimes expensive to build large sample size.
Standard Deviation
The power is resiprocal to standard deviation that means power is small when SD is large.
One- versus Two-Tailed Tests:
With these tests the Power is higher.Some measurements like that one-tailed test at the 0.05 level has the same power as a two-tailed test at 0.10 level.
3)Overfitting is noting but a problem for decision tree models. It happens when the algorithm continues to develop hypotheses at the cost of an increased test set error. To avoid this overfitting problem there are several approaches.
Pre-pruning is used to stop growing tree earlier. the main advanntage is that it initially classifies the training set.
Post-pruning is reverse to the pre-pruning, it allows the tree to classify the training set perfectly, after that performs prune the tree.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.