Analyze datasets using tools like weka. Summary Using two public domain datasets
ID: 3847344 • Letter: A
Question
Analyze datasets using tools like weka.
Summary
Using two public domain datasets, we use four or more classifiers(any classifiers are ok) to compare the performance of each classifier and do all the analysis for the dataset. The purpose of this task is not merely to present the results of the program, but to determine the nature of the dataset. Each student should show the maximum amount of data analysis.
Datasets
Select two datasets from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/)
# Attributes and # instances are not too few data
Data that contains at least one multi-variate
Tools
Various tools available
Weka. (Data format must be changed)
Other data analysis, machine learning tools
Or, your own data analysis code
Analysis and evaluation
A 'comparative analysis' of the results using at least four different classifiers
By using the analysis results, you should try to analyze the dataset itself.
The final evaluation should use cross-validation.
An overfitting perspective should be used in the evaluation analysis.
Use the result of zeroR, oneR as the baseline.
Report
Experiment summary of one page
Description of the data
Why we chose datasets
Experimental design and method. Progress Details specifically.
The results of the comparative analysis (in-depth 'comparison' analysis of four or more classifier experiments for two data sets)
conclusion
Explanation / Answer
ZeroR Classifier
=== Run information ===
Scheme: weka.classifiers.rules.ZeroR
Relation: iris.arff-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR-weka.filters.supervised.attribute.MergeNominalValues-L0.05-Rfirst-last-weka.filters.supervised.instance.ClassBalancer-num-intervals10-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR
Instances: 149
Attributes: 5
5.1
3.5
1.4
0.2
Iris-setosa
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
ZeroR predicts class value: Iris-versicolor
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 48.7544 32.7211 %
Incorrectly Classified Instances 100.2456 67.2789 %
Kappa statistic -0.0092
Mean absolute error 0.4445
Root mean squared error 0.4714
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 149
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.082 0.100 0.290 0.082 0.127 -0.030 0.491 0.327 Iris-setosa
0.900 0.909 0.331 0.900 0.484 -0.015 0.495 0.331 Iris-versicolor
0.000 0.000 0.000 0.000 0.000 0.000 0.495 0.331 Iris-virginica
Weighted Avg. 0.327 0.336 0.207 0.327 0.204 -0.015 0.494 0.330
=== Confusion Matrix ===
a b c <-- classified as
4.05 45.61 0 | a = Iris-setosa
4.97 44.7 0 | b = Iris-versicolor
4.97 44.7 0 | c = Iris-virginica
Decision Table Classifier
=== Run information ===
Scheme: weka.classifiers.rules.DecisionTable -X 1 -S "weka.attributeSelection.BestFirst -D 1 -N 5"
Relation: iris.arff-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR-weka.filters.supervised.attribute.MergeNominalValues-L0.05-Rfirst-last-weka.filters.supervised.instance.ClassBalancer-num-intervals10-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR
Instances: 149
Attributes: 5
5.1
3.5
1.4
0.2
Iris-setosa
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
Decision Table:
Number of training instances: 149
Number of Rules : 3
Non matches covered by Majority class.
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 12
Merit of best subset found: 96
Evaluation (for feature selection): CV (leave one out)
Feature set: 4,5
Time taken to build model: 0.02 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 137.08 92 %
Incorrectly Classified Instances 11.92 8 %
Kappa statistic 0.88
Mean absolute error 0.0972
Root mean squared error 0.2153
Relative absolute error 21.8803 %
Root relative squared error 45.6775 %
Total Number of Instances 149
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Iris-setosa
0.900 0.070 0.865 0.900 0.882 0.822 0.947 0.816 Iris-versicolor
0.860 0.050 0.896 0.860 0.878 0.819 0.947 0.921 Iris-virginica
Weighted Avg. 0.920 0.040 0.920 0.920 0.920 0.880 0.965 0.912
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 45 5 | b = Iris-versicolor
0 7 43 | c = Iris-virginica
JRip Classifier
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1
Relation: iris.arff-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR-weka.filters.supervised.attribute.MergeNominalValues-L0.05-Rfirst-last-weka.filters.supervised.instance.ClassBalancer-num-intervals10-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR
Instances: 149
Attributes: 5
5.1
3.5
1.4
0.2
Iris-setosa
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:
===========
(1.4 <= 1.9) => Iris-setosa=Iris-setosa (49.66666666666662/0.0)
(0.2 >= 1.8) => Iris-setosa=Iris-virginica (45.69333333333331/0.9933333333333333)
(1.4 >= 5) => Iris-setosa=Iris-virginica (5.96/1.9866666666666666)
=> Iris-setosa=Iris-versicolor (47.67999999999997/0.9933333333333333)
Number of Rules : 4
Time taken to build model: 0.01 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 137.08 92 %
Incorrectly Classified Instances 11.92 8 %
Kappa statistic 0.88
Mean absolute error 0.0704
Root mean squared error 0.2173
Relative absolute error 15.8436 %
Root relative squared error 46.0854 %
Total Number of Instances 149
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.020 0.962 1.000 0.980 0.971 0.999 0.997 Iris-setosa
0.880 0.050 0.898 0.880 0.889 0.834 0.964 0.896 Iris-versicolor
0.880 0.050 0.898 0.880 0.889 0.834 0.957 0.924 Iris-virginica
Weighted Avg. 0.920 0.040 0.919 0.920 0.919 0.880 0.974 0.939
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
1 44 5 | b = Iris-versicolor
1 5 44 | c = Iris-virginica
OneR Classifier
=== Run information ===
Scheme: weka.classifiers.rules.OneR -B 6
Relation: iris.arff-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR-weka.filters.supervised.attribute.MergeNominalValues-L0.05-Rfirst-last-weka.filters.supervised.instance.ClassBalancer-num-intervals10-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR
Instances: 149
Attributes: 5
5.1
3.5
1.4
0.2
Iris-setosa
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
0.2:
< 0.8 -> Iris-setosa
< 1.75 -> Iris-versicolor
>= 1.75 -> Iris-virginica
(143/149 instances correct)
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 137.08 92 %
Incorrectly Classified Instances 11.92 8 %
Kappa statistic 0.88
Mean absolute error 0.0533
Root mean squared error 0.2309
Relative absolute error 11.9995 %
Root relative squared error 48.9875 %
Total Number of Instances 149
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Iris-setosa
0.900 0.070 0.865 0.900 0.882 0.822 0.915 0.812 Iris-versicolor
0.860 0.050 0.896 0.860 0.878 0.819 0.905 0.817 Iris-virginica
Weighted Avg. 0.920 0.040 0.920 0.920 0.920 0.880 0.940 0.876
=== Confusion Matrix ===
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 45 5 | b = Iris-versicolor
0 7 43 | c = Iris-virginica
attribute evaluator
=== Run information ===
Evaluator: weka.attributeSelection.CfsSubsetEval -P 1 -E 1
Search: weka.attributeSelection.BestFirst -D 1 -N 5
Relation: iris.arff-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR-weka.filters.supervised.attribute.MergeNominalValues-L0.05-Rfirst-last-weka.filters.supervised.instance.ClassBalancer-num-intervals10-weka.filters.supervised.attribute.AddClassification-Wweka.classifiers.rules.ZeroR
Instances: 149
Attributes: 5
5.1
3.5
1.4
0.2
Iris-setosa
Evaluation mode: evaluate on all training data
=== Attribute Selection on all input data ===
Search Method:
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 12
Merit of best subset found: 0.887
Attribute Subset Evaluator (supervised, Class (nominal): 5 Iris-setosa):
CFS Subset Evaluator
Including locally predictive attributes
Selected attributes: 3,4 : 2
1.4
0.2
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.