Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. A. Given the data set below, apply the k-Nearest Neighbor algorithm to classi

ID: 3844432 • Letter: 1

Question

1. A. Given the data set below, apply the k-Nearest Neighbor algorithm to classify the test data for k=1 and k=3. Use the Euclidean distance metric.

Training Set

#

x1

x2

true label

1

0.453705

-0.0106

1

2

3.258589

0.169734

1

3

3.184656

-0.83691

0

4

-0.42561

1.385033

0

5

0.658765

-1.87715

0

6

-0.40507

-1.9574

0

7

-4.52775

4.123102

1

8

2.538689

-1.5386

1

9

-1.04649

-3.59664

1

10

2.967113

0.505111

0

·        

Testing Set

#

x1

x2

true label

predicted label

11

-4.69237

-4.77898

1

12

-2.1147

-1.81277

0

13

4.277164

-4.83136

1

14

-1.33862

-0.93995

0

15

-4.02728

-4.96129

1

16

4.968125

3.757161

1

17

-2.19987

-3.48712

0

18

2.849136

-3.33965

0

19

-4.30273

2.530094

1

20

4.690116

-0.36379

1

B. Compute the confusion matrix, accuracy, precision, recall, and F1 measures given your answers to problem 1.

·         C. Assume you have the data set given below, which provides hypothetical examples of instances when people did or did not get hired for a job. It consists of three categorical attributes and a label that indicates "hired" or "not hired". Using this data, induce a decision tree using information gain for splitting the nodes, showing the calculations at each step.

Training Set

#

Experience (EXP)

Sufficient Qualifications? (QUAL)

Opinions of References (REFOP)

true label

1

good

Yes

favorable

1

2

excellent

Yes

favorable

1

3

none

No

favorable

0

4

good

No

not favorable

0

5

good

Yes

not favorable

0

6

excellent

Yes

not favorable

0

7

excellent

Yes

favorable

1

8

good

Yes

favorable

1

9

none

Yes

favorable

1

10

none

Yes

not favorable

0

Training Set

#

x1

x2

true label

1

0.453705

-0.0106

1

2

3.258589

0.169734

1

3

3.184656

-0.83691

0

4

-0.42561

1.385033

0

5

0.658765

-1.87715

0

6

-0.40507

-1.9574

0

7

-4.52775

4.123102

1

8

2.538689

-1.5386

1

9

-1.04649

-3.59664

1

10

2.967113

0.505111

0

Explanation / Answer

Solution :-

General type of syntax is as follows:-

label = predict(Mdl,X)

[label,score,cost] = predict(Mdl,X)

Based on above syntax, we will now fill the below predicted label and it is also based on k = 1 and 3

Testing Set # x1 x2 true label predicted label 11 -4.69237 -4.77898 1 1 12 -2.1147 -1.81277 0 1 13 4.277164 -4.83136 1 0 14 -1.33862 -0.93995 0 0 15 -4.02728 -4.96129 1 0 16 4.968125 3.757161 1 0 17 -2.19987 -3.48712 0 1 18 2.849136 -3.33965 0 1 19 -4.30273 2.530094 1 1 20 4.690116 -0.36379 1 0