In this question, we will Titanic dataset from the Kaggle competition, Titanic:
ID: 3745153 • Letter: I
Question
In this question, we will Titanic dataset from the Kaggle competition, Titanic: Machine Learning from Disaster? The dataset includes information about passenger characteristics as well as whether they survived from the disaster.
Import the Titanic data using the following R code:
• (a) Calculate P (Survived) and P (Survived|P lcass = 1) using R. The value 1 of the “Survived” variable means survived, 0 means not survived (1 mark).
• (b) Calculate the entropy (log2()) of H(Embarked) and H(Pclass). Which entropy is higher? Why? Do not use an entropy function. (1 mark)
2
•
•
(c) In this competition, you must predict the fate of the passengers aboard the Titanic. Caroline used two methods to predict survival of passengers. She saved the prediction results as vari- able “Survived_guess1” and “Survived_guess2”. Calculate H (Survived_guess1|P class) andH(Survived_guess2|Pclass), which entropy is higher? (1 mark)
(d) Can you guess which algorithm that Caroline used to obtain the prediction Survived_guess2. Hint: she used two variables Pclass and Embarked 2 in prediction. (optional with 1 bonus marks)
Explanation / Answer
Look it's a machine learning question and many approches can give you answer. First you should understand the data and analyse it. Then clean the data and find relation between different features of data and the apply suitable training models and check their correctness.
Still need help then you check solution on kaggle it is avialable there.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.