Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. Investigate the following website—Gapminder: http://www.gapminder.org/data/ 2

ID: 3221427 • Letter: 1

Question

1.     Investigate the following website—Gapminder: http://www.gapminder.org/data/

2.     In this website there are around 516 datasets listed, pick a dataset for your project (specifically from this website), and then, download this dataset, by clicking on the Excel icon).

3.     Open this dataset in Excel and define the following:

1)     What does this dataset represent?

2)     How many variables and cases there are in this dataset?

4.     After understanding this dataset define your main goals, such as:

1)     What are you planning to analyze?

2)     What are your hypotheses?

3)     Are there a lot of missing data inside this dataset? Which variables have missing data?

5.     Start observing this dataset, and then, answer the following questions (A Hint: revisit Appendix 1.1 in Textbook before answering these questions):

1)     What type of variables are listed in this dataset and which ones are quantitative variables and which ones are qualitative variables, if any?

2)     Plot time series charts.

6.     Start analyzing this dataset, and then, answer the following questions (A Hint: revisit Appendix 2.1 in Textbook before answering these questions):

1)     Summarize this dataset by using descriptive statistics such as tabular and graphical methods (i.e., Frequency distributions, bar charts, pie charts, Pareto Charts (if applicable), histograms, frequency polygons, ogives, contingency tables, and scatter plots).

7.     Report your research findings by answering the following questions (A Hint: revisit Appendix 3.1 in Textbook before answering these questions):

1)     Describe this dataset by using central tendency.

2)     Compute and interpret the range, variance, and standard deviation.

3)     Use empirical rules to describe variations in this dataset.

4)     Compute and interpret percentiles, quartiles, and box-and-whiskers displays.

5)     Compute and interpret covariance, correlation, and least square lines.

8. Define what is the response variable and what are the factor variables in this dataset.

9. Build a decision tree.

10. Interpret the results (in details).

Explanation / Answer

Answer to question# 1 & #2 )

I have downloaded the file :"age at first marriage(women)"

.

Answer to question# 3)

1. This data set tells us about the average age of first marriage for different countries and the data is observed in different years

the average age is calculated in the year for all the countries

2. the average age of marriage data is present for 186 countries

.

Answer to question# 4)

1) we can plan to analyze if all he 186 countires have same age of marriage

2) the null hypothesis would be: Mean Age of first marriage for all the countires is same

Alternate hypothesis would be : Mean age of first marriage is different for atleast one country

We can make us eof significance level 0.05

the test to be used in "one fator ANOVA"

3) Since the data is collected in a manner, that all the past records for each of the countires have been presented. For a country if the average was calculated in 1970 that is mentioned , for other if it is mentioned of other year then that is included accordingly. Thus in this manner the data collected is scattered over a ong time period in terms of year. All we have is the average of all the information