Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Social Statistics This week, you have been introduced to the idea of inferential

ID: 3127102 • Letter: S

Question

Social Statistics

This week, you have been introduced to the idea of inferential statistics. Of key importance to this lesson is the relationship between the population and the sample and their connection through the sampling distribution. For this problem set (this is the only one that you do not need a data file to complete), you will need to do the following:

1-Define and describe what inferential statistics are, including a discussion of why estimation is the heart of inferential statistics.

2-Define and discuss what the population and samples are; as part of this discussion tell me about parameters and statistics and where these come from.

3-Discuss the role of the sampling distribution of sample means/proportions. How does the sampling distribution connect the sample to the population and why is this important?

4-Define what the central limit theorem is, what assumptions it allows us to make, and why these assumptions are important.

Explanation / Answer

Q1) Define and describe what inferential statistics are, including a discussion of why estimation is the heart of inferential statistics.

A1) Statistical inference is the process of deducing properties of an underlying distribution by analysis of data. Inferential statistical analysis infers properties about apopulation: this includes testing hypotheses and deriving estimates. The population is assumed to be larger than the observed data set; in other words, the observed data is assumed to be sampled from a larger population.

Inferential Statistics:

There are two main methods used in inferential statistics: estimation and hypothesis testing. In estimation, the sample is used to estimate a parameter and a confidence interval about the estimate is constructed. It is important to realize the order here. The sample statistic is calculated from the sample data and the population parameter is inferred (or estimated) from this sample statistic. Let me say that again: Statistics are calculated, parameters are estimated.

Q2) Define and discuss what the population and samples are; as part of this discussion tell me about parameters and statistics and where these come from.

A2) For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students. Any group of data like this, which includes all the data you are interested in, is called a population. A population can be small or large, as long as it includes all the data you are interested in. For example, if you were only interested in the exam marks of 100 students, the 100 students would represent your population. Descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population (i.e., everybody you are interested in).

Often, however, you do not have access to the whole population you are interested in investigating, but only a limited number of data instead. For example, you might be interested in the exam marks of all students in the UK. It is not feasible to measure all exam marks of all students in the whole of the UK so you have to measure a smaller sample of students (e.g., 100 students), which are used to represent the larger population of all UK students. Properties of samples, such as the mean or standard deviation, are not called parameters, but statistics. Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling.

The sample statistic is calculated from the sample data and the population parameter is inferred (or estimated) from this sample statistic. Let me say that again: Statistics are calculated, parameters are estimated.

There are two types of estimates we will find: Point Estimates and Interval Estimates. The point estimate is the single best value.

A good estimator must satisfy three conditions:

Q3) Discuss the role of the sampling distribution of sample means/proportions. How does the sampling distribution connect the sample to the population and why is this important?

A3)

Properties of statistics

Statistic have different properties as estimators of a population parameters. The sampling distribution of a statistic provides a window into some of the important properties. For example if the expected value of a statistic is equal to the expected value of the corresponding population parameter, the statistic is said to be unbiased. In the example above, all three statistics would be unbiased estimators of the population parameter.

Consistency is another valuable property to have in the estimation of a population parameter, as the statistic with the smallest standard error is preferred as an estimator estimator A statistic used to estimate a model parameter.of the corresponding population parameter, everything else being equal. Statisticians have proven that the standard error of the mean is smaller than the standard error of the median. Because of this property, the mean is generally preferred over the median as an estimator.

Selection of distribution type to model scores

The sampling distribution provides the theoretical foundation to select a distribution for many useful measures. For example, the central limit theorem describes why a measure, such as intelligence, that may be considered a summation of a number of independent quantities would necessarily be distributed as a normal (Gaussian) curve.

Hypothesis testing

The sampling distribution is integral to the hypothesis testing procedure. The sampling distribution is used in hypothesis testing to create a model of what the world would look like given the null hypothesis was true and a statistic was collected an infinite number of times. A single sample is taken, the sample statistic is calculated, and then it is compared to the model created by the sampling distribution of that statistic when the null hypothesis is true. If the sample statistic is unlikely given the model, then the model is rejected and a model with real effects is more likely. In the example process described earlier, if the sample {3, 1, 4} was taken from the population described above, the sample mean (2.67), median (3), or mid-mean (2.5) can be found and compared to the corresponding sampling distribution of that statistic. The probability of finding a sample statistic of that size or smaller could be found for each e.g. mean (p< .033), median (p<.18), and mid-mean (p<.025) and compared to the selected value of alpha (). If alpha was set to .05, then the selected sample would be unlikely given the mean and mid-mean, but not the median.

Q4) Define what the central limit theorem is, what assumptions it allows us to make, and why these assumptions are important.

A4) Central Limit Theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independentrandom variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution. To illustrate what this means, suppose that a sample is obtained containing a large number of observations, each observation being randomly generated in a way that does not depend on the values of the other observations, and that the arithmetic average of the observed values is computed. If this procedure is performed many times, the central limit theorem says that the computed values of the average will be distributed according to the normal distribution (commonly known as a "bell curve"). A simple example of this is that if one flips a coin many times, the probability of getting a given number of heads should follow a normal curve, with mean equal to half the total number of flips.

The Central Limit Theorem assumes the following:

Randomization Condition: The data must be sampled randomly. Is one of the good sampling methodologies discussed in the chapter “Sampling and Data” being used?

Independence Assumption: The sample values must be independent of each other. This means that the occurrence of one event has no influence on the next event. Usually, if we know that people or items were selected randomly we can assume that the independence assumption is met.

10% Condition: When the sample is drawn without replacement (usually the case), the sample size, n, should be no more than 10% of the population.

Sample Size Assumption: The sample size must be sufficiently large. Although the Central Limit Theorem tells us that we can use a Normal model to think about the behavior of sample means when the sample size is large enough, it does not tell us how large that should be. If the population is very skewed, you will need a pretty large sample size to use the CLT, however if the population is unimodal and symmetric, even small samples are acceptable. So think about your sample size in terms of what you know about the population and decide whether the sample is large enough. In general a sample size of 30 is considered sufficient if the sample is unimodal (and meets the 10% condition).

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote