Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In Unit 3 we learned about discrete probability distributions: Hypergeometric, B

ID: 3041700 • Letter: I

Question

In Unit 3 we learned about discrete probability distributions: Hypergeometric, Binomial, and Poisson. In Unit 4, we added the Normal distribution to our list as the first of many continuous distributions. In this unit, we are adding those other continuous probability distributions. We are seeing that the Exponential, Gamma, Weibull, Lognormal, and Beta distributions are appropriate to certain types of engineering problems. Although our readings have tended to focus the most attention on the Normal distributions, we need to recognize that to solve a particular engineering challenge, any of these distributions might be needed. We need to be able to determine which, if any, of these distributions fits any situation in which we might be doing analysis.

We use a probability plot as a tool to determine if a set of data we are analyzing can reasonably be described by one of our probability distributions. I so, then the standard probabilities associated with the distribution can be used to make predictions about the process or system represented by our data. If not, we have to do some extra math to determine our own probabilities by fitting the data we have to an algebraic function (something we'll do when we get to linear and nonlinear regression) and then integrating that function over our range of interest to determine probabilities. The math isn't that difficult once we know the function, but it is certainly faster and easier to do our work if we can quickly show that one of the distributions we already understand fits our data well enough to use it.

Discuss how a probability plot works, and why we can draw conclusions based on the level of fit we see. If the resulting "fit" isn't perfect (which it very rarely is), what factors do you need to consider in making a decision about whether to use a particular distribution to solve your challenge? Describe how you would go about determining the best distribution for a set of data (if there actually is one).

Explanation / Answer

Most of the statitical studies done till today assume that the available data is actually a small part of a bigger and larger dataset, with a defined distribution. So, to understand how the data is distributed, a probability plot is employed. According to the defination, probability plot (Chambers et al., 1983) is a graphical technique for assessing whether or not a data set follows a given distribution. Its is plotted against a given distribution and the variation in the probability plot can disclose whether the actual distribution matches the assumed distribution.

How does it work -

A straight line in the probability plot shows that the data is distributed is a similar way as assumed. In other words, the data fits the assumed distribution. The correlation coefficient associated with the linear fit to the data in the probability plot is a measure of the goodness of the fit with the assumed distribution. The amount of deviation in the plot curve shows how differently the given data is distriburted with respect to the assumed distribution. The degree of nonnormality can be seen by the amount of curvature in the plot.

Factors and points to consider when the probability plot is not a straght line --

Several data related issues create anamolies in the probability plot diagram. In case of outliers, if the number of outliers are very small, then it can be considered as default corrupted data point. Although, if there are many outliers, the statistical analysis may fail. Also, sometimes the plot tails are a visual proof that the data may not be distributed as assumed.

For the curve to be a bell shaped or concave shaped, data skewness might be a reason, as the data may be weighted towards one side. Using a power transformation, this problem can be fixed and the resulting probability plot can come out as expected.

The best and most easy and accurate visual way iof determing the distribution of data for me is through frequency distribution curve. I will plot a frequency distribution plot and check out the shape of the plot. Whichever distribution resembles the frequency plot of the data will be assumed as the true distribution.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote