Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Case Study: The problem is to reduce churn ratio by 5% quarterly by analyzing CD

ID: 3224796 • Letter: C

Question

Case Study:

The problem is to reduce churn ratio by 5% quarterly by analyzing CDR, credit report and billing data of telecom operators to mine out churn trends of a specific region or a specific person or and age group.

Questions:

1. Describe 2-3 classical statistical concepts/measures that will help you understand your data

2. Which types of your data would you expect to be normally distributed? Which would you expect to be non-normal?

3. Why do you have to be careful about just looking for correlations? How does this relate to unstable, scalable systems?

4. Identify two statistical tests from slide 11 that are appropriate to your problem. Explain why: ( Slides attached)

5. Identify two basic data visualizations from slide 11 that are appropriate to your problem. Explain why:  

Descriptive Statistical Methods Histograms Provide a visual representation of a measured distribution Census Carefully measuring information about a population T-Test Z-Test F-Test: Formal "test statistics" used in hypothesis testing. Tests of significance on whether distributions are the same or different based on the r means, compared against pure chance. X2-Test Like T-Test, Z-Test, F-Test except evaluating the shape of distributions (goodness of fit) ANOVA: Like a teenager, it's mostly normal. ANOVA is a t-test generalized to more than two groups P-Value The measure of significance of a test statistic If differences are likely not due to chance (low P-values) then differences are statistically significant! Confidence interval Another measure of significance, but providing a range of significance values clusters Measurements sometimes fall into "clusters" or groups, potentially suggesting multiple populations Regression Fitting curves to measured distributions to model the true distribution from which they are derived Slide11

Explanation / Answer

There are 3 variables to be investigated on - CDR, credit report and billing data of telecom operators

1. Describe 2-3 classical statistical concepts/measures that will help you understand your data.

Mean of customer’s monthly bill (from billing data)

Range/Standard deviation of Credit ratings (from credit report) to understand the variability of credit staus of the customers and Customer Financial status information

Average Monthly Recurring Charge

2. Which types of your data would you expect to be normally distributed? Which would you expect to be non-normal?

Credit ratings and customer’s monthly bill expected to be normally distributed. As the data size is estimated to be large, we do not expect any data to be non-normal.

3. Why do you have to be careful about just looking for correlations? How does this relate to unstable, scalable systems?

If the variables are correlated, it will lead to problem of multicollinearity and the estimate of one variable's impact on the dependent variable (churn rate) while controlling for the others tends to be less precise. For the regression model, it just produces large standard errors in the related independent variables and causes imprecise estimates of coefficient values with the resulting out-of-sample predictions will also be imprecise.

4. Identify two statistical tests from slide 11 that are appropriate to your problem. Explain why

T-test, Z-test - Because we expect the data to follow a normal distribution and not skewed and there are no outliers in the data.

5. Identify two basic data visualizations from slide 11 that are appropriate to your problem. Explain why:

Clusters - To organize the customers based on their credit ratings,financial status or billing data, to analyze the churn rate of each cluster.

Histogram - To display the churn rate against the billing data or credit ratings.

6. Select two basic statistical tools and compare. Explain why you will pick one of them for your problem

Regression - To model relationship between churn rate (dependent variable) and credit ratings, billing data and CDR (Independent variables) to predict the churn rate for the population based on the credit ratings, billing data and CDR.

Anova - To test if there is difference in the churn rate for different credit ratings, billing data or CDR.

I would pick regression, as we have already identified the independent variables - credit ratings, billing data and CDR. We just need to build the regression model between the churn rate (dependent variable) and credit ratings, billing data and CDR (Independent variables).

If we have to determine how significant a independent variable is in determining the churn rate, we may use ANOVA.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote