Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

please answer these question please . thank u in advance. 1.Describe one advanta

ID: 399965 • Letter: P

Question

please answer these question please .

thank u in advance.

1.Describe one advantage and one disadvantage of a stem and leaf plot with respect to a standard histogram.

2.How might you address the problem that a histogram depends on the number and location of the bins?

3. Describe the types of situations that produce sparse or dense data cubes. Illustrate with examples other than those used in the book.

4. Discuss the differences between dimensionality reduction based on aggregation

and dimensionality reduction based on techniques such as PCA and SVD.

Explanation / Answer

Answers:

1)One advantage & disadvantage:

Advantage is it can show actual distribution of values.

Disadvantage is that it becomes unwieldy for large number of values.

2)Problem of bins in histogram:

The best approach is to estimate what the actual distribution function of the data will look like using kernel density estimation. This data analysis is considered to be well developed & appropriate if widely available, but simplistic approach of histogram is not sufficient.

3)Sparse data cubes:

Any set of data for which all combinations of values are unlikely to occur can produce sparse data values. This can include set of continuous attributes where the set of objects described by the attributes doesn’t occupy the entire data space.

A dense data cube can occur when either almost all combinations of categories of underlying attributes occur or level of aggregation is high so that all combinations are likely to have values. For example, consider a data set that contains traffic accident with date & location. The original data cube would be sparse but if it is aggregated to have categories consisting single or multiple car accidents, the state of accident, month of occurrence then we would obtain a dense data cube.

4)Differences:

The dimensionality of PCA or SVD can be viewed as projection of data into reduced set of dimensions. Whereas in aggregation, group of dimensions are combined. In some cases, when days are aggregated into months or sales of product are aggregated by store location, the aggregation can be viewed as change of scale. Whereas dimension provided by PCA or SVD doesn’t have such dimension.