From your everyday life, select an interface that you would argue uses a good re
ID: 3842817 • Letter: F
Question
From your everyday life, select an interface that you would argue uses a good representation of its underlying content. Describe the connections between the representation and the underlying content; in what ways does the representation exemplify at least two criteria of a good representation? 150 minimum Then, select an interface that you would argue does not use a good representation of its underlying content. Describe the mismatch between the representation and the underlying content; in what ways does the representation violate at least three criteria of a good representation? 150 minimum
Explanation / Answer
Learning data representation make it easier to extract information for building classifies and other predictors. The probabilistic model of a good representation captures the posterior distribution of the underlying explanatory factors for the observed input. A good representation is useful input to a supervised predictor.
The rapid increase in scientific activity on representation learning has been accompanied and nourished by a remarkable string of empirical successes both in academia and in industry.
The criteria of a good representation are as follows:
Smoothness: assumes the function to be learned f is s.t. x y generally implies f(x) f(y).
Multiple explanatory factors: the data generating distribution is generated by different underlying factors, and for the most part what one learns about one factor generalizes in many configurations of the other factors.
A hierarchical organization of explanatory factors: the concepts that are useful for describing the world around us can be defined in terms of other concepts, in a hierarchy, with more abstract concepts higher in the hierarchy, defined in terms of less abstract ones.
Semi-supervised learning: with inputs X and target Y to predict, a subset of the factors explaining X’s distribution explain much of Y , given X. Hence representations that are useful for P(X) tend to be useful when learning P(Y |X), allowing sharing of statistical strength between the unsupervised and supervised learning tasks.
Shared factors across tasks: with many Y ’s of interest or many learning tasks in general, tasks are explained by factors that are shared with other tasks, allowing sharing of statistical strengths across tasks.
Manifolds: probability mass concentrates near regions that have a much smaller dimensionality than the original space where the data lives. This is explicitly exploited in some of the auto-encoder algorithms and other manifold-inspired.
Natural clustering: different values of categorical variables such as object classes are associated with separate manifolds.
Temporal and spatial coherence: consecutive or spatially nearby observations tend to be associated with the same value of relevant categorical concepts, or result in a small move on the surface of the high-density manifold.
Sparsity: for any given observation x, only a small fraction of the possible factors are relevant. In terms of representation, this could be represented by features that are often zero (as initially proposed by Olshausen and Field (1996)), or by the fact that most of the extracted features are insensitive to small variations of x.
Simplicity of Factor Dependencies: in good high-level representations, the factors are related to each other through simple, typically linear dependencies. This can be seen in many laws of physics, and is assumed when plugging a linear predictor on top of a learned representation.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.