When finding a maximum likelihood value, we often use the log-likelihood instead
ID: 3176040 • Letter: W
Question
When finding a maximum likelihood value, we often use the log-likelihood instead of the likelihood itself. Why? The probability density for any given Y value, say y_1, would be given f(y_1; theta). If my y's are all independent, the joint probability of observing all of these will be: Product_i = 1^N f(y_t; theta) Consider two things: (1), if the f(y_t; theta) values tend to be smaller than 1, what would happen if I multiply all N of them together? (2), what happens when I take the natural logarithm of a product? How will that help to fix the problem in (1)?Explanation / Answer
The logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques. Finding the maximum of a function often involves taking the derivative of a function and solving for the parameter being maximized, and this is often easier when the function being maximized is a log-likelihood rather than the original likelihood function.
For example, some likelihood functions are for the parameters that explain a collection of statistically independent observations. In such a situation, the likelihood function factors into a product of individual likelihood functions. The logarithm of this product is a sum of individual logarithms, and the derivative of a sum of terms is often easier to compute than the derivative of a product. In addition, several common distributions have likelihood functions that contain products of factors involving exponentiation. The logarithm of such a function is a sum of products, again easier to differentiate than the original function.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.