Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I\'m learning about HMMs and their applications and trying to understand their u

ID: 649704 • Letter: I

Question

I'm learning about HMMs and their applications and trying to understand their usages. My knowledge is a bit spotty, so please correct any incorrect assumptions I'm making. The specific example I'm wondering about is for using HMMs for speech detection, which is a common example in literature.

The basic method seems to be to treat the incoming sounds (after processing) as observations, where the actual words being spoken are the hidden states of the process. It seems obvious the hidden variables here are not independent, but I do not understand how they satisfy the Markov property. I would imagine that the probability of the Nth word is not just dependent on the N-1 word, but on many preceding words before that.

Is this simply ignored as a simplifying assumptions because HMMs are very good at correctly modeling speech detection problems, or am I not clearly understanding what the states and hidden variables in the process are? The same problem would appear to apply to a great deal of applications in which HMMs are quite popular, POS tagging, and so forth.

Explanation / Answer

Using Markov models to represent language knowledge was controversial. Linguists knew no natural language could be represented even by context-free grammar, much less by a finite state grammar. Similarly, artificial intelligence experts were more doubtful that a model as simple as a Markov process would be useful for representing the higher-level knowledge sources recommended in the Newell report. However, there is a fundamental difference between assuming that lan- guage itself is a Markov process and modeling language as a probabilistic function of a hidden Markov process. The latter model is an approximation method that does not make an assumption about language, but rather provides a prescription to the designer in choosing what to represent in the hidden process. The definitive property of a Markov process is that, given the current state, probabilities of future events will be independent of any additional information about the past history of the process. This property means if there is any information about the past history of the ob- served process (such as the observed words and sub-word units), then the designer should encode that information with distinct states in the hidden process. It turned out that each of the levels of the Newell hierarchy could be represented as a probabilistic function of a hidden Markov process to a reasonable level of approximation. For today

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote