4. (6 marks) Tagging (a) (2 marks) Part of speech tagging is a sequence labellin
ID: 3906189 • Letter: 4
Question
4. (6 marks) Tagging (a) (2 marks) Part of speech tagging is a sequence labelling task. Explain what the key differences are between sequence labelling and classification (b) (2 marks) One can use standard classification in order to perform sequence labelling such as part of speech tagging by defining feature extractors that consider the context of the words. Explain what sort of context could be useful for part of speech tagging. Your explanation does not need to include Python code. (c) (2 marks) Explain the concept of backoff for the task of part of speech tagging and why it could be useful. Your explanation must cover the idea of data sparsenessExplanation / Answer
what is sequence labeling? It's the meta problem that we face all the time in NLP tasks where we want to assign a single label to each element in a sequence. For us, a sequence is usually a sentence and a word is an element. The elements we are trying to assign are usually things like parts of speech, syntactic chunk labels (is this part of a noun phrase, verb phrase, etc.), named entity labels (is this a person?) and so on. Information extraction systems (i.e., extracting meeting times and locations from emails) can also be treated as sequence labeling problems.
There are roughly two varieties of sequence labeling: (1) raw labeling and (2) joint segmentation and labeling. The latter is (IMO) more common. Raw labeling is something like POS tagging where each element gets a single tag. Joint segmentation and labeling is where whole segments get the same label. For instance, in named entity recognition, a sentence like "Yesterday , George Bush gave a speech ." contains example one named entity ("George Bush"). Here, we want to assign the label "PERSON" to the entire phrase "George Bush", not to individual words (for instance, if two named abut, we need to know where they separate).
The easiest way to deal with segmentation problems is to transform them into raw labeling problems. The standard way to do this is the "BIO" encoding, where we label each word by "B-X", "I-X" or "O". Here, "B-X" means "begin a phrase of type X", "I-X" means "continue a phrase of type X" and "O" means "not in a phrase." For instance, the Bush sentence would be labeled as: "Yesterday/O ,/O George/B-PER Bush/I-PER gave/O a/O speech/O ./O" Once can now treat this as a raw labeling problem, perhaps being careful to avoid producing impossible sequences at test time (eg., an "I-X" can only follow a "B-X" or another "I-X"). Other encodings are also possible. So for now, I'll concentrate on raw labeling and revisit the true joint segmentation problem at the end.
In raw labeling, we're faced with the problem of assigning a single label to each word in a sequence. Perhaps the easiest approach is to predict each label independently. Then, we just have a collection of multiclass classification tasks (each label is a different class). We can use whatever classifier we want for this. Despite it's simplicity, this approach is actually quite effective for many problems.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.