Basic tasks
Analysis
-
Task: convert input in the form of a sentence or an entire discourse into an internal representation that facilitates further processing
-
What sort of output representation
-
Verbatim input
-
Bag of words
-
Bag of lexical representations
-
Morphoyntactic+lexical representation (tree, DAG/feature structure, dependency graph)
-
Non-linguistic semantic/pragmatic representation (predicate calculus, Discourse Representation Theory)
-
How is it done?
-
Parsers
-
Grammars and automata: finite-state automata (regular expressions), context-free grammars, dependency grammars,
-
Co-occurrence statistics
-
Neural network (simple recurrent network, self-organizing map, holographic reduced representations)
-
What's hard
-
Ambiguity
-
Multiple knowledge sources, including statistical and (apparently) linguistic/symboliic
Generation
-
Task: convert a non-linguistic representation into a sentence or an entire discourse,
or summarize a text
-
How is it done?
-
Incremental or pre-specified
-
Grammar-driven (top-down) or lexically-driven (bottom-up)
-
What's hard
-
Planning on the basis of a pragmatic goal and what the hearer/reader knows
-
Sometimes no obvious correspondence (or one-to-many correspondence) between and input concept and linguistic constructs.
Translation
-
Task: convert a sentence or an entire discourse in a source language into a corresponding sentence (or discourse) in a targe language
-
How is it done?
-
Symbolic translation with varying depths of analysis
-
Statistical translation: from a sequence of words in a source sentence, generate the sequence of words in the target language that is most likely, based on estimates of the probability of the source sentence given the target sentence, and the probability of the target sentence
-
Word-based and phrase-based approaches
-
What's hard
-
Ambiguity
-
Lexical/grammatical gaps