B651: Readings

'*' means that the paper, or some portion of it, is appropriate for a class project.

If links through the IU Library to journals don't work, try this link to the generic online journals page.

Speech

[PS] Dutoit, A. (1997). High-quality text-to-speech synthesis: an overview. Journal of Electrical and Electronics Engineering, Australia 17, 25-37.

Morphology

[PDF] Clark, A. (2002). Memory-based learning of morphology with stochastic transducers. Annual Meeting of the Association for Computational Linguistics, 40, 513-520.

[PDF] Cohen-Sygal, Y., Wintner, S. (2006). Finite-state registered automata for non-concatenative morphology. Computational Linguistics, 32, 49-82.

[PDF] *Goldsmith, J. (2001). Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27, 153-198.

[PDF] Mohri, M. (2005). Statistical natural language processing. In M. Lothaire (Ed.), Applied combinatorics on words. Cambridge University Press.

[PDF] Mohri, M., Pereira, F. C. N., and Riley, M. (1996). Weighted automata in text and speech processing. Biennial European Conference on Artificial Intelligence, 12.

[PDF] Amtrup, J. W. (2003). Morphology in machine translation systems: Efficient integration of finite state transducers and feature structure descriptions. Machine translation, 18, 213-235.

Words in context

[PDF] *McCarthy, D., Koeling, R., Weeds, J. and Carroll, J. (2007). Unsupervised acquisition of predominant word senses. Computational Linguistics, 33, 553-590.

[PDF] *Padó, S. and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33, 161-199.

[PDF] Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24, 97-123.

[PDF] Stevenson, M. and Wilks, Y. (2001). The interaction of knowledge sources in word sense disambiguation. Computational Linguistics, 27, 321-349.

[PDF] *Tsang, V. and Stevenson, S. (2010). A graph-theoretic framework for semantic distance. Computational Linguistics, 36, 31-69.

[PDF] *Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32, 379-416.

[PDF] *Yuret, D. and Yatbaz, M. A. (2010). The noisy channel model for unsupervised word sense disambiguation. Computational Linguistics, 32, 379-416.

Syntax and semantics

[PDF] *Erk, K., Padó, S., and Padó, U. (2010). A flexible, corpus-driven model of regular and inverse selectional preferences. Computational Linguistics, 36, 723-763.

[PDF] Gamallo, P., Agustini, A., Lopes, G. P. (2005). Clustering semantic positions with similar semantic requirements. Computational Linguistics, 31, 107-146.

[PDF] *Gildea, D. and Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28, 245-288.

[PDF] *Lapata, M. and Brew, C. (2004). Verb class disambiguation using informative priors. Computational Linguistics, 30, 45-73.

[PDF] Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S. and Marsi, E. (2007). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95-135.

[PDF] Schuler, W., AbdelRahman, S., Miller, T., and Schwartz, L. (2010). Oflazer, K. (2003). Broad-coverage parsing using human-like memory constraints . Computational Linguistics, 36, 1-30.

[PDF] *Zhang, Y. and Clark, S. (2011). Syntactic processing using the generalized perceptron and beam search. Computational Linguistics, 37, 105-151.

Generation

[PDF] *Clarke, J. and Lapata, M. (2010). Discourse constraints for document compression. Computational Linguistics, 36, 411-441.

[PDF] Dale, R. and Viethen, J. (2009). Referring expression generation through attribute-based heuristics. Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

[PDF] Reiter, E., Turner, R., Alm, N., Black, R., Dempster, M., and Waller, A. (2009). Using NLG to help language-impaired users tell stories and participate in social dialogues. Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

[PDF] *Hallett, C., Scott, D., and Power, R. (2007). Composing questions through conceptual authoring. Computational Linguistics, 33, 105-133.

[PDF] *Kazantseva, A. and Szpankovicz, S. (2010). Summarizing short stories. Computational Linguistics, 36, 71-109.

[PDF] Madnani, N. and Dorr, B. J. (2010). Generating phrasal and sentential paraphrases: a survey of data-driven methods. Computational Linguistics, 36, 341-387.

[PDF] Teufel, S. and Moens, M. (2002). Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics, 28, 409-445.

Machine translation

[PDF] Abdul-Rauf, S. and Schwenk, H. (2009). On the use of comparable corpora to improve SMT performance. EACL 12, 16-23.

[PDF] Carpuat, M. and Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. EMNLP-CoNLL 2007.

[PDF] Cheung, P. and Fung, P. (2005). Translation disambiguation in mixed language queries. Machine Translation, 18, 251-273.

[PDF] Chiang, D. (2007). Hierarchical phrase-based translation. Computational Linguistics, 33, 201-228.

[PDF] Etzioni, O., Reiter, K., Soderland, S., and Sammer, M. (2007). Lexical translation with application to image search on the Web. MT Summit XI.

[PDF] Forcada, M. L., Ginest’-Rosell, M., Nordfalk, J., OÕRegan, J., Ortiz-Rojas, S., PŽrez-Ortiz, J. A., S‡nchez-Mart’nez, F., Ram’rez-S‡nchez, G., and Tyers, F. M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25, 127-144.

[PDF] Gamallo, P. (2007). Learning bilingual lexicons from comparable English and Spanish corpora. Machine Translation Summit XI.

[RTF] Knight, K. (1999). A statistical MT tutorial workbook. Summer 1999 Workshop on Statistical Machine Translation. Center for Language and Speech Processing, Johns Hopkins University.

[PDF] Mariño, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A. R., and Costa-jussá, M. R. (2006). N-gram based machine translation. Computational Linguistics, 32, 527-549.

[PDF] Mitkov, R., Pekar, V., Blagoev, D., Mulloni, A. (2007). Methods for extracting and classifying pairs of cognates and false friends. Machine Translation, 21, 29-53.

[PDF] Munteanu, D. S. and Marcu, D. (2005). Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics, 31, 477-504.

[PDF] Nießen, S. and Ney, H. (2004). Statistical machine translation with scarce resources using morpho-syntactic information. Computational Linguistics, 30, 181-204.

[PDF] Och, F. J. and Ney, H. (2004). The alignment template approach to statistical machine translation. Computational Linguistics, 30, 417-449.

[PDF] Pekar, V., Mitkov, R., Blagoev, D., Mulloni, A. (2006). Finding translations for low-frequency words in comparable corpora. Machine Translation, 20, 247-266

[PDF] Sammer, M. and Soderland, S. (2007). Building a sense-distinguished multilingual lexicon from monolingual corpora and bilingual lexicons. MT Summit XI.

[Word] Somers, H. (2003). An overview of EBMT. In M. Carl and A. Way (Eds.), Recent advances in example-based machine translation. Dordrecht: Kluwer, pp. 3-57.

[PDF] Utsuro, T., Hino, K., Kida, M., Nakagawa, S. and Sato, S. (2004). Integrating cross-lingually relevant news articles and monolingual web documents in bilingual lexicon acquisition. COLING 2004.

Anaphora

[PDF] Markert, K. and Nissim, M. (2006). Comparing knowledge sources for nominal anaphora resolution. Computational Linguistics, 32, 367-402.

[PDF] Soon, W. M., Ng, H. T. and Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27, 521-544.

Sentiment analysis

[PDF] Liu, B. (2010). Sentiment analysis and subjectivity. In N. Indurkhya and F. J. Damereau (Eds.), Handbook of Natural Language Processing (2nd ed.), Boca Raton, FL, USA: CRC PRess.

[PDF] *Thomas, M., Pang, B., and Lee, L. (2006). Get out the vote: determining support or opposition from Congressional floor-debate transcripts. Proceedings of EMNLP 2006, 30, 327-335.

[PDF] *Wiebe, J., Wilson, T., Bruce, R., Bell, M. and Martin, M. (2004). Learning subjective language. Computational Linguistics, 30, 277-308.

Dialog

[PDF] Elsner, M. and Charniak, E. (2010). Disentangling chat. Computational Linguistics, 36, 389-409.

[PDF] Levin, E., Pieraccini, R., and Eckert, W. (2000). A stochastic model of human-machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing, 8, 11-23.

[PDF] Passonneau, Rebecca J., Epstein, Susan, Ligorio, Tiziana, Gordon, Joshua, and Pravin Bhutada. (2010). Learning about voice search for spoken dialogue systems. 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles.

IU | INFO | CSCI