Finite State Morphology for Ethiopian Semitic Languages

Michael Gasser

wispy clouds in a blue sky

Workshop on Computational Approaches to Semitic Languages
EACL 2009
Athens
31 March, 2009

Overview

Motivation

Ethiopian Semitic: distribution

Ethiopian Semitic: languages

LanguageSpeakers
North Tigrinya6,000,000
Tigre1,000,000
†Geez
South Amharic23,000,000
Silt'e-Ulbareg-Enneqor-Welane800,000
Inor-Enner-Endegegn-Gyeto360,000
Chaha-Ezha-Gumer-Gura270,000
Kistane-Gogot-Galila250,000
Muher90,000
Argobba50,000
Harari25,000
Mesqan25,000
Zway5,000
†Gafat

Morphology of Ethiopian Semitic verbs: stems

Root-template morphology (interdigitation, intercalation)
root-template

Notation

Morphology of Ethiopian Semitic verbs: stems: roots

Morphology of Ethiopian Semitic verbs: stems: templates

Morphology of Ethiopian Semitic verbs: stems: templates:

Morphology of Ethiopian Semitic verbs: affixes

Morphology of Ethiopian Semitic verbs: derivational idiomaticity

Morphology of Ethiopian Semitic verbs: long-distance dependencies and ambiguity

Morphology of Ethiopian Semitic verbs: a maximal example

Ti verb

Representing the structure of a verb

'abzəytɨr_axəbalun ↔ `[ (text(root)=text('rkb')), (text(sbj)=[-text(p1),-text(p2),+text(plr),+text(fem)]), (text(obj)=[-text(p1),-text(p2),-text(plr),-text(fem),+text(prep)]), (+text(neg)), (+text(rel)), (text(prep)=text('ab'-)), (text(suf_conj)=text(-'n')) ]`

Goals

Finite state morphology

Finite state transducers

The components of a complete system

Non-concatenative morphology

Examples to consider

Three types of finite state approaches to Semitic morphology

Finite state approaches to Semitic morphology: simple FSTs

Am CCC naive

Finite state approaches to Semitic morphology: multiple tapes (Kiraz)

Finite state approaches to Semitic morphology: additional memory

Weighted finite state automata (Mohri)

FSTs weighted with feature structures (Amtrup)

Weighted FSTs for ES verb morphology: affix dependencies

Weighted FSTs for ES verb morphology: stem

Ethiopian Semitic verb morphology: architecture

arch guess arch lex

How big are they?

Coverage

Idiomaticity

An application: building a root lexicon

Conclusions