Course syllabus
Språkteknologi
Language Technology
EDAN20, 7,5 credits, A (Second Cycle)
Valid for: 2012/13
Decided by: Education Board 1
Date of Decision: 2012-03-19
General Information
Elective for: C4, D4, D4-pv
Language of instruction: The course will be given in English
Aim
In the past 15 years, language technology has considerably
matured driven by the massive increase of textual and spoken data
and the need to process them automatically. Although there are few
systems entirely dedicated to language processing, there are now
scores of applications that are to some extent "language-enabled"
and embed language processing techniques such as spelling and
grammar checkers, information retrieval and extraction, or spoken
dialogue systems. This makes the field form a new requirement for
the CS engineers.
The course introduces theories used in language technology. It
attempts to cover the whole field from character encoding and
statistical language models to semantics and conversational agents,
going through syntax and parsing. It focuses on proven techniques
as well as significant industrial or laboratory applications.
Learning outcomes
Knowledge and understanding
For a passing grade the student must
- Understand the field of language technology and major
applications using them
- Know the most important techniques, fundamental algorithms, and
most common architectures used in applications
- Create and implement language processing algorithms. Write,
interpret, evaluate, and improve them during the programming
laboratories.
Competences and skills
For a passing grade the student must
- Understand and develop annotation schemes, create and process
structured documents
- Understand and write regular expressions and use them in
languages like Perl or Java
- Use logic and a logic programming language like Prolog
- Understand and use machine--learning algorithms and statistical
techniques
- Develop and evaluate algorithms in major fields of language
technology: language models, partial parsing, dependency parsing,
and semantic parsing using real data.
Judgement and approach
For a passing grade the student must
- Show curiosity, creativity, and problem solving aptitudes
- Show an understanding of industrial and research issues in
language technology
Contents
- An overview of language technology: disciplines, applications,
and examples
- Corpus and word processing: regular expressions, automata, an
introduction to Perl, concordances, tokenization, counting words,
collocations
- Morphology and part-of-speech tagging: word morphology,
transducers, part-of-speech tagging,
- Phrase-structure grammars: constituents, trees, DCG rules,
unification.
- Partial parsing: multiword detection, noun group and verb group
extraction, information extraction, evaluation
- Syntax: formalisms, constituency and dependency, functions,
parsing, statistical parsing, dependency parsing.
- Semantics: formal semantics, lambda-calculus, lexical
semantics, predicate--argument structures, frame semantics,
semantic parsing.
- Discourse and dialogue: reference and coreference, discourse
and rhetoric, discourse relations, parsing discourse relations,
dialogue automata, speech acts, multimodality.
Examination details
Grading scale: TH
Assessment: Compulsory course items: Assignments and possibly an examination. The coursework assignments are carried out in teams of two or three students, but can also be carried out individually. The first laboratory session will be dedicated to a hands-on approach to the programming tools used in the course. The assignements will then consist of five programming problems. Passing the course with a mark of 3 will consist in passing all the assignments. Optionally, the students will be able to set an examination and improve their mark to 4 or 5.
Admission
Admission requirements:
- EDAA01 Programming - Second Course or EDA027 Algorithms and Data Structures
The number of participants is limited to: No
The course overlaps following course/s: EDA171
Reading list
- Nugues Pierre, An Introduction to Language Processing with Perl and Prolog. An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German. Series: Cognitive Technologies, Springer Verlag, 2006, ISBN: 3-540-25031-X.
Contact and other information
Course coordinator: Professor Pierre Nugues, Pierre.Nugues@cs.lth.se
Course homepage: http://cs.lth.se/edan20