Syllabus academic year 2009/2010
(Created 2009-08-11.)
LANGUAGE PROCESSING AND COMPUTATIONAL LINGUISTICS | EDA171 |
Higher education credits: 7,5.
Grading scale: UG.
Level: A
(Second level).
Language of instruction: The course might be given in English.
Optional for: C4, D4, E4, F4, Pi4.
Course coordinator: Pierre Nugues, Pierre.Nugues@cs.lth.se, Inst f datavetenskap.
Prerequisites: EDAA01 Programming - Second Course or EDA027 Algorithms and Data Structures.
The number of participants is limited to 58
Selection criteria: 6 places (maximum) will be allotted for exchange students. The rest of the places (52) will be allotted to students from the Faculty of Engineering and the Faculty of Science in proportion to the number of applicants. Selection Criteria for the Faculty of Engineering students: 1. Credits awarded or credited within the study programme. 2. Credits awarded within courses at the Department of Computer Science.
Assessment: Compulsory course items: Assignments, dissertation and a project. The coursework assignments are carried out in teams of two or three students, but can also be carried out individually. The assignements consist of six programming problems and a project. The project result is presented at the end of the course. If it is unsatisfactory, the students will have the possibility to improve their results and present them a second time within a period of one month.
Further information: Limited number of participants.
Home page: http://www.cs.lth.se/EDA171.
Aim
In the past 15 years, language technologies have considerably matured driven by the massive increase of textual and spoken data and the need to process them automatically. Although there are few systems entirely dedicated to language processing, there are now scores of applications that are to some extent "language-enabled" and embed language processing techniques such as spelling and grammar checkers, information retrieval and extraction, or spoken dialogue systems. This makes the field form a new requirement for the CS engineers.
The course introduces theories used in natural language processing and computational linguistics. It attempts to cover the whole field from character encoding and statistical language models to semantics and conversational agents, going through syntax and parsing. It focuses on proven techniques as well as significant industrial or laboratory applications.
Knowledge and understanding
For a passing grade the student must
- Understand the field of language technologies and major applications using them
- Know the most important techniques, fundamental algorithms, and most common architectures used in applications
- Create and implement language processing algorithms. Write, interpret, evaluate, and improve them.
- Design, develop, evaluate, and describe a complete language processing prototype
Skills and abilities
For a passing grade the student must
- Understand and develop annotation schemes, create and process structured documents using XML
- Understand and write regular expressions and use them in languages like Perl or Java
- Use logic and a logic programming language like Prolog or description logic
- Understand and use machine learning algorithms and statistical techniques
- Develop and evaluate algorithms in the major fields of language technologies, language models, partial parsing, dependency parsing, semantic networks, and dialogue, using real data
- Develop a complete language processing prototype in a team project involving two or three students (possibly one)
- Search and review literature and research articles in the field of language technologies
- Evaluate and formulate an opinion on systems and current research
- Design and implement the prototype
- Make a spoken presentation of the results and write a description of it in a paper
Judgement and approach
For a passing grade the student must
- Show curiosity, creativity, and problem solving aptitudes
- Show an understanding of industrial and research issues in language technologies
Contents
- An overview of language processing: presentation of language processing, applications, disciplines of linguistic, examples
- Corpus and word processing: regular expressions, automata, an introduction to Perl, concordances, tokenization, counting words, collocations
- Morphology and part-of-speech tagging: morphology, transducers, part-of-speech tagging,
- Prolog to write phrase-structure grammars: constituents, trees, DCG rules, variables, getting the syntactic structure, compositional analysis to build a semantic structure.
- Syntax: formalisms, constituency and dependency, functions, parsing, statistical parsing, dependency parsing.
- Semantics: formal semantics, lambda-calculus, compositionality: nouns, verbs, determiners, words and meaning, lexical semantics, case grammars, semantic grammars
- Discourse and dialogue: discourse and rhetoric, anaphora, structure, RST, dialogue: automata, pairs, speech acts, multimodality.
- Overview of speech synthesis and speech recognition
Literature
Nugues Pierre, An Introduction to Language Processing with Perl and Prolog. An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German. Series: Cognitive Technologies, Springer Verlag, 2006, ISBN: 3-540-25031-X.