N-DL questions Digital Linguistics
Courses listed in parentheses are not required to cover the strand; they are merely a listing of relevant courses that deal at least in part with the topic.
- Applications of natural language processing: automatic morphological and syntactic analysis. Semantic analysis of sentences. Speech processing, dialogue systems. Text classification, information extraction. Sentiment analysis, named entity recognition, question answering, machine translation (PA153 , IV161, PA156, IV029, PLIN082).
- Machine learning for natural language processing: corpora, language models. Text classification (Naive Bayes, neural network based approaches). Vector representations of words, phrases and documents. Convolutional networks for text processing. Recurrent neural networks for language modeling, sequence processing, transformers, large language models, generative models. Use of application programming interfaces. (PA153 , IV161, PA154, PLIN037, PLIN064)
- Linguistic analysis. Basic stylometric methods. Problems of authorship determination. Discourse analysis, anaphora recognition, pragmatics. (PLIN041, PLIN082, PLIN037, PLIN077)
- Linguistics in theory: Word species - criteria of classification (morphological, syntactic, semantic). Sentence articles - subject, object, adverbial determiner, preposition, complement, attribute (how they can be recognized and what their properties are). The development of linguistic thinking from antiquity to the 19th century. The main linguistic trends in 20th and 21st century linguistics and their theoretical and methodological foundations. Infrastructures, data and metadata. (PLIN065, PLIN082, CJJ60 )
- Lexicography: vocabulary - structuring; developmental tendencies, neologisms. Lexicography - subject of interest; Computer lexicography - dictionary editing systems, dictionary entry tagging; dictionary construction, presentation of macrostructure and microstructure on selected dictionary work. (PLIN035, CJJ14)
- Corpus Linguistics: history of corpus linguistics - early corpus linguistics, Chomsky's critique of corpus linguistics, building the first corpora. The development of corpus linguistics. Types of corpora. Representativeness of linguistic corpora and their balance - critical evaluation. Automated tools for studying grammar built over language corpora - specific applications, use of complex queries in CQL for studying the grammatical system of a language, regular expressions, syntactic annotation of corpora in CQL. Selecting a suitable corpus for solving a linguistic problem - freely available corpora and their characteristics, DIY corpora. (CJBB105 , IB047, PLIN082)
- Statistics: methods of data analysis. Parametric models - parameter estimation, hypothesis testing, ANOVA, independence testing, non-parametric tests. Linear regression models. (MV013 )
- Mathematical induction. Binary relations, closures, transitivity. Equivalence and ordered sets. Composition of relations and functions. Concept of graph, isomorphism, continuity, trees, skeletons. (IB000)