PV030 -- Textual Information Systems (spring 2011)
News |
Lectures |
References |
Tests |
- 16. 5. 2011
LZ77 (Dvorak),
LZ78 (Dvorak),
PPM (Dvorak)
Dictionary Implementation.
Syntactical methods of compression.
Models of TIS: boolean, vector and probability.
Document similarity.
Automatic structuring of texts.
Signature methods.
Compression with neuron networks.
slides
Query and answers session.
- 9. 5. 2011
Entropy, coding theory.
Universal encoding of natural numbers (cont.).
Introduction to compression.
Statistical methods of compression.
Shannon-Fano, Hufmann and
arithmetic coding.
Compression dictionary methods.
Adaptive dictionary compression methods.
FGK (Dvorak)
Methods with dictionary restructuralisation.
slides
Exercises in B411.
- 2. 5. 2011
Google brainstorming.
Homework: Entropy, redundancy.
Coding theory: basic notions.
Universal encoding of natural numbers: selfstudy from.
slides
Brainstorming on Anatomy of Google: Google paper on
WWW7 conference
Exercises in B116:
Touchgraph
- 25. 4. 2011
Easter, no lectures. Readings/links to try:
Anatomy of Google: Google paper on
WWW7 conference,
Jeff Dean's video lecture,
About Google in Czech,
Google File System,
Google executive,
PageRank Calculator
- 18. 4. 2011
Basics of corpus linguistics as an example of textual information system.
Indexing with natural languages processing and its implementation.
slides
Exercises will take place (again) in B116.
- 11. 4. 2011
Introduction to Information
Retrieval (indexing and query evaluation - cont.):
Part
2 (Term vocabulary).
Exercises will take place (again) in B116.
- 4. 4. 2011 Midterm exam
Introduction to Information
Retrieval (indexing and query evaluation).
Slides (Manning)
Part
1 Boolean retrieval.
Midterm exam (approx. 1 hour, from 12 a.m., B411).
Exercises (2 to 3p.m.) in B116:
Sketch Engine,
BNC at MU
- 28. 3. 2011
Search methods from right to left (cont., Buczilowski).
Twoway automata with jumps:
generalization of exact search algorithms.
Hierarchy of search engines. Proximity search.
Search classification: sixdimensional space of search
problems.
slides
Index methods: preliminaries. Implementation of indexes.
Automatic indexing, konstrukce tezauru.
slides
Exercises are in B116 today!
Midterm exam will take place next week, prepare questions!
- 21. 3. 2011 Regular expressions, search of infinitely many
patterns. Search methods from right to left (variants of Boyer Moore,
Commentz-Walter, Buczilowski).
slides
animations of algorithms Boyer-Moore, KMP (Buehler)
taxonomy
of search automaton constructions
reformulation of CW algo by L. Riedel
(in Czech)
Exercises in B116 _next_ week!
- 14. 3. 2011 neither lecture, neither seminar, but homework:
Let we have patterns P= {tis, ti, iti}
1) Create NFA for searching P without epsilon transitions.
2) Create DFA equivalent to the NFA from 1)
3) Minimize DFA created in 2)
4) Compare the search by 3) with AC
5) You may experiment with finite automata and JFLAP
- 9. 3. 2011 Lecture about Watson at PV173 seminar, 10:45AM,
NLPlab (B204)
- 7. 3. 2011 Exact search of several patterns (AC),
regular expressions, exact search of infinite many patterns.
slides
animation
of
Aho-Corasick algorithm, and
implementation
in C#.
- 28. 2. 2011 (B411) Exact search of one pattern
(Shift-Or, Karp-Rabin, MP, KMP) and more patterns (AC).
slides
Animations:
String
matching algorithms (with animations, Lecroq),
Interactive Pattern Matching Animation (Goodrich),
animation
of algoritm KMP (Buehler)
Exercises in B411.
- 21. 2. 2011 (B411) Introduction, basic notions,
classification of search problems.
slides
Watson,
paper
about Watson
Žákovi, který se hrozil chyb, Mistr řekl: "Ti, kdo nedělají chyby,
chybují nejvíc ze všech - nepokoušejí se o nic nového." Anthony de
Mello: O cestě.

sojka at fi dot muni dot cz --