Zpět

Anotovaná literatura k PHD studiu

User needs

Armitage, Enser: Analysis of User Need in Image Archives. Journal of Information Science, 1997. Placené.

Existing image search systems

IBM QBIC: Ashley, Flickner, Hafner, Lee, Niblack, Petkovic: The Query By Image Content (QBIC) System. SIGMOD 1995
SIMPLIcity: Wang, Li, Wiederhold: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, 2001.
Web-WISE: Wei, Li, Sethi: Web-WISE: compressed image retrieval over the web . Multimedia Information Analysis and Retrieval, 1998.
Cortina: Quack, Monich, Thiele, Manjunath: Cortina: A System for Large-scale, Content-based Web Image Retrieval. ACM Multimedia 2004.
MUFIN: Batko, Falchi, Lucchese, Novak, Perego, Rabitti, Sedmidubsky, Zezula: Building a web-scale image similarity search system. Multimedia Tools and Applications 2010.
www.MMRetrieval.net: Zagoris, Arampatzis, Chatzichristofis: www.MMRetrieval.net: A Multimodal Search Engine. SISAP 2010.

Descriptors

Bannour, Hlaoua, Ayeb: Survey of the Adequate Descriptor for Content-Based Image Retrieval on the Web: Global versus Local Features. 2009. (plný text článku)
Eidenberger: Statistical analysis of content-based MPEG-7 descriptors for image retrieval. 2004. (plný text článku)
Deselaers, Keysers, Ney: Features for image retrieval: an experimental comparison. Information Retrieval, 2008.

Global

Chatzichristofis, Boutalis: CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. ICVS'08

Local

Kampel, Zaharieva: Recognizing Ancient Coins Based on Local Features. ISVC 2008.
Wu, Ke, Isard, Sun: Bundling Features for Large Scale Partial-Duplicate Web Image Search. 2009.

Modalities, similarity measures

Jegou, Schmid, Harzallah, Verbeek: Accurate image search using the contextual dissimilarity measure. Pattern Analysis and Machine Intelligence, 2010.

Indexing

Brisaboa, Pedreira, Seco, Solar, Uribe: Clustering-based similarity search in metric spaces with sparse spatial centers. SOFSEM 2008.
Nang, Park, Yang, Kim: A Hierarchical Bitmap Indexing Method for Similarity Search in High-Dimensional Multimedia Databases 2010. (plný text článku)

Hashing, LSH

LSH: Theoretical papers

Indyk, Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. Proceedings of the Symposium on Theory of Computing, 1998.
Charikar: Similarity Estimation Techniques from Rounding Algorithms. STOC'02.
Panigrahy: Entropy based Nearest Neighbor Search in High Dimensions. 2005.

LSH: Theory + experiments

Andoni, Indyk: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. COMMUNICATIONS OF THE ACM January 2008/Vol. 51, No. 1.
Gionis, Indyk, Motwani: Similarity Search In High Dimensions via Hashing. VLDB'99.
Lv, Josephson, Wang, Charikar, Li: Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. VLDB'07.
Dong, Wang, Josephson, Charikar, Li: Modeling LSH for Performance Tuning. CIKM'08.

"LPT" (Locality Preserving Transformations) - LSH principles for general metric spaces

Novak, Batko, Zezula: Web-Scale System for Image Similarity Search: When The Dreams Are Coming True. 2008.
Kyselak, Novak, Zezula: On Locality-sensitive Indexing in Generic Metric Spaces. SISAP 2010.

Query expansion

Natsev, Haubold, Tesic, Xie, Yan: Semantic concept-based query expansion and re-ranking for multimedia retrieval. ACM Multimedia 2007
Pinto, Pérez-Sanjulián: Automatic query expansion and word sense disambiguation with long and short queries using WordNet under vector model. 2008. (plný text článku)
Gong, Cheang,Hou U: Multi-term Web Query Expansion Using WordNet. 2006. (plný text článku)
Gong, Cheang,Hou U: Web Query Expansion by WordNet. 2005. (plný text článku)
Deng, Dong, Socher, Li, Li, Fei-Fei: ImageNet: A Large-Scale Hierarchical Image Database. 2009. (plný text článku)
Harris, Srinivasan: Comparing Crowd-Based, Game-Based, and Machine-Based Approaches in Initial Query and Query Refinement Tasks. 495-506

Ranking, combining metric spaces

Information fusion

Kludas, Bruno, Marchand-Maillet: Information Fusion in Multimedia Information Retrieval. 2008
Zhou, Depeursinge, Muller: Information Fusion for Combining Visual and Textual Image Retrieval. Pattern Recognition 2010
Moulin, Largeron, Gery: Impact of Visual Information on Text and Content Based Image Retrieval. 2010
Clinchant, Ah-Pine, Csurka: Semantic Combination of Textual and Visual Information in Multimedia Retrieval. ICMR 2011
Park, Nang: Content Based Web Image Retrieval System Using Both MPEG-7 Visual Descriptors and Textual Information. MMM 2007
He, Xiong, Yang, Park: Using Multi-Modal Semantic Association Rules to fuse keywords and visual features automatically for Web image retrieval. Information Fusion 2011
Depeursinge, Müller: Fusion Techniques for Combining Textual and Visual Information Retrieval. Chapter 6 of ImageCLEF Experimental Evaluation in Visual Information Retrieval, Springer 2010
Arampatzis, Zagoris, Chatzichristofis: Fusion vs. Two-Stage for Multimodal Retrieval. ECIR 2011
Kokar, Tomasik, Weyman: Formalizing classes of information fusion systems. Information Fusion 2004

Multimedia Information Retrieval - keynote by Stephane Marchand-Maillet, 2012

Combining metric spaces

Fagin: Combining Fuzzy Information: an Overview. SIGMOD 2002
Fagin: Combining Fuzzy Information from Multiple Systems. 1999
Amato, Falchi, Gennaro, Rabitti, Savino, Stanchev: Improving Image Similarity Search Effectiveness in a Multimedia Content Management System. MIS 2004
Skala: Measuring the Diffculty of Distance-Based Indexing. SPIRE 2005.
Chavez, Navarro: Measuring the dimensionality of general metric spaces. Technical Report TR/DCC-00-1, 2000.
Skala: Counting Distance Permutations. SISAP 2008
Olivares, Ciaramita, Zwol. Boosting Image Retrieval through Aggregating Search Results based on Visual Annotations. ACM Multimedia 2008.

Some kind of dynamic aggregation

Tung, Zhang, Koudas, Ooi: Similarity Search: A Matching Based Approach. VLDB 06
Bustos, Skopal: Dynamic Similarity Search in MultiMetric Spaces. MIR 2006
Bustos, Keim, Saupe, Schreck, Vranic: Using Entropy Impurity for Improved 3D Object Similarity Search. ICME 2004
Bustos, Keim, Saupe, Schreck, Vranic: Automatic Selection and Combination of Descriptors for Effective 3D Similarity Search. ISMSE 04

Query results clustering

Ji, Yao, Liu, Wang, Xu. A Novel Retrieval Refinement and Interaction Pattern by Exploring Result Correlations for Image Retrieval. 2008
Chen, Wang, Krovetz. CLUE: Cluster-Based Retrieval of Images by Unsupervised Learning. IEEE Transactions on Image Processing, vol. 14, no. 8, August 2005
Jia, Wang, Zhang, Hua. Finding Image Exemplars Using Fast Sparse Affinity Propagation. Proceeding of the 16th ACM international conference on Multimedia, 2008.
Li, Dai, Xu, Er. Multilabel Neighborhood Propagation for Region-Based Image Retrieval. IEEE Transactions on Image Processing, 2007.

Query results reordering/ranking

Arampatzis, Zagoris, Chatzichristofos. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases. ECIR 2011
Ambai, Yoshida. Multiclass VisualRank: Image Ranking Method in Clustered Subsets Based on Visual Features. SIGIR 09
Jing, Baluja. VisualRank: Applying PageRank to Large-Scale Image Search. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, November 2008
Jing, Baluja. PageRank for Product Image Search. WWW 2008, Beijing, China.
Yiu, Mamoulis: Multi-dimensional top-k dominating queries. VLDB Journal 2009.
Park, Baek, Lee: A Ranking Algorithm Using Dynamic Clustering for Content-Based Image Retrieval, 2002.
Park, Baek, Lee: Majority Based Ranking Approach in Web Image Retrieval, CIVR 2003.
Joyce M. dos Santos, Joao M. B. Cavalcanti, Patricia Correia Saraiva, Edleno Silva de Moura: Multimodal Re-ranking of Product Image Search Results. 62-73
Thollard, Quénot: Content-Based Re-ranking of Text-Based Image Search Results. ECIR 2013, 618-629

Other

Falchi, Lucchese, Perego, Rabitti: CoPhIR: COntent-based Photo Image Retrieval. 2008.

Relevance feedback

Understanding relevance

Moshfeghi, Pinto, Pollick, Jose: Understanding Relevance: An fMRI Study. ECIR 2013.

Basic approaches in general

Drucker, Shahrary, Gibbon: Support vector machines: relevance feedback and information retrieval. 2001.

Relevance feedback in image retrieval

Ishikawa, Subramanya, Faloutsos: MindReader: Querying databases through multiple examples. VLDB 98.
Rui, Huang, Ortega, Mehrotra: Relevance Feedback: A Power Tool in Interactive Content-Based Image Retrieval. IEEE Trans. Circuits and Systems for Video Technology, 8(5):644–655, 1998.
Zhou, Huang: Relevance feedback in image retrieval: A comprehensive review. 2002.
Shen, Jiang, Tan, Huang, Zhou: Speed up interactive image retrieval. 2008.
Huiskes, Lew: Performance Evaluation of Relevance Feedback Methods. CIVR’08.
Wu, Faloutsos, Sycara, Payne: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. VLDB 2000.
Razente, Barioni, Traina, Traina: Aggregate Similarity Queries in Relevance Feedback Methods for Content-based Image Retrieval. SAC’08.
Hua, Yu, Liu: Query Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieva. ICDE’06
Vu, Cheng, Hua: Image Retrieval in Multipoint Queries. 2008.

Image annotation

Surveys

Hanbury: A survey of methods for image annotation. Journal of Visual Languages & Computing, 2008.
Tousch, Herbin, Audibert: Semantic hierarchies for image annotation: A survey. Pattern Recognition, 2012.
Zhang, Islam, Lu: A review on automatic image annotation techniques. Pattern Recognition 2012
Wang: A survey on automatic image annotation and trends of the new age. Procedia Engineering 2011

Model-based approaches

Search-based approaches

Makadia, Pavlovic, Kumar: A New Baseline for Image Annotation. 10th European Conference on Computer Vision, 2008

Text-based approaches

Eickhoff, Li, de Vries: Exploiting User Comments for Audio-Visual Content Indexing and Retrieval. ECIR 2013
Noel, Peterson: Context-Driven Image Annotation Using ImageNet. 26th International Florida Artificial Intelligence Research Society Conference, 2013
Du, Rau, Huang, Chen: Improving the quality of tags using state transition on progressive image search and recommendation system. SMC 2012
Sun, Bhowmick, Chong: Social image tag recommendation by concept matching. ACM Multimedia 2011
Lux, Pitman, Marques: Can Global Visual Features Improve Tag Recommendation for Image Annotation? Future Internet 2010

Annotation refinement techniques

Belém, Martins, Almeida, Gonçalves: Exploiting Novelty and Diversity in Tag Recommendation. ECIR 2013
McParlane, Jose: Exploiting Time in Automatic Image Tagging. ECIR 2013, 520-531
Kern, R.; Granitzer, M.; Pammer, V. Extending Folksonomies for Image Tagging. In Proceedings of 9th International Workshop on Image Analysis for Multimedia Interactive Services, Klagenfurt, Austria, 7–9 May 2008.
Graham, R.; Caverlee, J. Exploring Feedback Models in Interactive Tagging. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Sydney, Australia, 9–12 December 2008; IEEE Computer Society: Washington, DC, USA, 2008; pp. 141–147.

Domain specific solutions

Manual annotation, ground truth acquisition

Ames, Naaman: Why we tag: motivations for annotation in mobile and online media. In: SIGCHI 2007 (2007)

Resources

Xie, He: Picture tags and world knowledge: learning tag relations from visual semantic sources. ACM Multimedia 2013:967-976
Deng, Dong, Socher, Li, Li, Fei-Fei: ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009

Other

Deselaers, Ferrari: Visual and semantic similarity in ImageNet. Computer Vision and Pattern Recognition (CVPR), 2011.
Heesch, Yavlinsky, Rüger: NNk Networks and Automated Annotation for Browsing Large Image Collections from the World Wide Web. Multimedia 2006.
Noah, Ali, Alhadi, Kassim: Going Beyond the Surrounding Text to Semantically Annotate and Search Digital Images. ACIIDS 2010.
Nowak, Huiskes: New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010. ImageCLEF 2011.
Barnard, Duygulu, Forsyth, Freitas, Blei, Jordan: Matching Words and Pictures. 2003. (plný text článku)
Wang, Zhang, Jing, Ma: Annosearch: Image auto-annotation by search. CVPR 2006. (plný text článku)
Enser, Sandom, Lewis: Automatic Annotation of Images from the Practitioner Perspective. CIVR 2005: 497-506 (plný text článku)
Sinha, Jain: Semantics in Digital Photos: A Contenxtual Analysis. In Proceedings of 'ICSC '08: The 2008 IEEE International Conference on Semantic Computing, Santa Clara, CA, USA, 4–7 August 2008; IEEE Computer Society: Washington, DC, USA, pp. 58–65.

Summarization of image collections

Rudinac, Larson, Hanjalic: Learning Crowdsourced User Preferences for Visual Summarization of Image Collections. IEEE Transactions on Multimedia, 2013
Metze, Ding, Younessian, Hauptmann: Beyond audio and video retrieval: topic-oriented multimedia summarization. International Journal of Multimedia Information Retrieval, 2013
Tan, Song, Liu, Xie: ImageHive: Interactive Content-Aware Image Summarization. IEEE Computer Graphics and Applications, 2012
Zheng, Herranz, Jiang: Flexible navigation in smartphones and tablets using scalable storyboards. ICMR 2013
Liu, Wang, Sun, Zheng, Tang, Shum: Picture Collage. IEEE Transactions on Multimedia, 2009
Ekhtiyar, Sheida, Amintoosi: Picture Collage with Genetic Algorithm and Stereo vision. CoRR, 2012
Zhang, Huang: Hierarchical Narrative Collage For Digital Photo Album. Computer Graphics Foru, 2012
Nenkova, McKeown: A Survey of Text Summarization Techniques. Mining Text Data, 2012

Ontologies

Laparra, Rigau, Vossen: Mapping WordNet to the Kyoto ontology. LREC 2012
Suchanek, Kasneci, Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. Journal of Web Semantics, 2008
d’Aquina, Noy: Where to publish and find ontologies? A survey of ontology libraries. Web Semantics: Science, Services and Agents on the World Wide Web, 2012

Evaluation

Building ground truth

Yao, Yang, Zhu: Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks. 2007. (plný text článku)
Muller, Geissbuhler, Marchand–Maillet, Clough. Benchmarking Image Retrieval Applications. 2004. (plný text článku)
Alonso, Mizzaro: Can we get rid of trec assessors? Using mechanical turk for relevance assessment. In: SIGIR 2009 Workshop on the Future of IR Evaluation (2009)

Performance metrics

Järvelin, Kekäläinen: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002

Fagin, Kumar, Sivakumar: Comparing Top k Lists. 2003
Manning, Raghavan, Schutze: Introduction to Information Retrieval. 2008.
Nowak, Lukashevich, Dunker, Rueger: Performance Measures for Multilabel Evaluation. MIR 2010

Query Language

Amato, Mainetto, Savino: A Query Language for Similarity-based Retrievla of Multimedia Data. Advances in Databases and Information Systems 1997.
Adali, Bonatti, Sapino, Subrahmanian: A multi-similarity algebra. SIGMOD 98
Town, Sinclair: Ontological Query Language for Content Based Image Retrieval. ANSS’03.
Döller, Kosch, Wolf, Gruhne: Towards an MPEG-7 Query Language. SITIS 2006
Mamou, Mass, Shmueli-Scheuer, Sznajder: A Query Language for Multimedia Content. Sigir’07.
Gruhne, Tous, Delgado, Doeller, Kosch: MP7QF: An MPEG-7 Query Format. 2007.
Guliato, de Melo, Rangayyan, Soares: POSTGRESQL-IE: An Image-handling Extension for PostgreSQL. Journal of Digital Imaging, Vol 22, No 2, 2009
Pein, Lu, Renz: An Extensible Query Language for Content Based Image Retrieval based on Lucene. 2008
Döller, Tous, Gruhne, Yoon, Sano, Burnett: The MPEG Query Format: Unifying Access to Multimedia Retrieval Systems. IEEE MultiMedia, 2008.

Barioni, Razente, Traina, Traina: Seamlessly integrating similarity queries in SQL. Software - Practice and Experience, 2009

Silva, Aref, Larson, Pearson, Ali: Similarity queries: their conceptual evaluation, transformations, and processing. VLDB Journal, June 2013

Other interesting papers and ideas

Griffiths, Steyvers, Firl: Google and the Mind: Predicting Fluency With PageRank. Psychological Science, 2007
de Melo, Hose: Searching the Web of Data. ECIR 2013, tutorial outline
Radlinski, Hofmann: Practical Online Retrieval Evaluation. ECIR 2013, tutorial outline. slides

Datta, Li, Wang: Content-Based Image Retrieval - Approaches and Trends of the New Age. MIR’05

plný text článku, bibtex

Pěkný úvod do CBIR (co je obtížnější ve strovnání s text retrieval), že je to důležité zkoumat (dokumentováno nárustem publikací v této oblasti za poslední roky; 120 užitečných citací!). Obsahuje stručnou diskuzi o používaných deskriptorech, shrnutí základních přístupů k CBIR (semantics-sensitive approach, use of hierarchical perceptual grouping of primitive image features and their inter-relationships to characterize structure; Clustering has been applied to image retrieval to help improve interface design, visualization, and result pre-processing). Annotation and Concept Detection - velmi přínosné, pokud se podaří anotaci udělat, to je ale velmi obtížné, zatím se to zvládá jen pro hrubé zařazení do kategorií; pomoci může učení z předchozích výsledků. Velmi dobré zmapování základních přístupů Relevance feedback. Diskuze o HW urychlení a vhodných uživatelských rozhraních.

The trends in these graphs indicate a roughly exponential growth in interest in image retrieval and closely related topics. We also note that growth in the field has been particularly strong over the last five years, spanning new techniques, new support systems, and diverse application domains. Yet, a brief scanning of about 300 relevant papers published in the last five years revealed that less than 20% were concerned with applications or real-world systems. This may not be a cause for concern, since the theoretical foundation (if any such exists) behind how we humans interpret images is still an open problem. But then, with hundreds of different approaches proposed so far, and no consensus reached on any, it is rather optimistic to believe that we will chance upon a reliable one in the near future. Instead, it may make more sense to build systems that are useful, even if their use is limited to specific domains. A way to see this is that natural language interpretation is an unsolved problem, yet text-based search engines have proved very useful.
Relevance feedback (RF) is a query modification technique, originating in information retrieval, that attempts to capture the user’s precise needs through iterative feedback and query refinement.... One problem with RF is that after every round of user interaction, usually the top results with respect to the query have to be recomputed using a modified similarity measure. A way to speed up this nearest-neighbor search has been proposed in [109]. Another issue is the user’s patience in supporting multi-round feedbacks. A way to reduce the user’s interaction is to incorporate logged feedback history into the current query [41]. History of usage can also help in capturing the relationship between high level semantics and low level features [36]. We can also view RF as an active learning process, where the learner chooses an appropriate subset for feedback from the user in each round based on her previous rounds of feedback, instead of choosing a random subset. Active learning using SVMs was introduced into the field of image retrieval in [95]. Extensions to the active learning process have also been proposed [31, 37]. ... With the increase in popularity of region-based image retrieval [12, 104], attempts have been made to incorporate the region factor into RF using query point movement and support vector machines [49, 50]. A tree-structured self-organizing map has been used as an underlying technique for RF [58] in a content-based image retrieval system [57]. Probabilistic approaches have been taken in [21, 91, 100]. A clustering based approach to RF incorporating the user’s perception in case of complex queries has been studied in [53]. In [39], manifold learning on the user’s feedback based on geometric intuitions about the underlying feature space is proposed. While most RF algorithms proposed deal with a two-class problem, i.e., relevant or irrelevant images, another way of looking at RF is to consider multiple relevant and irrelevant groups of images using an appropriate user interface [40, 79, 118]. For example, if the user is looking for cars, then she can highlight groups of blue cars and red cars as relevant examples, since it may not be possible to represent the concept car uniformly in any low level feature space. Yet another deviation from norm is the use of multilevel relevance scores to incorporate the relative degrees of relevance of certain images to the user’s query [108].

Datta, Joshi, Li, Wang: Image retrieval: Ideas, influences, and trends of the new age. ACM Computation Surveys, 2008.

plný text článku, bibtex

Rozšířená verze předchozího. V úvodu stručné shrnutí survey Smeulders00.

Kherfi, Ziou, Bernardi: Image Retrieval From the World Wide Web: Issues, Techniques, and Systems. ACM computing surveys, 2004.

plný text článku, bibtex

Proč potřeba web image search. Applications - různé kategorie, hezky rozebrané. Jaké služby by lidé od web search engine očekávali: query-based retrieval, browsing, summary of a set of images. Dále popis několika existujících image search systémů, které většinou kombinují text a visual search. Dobrý přehled zdrojů a funkčnosti (charakteristika, které deskriptory, jaké dotazy podporuje).
V další sekci se řeší návrh Web image search engine, a to konkrétně tyto části: data gathering, the identification and the estimation of image descriptors, similarity and matching, indexing, query specification, retrieval and refinement, Web coverage, and performance evaluation. Ohledně text search pěkný přehled toho, které kusy stránky který systém využívá. Odkaz na článek, který řeší, jak poznat relevantní slova ve stránce. Dále dobrý přehled možností query formulation. Doporučení používat text i image. V další části zadefinován evaluation test-bed a metriky porovnávání systémů včetně netradičních jako ease of use, numer of iterations. Nakonec různé open issues - např. pochopení potřeb uživatele, jsou tam nějaké citace, i když starších článků.

It is very useful for a Web image search system to offer an image taxonomy based on subjects such as art and culture, sports, world regions, technology etc. ... WebSeek offers an online catalog.
page zero problem: finding a good query image to initiate the retrieval. To avoid falling into this problem, the system must provide the user with a set of candidate images that are representative of the entire visual content of the Web, for example, by choosing an image from each category. If none of these images is appropriate, the user should be allowed to choose to view more candidate images. Another solution to the page zero problem, used in the PicToSeek system, involves asking the user to provide an initial image; however, it is not always easy for users to find such an image. ImageRover and WebSeek ask the user to introduce a textual query at the beginning, after which both textual and visual refinement are possible. In Kherfi et al. [2003b], we showed that the use of a positive example together with a negative example can also help to mitigate the page zero problem.

Alemu, Koh, Ikram, Kim: Image Retrieval in Multimedia Databases: A Survey. Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing

plný text článku, bibtex

Přehled přístupů - text search, CBIR, ontology-based image retrieval. Nejde to moc do hloubky, ale pro rychlý přehled dobré, dá se citovat pro úvod do text-based a content-based image search.

Mueller, Deselaers: Image Retrieval. Tutorial TrebleCLEF Summer School 2009.

web s prezentacemi tutorialu

Hare, Lewis, Enser, Sandom: Mind the Gap: Another look at the problem of the semantic gap in image retrieval. 2006

plný text článku, bibtex

Uvažuje sémantické dotazy "Najdi obrázek, kde je vlk ve sněhu na silnici v Atlantě z roku 2000". Semantic gap rozdělen na několik částí: obrázek => reprezentace => objekty => klíčová slova => sémantika. Popsány existující přístupy k automatické anotaci klíčovými slovy - to označují jako přístup zdola. Přístup shora je přes ontologie.

Guan, Antani, Long, Thoma: Bridging the semantic gap using ranking svm for image retrieval. 2009

plný text článku, bibtex

Pro kolekci rentgenových obrázků páteře aplikují učící algoritmus založený na SVM, inovací je, že se nehledají kategorie, ale optimální ranking. Stručný, jasný a výstižný článek, pro specializované kolekce je to asi dobré řešení.

Liu, Zhang, Lu, Ma: A survey of content-based image retrieval with high-level semantics. 2006.

plný text článku, bibtex

Survey přístupů k řešení Semantic gap. Identifikují pět hlavních přístupů: (1) using object ontology to define high-level concepts; (2) using machine learning methods to associate low-level features with query concepts; (3) using relevance feedback to learn users’ intention; (4) generating semantic template to support high-level image retrieval; (5) fusing the evidences from HTML text and the visual content of images for WWW image retrieval.
V úvodní části velmi srozumitelně popsány metody segmentace a extrakce deskriptorů.
Object ontology: nasegmentované úseky se popíší pomocí vlastností deskriptorů, třeba světle modrý, jednolitý, umístěný nahoře. Na další úrovni se těmto vlastnostem přiřadí pojem obloha. Způsoby pojmenování barev, asi se to zvlášť hodí na vyhledávání v uměleckých kolekcích.
Machine learning: supervised - účelem je naučit systém správně kategorizovat či pojmenovávat; unsuperised - snaží se podchytit, jak jsou data organizována/clusterována. Supervised přístupy: SVM, Bayesian classification, neuronové sítě, rozhodovací stromy, bootstrapping. Rozhodovací stromy někdo používá pro relevance feedback (rozhodování relevantní/nerelevantní). Typickým příkladem unsupervised learning je image clustering (metody k-means clustering, Normalized cut, CLUE; locality preserving clustering). Učení se může používat pro object recognition.
Relevance feedback: online metoda, hlavní přístupy: změna vah, posouvání query point (často využívá Rocchio vzorec). Obě tyto metody používají nearest-neighbor sampling. Při určování změněného dotazu se často používají machine learning techniques.
Semantic templates: mapování mezi sémantickými koncepty a nízkoúrovňovými deskriptory. Semantic template obvykle definována jako reprezentativní vlastnost pro daný koncept odvozená ze vzorových objektů. Může se vytvářet během relevance feedback.
Web image retrieval: má specifické vlastnosti, může využívat další informace získané z webové stránky (URL, title, alt, description, hyperlinks).
Článek dále obsahuje obsáhlou diskuzi o testovacích datech, zejména o výhodách a nevýhodách nejpoužívanější kolekce Corel. Dále se řeší vhodné slovníky pro sémantické koncepty. Nakonec řeší evaluační metody: precision, recall, jejich poměry, poměr ku scope, rank.
V další sekci zmiňují potřebu dotazovacího jazyka, popisují několik existujících návrhů, které jsou ale všechny sémantické. Také se stručně věnují indexovacím strukturám a upozorňují, že často nejsou specializované na vlastnosti obrázků (existují i některé, které se o to snaží).

Jain, Sinha: Content Without Context is Meaningless. MM 2010.

plný text článku, bibtex

Kritika současného trendu výzkumu - děláme content-based, kašleme na context, přitom v praxi se používá zrovna context a to naše ne. Příliš mnoho problémů se snažíme řešit strojovým učením ("Machine Learning Hammer"). Zmiňují otázku, jestli současný výzkum vůbec řeší ten správný problém (k čemu se používá které medium, jaké informace z něj jsou podstatné...). Multimedia content problem má být "perception problem", kde je nutno vzít do úvahy prostředí, medium a příjemce. Hezké shrnutí problematiky Image understanding, proč je to složité a proč to jde se současným přístupem dělat jen pro omezené domény.

Lidské vnímání je hodně založeno na context - naší znalosti prostředí a zkušenosti. V počítačovém zpracování může context pomoci ke zúžení prohledávaného prostoru, což je zásadní, protože jinak je ten prostor příliš velký a různorodý. To je problém možná ani ne výkonnostní, ale zejména je tam příliš mnoho šumu.

Tipy na konkrétní využití context: EXIF metadata - z doby expozice se třeba dá odhadovat, zda je fotka denní/noční, venku/vevnitř apod., z ohniskové délky se dá odhadovat velikost.

Eickhoff, Li, de Vries: Exploiting User Comments for Audio-Visual Content Indexing and Retrieval. ECIR 2013

plný text článku, bibtex

Didn't read in detail, but the main idea is pretty simple and interesting at the same time: use a new modality for video search - tags extracted from YouTube comments. Some time series analysis and filtering by Wikipedia are used.

Noel, Peterson: Context-Driven Image Annotation Using ImageNet. 26th International Florida Artificial Intelligence Research Society Conference, 2013

plný text článku, bibtex

Annotate images using visual content and context = keywords from title/surrounding page/whatever. Use the keywords to determine synsets from WordNet, then use appropriate ImageNet data to compare with the image to be annotated, select the most probable synsets. Face classifier used to verify people-related concepts.

Lew, Sebe, Djeraba, Jain: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM 06.

plný text článku, bibtex

Jednou z hlavních výzev je Semantic gap - vyhledávání pomocí low-level deskriptorů je vhodné jen pro některé aplikace a znalé uživatele, pro běžného uživatele je potřeba "přeložit" deskriptory do srozumitelných pojmů. Opět zmiňuje problém objektivních benchmarků pro srovnávání. Odkazy na různé studie, co uživatelé potřebují.

2.2.2 Relevance Feedback. Beyond the one-shot queries in the early similarity-based search systems, the next generation of systems attempted to integrate continuous feedback from the user in order to learn more about the user query. The interactive process of asking the user a sequential set of questions after each round of results was called relevance feedback because of its similarity to older pure text approaches. Relevance feedback can be considered a special case of emergent semantics. Other names have included query re?nement, interactive search, and active learning fromthe computer vision literature. The fundamental idea behind relevance feedback is to show the user a list of candidate images, ask the user to decide whether each image is relevant or irrelevant, and modify the parameter space, semantic space, feature space, or classi?cation space to re?ect the relevant and irrelevant examples. In the simplest relevance feedback method from Rocchio [1971], the idea is to move the query point toward the relevant examples and away from the irrelevant examples. In principle, one general view is to view relevance feedback as a particular type of pattern classi?cation in which the positive and negative examples are found from the relevant and irrelevant labels, respectively.

Bozzon, Fraternali: Multimedia and Multimodal Information Retrieval. Search Computing, Chapter 8, Springer 2010.

plný text článku, bibtex

Survey k multimodal information retrieval, obsahuje hezké formulace základních problémů MIR, jednodlivé části procesu od data acquisition po presentation. Je tam také kapitolka ke query languages.

Rui, Huang, Chang: Image Retrieval: Current Techniques, Promising Directions, and Open Issues. Journal of Visual Communication and Image Representation, Volume 10, Issue 1, 1999.

plný text článku, bibtex

Úvod do Content-Based Image Retrieval - historie, hlavní problémy. Hlavní problémy jsou detailně rozebrány:

visual features, jejich reprezentace (hezky popsány různé přístupy, jak popsat barvy, ..., a jak počítat podobnost)

indexování: jako základní přístupy uvádějí dimension reduction a potom různé stromové indexy (R-tree, k-d tree). Jako nové přístupy zmiňují clustering a neural nets.

state-of-the-art komerční a akademické systémy: většinou podporují něco z: random browsing, search by example, search by sketch, search by text (including key word or speech), navigation with customized image categories. Popis několika systémů (QBIC, MARS, WebSeek, ...)

Dále navrhují směry budoucího výzkumu: human in the loop, relevance feedback, web-oriented systems, high-dimensional indexing, performance evaluation criterion a testbed.

Veltkamp, Tanase: Content-Based Image Retrieval Systems: A Survey. 2000.

plný text článku bibtex

Popis několika desítek CBIR systémů. Jak jsou objekty popsány, jak se zadává a vyhodnocuje dotaz, jak jsou prezentovány výsledky. U některých zmíněn a stručně popsán relevance feedback.

Smeulders, Worring, Santini, Gupta, Jain: Content-Based Image Retrieval at the End of the Early Years. 2000.

plný text článku, bibtex

Ashley, Flickner, Hafner, Lee, Niblack, Petkovic: The Query By Image Content (QBIC) System. SIGMOD 1995.

plný text článku, bibtex

Wang, Li, Wiederhold: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 9, pp. 947-963, 2001.

plný text článku, bibtex

Mají databázi oztříděnou do několika kategorií - textured/nontextured, graph/photograph. Několik vlastních deskriptorů - region-based, ale pak se to spojí do jednoho globálního. Zkoušeno na 200K obrázcích.

Wei, Li, Sethi: Web-WISE: compressed image retrieval over the web . Multimedia Information Analysis and Retrieval, 1998.

plný text článku, bibtex

Systém má tři části: stahování obrázků z webu, extrakce deskriptorů a zpracování dotazu. Color a texture global descriptors (vlastní).

Quack, Monich, Thiele, Manjunath: Cortina: A System for Large-scale, Content-based Web Image Retrieval. ACM Multimedia 2004.

plný text článku, bibtex

3 million images. Visual features and collateral text. Search process consists of an initial query-by-keyword or query-by-image and followed by relevance feedback on the visual appearance of the results. Semantic relationships in the data are explored and exploited by data mining, and multiple feature spaces are included in the search process. Uses four global MPEG7 feature descriptors for color and texture.

Batko, Falchi, Lucchese, Novak, Perego, Rabitti, Sedmidubsky, Zezula: Building a web-scale image similarity search system. Multimedia Tools and Applications 2010.

plný text článku, bibtex

Velký tučný článek o MUFINu. Na začátku dobré počítání exploze dat. Článek má dvě hlavní témata - budování CoPhIR kolekce a vytvoření scalable search system. Porovnání centralized a distributed implementace, použití různých indexů.

Zagoris, Arampatzis, Chatzichristofis: www.MMRetrieval.net: A Multimodal Search Engine. SISAP 2010.

plný text článku, bibtex

Dvoustránkový paper o systému, používá distribuované indexy a paralelní vyhledávání (detaily neznámé), modality kombinuje pomocí late fusion, uživatel si volí způsob kombinace a váhy.

Kludas, Bruno, Marchand-Maillet: Information Fusion in Multimedia Information Retrieval. 2008

plný text článku, bibtex

Hezký přehled přístupů k information fusion, kategorizace, odkazy. V praktické části porovnání několika přístupů - používají SVM klasifikátory, malé testovací datasety. Kategorizace podle počtu modalit a zdrojů. Multi-modal fusion může být sériová, paralelní a hierarchická. Jiné dělené - complementary, cooperative a competitive fusion (podle toho, jestli zdroje informace doplňují, nebo upřesňují výběr). Fusion může probíhat na různých úrovních - data/feature, classifier/score, decision level. Performance improvement boundaries.

The JDL working group defined information fusion as "an information process that associates, correlates and combines data and information from single or multiple sensors or sources to achieve refined estimates of parameters, characteristics, events and behaviors"
Drawbacks in data and feature fusion are problems due to the ’curse of dimensionality’, its computationally expensiveness and that it needs a lot of training data
It can be said to be throughout faster because each modality is processed independently which is leading to a dimensionality reduction. Decision fusion is however seen as a very rigid solution, because at this level of processing only limited information is left.
For information retrieval systems the situation is not as trivial. For example, three different effects in rank aggregation tasks can be exploited with fusion [4]:
(1) Skimming effect: the lists include diverse and relevant items
(2) Chorus effect: the lists contain similar and relevant items
(3) Dark Horse effect: unusually accurate result of one source
According to the theory presented in the last section, it is impossible to exploit all effects within one approach because the required complementary (1) and cooperative (2) strategy are contradictory.

Zhou, Depeursinge, Muller: Information Fusion for Combining Visual and Textual Image Retrieval. Pattern Recognition 2010

plný text článku, bibtex

Zkouší různé metody kombinování výsledků s late fusion strategií, porovnávají vliv na fusion efekty - chorus, dark horse. maximum combinations, sum combinations, product of maximum and a non-zero number. Various normalization strategies tried.

Moulin, Largeron, Gery: Impact of Visual Information on Text and Content Based Image Retrieval. 2010

plný text článku, bibtex

Late fusion of visual and text, weighted sum, zkoumají vliv vah. Součást ImageCLEF multimedia. SIFT, bag of words. Z query objektu i db objektů extrahují text a visual deskriptory, obojí počítají podle tf-idf.

Clinchant, Ah-Pine, Csurka: Semantic Combination of Textual and Visual Information in Multimedia Retrieval. ICMR 2011

plný text článku, bibtex

"Image and textual queries are expressed at different semantic levels" - one should not combine them independently as most information fusion techniques do. Ví se, že samotný text search funguje líp než samotný visual, taky se ví, že kombinace pomáhá. Takže, jak je zkombinovat co nejlíp - manage the complementarities between image and text search? Hezky vysvětlují rozdíl mezi early a late fusion. Dále uvádějí transmedia fusion: first use one of the modalities to gather relevant documents and the swith to the other modality. In many experiments reported in the literature, it has been shown that either late fusion or transmedia fusion approaches have been performing better than early fusion techniques. Na druhou stranu, image reranking podle nějakého článku z ImageCLEF horší než čistě text search. Nebude to tím, že se v ranking ignoruje text distance? Proto oni navrhují combine image reranking with late fusion, tedy nejprve text searchem vybrat kandidáty, pak rankovat podle text+visual. Pro předvýběr používají K=1000.

There is a semantic mismatch between visual and textual queries: image queries are ambiguous for humans from a semantic viewpoint. Fusion techniques should be asymmetric since text and image are expressed at different semantic levels.
Visual techniques do work and are effective to retrieve visually similar images but they usually fail in multimedia retrieval due to the semantic mismatch

Park, Nang: Content Based Web Image Retrieval System Using Both MPEG-7 Visual Descriptors and Textual Information. MMM 2007

plný text článku, bibtex

Web search system komplet - stahování obrázků, indexace, hledání. Hledá podle text and visual, early fusion. Text deskriptory získány ze stránky, potom přefiltrovány pomocí naučených asociací mezi visual concepts a keywords (propojení pomocí WordNetu). Text&visual index - hierarchical bitmap index.

He, Xiong, Yang, Park: Using Multi-Modal Semantic Association Rules to fuse keywords and visual features automatically for Web image retrieval. Information Fusion 2011

plný text článku, bibtex

Vyvíjejí VAST web image retrieval system. Hledají asociace mezi visual features a text. Information fusion dělí na automatic a non-automatic, v automatic tyto kategorie: pseudo-relevance feedback, online clustering, long-term RF learning. Odkazují nějaké analýzy chování uživatele - lazy user. Jejich technologie - automatically fuse image and text using Multi-Modal Semantic Association Rule - dává dohromady single keyword a několik visual features. Dost detailní popis tvorby asociací. Pomocí získaných asociací se přerankují výsledky prvotního text-based vyhledávání.

Depeursinge, Müller: Fusion Techniques for Combining Textual and Visual Information Retrieval. Chapter 6 of ImageCLEF Experimental Evaluation in Visual Information Retrieval, Springer 2010

plný text článku, bibtex

Kategorizace přístupů k data fusion z článků na ImageCLEF za posledních 7 let. Identifikují tři základní přístupy - inter-media query expansion, early fusion, late fusion (by far the most widely used). V early fusion uvažují jednoduše concatenation, jako problém uvádějí curse of dimensionality. V late fusion se rozlišuje rank-based fusion a score-based fusion, druhá varianta častější, vyžaduje normalizaci. Jedna možná implementace late fusion je intersection - uvedeno několik variant, jak zadefinovat. Jiná varianta - reordering; odkazy na publikace o text search + visual reordering a naopak. Detailně se popisuje několik metod kombinování - SUM a spol. Dále popisují text a visual query expansion.

Arampatzis, Zagoris, Chatzichristofis: Fusion vs. Two-Stage for Multimodal Retrieval. ECIR 2011

plný text článku, bibtex

Porovnávají late fusion text a visual search results s visual reordering z hlediska kvality, vyjde jim to celkem stejně. Čas neměřili, ale v diskuzi uvádějí, že rankování je rychlejší, tudíž lepší. Porovnávali to na zhruba 200 000 objektech z Wikipedie.

Kokar, Tomasik, Weyman: Formalizing classes of information fusion systems. Information Fusion 2004

plný text článku, bibtex

Snaha o formální uchopení information fusion. Věnují se early a late fusion, ukazují, že decision fusion je podtřídou data fusion. Formalismus zajímavý, používají pěkná schémata, asi by stálo za to někdy na to víc kouknout.

Fagin: Combining Fuzzy Information: an Overview. SIGMOD 2002

plný text článku, bibtex

Naive algorithm, Fagin algorithm a Threshold algorithm pro vyhodnocování kombinovaných dotazů nad systémem, který umí vracet uspořádané seznamy podle jednotlivých deskriptorů (dva druhy přístupu k datům - sorted a random access, random dražší). Dokáže se, že Threshol algorithm je optimální přesný algoritmus pro tento problém. Dále se diskutuje aproximativní algoritmus, který skončí předčasně, když je dosaženo dané míry přesnosti.

There is an obvious naive algorithm for obtaining the top k answers. Under sorted access, it looks at every entry in each of the m sorted lists, computes (using t) the overall grade of every object, and returns the top k answers. The naive algorithm has linear middleware cost (linear in the database size), and thus is not efficient for a large database.
Fagin algorithm:
1. Do sorted access in parallel to each of the m sorted lists Li. (By “in parallel”, we mean that we access the top member of each of the lists under sorted access, then we access the second member of each of the lists, and so on.) Wait until there are at least k “matches”, that is, wait until there is a set H of at least k objects such that each of these objects has been seen in each of the m lists.
2. For each object R that has been seen, do random access as needed to each of the lists Li to find the ith field xi of R.
3. Compute the grade t(R) = t(x1, . . . , xm) for each object R that has been seen. Let Y be a set containing the k objects that have been seen with the highest grades (ties are broken arbitrarily). The output is then the graded set {(R, t(R)) |R ∈ Y }.
Threshold algorithm:
1. Do sorted access in parallel to each of the m sorted lists Li. As an object R is seen under sorted access in some list, do random access to the other lists to find the grade xi of object R in every list Li. Then compute the grade t(R) = t(x1, . . . , xm) of object R. If this grade is one of the k highest we have seen, then remember object R and its grade t(R) (ties are broken arbitrarily, so that only k objects and their grades need to be remembered at any time).
2. For each list Li, let xi be the grade of the last object seen under sorted access. Define the threshold value τ to be t(x1, . . . , xm). As soon as at least k objects have been seen whose grade is at least equal to τ, then halt.
3. Let Y be a set containing the k objects that have been seen with the highest grades. The output is then the graded set {(R, t(R)) |R ∈ Y }.
Aggregation function: FA and TA are correct for monotone aggregation functions t (that is, the algorithm successfully finds the top k answers). An aggregation function t is strict if t(x1, . . . , xm) = 1 holds precisely when xi = 1 for every i. Thus, an aggregation function is strict if it takes on the maximal value of 1 precisely when each argument takes on this maximal value. We would certainly expect an aggregation function representing the conjunction to be strict. In fact, it is reasonable to think of strictness as being a key characterizing feature of the conjunction.

Fagin: Combining Fuzzy Information from Multiple Systems. 1999

(plný text článku)

Jak kombinovat víc deskriptorů. Definují se potřebné vlastnosti agregační funkce, navržen Fagin algorithm.

Let us define an m-ary scoring function to be a function from [0, l]^m to [0, 1]. For the sake of generality, we will consider m-ary scoring functions for evaluating conjunctions of m atomic queries, although in practice an m-ary conjunction is almost always evaluated by using an associative 2-ary function that is iterated. Analogously to the binary case, we say that an m-ary scoring function t is monotone if t(xl, ..., xm)≤t(x1', ..., xm') when xi ≤ xi' for every i. As discussed before, monotonicity is a reasonable property to expect a scoring function to obey. Another such property is strictness: an m-ary scoring function t is strict if t(xl, ..., xm) = 1 iff xi = 1 for every i. Thus, a scoring function is strict if it takes on the maximal value of 1 precisely if each argument takes on this maximal value. Scoring functions considered in the literature seem to be monotone and strict. In particular, scoring functions derived from “triangular norms” are monotone and strict. Another important class of scoring functions include various weighted and unweighed arithmetic and geometric means, which Thole, Zimmerman, and Zysno found to perform empirically quite well. These are also monotone and strict.

Tung, Zhang, Koudas, Ooi: Similarity Search: A Matching Based Approach. VLDB 06.

(plný text článku)

Kombinované dotazy: zatímco obvykle se podobnost objektů určuje na základě kombinace fixní sady deskriptorů, článek navrhuje používat proměnlivou podmnožinu deskriptorů, a to ty, které jsou nejpodobnější (důvod - hlavně u vyšších dimenzí velká šance, že některý deskriptor "ustřelí", i když je objekt jinak velmi dobrý).

NN problem: distance between query object Q and data object over a FIXED set of features. Problems of this approach - partial similarities undiscovered, distance affected by a few dimensions with high dissimilarity (this phenomenon more obvious for high-dimensional data; in real applications - bad pixels, noise in signal, ...)
Model: object = multidimensional point; attributes sorted in each dimension, cost is measured by the number of attributes retrieved; attributes sorted with respect to their values, not distances to Q!
k-n-match problem: matching between Q and data objects in n dimensions, n < dimensionality d; these n dimensions are determined dynamically to make Q and data objects returned match best
Definition: N-match difference: Given two d-dimensional points P (p1, p2, ..., pd) and Q (q1, q2, ..., qd), let δi = |pi - qi|, i = 1, ..., d. Sort the array δ1, ..., δd in increasing order and let the sorted array be δ1' ,..., δd'. Then δn' is the n-match difference of point P with regard to Q.
N-match difference is not a metric!
Definition: The n-match problem: Given a d-dimensional database DB, a query point Q and an integer n (1 ≤ n ≤ d), find the point P ∈ DB that has the smallest n-match difference with regard to Q. P is called the n-match point of Q.
Definition: The k-n-match problem: Given a d-dimensional database DB of cardinality c, a query point Q, an integer n (1 ≤ n ≤ d), and an integer k ≤ c, find a set S which consists of k points from DB so that for any point P1 ∈ S and any point P2 ∈ DB - S, the n-match difference between P1 and Q is less than or equal to the n- match difference between P2 and Q. The S is the k-n-match set of Q.
frequent k-n-match: finds a set of objects that appears in the k-n-match answers most frequently for a range of values of n
The frequent k-n-match problem: Given a d-dimensional database DB of cardinality c, a query point Q, an integer k ≤ c, and an integer range [n0; n1] within [1; d], let S0, ..., Si be the answer sets of k-n0-match, ..., k-n1-match, respectively. Find a set T of k points, so that for any point P1 ∈ T and any point P2 ∈ DB - T, P1’s number of appearances in S0, ..., Si is larger than or equal to P2’s number of appearances in S0, ..., Si.
Algorithm for (frequent) k-n-match problem: FA doesn't work as the aggregation function used in k-n-match is not monotone.
AD algorithm for k-n-match Search: We first locate each dimension of the query Q in the d sorted lists. Then we retrieve the individual attributes in ascending order of their differences to the corresponding attributes of Q. When a point ID is first seen n times, this point is the first n-match. We keep retrieving the attributes until k point ID’s have been seen at least n times. Then we can stop. We call this strategy of accessing the attributes in Ascending order of their Differences to the query point’s attributes as the AD algorithm.
Effectivity evaluation:
- To validate the statement that the traditional kNN query leaves many partial similarities uncovered, we first use a image database to visually show this fact.
- In order to evaluate effctiveness from a (statistically) quantitative view, we use the class stripping technique, which is described as follows. We use five real data sets with dimensionalities varying from 4 to 34 ... Each record has an additional variable indicating which class it belongs to. By the class stripping technique, we strip this class tag from each point and use different techniques to find the similar objects to the query objects. If the answer and the query belong to the same class, then the answer is correct. The more correct ones in the returned answers, statistically, the better the quality of the similarity searching method.

Nápady: mohlo by se použít pro přerankování výsledků Combined Query, ale problémy: ignoruje se agregační funkce - pro sumu to asi jde vyřešit normováním jednotlivých sčítanců?

Manning, Raghavan, Schutze: Introduction to Information Retrieval. Cambridge University Press. 2008.

HTML version, bibtex

Obsahuje dobrou kapitolu o evaluačních metodách - co je vhodné pro rankované výsledky, neúplnou ground truth.

Järvelin, Kekäläinen: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 2002

plný text článku, bibtex

Popisuje metrik Cumulated Gain, Discounted Cumulated Gain, Normalized DCG. Vhodné pro rankované výsledky, poradí si i s neúplnou GT.

Nowak, Lukashevich, Dunker, Rueger: Performance Measures for Multilabel Evaluation. MIR 2010

plný text článku

Popis a porovnání různých performance measures - Precision, Recall, Mean Average Precision a další. Concept-based comparison, example-based, hierarchical, ontology-based. Srovnávají to na datech z ImageCLEF, doporučení co používat (používá se teď pro ImageCLEF) - MAP, F-measure.

Fagin, Kumar, Sivakumar: Comparing Top k Lists. 2003

(plný text článku)

Rozebírá standardní možnosti porovnávání uspořádaných seznamů, navrhuje nové pro potřeby porovnávání top-k výsledků (tady se musí uvažovat, že neznáme uspořádání celé domény). Hodně teoreticky rozebrané, řazení měřicích funkcí do tříd, zkoumání metrických vlastností.

There are several standard methods for comparing two permutations, such as Kendall’s tau and Spearman’s footrule. We cannot simply apply these known methods, since they deal only with comparing one permutation against another over the same elements.
A fairly straightforward attempt at defining a distance measure is to compute the intersection of the two top k lists (viewing them as sets). In order to obtain a metric, we consider the notion of the symmetric difference (union minus the intersection), appropriately scaled. This, unfortunately, is not adequate for the top k distance problem,since two top 10 lists that are reverses of each other would be declared to be “very close.” We propose natural extensions of this idea that leads to a metric for top k lists. Briefly, the idea is to truncate the top k lists at various points i ≤ k,compute the symmetric diffference metric between the resulting top i lists, and take a suitable combination of them.
Kendall’s tau metric between permutations is defined as follows. For each pair {i, j} of distinct members of D, if i and j are in the same order in σ1 and σ2, then let Ki,j(σ1,σ2)=0; if i and j are in the opposite order, then let Ki,j(σ1,σ2) = 1. Kendall’s tau is given by sum of Ki,j (σ1,σ2). The maximum value of K(σ1,σ2) is n(n-1)/2, which occurs when σ1 is the reverse of σ2 (that is, when σ1(i)+ σ2(i)= n + 1 for each i).
Spearman’s footrule metric is the L1 distance between two permutations. Formally, it is defined by F(σ1,σ2)= Sum |σ1(i) - σ2(i)|. The maximum value of F(σ1,σ2) is n²/2 when n is even,and (n + 1)(n - 1)/2 when n is odd. Again, the maximum occurs when σ1 is the reverse of σ2.
Spearman’s rho is the L2 distance between two permutations. Formally, ρ(σ1,σ2) = (Sum |σ1(i) - σ2(i)|²)^1/2 and it can be shown that ρ(·, ·) is a metric. The maximum value of ρ(σ1,σ2) is (n(n+1)(2n+1)/3)^1/2, which occurs when σ1 is the reverse of σ2.
Let p be a fixed parameter with 0 ≤ p ≤ 1 (penalty). We define the Kendall distance with penalty parameter p as follows:
- Case 1 (i and j appear in both top k lists): If i and j are in the same order then let K(p)i,j (τ1,τ2) = 0. If i and j are in the opposite order (such as i being ahead of j in τ1 and j being ahead of i in τ2), then let the penalty K(p) i,j (τ1,τ2)=1.
- Case 2 (i and j both appear in one top k list (say τ1), and exactly one of i or j, say i, appears in the other top k list (τ2)): If i is ahead of j in τ1,then let the penalty K(p)i,j (τ1,τ2) = 0, and otherwise let K(p)i,j (τ1,τ2) = 1. Intuitively,we know that i is ahead of j as far as τ2 is concerned, since i appears in τ2 but j does not.
- Case 3 (i, but not j, appears in one top k list (say τ1), and j, but not i, appears in the other top k list (τ2)): Then let the penalty K(p) i,j (τ1,τ2) = 1. Intuitively, we know that i is ahead of j as far as τ1 is concerned and j is ahead of i as far as τ2 is concerned.
- Case 4 (i and j both appear in one top k list (say τ1), but neither i nor j appears in the other top k list (τ2)): This is the interesting case (the only case where there is really an option as to what the penalty should be). In this case, we let the penalty K(p) i,j (τ1,τ2)= p.
Let l be a real number greater than k. The footrule distance with location parameter l, denoted F(l), is obtained by placing all missing elements in each of the lists at position l and computing the usual footrule distance between them. A natural choice for l is k + 1. Another possibility is l = (3k -z +1)/2 which forms a "dynamic" version of F(l), where l = (3k - z +1)/2 actually depends on τ1 and τ2. Note that l =(3k - z +1)/2 is the average of k + 1 and 2k - z, where the latter number is the size of D = Dτ1 ∪ Dτ2. Taking l = (3k - z +1)/2 corresponds intuitively to “placing the missing elements at an average location”.
It is convenient to deal with normalized versions of distance measures by dividing each measure by its maximum value. The normalized version is then a distance measure that lies in the interval [0, 1].
Algorithmic application: Given r ranked lists τ1,...,τr (either full lists or top k lists) of “candidates,” the rank aggregation problem with respect to a distance measure d is to compute a list τ (again, either a full list or another top k list) such that the difference between the new and original lists is minimized. It is argued that Kendall’s tau and its variants are good measures to use, both in the context of full lists and top k lists. Unfortunately, computing an optimal aggregation of several full or top k lists is NP-hard for each of the Kendall measures. In this context, our notion of an equivalence class of distance measures comes in handy.

Bustos, Skopal: Dynamic Similarity Search in MultiMetric Spaces. MIR 2006.

(plný text článku)

Článek popisuje strukturu M³-tree, která má umožňovat dotazy nad více metrikami s dynamickým určováním koeficientů do váženého součtu subdistancí. Vychází se z toho, že koeficienty jsou z intervalu [0,1] a tudíž existuje horní mez součtu. Strom ukládá parciální vzdálenosti a dynamicky odhaduje horní mez pro covering radius uzlu.
Dynamické koeficienty se určují na základě podobnosti objektu k malé množině předem klasifikovaných objektů.
Článek je zaměřen na rychlost výpočtu, nikoli kvalitu výsledků.

Usually, a single metric function is used to compute the similarity between two objects in the metric space. However, a recent trend to improve the effectiveness of the similarity search resorts to use several metric functions. The (dis)similarity function is computed as a linear combination of some selected metrics. It follows (from metric spaces theory) that the combined distance function is also a metric.
Definition 1. (linear multi-metric): Let M = <δ_i> be a vector of metric functions, and let W = <w_i> be a vector of weights, with |M| = |W| = m and ∀i w_i ∈ [0, 1]. The linear multi-metric (or linear combined metric function) is defined as D_W(O1,O2) = ∑_{i=1 to m} w_i · δ_i(O1,O2).
The problem with a static combination of metrics (i.e., where the weights of the linear combination are fixed) is that usually not all metrics are well-suited for performing similarity search with all query objects. Moreover, a bad-suited metric may “spoil” the final result of the query. Thus, to further improve the effectiveness of the search system, methods for dynamic combinations of metrics have been proposed, where the query processor weighs the contribution of each metric depending on the query object (i.e., big weights are assigned to the “good” metrics for that query object, and low weights are assigned to the “bad metrics”, according to some quality criteria). This means that, instead of a single metric, the system uses a dynamic metric function (multimetric), where a different metric is computed to perform each similarity query.
We implemented a query processor based on the entropy impurity method to compute the dynamic weights for each 3D feature vector. This method uses a reference dataset that is classified in object classes (in our case, we used the classified subset of the 3D models database). For each feature vector, a similarity query is performed on the reference dataset. Then, the entropy impurity is computed looking at the model classes of the first t retrieved objects: It is equal to zero if all the first t retrieved objects belong to the same model class, and it has a maximum value if each of the t object belongs to a different model class.

Bustos, Keim, Saupe, Schreck, Vranic: Using Entropy Impurity for Improved 3D Object Similarity Search. ICME 2004

(plný text článku)

Snaží se zefektivnit vyhledávání v 3D objektech. Využívá se několik deskriptorů, pro každý query objekt se zjistí, které jsou pro něj nejdůležitější, a to pomocí entropy impurity measure. Při pokusech se používá trénovací množina, v níž jsou objekty předem klasifikovány do kategorií. Kategorie jsou dány předem, pro každou z nich známe několik reprezentantů (trénovací objekty). Pro daný query objekt a deskriptor seřadíme objekty z trénovací množiny podle vzdálenosti dané tímto deskriptorem. Čím více objektů, které jsou na prvních místech, patří do stejné kategorie, tím vhodnější je tento deskriptor pro popis daného query objektu. Zajímavý pro distinct kNN na tom může být princip Entropy Impurity measure – pokud všechny blízké objekty ze stejné kategorie, EI=0, naopak pokud blízké objekty rovnoměrně ze všech kategorií, EI=1.

As we learned from our experiments, the effectiveness of a given FV (feature vector) cannot be assessed for the general case, as it depends on the specific type of 3D object that one wants to search. For example, we have observed that the best effectiveness for “car models” is achieved using the depth buffer FV, but the best effectiveness for “sea animal models” is achieved using the silhouette FV. In this work, we attempt to improve the effectiveness of a similarity search system for a general 3D object database, where no restrictions are imposed on the 3D models, as they are allowed to represent any type of object.
We introduce a heuristic for the a priori estimation of individual FV performance. We then define two methods that use this estimator to improve the effectiveness of the retrieval system: The first one makes use of the heuristic to select a good FV given a query object from the pool of available FVs. The other one uses it to combine FVs with weights based on the estimator value.
We use an estimator based on the entropy impurity for determining the best FVs to use. The entropy impurity is a well known measure used in the context of decision tree induction, where it measures the “impurity” of a node N of the tree w.r.t. the elements assigned to N. If these elements all have the same class label, then the impurity is 0, otherwise it is a positive value that increases up to a maximum when all classes are equally represented.
An open problem is the automatic selection of a FV or combination of FVs for searching for a given query object in a database consisting entirely of unclassified objects. In this case, the proposed methods cannot be directly applied.

Bustos, Keim, Saupe, Schreck, Vranic: Automatic Selection and Combination of Descriptors for Effective 3D Similarity Search. ISMSE 04

(plný text článku)

Další článek na stejné téma jako předchozí. Pomocí trénovací sady objektů, které jsou rozčleněny do kategorií, se zjistí, které deskriptory jsou pro daný objekt podstatné. Pak se hledá podle nejdůležitějšího deskriptoru nebo lépe podle lineární kombinace, kde váhy jsou určeny na základě důležitosti deskriptoru pro daný objekt.

Amato, Falchi, Gennaro, Rabitti, Savino, Stanchev:: Improving Image Similarity Search Effectiveness in a Multimedia Content Management System. MIS 2004

(plný text článku)

Autoři pro danou kolekci obrázků a množinu šesti MPEG-7 deskriptorů zkoumají, jaké váhy přiřadit deskriptorům, aby se získal co nejlepší výsledek. Váhy mají být univerzální pro celou kolekci. Zkoumají to tak, že dělají experimenty s uživateli, kteří označují nejlepší výsledky (číselně, od 0 do 1), pak porovnávají, pro které deskriptory to počítač seřadil podobně a pro které ne. Podle míry podobnosti se pak odvodí koeficienty agregační funkce.

Falchi, Lucchese, Perego, Rabitti: CoPhIR: COntent-based Photo Image Retrieval. 2008.

(plný text článku)

Popisuje, proč a jak byla vytvářena kolekce CoPhIR (výběr zdroje, výběr sbíraných metadat, způsob crawlování a indexování - velmi náročné, používají se gridy).

Skala: Measuring the Diffculty of Distance-Based Indexing. SPIRE 2005.

(plný text článku)

Počítá se intrinsic dimensionality daného metrického prostoru a zvolené distanční funkce. Tato veličina se určuje ze střední hodnoty a rozptylu rozložení vzdáleností pro množinu náhodných dvojic objektů a měla by umožnit porovnávat, jak náročné je indexování daného prostoru ve srovnání s n-dimenzionálním vektorovým prostorem. Podle hodnoty intrinsic dimensionality se dá rozhodnout, která distanční funkce je vhodnější pro indexování daného prostoru.
K intrinsic dimensionality viz také Chavez, Navarro: Measuring the dimensionality of general metric spaces.
V článku se na základě této veličiny provádí výpočty pro několik zvolených množin objektů a několik variant distančních funkcí. Vyhodnocuje se dimenzionalita a nejvhodnější funkce. Hodně teoretické, důkazy, pravděpodobnosti.

Chavez, Navarro: Measuring the dimensionality of general metric spaces. Technical Report TR/DCC-00-1, 2000.

(plný text článku)

Složitost vyhledávání závisí na dimenzionalitě prohledávaného prostoru. Autoři zdůvodňují, že z pohledu vyhledávání je dimensionalita u vektorového prostoru často jiná než skutečný počet dimenzí. Vyhledávací dimensionalita by měla také být měřitelná u obecných metrických prostorů, které běžné dimenze nemají. Autoři proto definují tzv. intrinsic dimensionality, která se počítá ze statistických vlastností histogramu vzdáleností. Autoři dále ukazují, že známe-li intrinsic dimensionality, můžeme určit meze pro složitost vyhledávání.
Velkou výhodou navržené míry pro intrinsic dimension (existují i jiné) je její jednoduchost a levné spočítání pro libovolný metrický prostor.

The search problem is known to be difficult as the dimension l grows. However, in practice the "intrinsic" dimension of a real-world vector space is less than l because the elements are not uniformly distributed. The behavior of all search algorithms is known to be related to the intrinsic dimention, but defining it is difficult.
Despite the absence of coordinates in general metric spaces, many authors speak in terms of "intrinsic dimensionality" of metric spaces as a measure of the difficulty of searching in them. Several authors have pointed out at the histogram of distances as a tool to measure the dimensionality, noting that in vecotr spaces the histogram of distances shrinks and shifts to the right as l grows.
We introduce a new definition of intrinsic dimensionality that is simple and efficient to estimate and which is shown analytically and experimentally to capture the essential feature of metric spaces that determine the behavior of all the search algoritms. We prove analytically that our definition extends naturally that of vector spaces and that lower bounds on the behavior of a large class of proximity search algorithms can be stated as a function of our intrinsic dimensionality measure.
We define the intrinsic dimensionality of a set of points in a metric space in terms of statistical properties of the histogram of distances. The idea is that, as the space has higher intrinsic dimension, the mean of the histogram grows and/or its variance is reduced (at least this is the case onrandom vector spaces).
Intuitive expanation of the concept: if we consides a random query q and an indexing scheme based on random pivots, then the possible distances between q and a pivot p are distributed according to the histogram of the figure. The elimination rule says that we can discard any point u such that d(p,u) ∉ [d(p,q), d(p,q)+r]. The grayed areas in the figure show the points that cannot be discarded. As the histogram is more and more concentrated around its mean, less and less points can be discarded using the information given by d(p,q). Moreover, in many cases the search radius r must grow as the mean distance grows, which makes the problem even harder.

Definition (intrinsic dimensionality): The intrinsic dimensionality of a metric space is defined as ρ = µ² / (2.σ²), where µ and σ² are the mean and variance of its histogram of distances.

Skala: Counting Distance Permutations. SISAP 2008.

(plný text článku), (rozšířený text článku)

Pro indexování metrických dat existují různé přístupy, které se liší v tom, kolik a jakých údajů (vzdáleností) je uloženo v indexu. Zde se uvažuje index, který pro každý datový objekt má uložený seznam k referenčních bodů uspořádaných podle vzdálenosti k danému objektu. Diskutuje se, že takovýchto seznamů, tj. permutací referenčních bodů, může být podstatně méně než k!, což má vliv na velikost indexu a rychlost prohledávání.

Indyk, Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. Proceedings of the Symposium on Theory of Computing, 1998.

(plný text článku)

Základní článek o LSH, které je prezentováno jako jedna z metod pro řešení problému PLEB, o kterém se ukazuje, že je "ekvivalentní" (oboustranně redukovatelný na/z) problému NNSearch. Rozebrány jsou především principy a složitost.

Overview of existing approaches to NN queries (structures, time complexity)
Reducing approximate NNS to a new problem - point location in equal balls. Two solutions to the point location problem. One is based on a method akin to the Elias bucketing algorihm - we decompose each ball into a bounded number of cells and store them in a dictionary. This allows us to achieve O(d) query time, while the preprocessing is exponential in d. For the second solution, we introduce the technique of locality-sensitive hashing. The key idea is to use hash functions such that the probability of collision is much higher for objects that are close to each other than for those that are far apart. We prove that existence of such functions for any domain (not necessarily a metric space) implies the existence of fast approximate e-NNS algorithms for that domain, with preprocessing cost only linear in d and sublinear in n (for e > 1).
Definition: Point Location in Equal Balls (PLEB): Given n radius-r balls centered at C = (cl, ..., cn) in M = (X,d), devise a data structure which for any query point q ∈ X does the following: if there exists ci ∈ C such that q ∈ B(ci, r) then return ci, else return NO.
Observe that PLEB (e-PLEB) can be reduced to NNS (e-NNS), with the same preprocessing and query costs, as follows: it suffices to find an exact (e-approximate) nearest neighbor and then compare its distance from q with r. There is a reduction in reverse from e-NNS to e-PLEB, with only a small overhead in preprocessing and query costs. This reduction relies on a data structure called a ring-cover tree.
We introduce the notion of locality-sensitive hashing and apply it to sublinear-time similarity searching. The definition makes no assumptions about the object similarity measure. In fact, it is applicable to both similarity and dissimilarity measures.
Definition: Locality-sensitive hashing: A family H = {h:S → U} is called (R,cR, P1, P2)-sensitive if for any q,p,p' ∈ S
- if p ∈ B(q, r1) then PrH[h(q) = h(p)] ≥ P1
- if p' ∉ B(q, r2) then PrH[h(q) = h(p')] ≤ P2.
In order for a locality-sensitive family to be useful, it has to satisfy inequalities p1 > p2 and rl < r2 when D is a dissimilarity measure, or p1 > p2 and r1 > r2 when D is a similarity measure.
LSH Algorithm for PLEB: For k specified later, define a function family G = (g : S → U^k} such that g(p) = (h1(p), ..., hk(p)), where hi ∈ H. The algorithm is as follows. For an integer l we choose l functions g1 , ... ,gl from G independently and uniformly at random. During preprocessing, we store each p ∈ P in the bucket gj(p), for j = 1,..., l. Since the total number of buckets may be large, we retain only the non-empty buckets by resorting to hashing. If any bucket contains more than one element, we retain an arbitrary one. To process a query q, we search all buckets g1(p) ,..., gl(p). Let p1 ,..., pt be the points encountered therein. For each pi, if p ∈ B(q, r2) then we return YES and pj, else we return NO.
Let W_b(q) = P- B(q, a), and p* be the point in P closest to q. The parameters k and I are chosen so as to ensure that with a constant probability there exists gj such that the following properties hold:
1. gj(p') ≠ gj(q) for all p' ∈ W_r2(q), and
2. if p* ∈ B(q, r1) then gj(p*) = gj(q).
Theorem: Suppose there is a (rl, r2, pl, p2)-sensitive family H for D. Then there exists an algorithm for (rl,r2)-PLEB under measure D which uses O(dn+n^1+ρ) space and O(n^1ρ) evaluations of the hash function for each query, where ρ = - ln p1 / ln (p1/p2)

Andoni, Indyk: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. COMMUNICATIONS OF THE ACM January 2008/Vol. 51, No. 1.

(plný text článku, prezentace)

Přehledné zopakování principů a definic LSH (k-NN, curse of dimensionality, approximate k-NN, probabilistic approach). Algoritmus - kombinace několika LSH rodin, buď rovnou při hashování, nebo použití více hashovacích tabulek - experimentální porovnání různých nastavení. Několik základních funkcí - pro Hamming distance, l1, l2, vektory, množiny. Nová funkce pro Euklidovskou vzdálenost - mapování do vícedimenzionálního prostoru.

Problem definition (Randomized R-near neighbor reporting): Given a set P of points in a d-dimensional space R^d, and parameters R > 0, δ >0, construct a data structure that, given any query point q, reports each R-near neighbor of q in P with probability 1 – δ.
Locality-sensitive hashing: A family His called (R, cR, P1, P2)-sensitive if for any two points p, q ∈ R^d
- if |p – q| ≤ R then PrH [h(q)= h(p)] ≥ P1,
- if |p – q| ≤ cR then PrH [h(q)= h(p)] ≤ P2.
In order for a locality-sensitive hash (LSH) family to be useful, it has to satisfy P1 > P2.
Algorithm: One typically cannot use H as is since the gap between the probabilities P1 and P2 could be quite small. Instead, an amplification process is needed in order to achieve the desired probabilities of collision. Given a family H of hash functions with parameters (R, cR, P1, P2), we amplify the gap between the high probability P1 and low probability P2 by concatenating several functions. In particular, for parameters k and L (specified later), we choose L functions gj(q)=(h_1,j(q),…,h_k,j(q)), where h_t,j (1 ≤ t ≤ k, 1 ≤ j ≤ L) are chosen independently and uniformly at random from H. These are the actual functions that we use to hash the data points. The data structure is constructed by placing each point p from the input set into a bucket gj (p), for j = 1,…,L. Since the total number of buckets may be large, we retain only the nonempty buckets. In this way, the data structure uses only O(nL) memory cells; note that it suffices that the buckets store the pointers to data points, not the points themselves. To process a query q, we scan through the buckets g1(q),…, gL(q), and retrieve the points stored in them.
How should the parameter k be chosen? Intuitively, larger values of k lead to a larger gap between the probabilities of collision for close points and far points. The benefit of this amplification is that the hash functions are more selective. At the same time, if k is large then P1 is small, which means that L must be sufficiently large to ensure that an R-near neighbor collides with the query point at least once.
Near-Optimal LSH Functions for Euclidean Distance: The starting point of our construction is the line partitioning method, where point p was mapped into R¹ using a random projection and this line was partitioned into intervals of length w, where w is a parameter. We define a “multi-dimensional version” of this approach. Specifically, we first perform random projection into R^t, where t is super-constant, but relatively small (i.e., t = o(log n)). Then we partition the space R^t into cells. The hash function function returns the index of the cell which contains projected point p. Ball partitioning is used on R^t: We create a sequence of balls B1, B2…, each of radius w, with centers chosen independently at random.

Charikar: Similarity Estimation Techniques from Rounding Algorithms. STOC'02.

(plný text článku)

Definuje si LSH jako rozložení, dokazuje některé vlastnosti, které má podobnostní funkce a příslušná LSH rodina. Navrhuje LSH rodiny pro vektory (kosinová míra) a Earth Mover Distance.

Lemma 1: For any similarity function sim(x, y) that admits a locality sensitive hash function family, the distance function 1-sim(x, y) satisfies triangle inequality.
Lemma 2: Given a locality sensitive hash function family F corresponding to a similarity function sim(x, y), we can obtain a locality sensitive hash function family F0 that maps objects to {0, 1} and corresponds to the similarity function (1+sim(x,y))/2.
Lemma 3: For any similarity function sim(x, y) that admits a locality sensitive hash function family, the distance function 1-sim(x, y) is isometrically embeddable in the Hamming cube.
LSH family for vectors based on random hyperplane cutting corresponds to similarity measure (1-θ(u,v))/π which is closely related to cos(θ). If we represent sets by their characteristic vectors, we can use the random hyperplane based hash function to obtain a hash function family for set similarity, where the similarity measure of sets is 1-θ/π, where θ = cos^-1(|A ∩ B|/(|A|.|B|)^1/2). This hash function family facilitates easy incorporation of element weights in the similarity calculation, since the values of the coordinates of the characteristic vectors could be real valued element weights.
Earth Mover Distance: ...
Generalization of min-wise independent permutations to the weighted set case: we pick an infinite sequence {(it, αt)}, t=1, ..., ∞ where for each t, it is picked uniformly and at random in {1, . . . n} and αt is picked uniformly and at random in [0, 1]. Given a distribution P = (p1, . . . , pn), the assignment of labels is done in phases. In the ith phase, we check whether αi ≤ pit. If this is the case and P has not been assigned a label yet, it is assigned label it.

Gionis, Indyk, Motwani: Similarity Search In High Dimensions via Hashing. VLDB'99.

(plný text článku)

Lv, Josephson, Wang, Charikar, Li: Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. VLDB'07.

(plný text článku)

Slabinou klasických LSH přístupů je, že pro dosažení rozumné přesnosti potřebují pro velké objemy dat hodně hashovacích tabulek. Multi-Probe LSH navrhuje lépe využít tabulky (přistupovat do více než jednoho bucketu) a tím snížit potřebný počet tabulek. Pro konkrétní metriku (d-dimenzionální vektory, projekce do přímky) se navrhuje postup procházení bucketů.

Properties of ideal indexing scheme for similarity search:
- Accurate: A query operation should return desired results that are very close to those of the brute-force, linear-scan approach.
- Time efficient: A query operation should take O(1) or O(logN) time where N is the number of data objects in the dataset.
- Space efficient: An index should require a very small amount of space, ideally linear in the dataset size, not much larger than the raw data representation. For reasonably large datasets, the index data structure may even fit into main memory.
- High-dimensional: The indexing scheme should work well for datasets with very high intrinsic dimensionalities (e.g. on the order of hundreds).
We propose a new method called multi-probe LSH, which uses a more systematic approach to explore hash buckets. Ideally, we would like to examine the buckets with the highest success probabilities. We develop a simple approximation for these success probabilities and use it to order the hash buckets for exploration. Moreover, our ordering of hash buckets does not depend on the nearest neighbor distance as in the entropy-based approach.
The key idea: use a carefully derived probing sequence to check multiple buckets that are likely to contain the nearest neighbors of a query object. Given the property of locality sensitive hashing, we know that if an object is close to a query object q but not hashed to the same bucket as q, it is likely to be in a bucket that is “close by” (i.e., the hash values of the two buckets only differ slightly). Our goal is to locate these buckets.
Two approaches: step-wise probing (try all buckets which differ in one of the L hash values g(q) = (h1(q),...,hL(q)), then buckets which differ in two values, ...) and query-directed probing (compute probability that good points are in given interval, order buckets according to these probabilities).

Novak, Batko, Zezula: Web-Scale System for Image Similarity Search: When The Dreams Are Coming True. 2008.

(plný text článku)

Škálovatelný systém pro webové vyhledávání založený na obecné metrice, využití aproximace pro získání použitelných časů. Představení M-Chordu, zkoumání parametrů aproximace.

System is based purely on the metric space model. Using this model is a two-edged sword: (1) It provides us with a unique generality and almost absolute freedom in defining and combining the similarity measures. (2) On the other hand, indexing techniques have to abandon classic schemas and the metric searching can be less efficient, especially on simple data domains. For complex data, like the features extracted from digital images, the classic techniques may loose their performance because of high dimensionality and may have problems with efficient indexing of very large collections. The relative efficiency of the metric-based approach grows with the volume and complexity of data.
M-Chord: scalable distributed data structure for indexing and similarity searching in metric data, based on the concept of structured P2P networks. The system has been inspired by a centralized vector-based method called iDistance – they both map the data space into a one-dimensional domain in a similar way. The M-Chord then applies one-dimensional P2P protocol, for instance Chord or Skip Graphs, in order to divide the data among the peers and to provide navigation within the system. The M-Chord search algorithms navigate the similarity queries to the relevant peers.
M-Chord mapping: Several reference points (pivots, pi) are selected from the sample dataset. The data space is partitioned in a Voronoi-like manner into clusters (Ci). The one-dimensional mapping of the data objects is defined according to their distances from the cluster’s pivot. Having a separation constant c, the M-Chord key for an object x ∈ Ci is calculated as mchord(x) = d(pi, x) + i · c.
To evaluate a range queryR(q, r), the data space to be searched is specified by mchord domain intervals in clusters which intersect the query region. Having the data space mapped into the mchord domain, every active node of the system takes over responsibility for an interval of keys. The range queries are routed to peers overlapping with the relevant intervals.
kNN(q, k) processing:
1. The query is evaluated locally on “the most promising peer(s)”; in this way, we obtain an upper bound rk on the distance to the kth nearest object from q.
2. Range query R(q, rk) is executed and the k nearest objects from the result are returned.
There is a tradeoff between the costs of the first phase and the precision of the rk estimation and, thus, cost of the second phase.
Precise similarity searching is inherently expensive. Approximate strategy - explore only the “most promising” parts of the “most promising” clusters. The distance between the query object q and the cluster pivot d(q, pi) is taken as a heuristic of the quality of the cluster, so the algorithm visits only a specified number of clusters in the order of this distance. Within a cluster Ci, the query is first navigated to a peer responsible for key d(pi, q) + i · c, which is the most promising spot on the mchord domain for cluster Ci. This peer searches its local data, returns the partial answer to the originator, and the query is sent to some number of its adjacent peers within the cluster.
Parameters: number of clusters to visit, number of nodes to access within each cluster.
Local data organization: Pivoting M-Tree (PM-Tree) - M-Tree with additional filtering and pruning by means of precomputed distances between the indexed objects and a fixed set of pivots. We use the same set of pivots as the M-Chord and thus the object-pivot distances are computed only once during the insert operation.

Dong, Wang, Josephson, Charikar, Li: Modeling LSH for Performance Tuning. CIKM'08.

(plný text článku)

Statistický model pro Multi-probe LSH, který má bý adaptovatelný pro libovolnou datovou množinu a pomoci při nastavování parametrů LSH. Uvažuje se i ladění LSH pro konkrétní dotaz. Počítá se ale pouze s euklidovskou vzdáleností, pro kterou autoři vycházejí z (empirického) předpokladu, že rozložení objektů je možno popsat pomocí gama-rozložení. Model také předpokládá použití základní LSH funkce - vynásobení náhodným vektorem, přičtení náhodného skaláru a rozdělení na úsečky.

Panigrahy: Entropy based Nearest Neighbor Search in High Dimensions. 2005.

(plný text článku)

Comments by Charikar et al. (in Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search): Recent theoretical work by Panigrahy [22] proposed an entropy-based LSH scheme, which constructs its indices in a similar manner as the basic scheme, but uses a different query procedure. This scheme works as follows. Assuming we know the distance Rp from the nearest neighbor p to the query q. In principle, for every hash bucket, we can compute the probability that p lies in that hash bucket (call this the success probability of the hash bucket). Note that this distribution depends only on the distance Rp. Given this information, it would make sense to query the hash buckets which have the highest success probabilities. However, performing this calculation is cumbersome. Instead, Panigrahy proposes a clever way to sample buckets from the distribution given by these probabilities. Each time, a random point p0 at distance Rp from q is generated and the bucket that p0 is hashed to is checked. This ensures that buckets are sampled with exactly the right probabilities. Performing this sampling multiple times will ensure that all the buckets with high success probabilities are probed. However, this approach has some drawbacks: the sampling process is ineffcient because perturbing points and computing their hash values are slow, and it will inevitably generate duplicate buckets. In particular, buckets with high success probability will be generated multiple times and much of the computation is wasteful. Although it is possible to remember all buckets that have been checked previously, the overhead is high when there are many concurrent queries. Further, buckets with small success probabilities will also be generated and this is undesirable. Another drawback is that the sampling process requires knowledge of the nearest neighbor distance Rp, which is diffcult to choose in a data-dependent way. If Rp is too small, perturbed queries may not produce the desired number of objects in the candidate set. If Rp is too large, it would require many perturbed queries to achieve good search quality.

Moshfeghi, Pinto, Pollick, Jose: Understanding Relevance: An fMRI Study. ECIR 2013.

plný text článku, bibtex

Brief but wide survey of approaches to relevance feedback. They are studying the "nature of relevance" - what happens in the brain when looking at relevant and irrelevant results.

Drucker, Shahrary, Gibbon: Support vector machines: relevance feedback and information retrieval. 2001.

(plný text článku)

Vysvětlení použití relevance feedback (iterativní vylepšování množiny vrácených dokumentů), popis a porovnání základních algoritmů využívajících vektory (existují i jiné přístupy, např. pravděpodobnostní model). Uživatel vždy označí nějaké dobré a špatné dokumenty ve výsledku (ohodnotí výsledky zobrazené na první stránce, příp. několika prvních stránkách, pokud na první stránce jsou samé špatné), podle toho se přepočítávají vyhledávací vektory.
Základní algoritmy:

Rocchio: dokument reprezentován vektorem D, jehož velikost odpovídá počtu všech slov ve všech dokumentech po pročištění (odstranění některých slov, hodnoty jsou 0/1/N (podle toho, jestli slovo v daném dokumentu není, je, příp. jestli počítáme výskyty). Pomocí zvolených koeficientů alfa, beta, gamma se vypočítá dotaz pro další iterace:

Dotaz se vynásobí s dosud nezpracovanými (neohodnocenými) dokumenty v databázi, vrací se ty s nejvyšší hodnotou. Negativní prvky v dotazu se nastaví na nulu. Pokud bylo ve feedback hodně špatných příkladů, hodně položek bude nulových, což další iteraci moc nepomůže.

Ide regular:

Ide dec-hi: podobné, ale v posledním sčítanci se uvažuje jen nejvýše umístěný nerelevantní dokument.

SVM = Support Vector Machines: hledá se nadrovina oddělující kladné a záporné případy. Trénovací příklady se nazývají support vectors, definují dvě nadroviny, jednu pro kladné, druhou pro záporné případy. Oddělující nadrovina se získá maximalizací jakési složité funkce. Rozlišují se lineárně separovatelné a neseparovatelné případy. Lineární SVM se dá vyhodnotit velmi rychle (říkají oni, pak ale uvádějí čas do tří sekund...). Jako nejlepší se po iteraci relevance feedback vrátí dokumenty, které jsou na správné straně oddělující nadroviny a jsou od ní nejdál.

Dále rozebírají možnosti ohodnocení kvality. Zavrhnuty jsou precission a recall, protože to uživatel nemůže v průběhu vyhledávání poznat a nic moc mu to neřekne. Navrhují proto "coverage ratio", což je poměr mezi počtem zatím získaných relevantních dokumentů a počtem relevantních dokumentů, které by byly získány pči ideálním vyhledávání. Nejlepší hodnota této metriky je 1 pro každou iteraci.

Ishikawa, Subramanya, Faloutsos: MindReader: Querying databases through multiple examples. VLDB 98.

(plný text článku)

V podstatě ani nemluví o relevance feedback, ale o multi-object dotazech, ovšem z více vzorů se určí jeden ("ten, který uživatel chtěl") + správná distance function a podle toho se pak vyhledává. Často citovaný článek, ovšem uvažuje jen vektory. Ideální q.o. je pak vážený průměr vzorových objektů a distanční funkce se počítá podle toho, ve kterých deskriptorech jsou všechny vzory blízké ideálu (ty jsou důležité, takže mají vyšší váhu) a ve kterých ne. Autoři zdůrazňují, že jejich přístup umí podchytit i korelace mezi deskriptory.

Rui, Huang, Ortega, Mehrotra: Relevance Feedback: A Power Tool in Interactive Content-Based Image Retrieval. IEEE Trans. Circuits and Systems for Video Technology, 8(5):644–655, 1998.

(plný text článku)

Computer centric system, semantic gap, subjectivity of human perception. První návrh zavedeni relevance feedback do podobnostního vyhledávání. Objekt modelován jako trojice (data, vlastnosti, deskriptory), daná vlastnost může být popsána více deskriptory (histogram, color layout pro barvy), deskriptor může být vektor. Relevance feedback má umožnit přiřadit jednotlivým vlastnostem/deskriptorům/prvkům deskriptoru váhy podle potřeb uživatele (naopak tzv. computer centric přístup pracuje se zafixovanými vahami). Každý z objektů v odpovědi uživatel označí jako velmi dobrý/dobrý/neutrál/špatný/velmi špatný. Na základě toho se vyladí váhy. Na začátku musí být váhy normalizované. Pak podle toho, jak by dopadl top-k dotaz podle dané vlastnosti, jak by se překrýval s výsledkem combined query a které objekty uživatel označil, se přepočítají nové váhy. Uvažuje se také to, zda se pro označené objekty daný deskriptor chová konzistentně - pokud ano, je určující a dostane vyšší váhu. Podle všeho se při každé iteraci začíná znova od nuly - neuvažuje se, co jsme se dověděli v minulé iteraci.
Experimenty: efficiency (rychlost konvergence algoritmu) a effectiveness (jak jsou uživatelé spokojeni). Předpoklad pro experimenty - uživatel označuje objekty konzistentně (to by vysvětlovalo, že ve výpočtu koeficientů nezohledňují předchozí kroky; dá se toto ale předpokládat?). Pro efficiency používají ground truth a zkoumají, jak rychle se k ní v jednotlivých iteracích výsledek přibližuje (zjištění: nejvíc v první). Zkoušejí dvě skupiny objektů - jedny mají ideální nastavení vah blíž počátečním hodnotám, druhé dál. Dále se dívají na závislost konvergence na počtu vracených objektů - přirozeně čím víc, tím líp (víc uživatelem ohodnocených objektů). Effectiveness - user satisfaction, ale žádné číselné vyhodnocení, pouze konstatují, že uživatelé z různých oblastí byli velmi spokojeni.
Navrhovaná future work: zapojení textu, vylepšení výpočtu nových vah pomocí pokročilejších technik.

Zhou, Huang: Relevance feedback in image retrieval: A comprehensive review. Multimedia Systems, 2003.

(plný text článku)

Přehled různých feedback mechanismů používaných v image retrieval. Obsahuje obecná pozorování o tom, co je potřeba řešit, a popis, jaké přístupy existují.

We need a user-in-the-loop, because images reside in a continuous representation space, inwhich semantic concepts are best described in discriminative subspaces – “cars” are of certain shapewhile “sunset” is more describable by color. In other words, only a small subset of features (or a subspace of the original space) is active for describing any giving concept.More importantly, different users at different timesmay have different interpretations or intended usages for the same image, whichmakes off-line, user-independent learning undesirable in general.
Since the general assumption is that every user’s need is different and time varying, a database cannot adopt a fixed clustering structure; and the total number of classes and the class membership are not available beforehand, since these are assumed to be user-dependent, and time varying as well.
If each image/region is represented by a point in a feature space, relevance feedback with only positive examples can be cast as a density estimation or novelty detection problem. While with both positive and negative training examples it becomes a classification problem, or an online learning problem in a batch mode, but with the following characteristics:
- Small sample issue. The number of training examples is small relative to the dimension of the feature space while the number of classes is large.
- Asymmetry in training sample. The small number of negative examples is unlikely to be representative for all the irrelevant classes; thus, an asymmetric treatment may be necessary.
- Real time requirement. The algorithm should be sufficiently fast, and if possible avoid heavy computations over the whole dataset.
Note that in reality, user consistency is hard to achieve, i.e., it is often difficult for a user to tell between two images which is “closer” to a third one consistently in accordance with the underlying feature representations adopted by the machine. In the light of this difficulty, the user modeling has to be “soft”, or probabilistic in nature.
If we assume that the user is greedy and Impatient, and thus expects the best possible retrieval results after each feedback round, the machine should always display the most-positive images based on previous training. In this case, the user can terminate the query process at any point in time, and will always get the best results so far. Additional user feedback or “training”, if any, will be performed on these most-positive images. This is the strategy adopted by most, if not all, early relevance feedback schemes.
However, for applications where the user is co-operative and willing to provide more than one screen of training samples before seeing the results, a new question arises: “After getting feedback for one or more screens of training images, which of the remaining images shall the machine select to ask the user to label in order to achieve the highest information gain?”
The key to understanding this problem is to realize that, from the machine’s point of view, “selecting 40 examples in one batch for user labeling and training” is not as good as “selecting 20 ?rst, training on them and then selecting another 20 based on what has been learned.” In general, the most-informative images will not coincide with the most-positive images, since some of the latter might already be labeled, or tend to be very correlated with images with known labels, thus providing less new information. Intuitively, the most-informative images should be those whose labels the learner is most uncertain about.
Objective functions and learning machines: some methods propose to learn a new query and the relative importance of different features or feature components, while others learn a linear transformation in the feature space, taking into account correlations among feature components.
- Kohonen’s learning vector quantization (LVQ) algorithm and tree-structured self-organizing map (TS-SOM). Positive and negative examples were mapped to positive and negative impulses on themaps, and a low-pass operation on the maps was argued to implicitly reveal the relative importance of different features, because a “good” map will keep positive examples cluster while negative examples scatter away.
- In Ishikawa et al. and Rui and Huang, based on the minimization of total distances of positive examples from the new query, the optimal solutions turned out to be the weighted average as the new query and a whitening transform (or Mahalanobis distance metric) in the feature space. Additionally, Rui and Huang adopted a two-level weighting scheme to better cope with singularity issue due to the small number of training samples. To take into account negative examples, Schettini et al. updated feature weights along each feature axis by comparing the variance of positive examples to the variance of the union of positive and negative examples.
- MacArthur et al. cast relevance feedback as a two-class learning problem, and used a decision tree algorithm to sequentially “cut” the feature space until all points within a partition are of the same class. The database was classified by the resulting decision tree: images that fall into a relevant leaf were collected and nearest neighbors of the query were returned.
Negative examples: Vasconcelos and Lippman assumed that a negative example for class Si shall be a positive example for the complement of class Si, and quantified in terms of likelihoods as follows: P(y'|Si =1)= P(y|Si =0) where y' means that y is treated as a negative example. Special steps are needed to avoid over-emphasizing the importance of negative examples.
Nastar et al. proposed empirical formulae to take into account negative examples while estimating the distribution of positive examples along each feature component. Another ad hoc formulation was proposed by Brunelli and Mich.
Many relevance feedback algorithms make use of the sample covariance matrix and its inverse. When the number of training examples is smaller than the dimensionality of the feature space, the singularity issue arises. The substitution of the Moore–Penrose inverse or pseudo-inverse matrix for the regular inverse proposed in Ishikawa et al. is not only mathematically unfounded, but also counter-intuitive. Another proposal was to adopt a hierarchical weighting scheme (assuming a block diagonal matrix instead of the full covariance matrix) so that fewer parameters need to be estimated, and in the extreme case, just use a diagonal matrix. This implies a forced independence assumption.
Feature normalization: Different image features need to be normalized to have comparable statistics, say normal distribution. Not surprisingly, this normalization can also be extended to a transformation of the feature space. From a discrimination point of view, the optimal normalization shall be the transform that separates all the semantically meaningful classes in the dataset from clusters within each class. Since the class membership is not known a priori, one possible solution is to use the accumulated feedback from all the users as the training set to be fed into a multiple discriminant analysis framework to yield a transform that is optimal for the training data available so far. This implies more computation, but can give better performance than the straightforward normal distribution assumption. A simplified example is that if all users emphasize color feature more than texture in all cases, then there is no reason for maintaining equal variances along these two axes – stretching the color axis can give a better initial retrieval result.
Incorporating textual annotations: The so-called “semantic gap” between high-level concepts in the user’s mind and low-level features extracted by the machine is so wide in many cases that the use of keywords or annotations (where available) is key to meaningful retrieval. The research efforts in combining low-level features with text include joint use of textual and visual features for querying and relevance feedback learning; learning of visual models for concept classes annotated by keywords; and the learning of keyword relations from relevance feedback.

Shen, Jiang, Tan, Huang, Zhou: Speed up interactive image retrieval. 2008.

(plný text článku)

Autoři se snaží co nejvíce urychlit feedback loop. Zatímco ostatní se zajímají hlavně o přesnost výsledků, jim jde i o efektivitu. Zvažují dvojí urychlení - urychlení jednotlivých iterací a snížení celkového počtu potřebných iterací. Postupují tak, že si uchovávají informace o provedených iteracích, na základě toho odhadují průběh dalších iterací a když jsou odhady dostatečně podobné skutečnosti, použijí je pro odhad poslední iterace. Urychlení jednotlivých iterací vznikne tak, že si spočítají překryv nové iterace se dřívějšími a nepřistupují zbytečně již dříve přistoupené objekty.

Feedback query refinement:
- Query point movement The query point’s coordinate values are modified iteratively based on the feedback. The intuition of this method is to move the query point towards the good results and away from the bad ones. Ishikawa et al. showed that the “optimal” query point is a weighted average of good results.
- Similarity metric modification The weighted similarity metric is updated iteratively in feedback loop. The intuition of this method is to assign higher weights to the feature components that are more important in deciding the good results. Ishikawa et al. [16] proved that the “optimal” assignment of weight for i-th dimension at (t + 1)-th iteration is: wt+1[i]= 1 / σt[i]^2 where σt [i] is the standard deviation of coordinates values along the i-th dimension from all the good results at t-th iteration. Rui and Huang have extended the above technique by considering multiple features.
The key is to maintain the information among the iterations and explore their correlations. Search space of the refined query Qt+1 is highly likely to overlap with that of Qt. Relevance feedback mechanisms assume that the feedback query is modified towards the “optimal” query further as more iterations are processed. Third, queries executed in the early iterations may suggest their changing trend.
If we know which candidate in the i th iteration will be re-accessed in the (i + 1)th iteration, we could store them for efficient scan to avoid the expensive random accesses. Generally the number of candidates is very small compared with the data size. And the overlap is expected to be smaler. Maintaining such a small set of candidates for efficient scan can speed up the search. Furthermore, pre-scan on the overlap produces a potentially smaller pruning radius which tightens the search space for the refined query.
Overlap prediction:
- linear regression: problem - expects that the amount of change every time is fixed. However, the changes over iterations may vary. Furthermore, the importance of past queries in linear regression is equal in predicting the next query.
- exponential smoothing: assigns unequal weight on different data, the largest weight on the most recent data. The exponential smoothing prediction is the weighted sum of all the past queries with the weight increasing as the iteration goes on. One major drawback of exponential smoothing is that there is no intrinsic best value for α. To determine α, generally a set of values are tested, and the value which best fits the queries is selected. We use the set of [0.05, 0.1, ... , 0.9, 0.95] to choose α. Exponential smoothing tends to lag behind a trend, which causes less accurate prediction. Exponential smoothing is appropriate when the underlying feedback queries behaves like a constant, i.e., when the mean of queries is moving very slowly.
- linear exponential smoothing: though queries tend to get closer to the “optimal” query, the trend does not necessarily remain constant, i.e., the trend may vary slowly over time. This could bemore convincing when more data are included. To capture the time-varying/local trends of feedback queries, one method is to use linear (i.e., double) exponential smoothing (LES).
- adaptive linear exponential smoothing: monitors the prediction error and judges its changing trend. From the changing trends, ALES identifies and smooth the noisy queries. A query is identified as a noise if its prediction error exhibits a sharp up or down along the changing trend. Instead of removing the noisy queries (we have only a few, so no removing), we smooth them by automatically replacing them with their predictions
The KNN search in relevance feedback: Before the first step, a sequential scan is performed on the predicted overlap returned from OptRFS. Interestingly, after sequentially scanning the predicted overlap, an initial set of results can also be computed. And the tighter pruning radius is then passed to the first step. In the second step, random accesses are performed on the candidates excluding those in the predicted overlap.
Advantages: First and most interestingly, repeated random accesses on the same candidates in two consecutive iterations are avoided. Instead, an efficient scan on the overlap leads to a much faster response. Generally, the overlap size is much smaller than the candidate size. Manipulating overlap instead of whole candidate set in last iteration saves both storage and scan overhead. Second, an initial set of results are computed when the sequential scan is performed on the predicted overlap. It is potential that the initial set contain some real results. This provides a chance for phase II to stop earlier. Let us consider the best case. When the initial set contains all real results and the lower bound of first candidate in phase II is already greater than the Kth actual largest distance, phase II stops immediately without consuming single random access. Third, a tighter rt is computed, which results in a smaller number of candidates. Fourth, OptRFS is able to predict the “optimal” query so that the feedback loop can be terminated earlier. It’s noticed that OptRFS is more effectivewhen there are indeed some considerable overlap between two iterations. While skipping many feedback iterations improves the performance of feedback search, it

Huiskes, Lew: Performance Evaluation of Relevance Feedback Methods. CIVR’08.

(plný text článku)

Diskuze o tom, jak objektivně porovnávat RF metody. Jaké chyby se často dělají, co je naopak vhodné.

Wu, Faloutsos, Sycara, Payne: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. VLDB 2000.

(plný text článku)

Relevance feedback pro metrické prostory. Zmíněno několik metod, jak se implementuje RF, ale vždy jen pro vektorové prostory. Oni navrhují postup pro obecné metrické prostory, kde může být zadáno několik vzorů pro dotaz. Hledá se vhodná funkce, která bude pro každý objekt vracet vzdálenost od množiny vzorů - "Aggregate Dissimilarity Function":

Využívá se parametr α experimentálně vyladěný pro danou datovou množinu, funkce pak sečte vážené α-mocniny jednotlivých vzdáleností, určí průměr a ten odmocní.
Implementace je řešena pomocí více range dotazů pro jednotlivé vybrané objekty, používá se základní vzdálenost. Je ukázáno, že nedojde k zanedbání žádného objektu. Problém je vybrat správně range. Zmiňují možnost nechat uživatele zadat příklady, jaké nejhorší objekty by ještě snesl, a z toho odvodit range.
Autoři v experimentech ukazují, že jejich postup vrací výsledky s vysokým precision i recall. Zdůrazňují, že to funguje i pro výsledky, kde chceme disjunktní nebo konkávní množiny objektů, což jiné přístupy neumí (disjunktní: několik různých množin objektů, které mají být ve výsledku; např. fotografie prezidentů USA).

Razente, Barioni, Traina, Traina: Aggregate Similarity Queries in Relevance Feedback Methods for Content-based Image Retrieval. SAC’08.

(plný text článku)

V rámci Related work jsou dobře popsány běžné přístupy k relevance feedback, zejména lidsky vysvětlen Rocchio algoritmus. Rozlišují single point movement přístup a multiple point movement.
Autoři navrhují nový typ dotazu, Aggregate Similarity Query, který může mít víc než jeden query object. Druhým parametrem je buď range, nebo k. Oba dotazy jsou formálně definovány pomocí predikátů a podobnostní selekce (používají i zápis v relační algebře). Similarity aggregation function převzata z Wu, Faloutsos, Sycara, Payne: FALCON: Feedback Adaptive Loop for Content-Based Retrieval.
V pokusech uvažují positive i negative feedback. Vůbec neřeší implementaci.

To deal with the semantic gap, relevance feedback techniques have been developed. In these techniques, positive and/or negative examples are informed by the user to allow the system to derive a more precise representation of the user intent. The new representation of the user intent can be achieved through query point movement or multiple point movement techniques. Furthermore, implicit feedback techniques have also been developed for web search engines. In these techniques, the system learns from search results provided by the user and takes advantage of this information to adapt ranking functions. One way to tell to the system what is the user’s intention is specifying, in the same query, other elements besides the query center, which are positive or negative examples of the intend answer. This representation is based on multiple query centers.
The query point movement techniques consider that a query is represented by a single query center. Therefore at each user interaction cycle, the strategy estimates an ideal query center in the query space, moving the center towards the relevant examples and away from the irrelevant ones, as in Figure 2. When dealing with multidimensional data, these techniques usually perform a refinement of the dissimilarity function using the information obtained from the user to decide which dimensions should be emphasized and which should have their in?uence diminished. The refinement is done assigning weights to all dimensions of the feature vector. ... It is important to note that the query point movement creates a new feature vector that may not correspond to an image from the dataset.
On the other hand, multiple point movement techniques utilize multiple query centers to represent a query, as shown in Figure 3. The strategy of these techniques consists in as- signing the elements informed as relevant in clusters, where each cluster is represented by the closest element to the clus- ter center. The new centers are called representatives, and are used to execute a multiple center query that calculates a single contour to cover all the representative elements. In this type of query, it is common to use the number of elements associated to each center as its weight. ... There are also strategies based on multiple point movement, where the elements labeled as relevant form disjunctive clusters, thus allowing the execution of disjunctive queries.
We propose the generalization of the two most common similarity queries in metric spaces – the range and the k-nearest neighbor query – defining the Aggregate Similarity Query, which retrieves elements based on their composite similarity regarding multiple centers. An aggregate similarity query can be limited either by an aggregate threshold t – resulting in the Aggregate Range Query – or by a quantity k of elements – resulting in the Aggregate k-Nearest Neighbor Query

Hua, Yu, Liu: Query Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieva. ICDE’06

(plný text článku)

Celkem přehledně popsány Query Point Movement a Multipoint Query strategie pro RF. Faginův algoritmus pro kombinované dotazy je zmíněn jako jedna z cest k vylepšení k-nn query.
Jejich vylepšení spočívá v tom, že vytvoří R*-strom, v němž jsou objekty sdruženy do clusterů podle podobnosti. Tento strom se tvoří zezdola. V tom pak dělají efektivní k-nn podle více vzorů. Takové shlukování ale myslím může fungovat jen pro vektorové prostory. Základní dotaz se vytváří tak, že se uživateli náhodně zobrazí objekty, on vybere relevantní, podle toho se najdou příslušné clustery a v nich se hledá.

Vu, Cheng, Hua: Image Retrieval in Multipoint Queries. 2008.

(plný text článku)

Celkem dobře popisuje jiné přístupy k multipoint query, oni ale pracují výhradně s vektorovým prostorem, navrhují systém rovnic pro popis vlastností, které musí být splněny pro objekty ve výsledku (podle query objektů a hodnot jednotlivých složek jejich vektorů).

Yiu, Mamoulis: Multi-dimensional top-k dominating queries. VLDB Journal 2009.

(plný text článku)

Kombinací dvou existujících přístupů, kNN a skyline query, vznikne top-k dominating query. Je to kNN, ale pořadí není dáno vzdáleností, kterou by uživatel musel nějak určit a která je závislá na tom, jaké hotnoty jsou v datasetu. Každý objekt je ohodnocen podle toho, kolika jiným "dominuje", tj. v alespoň jedné vlastnosti je lepší a v ostatních stejně dobrý nebo lepší.

The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. It combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions.
To summarize, top-k queries do not provide an objective order of importance for the points, because their results are sensitive to the preference function used. Skyline queries, on the other hand, only provide a subset of important points, which may have arbitrary size. To facilitate analysts, who may be interested in a natural order of importance, according to dominance, we propose the following intuitive score function: τ(p) =|{ p' ∈ D | p >' p' }| (1)
In words, the score τ(p) is the number of points dominated by point p. The following monotone property holds for τ: ∀ p, p' ∈ D, p >' p': τ(p)>τ(p') (2)

Olivares, Ciaramita, Zwol. Boosting Image Retrieval through Aggregating Search Results based on Visual Annotations. ACM Multimedia 2008.

(plný text článku)

V úvodu celkem obsáhle řeší textové popisy Flickr a odkazují na další výzkum. Pro podobnostní vyhledávání využívají "bag-of-words" přístup, kdy vektory jednotlivých deskriptorů převedou na slova a nad těmi pak hledají jakoby textově. Hlavní myšlenka - uživatelé Flickru mohou přidávat tagy ke konkrétním částem obrázku, toto autoři nazývají visual annotations - ke slovu/tagu můžu mít několik (pravděpodobně dost relevantních) ukázek. Ty používají k content-based hledání. Spíš je to content-based search než kombinace text and visual, protože text používají jen ke získání visual deskriptorů.

Ji, Yao, Liu, Wang, Xu. A Novel Retrieval Refinement and Interaction Pattern by Exploring Result Correlations for Image Retrieval. 2008.

(plný text článku)

Postprocessing výsledků – shlukování podle kategorií. Používá předem kategorizované obrázky, použitelné tedy pro databáze, kde je fixní kategorizace možná – např. medicínské obrázky. Podle nich se určí kategorie obrázků z výsledku kNN (tzv. Pairwise-Coupling), uživateli vrátí top-k obrázků a top-m kategorií.
Pairwise-Coupling: pro každou kategorii n klasifikátorů, ty se vyjadřují (0,1), jestli objekt patří do kategorie (asi jestli je vzdálenost dost nízká – není přesně specifikováno). Objekt je zařazen do té kategorie, kde získal nejvíc hlasů.

Chen, Wang, Krovetz. CLUE: Cluster-Based Retrieval of Images by Unsupervised Learning. IEEE Transactions on Image Processing, vol. 14, no. 8, August 2005.

(plný text článku)

Postprocessing výsledků, shlukování podobných do clusterů. Funguje to tak, že vezme množinu kandidátů (buď objekty do určitého poloměru od query, nebo top-k objektů od query a ke každému z nich ještě top-r; u metrického prostoru to není potřeba, oni řeší i ne-metriky). Tuto množinu si reprezentuje jako graf – objekty jsou uzly, mezi uzly hrany s ohodnocením podle vzdálenosti (ne přímo vzdálenost, ale funkce vzdálenosti, používají ).
Potom použijí grafový algoritmus na rozdělení do clusterů (normalized Cut method, variace na minimální řez). Algoritmus dělí graf rekurzivně na dvě části, dokud není dosažen buď maximální povolený počet clusterů, nebo je překročena nějaká hranice pro velikost řezu.

The query image and neighboring target images, which are selected according to a similarity measure, are clustered by an unsupervised learning method and returned to the user. In this way, relations among retrieved images are taken into consideration through clustering and may provide for the users semantic relevant clues as to where to navigate.
Uses a graph-theoretic algorithm to generate clusters. In particular, a set of images is represented as a weighted undirected graph: nodes correspond to images; an edge connects two nodes; and the weight on an edge is related to the similarity between the two nodes (or images). Graph-based representation and clustering sidestep the restriction of a metric space.
Graph partitioning method attempts to organize nodes into groups so that the within-group similarity is high, and/or the between-groups similarity is low. Representative image of a cluster is the image that is most similar to all images in the cluster.

Jia, Wang, Zhang, Hua. Finding Image Exemplars Using Fast Sparse Affinity Propagation. Proceeding of the 16th ACM international conference on Multimedia, 2008.

(plný text článku)

Postprocessing výsledků, snaha najít reprezentativní obrázky a ostatní shlukovat do clusterů, které jsou charakterizovány reprezentativními objekty. Používají Sparse Affinity Propagation – opět se pracuje s grafem, ohodnocené hrany, maximalizace podobnosti v clusteru. Zajímavé je, že pro urychlení nepracují se všemi hranami, ale některé vynechají (proto Sparse). Hrana mezi x a y se do grafu zanese jen tehdy, pokud y patří mezi K nejbližších sousedů x. (K se volí mnohem menší, než je počet uzlů.)

Li, Dai, Xu, Er. Multilabel Neighborhood Propagation for Region-Based Image Retrieval. IEEE Transactions on Image Processing, 2007.

(plný text článku)

Zabývají se modelováním množiny obrázků pomocí grafu. Ohodnocení hran na základě lokálních podobností, využívají Earth mover distance a lineární programování. Každý obrázek může mít několik labels, řeší se šíření informací grafem.

Arampatzis, Zagoris, Chatzichristofos. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases. ECIR 2011

(plný text článku)

Overview různých přístupů k rankování, valná většina používá fixní velikost initial result. Oni navrhují jiný přístup - dynamicky měnit počet objektů podle nějakých kritérií, zejména pravděpodobnosti, že objekt bude relevantní. Není moc jasné, jak to určují - odkazy na jiné články. Experimenty na text search a visual rank, vyšlo jim, že dynamické prahování initial result pomáhá, uvádějí optimální nastavení parametrů.

Ambai, Yoshida. Multiclass VisualRank: Image Ranking Method in Clustered Subsets Based on Visual Features. SIGIR 09

(plný text článku)

Rozšíření VisualRanku, který často vrací příliš podobné výsledky. Multiclass VR rozdělí výsledky dotazu do několika kategorií podle nějakých vzorů ve vizuálních deskriptorech, v rámci kategorií se pak počítá ranking. Navrženo jako post-filtering existujích obrázkových vyhledávání. Opět se používají SIFTy a PageRank, přidané je dělení do clusterů. Z VR se bere podobnostní funkce, ale upraví se tak, aby se příliš vysoké hodnoty snížily (jinak by kazily clusterování). Clustery se určují z grafu podobností pomocí normalizovaných řezů ("normalized cuts"). Při počítání VR se pak uvažují jen hrany uvnitř clusterů.

Jing, Baluja. VisualRank: Applying PageRank to Large-Scale Image Search. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, November 2008

(plný text článku)

Detailní zdůvodnění, proč Google postupuje cestou nejdřív textové vyhledávání, na výsledcích similarity ranking. Jednak je textové vyhledávání často používané a dobře fungující, někdy také není jiný způsob, jak zadat dotaz, a konečně získá se tak různorodá množina možných výsledků, které mohou být vstupem čistě CBIR algoritmu. Dobré porovnání s jinými přístupy. Vysvětlení principu Eigenvector ("charakteristický vektor", pomocí kterého se počítá VisualRank. Podobnost určují pomocí SIFTů, obecně jsou pro lokální deskriptory.
Dále autoři ukazují, jak použití VisualRank vylepší výsledek v případě a) "queries with homogeneous visual concepts" a b) "queries with heterogeneous visual concepts". Také stručně zmiňují problematiku možné manipulace výsledků (duplikování obrázků na stránkách).
Efektivní implementace pro web-scale search: textové vyhledávání, nad výsledkem (prvních 1000 obrázků) VisualRank. Takový přístup je query dependent - ten samý obrázek může mít v jiné výsledkové množině jiné hodnocení. Zmíněny přístupy k rychlému porovnávání lokálních deskriptorů (+ odkazy na literaturu), používají LSH (aproximace jim stačí) - základní vzorec pro vektory (a.V + b)/W, pro větší přesnost zřetězí výsledky několika hashování a používají víc tabulek. Za stejné se považují deskriptory, které mají stejný hash v alespoň C tabulkách (používají 40 tabulek, C = 3). Pak se provázané deskriptory seřadí podle obrázků, ke kterým patří, použijí se nějaké další úpravy a získá se výsledná podobnost obrázků.
Počítání podobnosti je výpočetně náročné, takže to nedělají online, ale předpočítávají si výsledky pro časté dotazy! VisualRank se také využívá jen tehdy, kdy je graf podobností dostatečně hustý.
Experimenty: a) 10 výsledků VisualRank + 10 výsledků Google search, odstranit duplikáty, nechat uživatele označit špatné. Bez ranking, aby to uživatel neřešil. Potom se vyhodnocoval počet nerelevantních obrázků mezi top-10, top-5 a top-3, které by vrátilo jedno a druhé vyhledávání. b) "click measurement": předpokládá se, že uživatelé klikají na obrázky, které je zajímají. Tohle jde ale vyhodnocovat jen díky tomu, že Google má statistiky klikání.

Jing, Baluja. PageRank for Product Image Search. WWW 2008, Beijing, China.

(plný text článku)

Vychází se z Google Image search, který vrátí počáteční množinu výsledků. Na té se pak počítá Image Rank, a to tak, že mezi obrázky se vytváří virtuální „linky“ na základě jejich podobnosti. Podobnost určují pomocí SIFTů. Image Rank se počítá jako Page Rank (pomocí Eigenvector), tj. hodnocení obrázků roste tím, že ne ně odkazují (tedy jsou podobné) jiné obrázky, pokud možno také dobře hodnocené.
V tomto článku se nevažuje shlukování výsledků, ale je to zmíněno jako potřebné pro future work. Nic novějšího o Image Rank jsem nenašla.
Při testování používají zajímavou metodu - označování výsledků, které nejsou relevantní.

By treating images as web documents and their similarities as probabilistic visual hyperlinks, we estimate the likelihood of images visited by a user traversing through these visual-hyperlinks. Those with more estimated “visits” will be ranked higher than other.
Unlike the web, where related documented are connected by manually defined hyperlinks, we compute visual-hyperlinks explicitly as a function of the visual similarities among images. Since the graph structure will uniquely determine the ranking of the images, our approach offers a layer of abstraction from the set of features used to compute the similarity of the image. Similarity can be customized for the types and distributions of images expected.
It is computationally infeasible to generate the similarity graph S for the billions of images that are indexed by commercial search engines. One method to reduce the computational cost is to precluster web images based using metadata such as text, anchor text, similarity or connectivity of the web pages on which they were found, etc. For example, images associated with “Paris”, “Eiffel Tower”, “Arc de Triomphe” are more likely to share similar visual features than random images. To make the similarity computations more tractable, a different rank can be computed for each group of such images.
A practical method to obtain the initial set of candidates is to rely on the existing commercial search engine for the initial grouping of semantically similar images. For example, given the query “Eiffel Tower” we can extract the top-N reults returned, create the graph of visual similarity on the N images, and compute the image rank only on this subset. In this instantiation, the approach is query dependent.
Minimizing Irrelevant Images: This study is designed to study a conservative version of “relevancy” of our ranking results. We asked the user: “Which of the image(s) are the least relevant to the query?” For this experiment, more than 150 volunteer participants were chosen, and were asked this question on a set of randomly chosen 50 queries selected from the top-query set. There was no requirement on the number of images that they marked. There are several interesting points to note about this study. First, it does not ask the user to simply mark relevant images; the reason for this is that we wanted to avoid a heavy bias to a user’s own personal expectation (i.e. when querying “Apple” did they want the fruit or the computer?). Second, we did not ask the users to compare two sets; since, as mentioned earlier, this is an arduous task. Instead, the user was asked to examine each image individually. Third, the user was given no indication of ranking; thereby alleviating the burden of analyzing image ordering.

Park, Baek, Lee: A Ranking Algorithm Using Dynamic Clustering for Content-Based Image Retrieval, 2002.

(plný text článku)

Výsledky vyhledávání se shlukují do clusterů, provede se analýza podobnosti clusteru a query objectu, podle toho se přehodnotí uspořádání výsledků. Uvažuje se vektorový model. Článek se dále zabývá vlivem hodnot několika parametrů clusterování na kvalitu výsledku.
Pro vyhodnocení experimentů používají ANMMMR.

In conventional CBIR systems, it is often observed that visually dissimilar images to the query image are located at high ranking. To remedy this problem, we utilize similarity relationship of retrieved results via dynamic clustering. In the first step of our method, images are retrieved using visual feature such as color histogram, etc. Next, the retrieved images are analyzed using a HACM (Hierarchical Agglomerative Clustering Method) and the ranking of results is adjusted according to distance from a cluster representative to a query.
We use Ward’s method for constructing dendrogram based on stored similarity matrix and Lance-Williams update formula. In Ward’s method, images join the cluster that minimizes the increase in the total within-group error sum of squares. It relatively makes an un-biased clustering tree. After dynamic clustering using retrieved results, we modify the distance value ofthe results according to a equation:
D'(I,I')= αD(I,I')+ βDc(I,I')
where Dc(I,I') is distance from the query image to the cluster I'. According to clustering hypothesis, more similar images are divided into same cluster and irrelevant images are divided into different cluster. Distance value is adjusted, as images in same cluster with query image have small distance value, otherwise images have large distance value.
We use a centroid of cluster as a cluster representative for calculating the distance from a cluser to a query.

Park, Baek, Lee: Majority Based Ranking Approach in Web Image Retrieval, CIVR 2003.

(plný text článku)

Vylepšování výsledků získaných textovým hledáním v obrázcích. Říká se, že webové vyhledávání v obrázcích je jiné než běžné CBIR, protože na webu jsou s obrázky spojené textové anotace. Výsledek se tedy získá pomocí textového hledání a pak se přerankuje podle vizuální podobnosti. Navrhují 4 metody rankování založené na clusterování prvních N výsledků textového hledání (clusterování samozřejmě na základě vizuální podobnosti):
1) majority-first method: clustery se uspořádají podle velikosti (největší je první), v rámci clusteru se objekty pořádají podle vzdálenosti od centroidu (průměru)
2) centroid-of-all method: ze všech objektů se spočítá centroid, objekty se řadí podle vzdálenosti k němu
3) centroid-of-top-k method: centroid se počítá jen z prvních k prvků, pak se použije jako query object pro rankování
4) centroid-of-largest-cluster method: v tomto případě se narozdíl od 2) a 3) ignoruje původní uspořádání.
Pokusy nad webovými vyhledávači Google a Naver autoři zjišťují, že lépe fungují metody 1) a 4).

Brisaboa, Pedreira, Seco, Solar, Uribe: Clustering-based similarity search in metric spaces with sparse spatial centers. SOFSEM 2008.

(plný text článku)

Nieves R. Brisaboa, Oscar Pedreira, Diego Seco, Roberto Solar, and Roberto Uribe. Clustering-based similarity search in metric spaces with sparse spatial centers. In In 34th International Conference on Current Trends in Theory and Practice of Com- puter Science, SOFSEM, pages 186–197, Slovakia, January 19-25, 2008.

Článek o výbírání center clusterů pro indexování metrických prostorů. Obsahuje přehledné shrnutí metrických prostorů a přístupů k indexování.

Deselaers, Keysers, Ney: Features for image retrieval: an experimental comparison. Information Retrieval, 2008.

plný text článku, bibtex

Kampel, Zaharieva: Recognizing Ancient Coins Based on Local Features. ISVC 2008.

(plný text článku)

Využití lokálních deskriptorů pro identifikaci mincí. Přehled základních lokálních deskriptorů, odkazy na literaturu, srovnání výsledků různých deskriptorů pro mince (vede SIFT, problémem je rychlost).

Wu, Ke, Isard, Sun: Bundling Features for Large Scale Partial-Duplicate Web Image Search. 2009.

(plný text článku)

Vyhledávání pomocí lokálních deskriptorů, využívají shlukování několika SIFTů do tzv. "bundles", což je víc diskriminativní a zachovává některé geometrické závislosti mezi SIFTy. Popisuje se porovnávání shluků, které používá jednak počet stejných SIFTů, jednak jejich pozice (zhruba, na základě jejich pořadí).

Amato, Mainetto, Savino: A Query Language for Similarity-based Retrievla of Multimedia Data. Advances in Databases and Information Systems 1997.

plný text článku, bibtex

Uvažují jednak raw data content, jednak concept level – objekty na obrázku, které se nějak rozpoznají (realizovatelný příklad – koncept person, v obrázku je/není přítomen, což určí nějaký klasifikátor). Navrhují celkem formálně obecnou strukturu dotazu, založenou na select-from-where. Mají operátor sim pro podobnost, ale v dotazu se nedá určit podobnostní funkce. Mají konstanty pro vzory (red, circle).

Adali, Bonatti, Sapino, Subrahmanian: A multi-similarity algebra. SIGMOD 98.

plný text článku, bibtex

Universal multimedia database – various index structures, algorithms. Authors propose universal abstraction that enables to model any (combination of any) search methods. Multi-similarity algebra MSA, relational multi-similarity algebra rMSA. The first defines operations over similarity measures, rMSA encapsulates this into relational algebra (new operators RSO = Relational Similarity Operator, and Best). Very formal definitions. Furthermore, equivalences in MSA and rMSA are studied to allow query reformulation and optimization. Cost model.

Town, Sinclair: Ontological Query Language for Content Based Image Retrieval. ANSS’03.

(plný text článku)

Jejich cílem je získat snadno použitelný vyhledávací dotaz používající přirozené textové zadání. Vyhledávání v textových popisech + trochu CBIR (barvy, tvary v daných regionech). Formálně popsaná bezkontextová gramatika pro popis toho, jak má vypadat výsledek (slova, obsah obrázku, podmínky). Pro každý obrázek se na základě dotazu počítá pravděpodobnost, že je relevantní.

some sky which is close to trees in upper corner, size at least 20%

[indoors] or [outdoors] & [people]

[some green or vividly coloured vegetation in the centre] which is of similar size as [clouds or blue sky at the top]

Döller, Kosch, Wolf, Gruhne: Towards an MPEG-7 Query Language. SITIS 2006

(plný text článku)

Přehledný popis požadavků na QL - požadavky na systém, na jazyk.

Požadavky na systém (framework)

možnost kombinování dotazů nad více databázemi
získání informací o databázi (jaké deskriptory, ...)

Požadavky na jazyk

Typ dotazu (Query by description, Query by example, Multimodal query, Spatial-temporal query)
Specifikace výsledku (struktura, obsah)
Dotazování s využitím preferencí uživatele a historie
Specifikace velikosti a reprezentace výsledku

Dobrý, snadno srozumitelný popis základních XML dotazovacích jazyků, proč nejsou vhodné. Jejich návrh jazyka a implementace: implementace musí podporovat sessions, které si udržují info o uživateli a výsledcích minulých dotazů pro možnost relevance feedback a vyhledávání v minulých výsledcích. Popis několika částí systému, jak spolu mají komunikovat, něco jako use cases. Jazyk zde nepopisují.

Döller, Tous, Gruhne, Yoon, Sano, Burnett: The MPEG Query Format: Unifying Access to Multimedia Retrieval Systems. IEEE MultiMedia, 2008.

plný text článku, bibtex

Dobře napsaný shrnující článek o MPEG7 Query Format.

Barioni, Razente, Traina, Traina: Seamlessly integrating similarity queries in SQL. Software - Practice and Experience, 2009

plný text článku, bibtex

Discussion of the need for integrating similarity queries into SQL, why user-defined functions are not sufficient (optimization issues). Fundamental concepts: two types of objects - "particulate" (GPS, color descriptor etc; can be compared by similarity using a distance function defined over their constituting attributes) and "monolitic" (binary image; require extracting predefined features that are used in place of each object to define the distance function). The consider unary predicates (Range query, KNN query, multi-object queries) and binary predicates (Range join, k-nearest-neighbors join, k-closest-neighbors join). Their similarity retrieval engine is called SIREN. Contains a command interpreter that identifies similarity-based constructs and forwards them to special processing, other parts of the query evaluated by standard RDMS. Extensions of the SELECT command: NEAR, STOP AFTER [k], RANGE [r]. Their syntax is tailored to the query types they consider. Further on, they describe how a new metric is defined and attached to an attribute. User-defined parameters of searching not supported (weights, approximation), late fusion not supported.

SELECT * FROM Employee
WHERE HomeCoordinate NEAR (
SELECT HomeLat AS Latitude, HomeLongit AS Longitude
FROM Employee WHERE name=‘John Doe’) STOP AFTER 5;

SELECT Name, FrontalMugShot FROM Employee
WHERE HomeCoordinate NEAR ALL (
SELECT HomeCoordinate FROM Employee
WHERE name like ‘% Doe’) STOP AFTER 3;

CREATE METRIC CostBenefit USING LP2 FOR PARTICULATE
  (hp FLOAT 5.0, mpg FLOAT, sec FLOAT 10.0);

CREATE TABLE Cars (
CarName CHAR(35),
Horsepower FLOAT,
Consumption FLOAT,
Acceleration FLOAT,
Origin CHAR(8),
Car PARTICULATE,
METRIC REFERENCES (Horsepower AS hp,
                   Consumption AS mpg,
                   Acceleration AS sec)
       USING (CostBenefit DEFAULT));

Mamou, Mass, Shmueli-Scheuer, Sznajder: A Query Language for Multimedia Content. Sigir’07.

(plný text článku)

V related work popis existujících pokusů o QL, zajímavé by mohlo být MMDOC-QL a VexQuery. Autoři se snaží o vytvoření QL pro MPEG7 standard podporující Query-by-example. Protože MPEG7 je reprezentováno XML, také jejich jazyk používá XML. Vychází z jazyka XML Fragments, který rozšiřuje. Přidávají možnost ptát se na jednotlivé deskriptory, zadat jejich parametry a váhy. Jazyk podporuje k-nn a range query.
V podstatě jde jen o podchycení toho, na co všechno se může uživatel chtít ptát. Neuvažují multi-point query, uvažují combined. Umožňují kombinaci s textovými popisy, podmínky na místo a čas. Nenavrhují efektivní implementaci.

<title>Twilight</Title> 
<Mpeg7Query weight=”0.5”> 
       <VisualDescriptor type="ScalableColorType" numOfBitplanesDiscarded="0" numOfCoeff="64"> 
             <Coeff>11 92 -3 87 … -3 -6</Coeff> 
       </VisualDescriptor> 
 </Mpeg7Query>
 <Mpeg7Query weight=”2”>
         <GeographicPosition> 
                <Point longitude="-34.7" latitude="19.75"/> 
         </GeographicPosition> 
</Mpeg7Query>

Gruhne, Tous, Delgado, Doeller, Kosch: MP7QF: An MPEG-7 Query Format. 2007.

(plný text článku)

Jak to vypadá s QL pro MPEG7. Popis existujících snah. Prezentují committee draft pro QL, tedy v podstatě odsouhlasený základ jazyka. Tři části - vstup (jaký typ hledání, jak má vypadat výstup, jaké podmínky), výstup (hlášení systému, vlastní výsledek), podmínky.

<Mpeg7Query>
    <Input>
        <OutputDescription outputNameSpace="urn:mpeg:mpeg7:schema:2004">
            <Field typeName="CreationType">/Title</Field>
        </OutputDescription>
        <QueryCondition>
            <Condition xsi:type="QueryByFreeText">
                <FreeText>Ry Cooder</FreeText>
            </Condition>
        </QueryCondition>
    </Input>
</Mpeg7Query>

Guliato, de Melo, Rangayyan, Soares: POSTGRESQL-IE: An Image-handling Extension for PostgreSQL. Journal of Digital Imaging, Vol 22, No 2, 2009.

(plný text článku)

Některé SQL DBMS se snaží podporovat obsahové vyhledávání v obrázcích, mají sadu podporovaných deskriptorů (každý jinou), k nim extraktory. POSTGRESQL-IE nabízí možnost definovat si vlastní deskriptory a extraktory, vlastní kombinace apod. Článek pojat velmi prakticky, ukazuje jejich implementaci (konkrétní používané tabulky, jak se do nich dají ukládat potřebné informace). V podstatě ukazují velmi obecný datový model, do kterého se to dá všechno uložit. Pracují jen s dekriptory, funcemi na jejich porovnávání a příkazy RANGE a KNN.

Pein, Lu, Renz: An Extensible Query Language for Content Based Image Retrieval based on Lucene. 2008.

(plný text článku)

Hybridní dotazovací jazyk pro text a obrázky. XML považují za zbytečně složité, jsou pro textové dotazy založené na textovém dotazovacím jazyku Lucene. Nic moc zajímavého, snad dvouúrovňová struktura, kde ve druhé úrovni jsou "nadstavbové" deskriptory, které se předají k vyhodnocení příslušnému pluginu.

Natsev, Haubold, Tesic, Xie, Yan: Semantic concept-based query expansion and re-ranking for multimedia retrieval. ACM Multimedia 2007.

plný text článku, bibtex

Related work on query expansion is the most valuable part - good categorization, general explanation of principles.
In their application, autors aim at concept-based query expansion for video retrieval. They have set of concepts from the LSCOM-lite lexicon with manually added annotations and links to WordNet, try to find the relevant concepts. Then get relevant images from each concepts, join, rerank. Fusion of results from different search systems: weighted for concepts - e.g. for some concepts, text search is known to perform better.
They use fixed text ontology and fixed visual ontology. Manual rules to map one to another.

Most large-scale multimedia search systems typically rely on text-based search over media metadata such as surrounding html text, anchor text, titles and abstracts. This approach, however, fails when there is no such metadata (e.g., home photos and videos), when the rich link structure of the Web cannot be exploited (e.g., enterprise content and archives), or when the metadata cannot precisely capture the true multimedia content.
On the other extreme, there has been a substantial body of work in the research community on content-based retrieval methods by leveraging the query-by-example and relevance feedback paradigms. These methods do not require any additional metadata, but rely on users to express their queries in terms of query examples with low-level features such as colors, textures and shapes. Finding appropriate examples, however, is not easy, and it is still quite challenging to capture the user intent with just a few examples.
Lexical approaches leverage global language properties, such as synonyms and other linguistic word relationships (e.g., hypernyms). These approaches are typically based on dictionaries or other similar knowledge representation sources such as WordNet [38]. Lexical query expansion approaches can be effective in improving recall but word sense ambiguity can frequently lead to topic drift, where the semantics of the query changes as additional terms are added.
They use LSCOM-lite lexicon and prepare SVM models for each of the concepts is the lexicon. Concepts manually annotated, linked to WordNet synsets. For query image, they evaluate each of the models, get weights for concepts

Eickhoff, Li, de Vries: Exploiting User Comments for Audio-Visual Content Indexing and Retrieval. ECIR 2013

plný text článku, bibtex

They analyze user comments of YouTube videos to retrieve relevant tags. For this type of data, a high level of noise is typical, they suggest how to deal with that using time dimension. Related work on keyword mining techniques. Their main idea - identify "bursts" of informative comments (assumption: many comments in short time span contain more information that other random comments). Also a few comments before each burst are deemed interesting. The burst strategy aims at eliminating unrelated and uninformative comments. The second source of noise are misspellings, abbreviations, chatspeak and foreign language utterances. They use Wikipedia to deal with these: terms that do not have a dedicated article in the English version of Wikipedia are assumed to be noise. Large-scale experiments, relevance judgements via crowdsourcing (Amazon Mechanical Turk), majority vote.

... The promising results achieved by previous work support the feasibility of our goal: Describing content exclusively based on user comments. We will employ statistical language models aided by time series analyses and external web resources such as Wikipedia, to find potential index terms and evaluate their quality in a series of TREC-style experiments.
The previous sections detailed concrete, task-driven performance evaluations of our method. In this section, we will dedicate some room to lessons learned and will discuss several observations that could not be con?rmed to be statistically signi?cant but yet deserve attention as they may become more salient in related applications or domains.

Jegou, Schmid, Harzallah, Verbeek: Accurate image search using the contextual dissimilarity measure. Pattern Analysis and Machine Intelligence, 2010.

(plný text článku)

Článek se zabývá lokálními deskriptory, ale definuje zajímavou míru nepodobnosti, která je založena na myšlence reverse kNN - dobrý výsledek je ten, který má QO také mezi svými blízkými. Počítají vzájemnou podobnost dvou bodů pomocí průměrné vzdálenosti ke k-tému nejbližšímu. Plus to nějak iterativně vylepšují.

Hanbury: A survey of methods for image annotation. Journal of Visual Languages & Computing, 2008.

(plný text článku), bibtex

Types of information that can be associated with images. Free-text descriptions - keywords from controlled vocabularies - classifications based on ontologies. Levels of specificity - whole image, segments. Annotation, categorization, recognition. Some discussion on the use of ontologies, the difficulties of creating a vocabulary for image annotation, discussion of some existing sources (incl. WordNet). Ground truth annotation collection methods. Review of annotated image datasets for computer vision research.

Tousch, Herbin, Audibert: Semantic hierarchies for image annotation: A survey. Pattern Recognition, 2012.

plný text článku, bibtex

Very good survey, wide range of approaches, well structured. Nature and use of semantics in image description - semantic levels, user and his context. Object recognition and controlled vocabularies - insights into psychological studies, links. Semantic image analysis using unstructured vocabularies: classification of approaches (direct, linguistic, compositional, structural, hierarchical compositional, communicating, hierarchical, multilabel). Semantic image analysis using structured vocabularies: classification of approaches (linguistic, compositional, communicating, hierarchical). Quite a lot of info about WordNet use. Brief section on Evaluation. Discussion of strong and weak sides of existing techniques, future research directions: full exploitation of semantic hierarchies, multi-level and multi-faceted image interpretation, evaluation protocols adapted to semantic hierarchy handling.

Unstructured vocabulary
- Pros
  - Generic design: all labels at the same level
  - Clear but limited evaluation metrics
  - Availability of evaluation databases
  - Rather efficient with a limited vocabulary
  - Large volume of studies in computer vision, pattern recognition and statistical learning
- Cons
  - Scalability in vocabulary size not demonstrated
  - No ?exibility in semantic level of description
  - Algorithms are rather ‘‘Black box’’ (i.e. decisions are hard to justify)
  - Noise or uncertainty in annotation not easily handled
  - Mainly general category level vocabulary, not ?ne grained
Structured vocabulary
- Pros
  - Handle large vocabularies
  - Can manage various levels of description or categorization, contextual phenomena, user point of view
  - Can exploit knowledge based representations
  - Can produce explanation of decision
  - Large volume of studies in artificial intelligence, data mining and computational linguistics
- Cons
  - No satisfying standard semantic hierarchy
  - Evaluation metrics not finalized
  - Few learning or evaluation databases available

Zhang, Islam, Lu: A review on automatic image annotation techniques. Pattern Recognition 2012

plný text článku, bibtex

Motivation for artificial image annotation (AIA): searching for images. Authors focus on two aspects of AIA - feature extraction and semantic learning/annotation. Feature extraction section: image segmentation (brief survey of methods), survey of features in use. AIA techniques: single labelling annotation using conventional classification methods (SVM, artificial neural network, decision tree); multi-labelling annotation, uses Bayesian methods; web-based image annotation which uses metadata to annotate images. Rich information about the variants of machine-learning techniques and their use for annotation. Good summary of principles and challenges of search-based annotation (here denoted as 'image annotation incorporating metadata'). Challenges identified: 1) high dimensional feature analysis, 2) building an effective annotation model (integrate low level visual information and high level textual information), 3) annotation and ranking problem - what to do online, what offline (with focus on keyword-based searching), 4) ranking of images within categories, 5) lack of standard vocabulary and taxonomy for annotation ("A hierarchical modelling of image semantics is needed to categorise images properly."), 6) training and evaluation.

"Nowadays, more and more images are available. However, to find a required image for an ordinary user is a challenging task."
"As annotations from text description can be noisy, these annotations need refinement. This is especially needed for web based image annovation, because each image is usually annotated with multiple words which may not be related to each other. In a refinement stage, it preserves the annotations which are strongly related to each other."
"The disadvantage of Jin et al. [64] is that it depends on WordNet and thus, this approach cannot refine the annotations which do not appear in WordNet. Wang et al. [29,32] improve the refinement process by removing the dependency on WordNet. Instead of using WordNet, Wang et al. [29] calculate the similarity between two words as the normalised frequency of images annotated by both words. In [32], they calculate this similarity as the normalised sum of content-similarities between the candidate image and the images annotated by both words. The similarity values are used to find strongly correlated annotation words."
"Another important issue is to define appropriate level of semantics. This is especially important in web image applications because the noise in web texts is common and it is difficult to understand which annotations should be used. For example, for an image with kangaroos, the concept ‘Australia’ is more abstract than the concepts ‘Kangaroo’. Therefore, it is necessary to identify the concepts which have smaller semantic gap (like the concept, ‘Kangaroo’)."
"In terms of semantic learning, there are several approaches including the traditional binary classi?cation methods, multiple labelling methods and metadata methods. Traditional binary classification may sound attractive and straightforward but it overlooks the fact that an image can usually be included into more than one category. Furthermore, it lacks a mechanism of ranking images according to their similarity to the classified categories. Multiple labelling is a more reasonable approach, because it assigns an image to several categories and assigns an image to a category with a con?dence value which assists image ranking. All the above approaches attempt to learn higher level semantics from low level features or visual features alone. It appears natural that metadata should be used to overcome the limitation of visual features if it is available. However, analysing metadata is a complex matter and becomes another issue in image annotation."

Wang: A survey on automatic image annotation and trends of the new age. Procedia Engineering 2011

plný text článku

Strange short survey. Seems rather like an unfinished paper, cannot be found in DBLP. However, there are some useful links to papers. Short discussion of three approaches to annotation: generative models, discriminative models, graph models.

Makadia, Pavlovic, Kumar: A New Baseline for Image Annotation. 10th European Conference on Computer Vision, 2008

plný text článku, bibtex

Automatic image annotation is a known and interesting problem, it has been researched for more than a decade, interesting solutions exist. What is lacking is a comparison of the sophisticated techniques to a very simple solution. Goal of this work is to create a baseline solution. Visual similarity evaluated by global color and texture, two types of aggregation function tried. Having nearest visual neighbors, they apply greedy label transfer algorithm.
Datasets for comparison: Corel, IAPR TC-12, web dataset with 20K images from ESP game

"In this work, we introduce a new baseline technique for image annotation that treats annotation as a retrieval problem. The proposed technique utilizes low-level image features and a simple combination of basic distances to find nearest neighbors of a given image. The keywords are then assigned using a greedy label transfer mechanism. The proposed baseline outperforms the current state-of-the-art methods on two standard and one large Web dataset. We believe that such a baseline measure will provide a strong platform to compare and better understand future annotation techniques."
"Image annotation is a difficult task for two main reasons: First is the well-known pixel-to-predicate or semantic gap problem, which points to the fact that it is hard to extract semantically meaningful entities using just low level image features, e.g. color and texture. Doing explicit recognition of thousands of objects or classes reliably is currently an unsolved problem. The second difficulty arises due to the lack of correspondence between the keywords and image regions in the training data. For each image, one has access to keywords assigned to the entire image and it is not known which regions of the image correspond to these keywords. This makes difficult the direct learning of classifiers by assuming each keyword to be a separate class. Recently, techniques have emerged to circumvent the correspondence problem under a discriminative multiple instance learning paradigm [1] or a generative paradigm [2]."
"We treat image annotation as a process of transferring keywords from nearest neighbors."
"We propose a simple method to transfer n keywords to a query image Iq from the query’s K nearest neighbors in the training set. Let Ii,i =1,...,K be these K nearest neighbors, ordered by increasing distance (i.e., I1 is the most similar image). The number of keywords associated with Ii is denoted by |Ii|. Following are the steps of our greedy label transfer algorithm.
1. Rank the keywords of I1 according to their frequency in the training set.
2. 2. Of the |I1| keywords of I1, transfer the n highest ranking keywords to query Iq. If |I1| <n, proceed to step 3.
3. Rank the keywords of neighbors I2 through IK according to two factors: 1) co- occurrence in the training set with the keywords transferred in step 2, and 2) local frequency (i.e. how often they appear as keywords of images I2 through IK). Select the highest ranking n -|I1| keywords to transfer to Iq."

Belém, Martins, Almeida, Gonçalves: Exploiting Novelty and Diversity in Tag Recommendation. ECIR 2013

plný text článku, bibtex

They focus on tag recommendation from texts. Interesting ideas are the quality measures - apart from relevance, which is typically understood as the most desirable quality, they also consider novelty and diversity (related terms; diversity = novelty of each new term). They implement ranking function ("learning to rank" research field) using genetic programming. Some related work on "learning to rank". They exploit co-occurrence patterns by mining relations among tags assigned to the same object in an object collection. Metrics for novelty and diversity: novelty is assessed as inverted frequency of a given keyword within collection, diversity of a candidate term c with respect to a list C of candidates is estimated as the average semantic distance between c and each other term in C. The dissimilarity between terms t1 and t2 is measured by the relative difference between the sets of objects O1 and O2 in which they appear as tag, i.e., dist(t1,t2)= |O1-O2|/|O1∪O2|.

Tag recommendation strategies aim at suggesting relevant and useful keywords to users, helping them in the task of assigning tags to content. Ultimately, these mechanisms aim at improving the quality not only of the generated tags, by making them more complete and accurate and reducing their noise (e.g., misspellings, unrelated words) but also of the information services that rely on tags as data source.

Xie, He: Picture tags and world knowledge: learning tag relations from visual semantic sources. ACM Multimedia 2013:967-976

plný text článku, slidy, bibtex

Interconnecting information from Flickr (images, user tags), ImageNet and ConceptNet. They build an ImageNet/Flickr collection by filtering ImageNet records so that only Flickr images with tags remain. On top of this collection, they compute statistics of tag use and study relationships between tags and the relationships between tag co-occurence and semantic distance of concepts as measured by ConceptNet. They propose some new measures of semantic relatedness and show how their resource can be used to recommend relevant tags.

Deng, Dong, Socher, Li, Li, Fei-Fei: ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009

plný text článku, bibtex

Koncepce ImageNetu, postup tvorby (text-search pomocí slov ze synsetu + nějaké rozšíření o slova z nadřazených synsetů, verifikace lidmi přes Mechanical Turk), možnosti využití. Porovnání s jinými existujícími databázemi obrázků - dobré pro přehled, co existuje (Caltech, MSRC, PASCAL, TinyImage, ESP dataset, LabelMe, Lotus Hill).

Du, Rau, Huang, Chen: Improving the quality of tags using state transition on progressive image search and recommendation system. SMC 2012

plný text článku, bibtex

Selecting relevant tags from Flickr-like galleries with user tagging. "Waking And Sleeping algorithm", uses number of clicks. Short survey of other tag cleaning methods.

Sun, Bhowmick, Chong: Social image tag recommendation by concept matching. ACM Multimedia 2011

plný text článku, bibtex

Abstract: Tags associated with social images are valuable information source for superior image search and retrieval experiences. In this paper, we propose a novel tag recommendation technique that exploits the user-given tags associated with images. Each candidate tag to be recommended is described by a few tag concepts derived from the collective knowledge embedded in the tag co-occurrence pairs. Each concept, represented by a few tags with high co-occurrences among themselves, is indexed as a textual document. Then user-given tags of an image is represented as a text query and the matching concepts are retrieved from the index. The candidate tags associated with the matching concepts are then recommended. Leverages on the well studied Information Retrieval (IR) techniques, the proposed approach leads to superior tag recommendation accuracy and lower execution time compared to the state-of-the-art.

Lux, Pitman, Marques: Can Global Visual Features Improve Tag Recommendation for Image Annotation? Future Internet 2010

plný text článku, bibtex

Content-aware tag recommendation, rich related work on tag recommendation. Some considerations about tagging habbits o users. Their own idea rather simple: start with some user-provided tags, evaluate a text search, take the result and reorder by visual similarity, take n best images, combine their tags, select the most relevant ones. For visual similarity, they use Color and Edge Directivity Descriptor, which is claimed to be better that the MPEG7 ones.

The goal of the research efforts described in this paper is to improve annotation quality and quantity (i.e., increase the number of meaningful tags assigned to an image) by tag recommendation. This was accomplished by developing a tag recommendation system which suggests tags based both on the context of the image and its visual contents. Rather than utilizing synthetic data or tagging vocabularies, it attempts to leverage the “wisdom of the crowds” [8,9] by utilizing existing images and metadata made available by online photo systems, e.g., Flickr.
Tags may be classified into the following four broad classes:
- Tags relating to direct subject matter of the photo, i.e. describing the objects visible in the photo. For example old town or architecture.
- Tags that provide technical information about the photo itself or the camera, such as whether a flash was used or the particular camera model. For example nikon or i500.
- Tags describing the circumstances surrounding the photo or the emotions invoked by it. For example awesome or speechless.
- Additional organizational tags, often pertaining to the perceived quality of the image or used to identify individual users or groups. For example diamondclassphotographer or flickrelite.

Deselaers, Ferrari: Visual and semantic similarity in ImageNet. Computer Vision and Pattern Recognition (CVPR), 2011.

plný text článku, bibtex

Na obrázcích ImageNetu zkoumají, jak moc je propojená sémantická a vizuální podobnost. Hloubka zanoření synsetu vs. vizuální variabilita objektů. Řeší se nějaké otázky z cognitive science - basic level categories. Navrhují nový vzoreček pro výpočet podobnosti obrázků, který kombinuje vizuální podobnosti a WordNet distance - najdou k obrázkům nejbližší z ImageNetu, podle toho počítají tu sémantickou vzdálenost. Testování na Caltech 101 datasetu.

Heesch, Yavlinsky, Rüger: NNk Networks and Automated Annotation for Browsing Large Image Collections from the World Wide Web. Multimedia 2006.

(plný text článku)

Short paper. Obrázky jsou organizovány v jakési sociální síti, každý objekt zná objekty, které jsou mu nějakým způsobem blízké. Dělají image annotation pomocí naučené pravděpodobnosti, že daný vizuální projev souvisí s daným pojmem. Učí se to na množině popsaných Corel obrázků.

Noah, Ali, Alhadi, Kassim: Going Beyond the Surrounding Text to Semantically Annotate and Search Digital Images. ACIIDS 2010.

(plný text článku)

Pracují s obrázky z novin. Snaží se (asi v rámci dané stránky, na které je obrázek) najít pojmenované entity, na zbytek použít WordNet, ConceptNet, hledání sémantických konceptů.

Nowak, Huiskes: New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010. ImageCLEF 2011

plný text článku, bibtex

Úkolem bylo přiřadit k obrázkům tagy z daného výběru obsahujícího 93 konceptů. Ground truth byla získána napůl ručně (z minulého ročníku) a napůl pomocí Amazon Mechanical Turk. Tři typy přístupů - visual, text, multimodal. Ovšem paper kromě velmi stručného popisu jednotlivých přístupů (kdo používá které deskriptory, víceméně) obsahuje jen číselné statistiky, žádnou rozumnou kategorizaci přístupů ani porovnání jejich vhodnosti.

Rudinac, Larson, Hanjalic: Learning Crowdsourced User Preferences for Visual Summarization of Image Collections. IEEE Transactions on Multimedia, 2013

plný text článku, bibtex

Summarization techniques, in particular, aim at providing a compact representation of a single multimedia data document or data collection. Depending on the type of data and the application domain, summaries may consist of text, images, video segments or a combination of these.
Paper contributions:
- We show how to deploy crowdsourcing to acquire implicit and explicit criteria humans find important when performing visual summarization
- We propose a novel approach for embedding the derived criteria into descriptive features and learning to distinguish between images based on the likelihood of their appearance in the human-created visual summaries.
- In order to match the criteria inferred from human-created summaries, we expand the scope of features used to represent the image collection beyond those that are typically deployed for visual summarization. This expansion encompasses, in particular, features related to the context, aesthetic appeal, sentiment and popularity of an image.
- We provide new insights regarding the applicability of some standard image aesthetic appeal features in a general summarization scenario.
- We demonstrate that the existence of multiple “optimal” visual summaries leads to a low inter-user agreement that makes image set evaluation difficult. We therefore propose an automatic evaluation protocol based on the pyramid approach and motivated by the experience from the text domain that has been documented in the literature by the text summarization and machine translation communities.

Úvod: co jsou visual summaries, že kvalita záleží na uživateli; je tedy potřeba znát uživatele a účel sumarizace, aby mohla být úspěšná. Obvyklé parametry, které se uvažují, jsou relevance, representativeness, a diversity. Podle článku jsou to příliš obecné pojmy, nevíme, jak uživatelé uvažují. Toto se v článku zkoumá prostřednictvím crowdsourcingu - na základě sumarizací, které vytvoří uživatelé, se zjišťuje, co je pro ně důležité. Aplikace uvažovaná v článku - sumarizace geografických oblastí, používají proto geo-visual clustering.
Z pokusů s lidmi jim vyšlo, že uživatelé se často na fotkách neshodují, ale na některých je zas shoda nečekaně velká. Zdá se, že to závisí na aesthetic appeal, affect and sentiment, což jsou věci zkoumané i v jiných souvislostech (computer vision, social network analytics, natural language processing). Vyberou tedy sadu vlastností, do které zahrnou i tyto objekty, pak vezmou výsledky uživatelem poskytnutých sumarizací a trénují SVM, aby se naučili správně rankovat obrázky v clusterech. Clustery se vytvářejí jen pomocí geo-visual information, sentiment, líbivost a relativní kvalita obrázku vzhledem k ostatním obrázkům se berou do úvahy až při vybírání reprezentanta clusteru. Pro clusterování používají affinity propagation clustering, clusteruje se prvně podle geo-location, sekundárně pomocí SIFTů (Bag of words approach).
Image aspect ratio and colorfulness emerge as the most discriminative features.

Metze, Ding, Younessian, Hauptmann: Beyond audio and video retrieval: topic-oriented multimedia summarization. International Journal of Multimedia Information Retrieval, 2013

plný text článku, bibtex

Transformation of videos into text summaries. Visual and audio features analyzed and classified, then templates used to construct sentences. Currently, they focus on 10 event concepts used in TRECVID 2011 Multimedia Event Detection (MED), for which training and test data are available. Detailed description of visual concept ranking (discard too frequent concepts, set probability thresholds). Example output: In this video, we detected 3 or more people meeting indoors. We probably heard the words "house", "half", "let", "happen", and "earn" in the video. We saw people sit down and we saw body parts in the video. We probably saw food, indoor, room and adult in the video. We possivly saw 3 or more people, joy and meeting in the video. But we also detected hand. They call the approach TOMS (topic-oriented multimedia summarization). Possible use: efficient browsing of videos (no need to watch to decide whether relevant), explanation why a given video was selected for a query. Evaluation: experiments with users, who performed various tasks with videos described either by TOMS or manually. Manual descriptions were better, but TOMS results promising. Another set of experiments tried to determine which features were most important for users.
Related work contains links to video and audio summarization papers.

Tan, Song, Liu, Xie: ImageHive: Interactive Content-Aware Image Summarization. IEEE Computer Graphics and Applications, 2012

plný text článku, bibtex

Zaměřují se hlavně na vizuální zpracování koláže. V úvodu poměrně hezky rozebrané cíle: 1) The image summary should be compact and make the most salient regions in the collection visible. 2) The image layout should reflect pairwise content relationships. 3) The summarization tool should let users examine the image collection from multiple perspectives. Related work se týká jen zobrazování koláží. Jejich přístup - nejprve nějak vyberou reprezentanty, např. pomocí k-means, pak ve dvou etapách řeší to rozložení. Cíle - reprezentace vztahů (podobnosti) mezi obrázky, uspořádání obrázků tak, aby nebyly překryté dúležité části. Pro určení podobnosti používají, co je zrovna k dispozici, takže například vizuální deskiptory, ale i WordNet-based distance of tags and descriptions. Z každého clusteru se vybere několik reprezentantů. Pro prvotní rozložení se vybrané obrázky reprezentují uzly v grafu (hrany reprezentují podobnost) a použije se vhodný algoritmus pro reprezentaci grafu v 2D. Článek dále rozebírá technicky složitější a asi zajímavý problém zadefinování tvrdých a měkkých omezení, které zajistí, že důležité části obrázků budou vidět. Evaluation - časy a výsledné koláže pro ImageHive a tři další systémy. Zkoušeli to na obrázcích z IAPR TC-12.

Nice intro: "Finding interesting content in a large image collection can be difficult, whether it’s photos of a friend’s international trip or a presidential archive. One typical solution is to select a smaller set of representative images and compile them into a compact image summary for subsequent exploration, understanding, communication, and storytelling. Organizing images by content similarity can help users quickly locate information of interest. However, users are distracted when the informative regions of images overlap, so summaries should avoid this. Furthermore, to deeply explore an image collection and exchange ideas, users must be able to interact with the summary in real time."

Zheng, Herranz, Jiang: Flexible navigation in smartphones and tablets using scalable storyboards. ICMR 2013

plný text článku, bibtex

Pro přehledné procházení kolekcí videí se často používají storyboards - sumarizace videa pomocí několika vybraných snímků. Autoři dema představují dynamic storyboards - umožňují uživateli dynamicky měnit délku a zaměření storyboard, aniž to stojí moc výpočetních kapacit. Dobře napsaný demo paper, chybí ale odkaz na samotné demo.

Liu, Wang, Sun, Zheng, Tang, Shum: Picture Collage. IEEE Transactions on Multimedia, 2009

plný text článku, bibtex

Asi dobrá a důkladná publikace k vytváření image collages. Cílem je "Picture collage is a kind of visual image summary—to arrange all input images on a given canvas, allowing overlay, to maximize visible visual information." Řeší se tedy, jak obrázky co nejlépe naskládat na danou plochu.

Ekhtiyar, Sheida, Amintoosi: Picture Collage with Genetic Algorithm and Stereo vision. CoRR, 2012

plný text článku, bibtex

Short paper o vytváření koláží pro stereo images. Nic převratného, zajímavé jsou jen screenshoty, které ujasňují rozdíly mezi "grid", "mosaic" a "collage".

Zhang, Huang: Hierarchical Narrative Collage For Digital Photo Album. Computer Graphics Foru, 2012

plný text článku, bibtex

Clusterují obrázky podle místa (to odhadují podle vizuální podobnost) a času + identifikují lidi. Pak to prezentují ve struktuře "character - setting - plot", přičemž uživatel může volit úrovně detailnosti. Hezký článek, technické detaily jsem nestudovala. Výsledky mi nicméně na první pohled nijak skvělé nepřipadají (nezdá se mi, že by zvolená struktura byla moc pochopitelná).

Laparra, Rigau, Vossen: Mapping WordNet to the Kyoto ontology. LREC 2012

plný text článku, bibtex

Kyoto is an event-mining system that tuilizes a rich knowledge model composed of wordnets and a generic ontology. This paper describes a method of obtaining a full mapping between the English WordNet and the ontology. The mapping was done semi-automatically. Intro to Kyoto ontology, WordNet-to-ontology relations, mapping methodology. The result "provides a powerful basis for semantic processing of full text in any domain".

Suchanek, Kasneci, Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. Journal of Web Semantics, 2008

plný text článku, bibtex

The article presents YAGO - a large ontology with high coverage and precision that has been automatically derived from Wikipedia and WordNet.

Griffiths, Steyvers, Firl: Google and the Mind: Predicting Fluency With PageRank. Psychological Science, 2007

plný text článku

Psychological paper, they analyze whether human mind associates concepts in a manner similar to the Google PageRank idea. The paper nicely explains the PageRank idea in simple bud precise terms.

Anotovaná literatura k PHD studiu

Content-Based Information Retrieval in general

Image Retrieval

Semantic gap

User needs

Existing image search systems

Descriptors

Global

Local

Modalities, similarity measures

Indexing

Hashing, LSH

LSH: Theoretical papers

LSH: Theory + experiments

"LPT" (Locality Preserving Transformations) - LSH principles for general metric spaces

Query expansion

Ranking, combining metric spaces

Information fusion

Combining metric spaces

Some kind of dynamic aggregation

Query results clustering

Query results reordering/ranking

Other

Relevance feedback

Understanding relevance

Basic approaches in general

Relevance feedback in image retrieval

Image annotation

Surveys

Model-based approaches

Search-based approaches

Text-based approaches

Annotation refinement techniques

Domain specific solutions

Manual annotation, ground truth acquisition

Resources

Other

Summarization of image collections

Ontologies

Evaluation

Building ground truth

Performance metrics

Query Language

Other interesting papers and ideas