Wednesday, February 19, 2014

Week 7: Reading Notes

IIR Chapter 9
    In this chapter the author mainly talks about relevance feedback and query expansion. The same concept may be referred to using different words. This issue, known as synonymy, has an impact on the recall of most
information retrieval systems.in this chapter we discuss ways in which a system can help with query refinement, either fully automatically or with the user in the loop.
    The methods for tackling this problem split into two major classes: global methods and local methods. Global methods are techniques for expanding or reformulating query terms independent of the query and results returned from it, so that changes in the query wording will cause the new query to match other semantically similar terms.
    The idea of relevance feedback (RF) is to involve the user in the retrieval process so as to improve the final result set. The Rocchio Algorithm is the classic algorithm for implementing relevance feedback. It models a way of incorporating relevance feedback information into the vector space model.
    In the next section we more briefly discuss three global methods for expanding a query: by simply aiding the user in doing so, by using a manual thesaurus, and through building a thesaurus automatically.
Improving the effectiveness of information retrieval with local context analysis
    Techniques for automatic query expansion have been extensively studied in information research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective that global techniques in general, existing local techniques are not robust and can seriously hurt retrieved when few of the retrieval documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on co-occurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.
A study of methods for negative relevance feedback.
    Negative relevance feedback is a special case of relevance feedback where we do not have any positive example; this often happens when the topic is difficult and the search results are poor. Although in principle any standard relevance feedback technique can be applied to negative relevance feedback, it may not perform well due to the lack of positive examples. In this paper, we conduct a systematic study of methods for negative relevance feedback. We compare a set of representative negative feedback methods, covering vector-space models and language models, as well as several special heuristics for negative feedback. Evaluating negative feedback methods requires a test set with sufficient difficult topics, but there are not many naturally difficult topics in the existing test collections. We use two sampling strategies to adapt a test collection with easy topics to evaluate negative feedback. Experiment results on several TREC collections show that language model based negative feedback methods are generally more effective than those based on vector-space models, and using multiple negative models is an effective heuristic for negative feedback. Our results also show that it is feasible to adapt test collections with easy topics for evaluating negative feedback methods through sampling.
Relevance feedback revisited  
    Researchers have found relevance feedback to be effective in interactive information retrieval, although few formal user experiments have been made. In order to run a user experiment on a large document collection, experiments were performed at NIST to complete some of the missing links found in using the probabilistic retrieval model. These experiments, using the Cranfield 1400 collection, showed the importance of query expansion in addition to query reweighting, and showed that adding as few as 20 well-selected terms could result in performance improvements of over 100%. Additionally it was shown that performing multiple iterations of feedback is highly effective.

No comments:

Post a Comment