Thursday, January 30, 2014

Week 4: Reading Notes

IIR Section 1.3,1.4:
    In this section, the author introduces how to use inverted index and basic Boolean retrieval model.He tells the concept of intersection.
    Simple strict Boolean operation can not satisfy information needs. So a proximity operator is introduced. It is a way of specifying that two terms in a query must occur close to each other in a document. Then the author uses an example of Westlaw to illustrate that Boolean searching is still a good way to use in commercial service. Boolean searching is more accurate and can be more powerful.
IIR Chapter 6:
    This chapter is about scoring, term weighting and the vector space model. The author talks about parametric and zone indexes at first. Then he develops the idea of weighting the importance of a
term in a document. The author also tells vector space scoring and how to compute the view into vector.At last he develops several variants of term-weighting.
   Digital documents generally encode, in machine-recognizable form, certain metadata associated with each document. Zones are similar to fields, except the contents of a zone can be arbitrary free text. The author also tells a concept named weighted zone scoring. He also tells some ways of weighting. For example Tf-idf weighting. It is quite complex.

No comments:

Post a Comment