In this chapter, the author mainly talks about the evaluation in information retrieval. Because information retrieval has developed as a highly empirical discipline, requiring careful and thorough evaluation to demonstrate the superior performance of novel techniques on representative document collections.
In this chapter the author begins with a discussion of measuring the effectiveness of IR systems and the test collections that are most often used for this purpose. He then present the straightforward notion of relevant and nonrelevant documents and the formal evaluation methodology that has been developed for evaluating unranked retrieval results.He then extends these notions and develop further measures for evaluating ranked retrieval results. He then steps back to introduce the notion of user utility, and how it is approximated by the use of document relevance.The author also tells a misundestanding part that user perceptions do not always coincide with system designers’ notions of quality.
At first the author tells a concept named test collection, which contains three different parts.Then he also tells that relevance is assessed relative to an information need, not a query.In the next section, the author gives some standard test collections, which includes Cranfield collection,Text Retrieval Conference (TREC),NII Test Collections for IR Systems (NTCIR),GOV2,CLEF,etc.
In the next section ,the author introduces a concept named contingency table and draw a table to illustrate it.
And he list the equation:accuracy =(tp + tn)/(tp + f p + f n + tn). However, the author says this equation is not that useful. It may lead to rate of false positive. He then claims to use both precision and recall because the advantage of having the two numbers for precision and recall is that one is more important than the other in many circumstances.In the final analysis, the success of an IR system depends on how good it is at satisfying the needs of these idiosyncratic humans.
In the last few sections, the author focuses on the evaluation with a broader perspective.He focuses on user utility,refining a deployed system and system issues.
In this chapter the author begins with a discussion of measuring the effectiveness of IR systems and the test collections that are most often used for this purpose. He then present the straightforward notion of relevant and nonrelevant documents and the formal evaluation methodology that has been developed for evaluating unranked retrieval results.He then extends these notions and develop further measures for evaluating ranked retrieval results. He then steps back to introduce the notion of user utility, and how it is approximated by the use of document relevance.The author also tells a misundestanding part that user perceptions do not always coincide with system designers’ notions of quality.
At first the author tells a concept named test collection, which contains three different parts.Then he also tells that relevance is assessed relative to an information need, not a query.In the next section, the author gives some standard test collections, which includes Cranfield collection,Text Retrieval Conference (TREC),NII Test Collections for IR Systems (NTCIR),GOV2,CLEF,etc.
In the next section ,the author introduces a concept named contingency table and draw a table to illustrate it.
And he list the equation:accuracy =(tp + tn)/(tp + f p + f n + tn). However, the author says this equation is not that useful. It may lead to rate of false positive. He then claims to use both precision and recall because the advantage of having the two numbers for precision and recall is that one is more important than the other in many circumstances.In the final analysis, the success of an IR system depends on how good it is at satisfying the needs of these idiosyncratic humans.
In the last few sections, the author focuses on the evaluation with a broader perspective.He focuses on user utility,refining a deployed system and system issues.
What's the value of TREC: is there a gap to jump or a chasm to bridge?
The TREC Programme has been very successful at generalising. It has shown that essentially simple methods of retrieving documents.The TREC Programme hassought to address variation, but it has done this in a largely ad hoc and unsystematic way.
The author's case is based on the notion of micro variation, and on the distinction between system environment and task context. He uses the evaluation framework ideas to analyse the TREC experimental programme and to support my argument for a new direction for TREC.
A convenient way of summarising the Cran eld evaluation paradigm is in terms of environment variables and system parameters.
In general, TREC participants have sought to adapt, or extend, their existing system apparatus to the new environment variable values.The foregoing is only an informal discussion: more thorough analysis of retrieval contexts is needed for a factor characterisation to be taken seriously as a basis for system development.
The author's case is based on the notion of micro variation, and on the distinction between system environment and task context. He uses the evaluation framework ideas to analyse the TREC experimental programme and to support my argument for a new direction for TREC.
A convenient way of summarising the Cran eld evaluation paradigm is in terms of environment variables and system parameters.
In general, TREC participants have sought to adapt, or extend, their existing system apparatus to the new environment variable values.The foregoing is only an informal discussion: more thorough analysis of retrieval contexts is needed for a factor characterisation to be taken seriously as a basis for system development.
Cumulated gain-based evaluation of IR techniques ACM Transactions on Information Systems
Modern large retrieval environments tend to overwhelm their users by their large output.In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents.
Graded relevance judgments may be used for IR evaluation, first, by extending traditional evaluation measures, such as recall and precision and P–R curves, to use them.
The author demonstrates the use of the proposed measures in a case study testing runs from the TREC-7 ad hoc track with binary and nonbinary relevance judgments.
In modern large database environments, the development and evaluation of IR methods should be based on their ability to retrieve highly relevant documents. This is often desirable from the user viewpoint and presents a not too liberal test for IR techniques.
Modern large retrieval environments tend to overwhelm their users by their large output.In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents.
Graded relevance judgments may be used for IR evaluation, first, by extending traditional evaluation measures, such as recall and precision and P–R curves, to use them.
The author demonstrates the use of the proposed measures in a case study testing runs from the TREC-7 ad hoc track with binary and nonbinary relevance judgments.
In modern large database environments, the development and evaluation of IR methods should be based on their ability to retrieve highly relevant documents. This is often desirable from the user viewpoint and presents a not too liberal test for IR techniques.
No comments:
Post a Comment