Zheng Gao's Wonderland: Week 1: Reading Notes

FOA 1.1:

FOA is a cognitive activity. It concerns the meaning instead of the detailed words. In the article, the author divided the process of FOA into three steps. What's more, by using some pictures, the meaning of the three steps are introduced vividly.

Specifically, the three phases are :1. asking a question; 2. constructing an answer; 3. assessing the answer. The phases mean that at first people raise some questions, and then these questions form searching query and are transferred search engine. Then the search engine use web crawler to get some related answers throughout huge document corpus. In the last step, people generate some relevance feedback to show which part is useful, which part is irrelevant and which part is neutral.

IES 1.1-1.2:

The article has a brief introduction to information retrieval. It includes an introduction to different kinds of search engines (which includes web search, desktop and file system, enterprise-level IR system, digital libraries and other specialized IR systems) and also shows us the components of an IR system.

And then, in the latter paragraph, the author shows something related to ranking algorithm. The author also tells two important principle when measuring the IR system which are efficiency and effectiveness. Only when searching in an effective and efficient way, we can get what we want in the shortest time. And then the author raises a principle named PRP(Probability Ranking Principle). In the principle the author tells that if the results after we searched are in the rank of decreasing probability of relevance, the effectiveness is the max. By using some examples and listing some related words, the author tells us how to search in an efficient way. Moreover, the author uses a small paragraph to show a concept of "document" and how documents are updated. I think this concept is more useful in later chapters.

As for me, in this chapter, I am most interested in "web search". I think it is quite amazing that how web search works. In the article the author says people stores a "snapshot" of the web in order to produce accurate result and minimize the reaction time. To update these snapshots, they use a web crawler to download the updates periodically.

MIR 1.1-1.4:

In the first chapter in this book, the author mostly introduce some basic information about the information retrieval. He firstly tells the concept of information retrieval, which is a way to help people get easy access to information of their interests. And he tells early development of IR. At first IR technology was only used in libraries. And with the help of the introduction of the World Wide Web, IR has finally had a place in the center of the stage.

The author also listed some problems of IR. If people type too many words to search, it may confuse the search engine. And then search engine can't generate related key words so that they can't get the answers that people want. Moreover, the author distinguishes information and data. Information allows small difference but data need to be totally accurate.

In the chapter of The IR System, the author tells us the inner construction of the IR system, which is the software architecture. The author uses two pictures to illustrate the different levels in IR system and how to generate index we want. Through different layers of process, we can get the top ranking retrieval answers.

Finally the author introduce the concept "web". He uses Jane Austen's example to illustrate the importance of web that can help people free to publish their ideas and works. And in the last, the author lists five impact of the web that can change search derives. Those impacts are the two sides of the coin. They offer the chance to search engine to develop as well as bring some negative results and confuse human normal life. However, it can help search derives to prosper.

Zheng Gao's Wonderland

Wednesday, January 8, 2014

Week 1: Reading Notes

No comments:

Post a Comment

Zheng Gao