Tuesday, March 25, 2014

Week 11: Reading Notes

Cross-Language Information Retrieval. Annual Review of Information Science and Technology

    This chapter reviews research and practice in cross-language information retrieval (CUR) that seeks to support the process of finding documents written in one natural language (e.g., English or Portuguese) with automated systems that can accept queries expressed in other languages. With the globalization of the economy and the continued internationalization of the Internet, CUR is becoming an increasingly important capability that facilitates the effective exchange of information. For retrospective retrieval, CUR allows users to state questions in their native language and then retrieve documents in any supported language. This can simplify searching by multilingual users and, if translation resources are limited, can allow searchers to allocate those resources to the most promising documents. In selective dissemination applications, CUR allows monolingual users to specify a profile using words from one language and then use that profile to identify promising documents in many languages. Adaptive filtering systems that seek to learn profiles automatically can use CUR to process training documents that may not be in the same language as the documents that later must be selected.Cross-Language Information Retrieval. In Ayse Goker, John Davies, Margaret Graham (eds) 

    CLIR are available in most main search engine and it brings a great convenience for people to retrieve documents contains multiple language. The most common methods are that the query is translated into one language if the query contains multiple language. I am wondering how to choose the priority language. For example, if a query contains Japanese and English, the system should translate the Japanese into English or translate English into Japanese?
    Moreover, in non-CLIR we use the Boolean query and index the term and find the match index. Why we cannot just simple index several language in different index and when we need to fulfill the query, we just retrieve the docs contains all or some the term in query in both language. Finally, I am wondering the market share for CLIR in IR market in US market. Because I think most often cross language query are query using English and other language. So we just need to focus on combining English with other Language.
    From the beginning of this semester, I am wondering the tech for retrieve multimedia materials. After reading the materials, I got some basic ideas. Mate data plays a great position in multimedia search, since compare to multimedia, text are very simple and handful. But at the same time, I believe image recognition is very helpful for matching query text and the materials in the retrieve library. But for the index of multimedia are not as simple as the text, I am wondering how the index ordered, if the index contains pictures or videos.

IES chapter 14 parallel information retrieval
   
    Information retrieval systems often have to deal with very large amounts of data. They must be able to process many gigabytes or even terabytes of text, and to build and maintain an index for millions of documents. To some extent the techniques discussed in Chapters 5–8 can help us satisfy these requirements, but it is clear that, at some point, sophisticated data structures and clever optimizations alone are not sufficient anymore. A single computer simply does not have the computational power or the storage capabilities required for indexing even a small fraction of the World Wide Web.1
    In this chapter we examine various ways of making information retrieval systems scale to very large text collections such as the Web. The first part (Section 14.1) is concerned with parallel query processing, where the search engine’s service rate is increased by having multiple index servers process incoming queries in parallel. It also discusses redundancy and fault tolerance issues in distributed search engines. In the second second part (Section 14.2), we shift our attention to the parallel execution of off-line tasks, such as index construction and statistical analysis of a corpus of text. We explain the basics of MapReduce, a framework designed for massively parallel computations carried out on large amounts of data.

No comments:

Post a Comment