CLIR are available in most main search engine and it brings a great convenience for people to retrieve documents contains multiple language. The most common methods are that the query is translated into one language if the query contains multiple language. I am wondering how to choose the priority language. For example, if a query contains Japanese and English, the system should translate the Japanese into English or translate English into Japanese?
Moreover, in non-CLIR we use the Boolean query and index the term and find the match index. Why we cannot just simple index several language in different index and when we need to fulfill the query, we just retrieve the docs contains all or some the term in query in both language. Finally, I am wondering the market share for CLIR in IR market in US market. Because I think most often cross language query are query using English and other language. So we just need to focus on combining English with other Language.
From the beginning of this semester, I am wondering the tech for retrieve multimedia materials. After reading the materials, I got some basic ideas. Mate data plays a great position in multimedia search, since compare to multimedia, text are very simple and handful. But at the same time, I believe image recognition is very helpful for matching query text and the materials in the retrieve library. But for the index of multimedia are not as simple as the text, I am wondering how the index ordered, if the index contains pictures or videos.
Information retrieval systems often have to deal with very large amounts of data. They must be able to process many gigabytes or even terabytes of text, and to build and maintain an index for millions of documents. To some extent the techniques discussed in Chapters 5–8 can help us satisfy these requirements, but it is clear that, at some point, sophisticated data structures and clever optimizations alone are not sufficient anymore. A single computer simply does not have the computational power or the storage capabilities required for indexing even a small fraction of the World Wide Web.1
In this chapter we examine various ways of making information retrieval systems scale to very large text collections such as the Web. The first part (Section 14.1) is concerned with parallel query processing, where the search engine’s service rate is increased by having multiple index servers process incoming queries in parallel. It also discusses redundancy and fault tolerance issues in distributed search engines. In the second second part (Section 14.2), we shift our attention to the parallel execution of off-line tasks, such as index construction and statistical analysis of a corpus of text. We explain the basics of MapReduce, a framework designed for massively parallel computations carried out on large amounts of data.
No comments:
Post a Comment