Monday, November 9, 2015

Lucene tips

1. how to extract query and analyse
eg: "new new new york" -->  "new" and " york"

String queryString="New New New York";
Query query = parser.parse(queryString);
Set<Term> queryTerms = new LinkedHashSet<Term>();

searcher.createNormalizedWeight(query, false).extractTerms(queryTerms);

2. scan input 
Scanner input=new Scanner(System.in);

3. Multifileds
Exposes flex API, merged from flex API of sub-segments. This is useful when you're interacting with an IndexReader implementation that consists of sequential sub-readers (egDirectoryReader or MultiReader).


4. get current path
curDir = System.getProperty("user.dir");

5. set similarity function:


String queryString = "police";
String index = "/Users/chunguo/Downloads/index";
IndexReader reader = DirectoryReader.
open(FSDirectory.open(Paths
.get(index)));
IndexSearcher searcher =
new IndexSearcher(reader);
QueryParser parser = new QueryParser("TEXT", analyzer);
Query query = parser.parse(queryString); System.
out.println("Searching for: " + query.toString("TEXT"));
TopDocs results = searcher.search(query, 1000);
//Print number of hits
int numTotalHits = results.totalHits; System.out.println(numTotalHits + " total matching documents");
//Print retrieved results
ScoreDoc[] hits = results.scoreDocs; for(int i=0;i<hits.length;i++){
Document doc=searcher.doc(hits[i].doc); System.out.println("DOCNO: "+doc.get("DOCNO"));
}





Yo can make Lucene ignore the special characters by sanitizing the query with something like
query = QueryParser.Escape(query)

If you do not want your users to ever use advanced syntax in their queries, you can do this always.



reader.close(); 

No comments:

Post a Comment