
sarowe at syr
Feb 21, 2012, 6:37 AM
Post #2 of 3
(188 views)
Permalink
|
|
RE: Can I just add ShingleFilter to my nalayzer used for indexing and searching
[In reply to]
|
|
Hi Paul, Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed. All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this. (There is a JIRA issue open for the QueryParser problem: <https://issues.apache.org/jira/browse/LUCENE-2605>). There is a workaround involving PositionFilter described on the Solr wiki: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>. Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser. CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency. In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module. Steve > -----Original Message----- > From: Paul Taylor [mailto:paul_t100 [at] fastmail] > Sent: Tuesday, February 21, 2012 8:07 AM > To: java-user [at] lucene > Subject: Can I just add ShingleFilter to my nalayzer used for indexing and > searching > > Trying out ShingleFIlter and the way it is documented it implys that you > can just add it to your anaylzer and that's it with no side-effects > except a larger index, but I read other implying you have to modify the > way you parse user queries, could anyone confirm/deny. > > Also is there an easy way to use a ShingleFilter only for common stop > words, or is that pointless. > > Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene > For additional commands, e-mail: java-user-help [at] lucene
|