Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Can I just add ShingleFilter to my nalayzer used for indexing and searching

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


paul_t100 at fastmail

Feb 21, 2012, 5:06 AM

Post #1 of 3 (295 views)
Permalink
Can I just add ShingleFilter to my nalayzer used for indexing and searching

Trying out ShingleFIlter and the way it is documented it implys that you
can just add it to your anaylzer and that's it with no side-effects
except a larger index, but I read other implying you have to modify the
way you parse user queries, could anyone confirm/deny.

Also is there an easy way to use a ShingleFilter only for common stop
words, or is that pointless.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


sarowe at syr

Feb 21, 2012, 6:37 AM

Post #2 of 3 (287 views)
Permalink
RE: Can I just add ShingleFilter to my nalayzer used for indexing and searching [In reply to]

Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed. All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this. (There is a JIRA issue open for the QueryParser problem: <https://issues.apache.org/jira/browse/LUCENE-2605>).

There is a workaround involving PositionFilter described on the Solr wiki: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>. Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency. In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module.

Steve

> -----Original Message-----
> From: Paul Taylor [mailto:paul_t100 [at] fastmail]
> Sent: Tuesday, February 21, 2012 8:07 AM
> To: java-user [at] lucene
> Subject: Can I just add ShingleFilter to my nalayzer used for indexing and
> searching
>
> Trying out ShingleFIlter and the way it is documented it implys that you
> can just add it to your anaylzer and that's it with no side-effects
> except a larger index, but I read other implying you have to modify the
> way you parse user queries, could anyone confirm/deny.
>
> Also is there an easy way to use a ShingleFilter only for common stop
> words, or is that pointless.
>
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene


paul_t100 at fastmail

Feb 21, 2012, 7:12 AM

Post #3 of 3 (289 views)
Permalink
Re: Can I just add ShingleFilter to my nalayzer used for indexing and searching [In reply to]

On 21/02/2012 14:37, Steven A Rowe wrote:
> Hi Paul,
>
> Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed. All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this. (There is a JIRA issue open for the QueryParser problem:<https://issues.apache.org/jira/browse/LUCENE-2605>).
>
> There is a workaround involving PositionFilter described on the Solr wiki:<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>. Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser.
>
> CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency. In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module.
>
> Steve
>
>
Thanks Steve, as our user interface allows access to the full lucene
query syntax I'll hold off this for now.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.