Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] Closed: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jul 2, 2008, 4:56 PM

Post #1 of 1 (130 views)
Permalink
[jira] Closed: (LUCENE-1320) ShingleMatrixFilter, a three dimensional permutating shingle filter

[ https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin closed LUCENE-1320.
-------------------------------

Resolution: Fixed
Fix Version/s: 2.4
Lucene Fields: [New, Patch Available] (was: [Patch Available, New])

Committed

Thanks again for the input Steve!

> ShingleMatrixFilter, a three dimensional permutating shingle filter
> -------------------------------------------------------------------
>
> Key: LUCENE-1320
> URL: https://issues.apache.org/jira/browse/LUCENE-1320
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/analyzers
> Affects Versions: 2.3.2
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Fix For: 2.4
>
> Attachments: LUCENE-1320.txt, LUCENE-1320.txt, LUCENE-1320.txt
>
>
> Backed by a column focused matrix that creates all permutations of shingle tokens in three dimensions. I.e. it handles multi token synonyms.
> Could for instance in some cases be used to replaces 0-slop phrase queries with something speedier.
> {code:java}
> Token[][][]{
> {{hello}, {greetings, and, salutations}},
> {{world}, {earth}, {tellus}}
> }
> {code}
> passes the following test with 2-3 grams:
> {code:java}
> assertNext(ts, "hello_world");
> assertNext(ts, "greetings_and");
> assertNext(ts, "greetings_and_salutations");
> assertNext(ts, "and_salutations");
> assertNext(ts, "and_salutations_world");
> assertNext(ts, "salutations_world");
> assertNext(ts, "hello_earth");
> assertNext(ts, "and_salutations_earth");
> assertNext(ts, "salutations_earth");
> assertNext(ts, "hello_tellus");
> assertNext(ts, "and_salutations_tellus");
> assertNext(ts, "salutations_tellus");
> {code}
> Contains more and less complex tests that demonstrate offsets, posincr, payload boosts calculation and construction of a matrix from a token stream.
> The matrix attempts to hog as little memory as possible by seeking no more than maximumShingleSize columns forward in the stream and clearing up unused resources (columns and unique token sets). Can still be optimized quite a bit though.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.