r.ventaglio at gmail
Jul 3, 2012, 12:22 PM
Post #1 of 1
How to extract only highlight spans?
is it possibile to use Lucene Highlighter classes to extract highlight
spans instead of getting the "highlighted" string?
I am using lucene 3.0.3 (and I cannot upgrade version for now).
I have the following snippet of code:
QueryScorer scorer = new QueryScorer(highlightQuery); // already rewritten
Highlighter highlighter = new Highlighter(formatter, scorer);
highlighter.setTextFragmenter(fragmenter); // a NullFragmenter
String bestFragments = highlighter.getBestFragments(tokenStream,
textToHighlight, maxNumFragments, fragmentsSeparator);
This returns the highlighted text (with html spans in it).
Instead, I would like to be able to get only a list of "spans" (e.g. <4,10>
<15,27> ...) that correspond to text positions (same positions read by
tokenStream) to highlight.
I need them because I have to merge lucene query highlight with some custom
highlight info (already expressed as start/end spans) and it is very
difficult to merge the two info if lucene gives me only the highlighted
Is there a way to extract this information using only the user query, the
text to highlight and the token stream of the search field?
Thank you in advance.