Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Search result snippets?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


dfrankow at cs

Jan 4, 2006, 3:04 PM

Post #1 of 8 (9366 views)
Permalink
Search result snippets?

Folks,

I'm a Lucene newbie, and I've been searching awhile today to answer this
question. Googled, read Lucene FAQ, looked at Javadoc for Document and
Hits, etc.

How would you implement "snippets" with Lucene? A snippet is some text
around parts of the text that caused a match, with the search matches in
bold. For example, the supporting text in Google searches. A snippet is
NOT a document summary, which has a FAQ entry and someone answered about
on this email list.

I can store the full text in Lucene, but then when I get the match back,
it seems as if I'd have to reimplement the query matcher if I wanted to
know which things matched.

Thanks in advance for any help.

Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


otis_gospodnetic at yahoo

Jan 4, 2006, 4:41 PM

Post #2 of 8 (9180 views)
Permalink
Re: Search result snippets? [In reply to]

Look for the Highlighter in contrib/ to get this effect:

http://www.lucenebook.com/search?query=highlighter+fragment

Otis

----- Original Message ----
From: Dan Frankowski <dfrankow [at] cs>
To: java-user [at] lucene
Sent: Wed Jan 4 18:04:33 2006
Subject: Search result snippets?

Folks,

I'm a Lucene newbie, and I've been searching awhile today to answer this
question. Googled, read Lucene FAQ, looked at Javadoc for Document and
Hits, etc.

How would you implement "snippets" with Lucene? A snippet is some text
around parts of the text that caused a match, with the search matches in
bold. For example, the supporting text in Google searches. A snippet is
NOT a document summary, which has a FAQ entry and someone answered about
on this email list.

I can store the full text in Lucene, but then when I get the match back,
it seems as if I'd have to reimplement the query matcher if I wanted to
know which things matched.

Thanks in advance for any help.

Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


dfrankow at cs

Jan 4, 2006, 9:13 PM

Post #3 of 8 (9178 views)
Permalink
Re: Search result snippets? [In reply to]

Thanks for this tip! This should be in the FAQ. Is there a way to get it
in there? I can't edit the wiki, I think.

Dan

Otis Gospodnetic wrote:

>Look for the Highlighter in contrib/ to get this effect:
>
>http://www.lucenebook.com/search?query=highlighter+fragment
>
>Otis
>
>----- Original Message ----
>From: Dan Frankowski <dfrankow [at] cs>
>To: java-user [at] lucene
>Sent: Wed Jan 4 18:04:33 2006
>Subject: Search result snippets?
>
>Folks,
>
>I'm a Lucene newbie, and I've been searching awhile today to answer this
>question. Googled, read Lucene FAQ, looked at Javadoc for Document and
>Hits, etc.
>
>How would you implement "snippets" with Lucene? A snippet is some text
>around parts of the text that caused a match, with the search matches in
>bold. For example, the supporting text in Google searches. A snippet is
>NOT a document summary, which has a FAQ entry and someone answered about
>on this email list.
>
>I can store the full text in Lucene, but then when I get the match back,
>it seems as if I'd have to reimplement the query matcher if I wanted to
>know which things matched.
>
>Thanks in advance for any help.
>
>Dan
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>For additional commands, e-mail: java-user-help [at] lucene
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>For additional commands, e-mail: java-user-help [at] lucene
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


hossman_lucene at fucit

Jan 4, 2006, 9:29 PM

Post #4 of 8 (9189 views)
Permalink
Re: Search result snippets? [In reply to]

: Thanks for this tip! This should be in the FAQ. Is there a way to get it
: in there? I can't edit the wiki, I think.

1) People who create a Wiki account can edit the FAQ.

2) the FAQ already includes a refrence to Highlighter, I think it's the
FAQ entry you mentioned...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7755f4ae639ee8eaf7de04f5c27f5

Is there a way to get a text summary of an indexed document with Lucene?

You could store the documents summary in the index and then use the
Highlighter from the sandbox.

...feel free to add to the FAQ whatever information you think would have
made the answer more useful to you when you first found it.


: >How would you implement "snippets" with Lucene? A snippet is some text
: >around parts of the text that caused a match, with the search matches in
: >bold. For example, the supporting text in Google searches. A snippet is
: >NOT a document summary, which has a FAQ entry and someone answered about
: >on this email list.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


dfrankow at cs

Jan 6, 2006, 2:59 PM

Post #5 of 8 (9179 views)
Permalink
Re: Search result snippets? [In reply to]

Chris,

Thanks, I have signed up for the wiki and will put in the answer when I
fully understand it.

Otis,

So, I downloaded the Highlighter code with

svn checkout
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/

Then I tried to compile it with lucene.jar, version 1.4.3. I get
compile-time errors (attached at the bottom of this message) for missing
classes. One of them is org.apache.lucene.index.TermVectorOffsetInfo.
For example,
% jar tvf lib/lucene.jar | grep TermVectorOffset
<nothing ...>
% jar xvf ../work/jspwiki/JSPWiki/lib/lucene.jar META-INF/MANIFEST.MF
inflated: META-INF/MANIFEST.MF
dfrankow [at] gibso (~/windows/tmp) % cat META-INF/MANIFEST.MF
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.6.1
Created-By: Apache Jakarta

Name: org/apache/lucene
Specification-Title: Lucene Search Engine
Specification-Version: 1.4.3
Specification-Vendor: Lucene
Implementation-Title: org.apache.lucene
Implementation-Version: build 2004-11-29 15:13:05
Implementation-Vemdpr: Lucene

I looked for TermVectorOffsetInfo in jar version 1.4.3, 1.4, 1.3, and
1.2, and could not find it.

Someone asked about this on March 11
(http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3C20050311062700.46656.qmail [at] web31110%3E).
You said to check out the trunk of Lucene, but I have to work with
Lucene 1.4.3 in an existing project. I can't work with a version from SVN.

Suggestions? Is there a version of Highlighter that works with 1.4.3? If
it compiles for you, where do you get TermVectorOffsetInfo, and is that
in 1.4.3?

Thanks in advance.

Dan

Chris Hostetter wrote:

>: Thanks for this tip! This should be in the FAQ. Is there a way to get it
>: in there? I can't edit the wiki, I think.
>
>1) People who create a Wiki account can edit the FAQ.
>
>2) the FAQ already includes a refrence to Highlighter, I think it's the
>FAQ entry you mentioned...
>
>http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7755f4ae639ee8eaf7de04f5c27f5
>
> Is there a way to get a text summary of an indexed document with Lucene?
>
> You could store the documents summary in the index and then use the
> Highlighter from the sandbox.
>
>...feel free to add to the FAQ whatever information you think would have
>made the answer more useful to you when you first found it.
>
>
>: >How would you implement "snippets" with Lucene? A snippet is some text
>: >around parts of the text that caused a match, with the search matches in
>: >bold. For example, the supporting text in Google searches. A snippet is
>: >NOT a document summary, which has a FAQ entry and someone answered about
>: >on this email list.
>
>
>-Hoss
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>For additional commands, e-mail: java-user-help [at] lucene
>
>

====

[javac]
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:19:
cannot find symbol
[javac] symbol : class TermVectorOffsetInfo
[javac] location: package org.apache.lucene.index
[javac] import org.apache.lucene.index.TermVectorOffsetInfo;
[javac] ^
[javac]
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
cannot find symbol
[javac] symbol : class TermVectorOffsetInfo
[javac] location: class org.apache.lucene.search.highlight.TokenSources
[javac] TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
[javac] ^
[javac]
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
cannot find symbol
[javac] symbol : method getOffsets(int)
[javac] location: interface org.apache.lucene.index.TermPositionVector
[javac] TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
[javac] ^
[javac]
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:145:
internal error; cannot instantiate Token(java.lang.String,int,int) at
org.apache.lucene.analysis.Token to ()
[javac] unsortedTokens.add(new Token(terms[t],
[javac] ^
[javac]
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:160:
internal error; cannot instantiate Token(java.lang.String,int,int) at
org.apache.lucene.analysis.Token to ()
[javac] tokensInOriginalOrder[pos[tp]]=new
Token(terms[t],
[javac] ^
[javac] 5 errors

BUILD FAILED
.../JSPWiki/build.xml:216: Compile failed; see the compiler error output
for details.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


koji.sekiguchi at m4

Jan 6, 2006, 3:24 PM

Post #6 of 8 (9185 views)
Permalink
RE: Search result snippets? [In reply to]

Hi Dan,

I've experienced same error you are facing. Check out:

http://www.gossamer-threads.com/lists/lucene/java-user/28554?search_string=T
ermVectorOffsetInfo;#28554

Hope this helps.

Koji


> -----Original Message-----
> From: Dan Frankowski [mailto:dfrankow [at] cs]
> Sent: Saturday, January 07, 2006 8:00 AM
> To: java-user [at] lucene
> Subject: Re: Search result snippets?
>
>
> Chris,
>
> Thanks, I have signed up for the wiki and will put in the answer when I
> fully understand it.
>
> Otis,
>
> So, I downloaded the Highlighter code with
>
> svn checkout
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highligh
> ter/src/java/org/apache/lucene/search/highlight/
>
> Then I tried to compile it with lucene.jar, version 1.4.3. I get
> compile-time errors (attached at the bottom of this message) for missing
> classes. One of them is org.apache.lucene.index.TermVectorOffsetInfo.
> For example,
> % jar tvf lib/lucene.jar | grep TermVectorOffset
> <nothing ...>
> % jar xvf ../work/jspwiki/JSPWiki/lib/lucene.jar META-INF/MANIFEST.MF
> inflated: META-INF/MANIFEST.MF
> dfrankow [at] gibso (~/windows/tmp) % cat META-INF/MANIFEST.MF
> Manifest-Version: 1.0
> Ant-Version: Apache Ant 1.6.1
> Created-By: Apache Jakarta
>
> Name: org/apache/lucene
> Specification-Title: Lucene Search Engine
> Specification-Version: 1.4.3
> Specification-Vendor: Lucene
> Implementation-Title: org.apache.lucene
> Implementation-Version: build 2004-11-29 15:13:05
> Implementation-Vemdpr: Lucene
>
> I looked for TermVectorOffsetInfo in jar version 1.4.3, 1.4, 1.3, and
> 1.2, and could not find it.
>
> Someone asked about this on March 11
> (http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.
> mbox/%3C20050311062700.46656.qmail [at] web31110%3E).
> You said to check out the trunk of Lucene, but I have to work with
> Lucene 1.4.3 in an existing project. I can't work with a version from SVN.
>
> Suggestions? Is there a version of Highlighter that works with 1.4.3? If
> it compiles for you, where do you get TermVectorOffsetInfo, and is that
> in 1.4.3?
>
> Thanks in advance.
>
> Dan
>
> Chris Hostetter wrote:
>
> >: Thanks for this tip! This should be in the FAQ. Is there a way
> to get it
> >: in there? I can't edit the wiki, I think.
> >
> >1) People who create a Wiki account can edit the FAQ.
> >
> >2) the FAQ already includes a refrence to Highlighter, I think it's the
> >FAQ entry you mentioned...
> >
> >http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7
755f4ae639ee8eaf7de04f5c27f5
> >
> > Is there a way to get a text summary of an indexed document
> with Lucene?
> >
> > You could store the documents summary in the index and then use the
> > Highlighter from the sandbox.
> >
> >...feel free to add to the FAQ whatever information you think would have
> >made the answer more useful to you when you first found it.
> >
> >
> >: >How would you implement "snippets" with Lucene? A snippet is some text
> >: >around parts of the text that caused a match, with the search
> matches in
> >: >bold. For example, the supporting text in Google searches. A
> snippet is
> >: >NOT a document summary, which has a FAQ entry and someone
> answered about
> >: >on this email list.
> >
> >
> >-Hoss
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
> ====
>
> [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:19:
> cannot find symbol
> [javac] symbol : class TermVectorOffsetInfo
> [javac] location: package org.apache.lucene.index
> [javac] import org.apache.lucene.index.TermVectorOffsetInfo;
> [javac] ^
> [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
> cannot find symbol
> [javac] symbol : class TermVectorOffsetInfo
> [javac] location: class
> org.apache.lucene.search.highlight.TokenSources
> [javac] TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
> [javac] ^
> [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
> cannot find symbol
> [javac] symbol : method getOffsets(int)
> [javac] location: interface org.apache.lucene.index.TermPositionVector
> [javac] TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
> [javac] ^
> [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:145:
> internal error; cannot instantiate Token(java.lang.String,int,int) at
> org.apache.lucene.analysis.Token to ()
> [javac] unsortedTokens.add(new Token(terms[t],
> [javac] ^
> [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:160:
> internal error; cannot instantiate Token(java.lang.String,int,int) at
> org.apache.lucene.analysis.Token to ()
> [javac] tokensInOriginalOrder[pos[tp]]=new
> Token(terms[t],
> [javac] ^
> [javac] 5 errors
>
> BUILD FAILED
> .../JSPWiki/build.xml:216: Compile failed; see the compiler error output
> for details.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


otis_gospodnetic at yahoo

Jan 6, 2006, 10:40 PM

Post #7 of 8 (9191 views)
Permalink
Re: Search result snippets? [In reply to]

Dan,

Grab the highlighter Jar that comes with Lucene in Action code -
http://www.lucenebook.com/ . That will work with Lucene 1.4.3.

Otis

--- Dan Frankowski <dfrankow [at] cs> wrote:

> Chris,
>
> Thanks, I have signed up for the wiki and will put in the answer when
> I
> fully understand it.
>
> Otis,
>
> So, I downloaded the Highlighter code with
>
> svn checkout
>
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/
>
> Then I tried to compile it with lucene.jar, version 1.4.3. I get
> compile-time errors (attached at the bottom of this message) for
> missing
> classes. One of them is org.apache.lucene.index.TermVectorOffsetInfo.
>
> For example,
> % jar tvf lib/lucene.jar | grep TermVectorOffset
> <nothing ...>
> % jar xvf ../work/jspwiki/JSPWiki/lib/lucene.jar
> META-INF/MANIFEST.MF
> inflated: META-INF/MANIFEST.MF
> dfrankow [at] gibso (~/windows/tmp) % cat META-INF/MANIFEST.MF
> Manifest-Version: 1.0
> Ant-Version: Apache Ant 1.6.1
> Created-By: Apache Jakarta
>
> Name: org/apache/lucene
> Specification-Title: Lucene Search Engine
> Specification-Version: 1.4.3
> Specification-Vendor: Lucene
> Implementation-Title: org.apache.lucene
> Implementation-Version: build 2004-11-29 15:13:05
> Implementation-Vemdpr: Lucene
>
> I looked for TermVectorOffsetInfo in jar version 1.4.3, 1.4, 1.3, and
>
> 1.2, and could not find it.
>
> Someone asked about this on March 11
>
(http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3C20050311062700.46656.qmail [at] web31110%3E).
>
> You said to check out the trunk of Lucene, but I have to work with
> Lucene 1.4.3 in an existing project. I can't work with a version from
> SVN.
>
> Suggestions? Is there a version of Highlighter that works with 1.4.3?
> If
> it compiles for you, where do you get TermVectorOffsetInfo, and is
> that
> in 1.4.3?
>
> Thanks in advance.
>
> Dan
>
> Chris Hostetter wrote:
>
> >: Thanks for this tip! This should be in the FAQ. Is there a way to
> get it
> >: in there? I can't edit the wiki, I think.
> >
> >1) People who create a Wiki account can edit the FAQ.
> >
> >2) the FAQ already includes a refrence to Highlighter, I think it's
> the
> >FAQ entry you mentioned...
> >
>
>http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7755f4ae639ee8eaf7de04f5c27f5
> >
> > Is there a way to get a text summary of an indexed document with
> Lucene?
> >
> > You could store the documents summary in the index and then use
> the
> > Highlighter from the sandbox.
> >
> >...feel free to add to the FAQ whatever information you think would
> have
> >made the answer more useful to you when you first found it.
> >
> >
> >: >How would you implement "snippets" with Lucene? A snippet is some
> text
> >: >around parts of the text that caused a match, with the search
> matches in
> >: >bold. For example, the supporting text in Google searches. A
> snippet is
> >: >NOT a document summary, which has a FAQ entry and someone
> answered about
> >: >on this email list.
> >
> >
> >-Hoss
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
> ====
>
> [javac]
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:19:
>
> cannot find symbol
> [javac] symbol : class TermVectorOffsetInfo
> [javac] location: package org.apache.lucene.index
> [javac] import org.apache.lucene.index.TermVectorOffsetInfo;
> [javac] ^
> [javac]
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
>
> cannot find symbol
> [javac] symbol : class TermVectorOffsetInfo
> [javac] location: class
> org.apache.lucene.search.highlight.TokenSources
> [javac] TermVectorOffsetInfo[]
> offsets=tpv.getOffsets(t);
> [javac] ^
> [javac]
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
>
> cannot find symbol
> [javac] symbol : method getOffsets(int)
> [javac] location: interface
> org.apache.lucene.index.TermPositionVector
> [javac] TermVectorOffsetInfo[]
> offsets=tpv.getOffsets(t);
> [javac] ^
> [javac]
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:145:
>
> internal error; cannot instantiate Token(java.lang.String,int,int) at
>
> org.apache.lucene.analysis.Token to ()
> [javac] unsortedTokens.add(new
> Token(terms[t],
> [javac] ^
> [javac]
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:160:
>
> internal error; cannot instantiate Token(java.lang.String,int,int) at
>
> org.apache.lucene.analysis.Token to ()
> [javac] tokensInOriginalOrder[pos[tp]]=new
> Token(terms[t],
> [javac] ^
> [javac] 5 errors
>
> BUILD FAILED
> .../JSPWiki/build.xml:216: Compile failed; see the compiler error
> output
> for details.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


dfrankow at cs

Jan 8, 2006, 9:36 AM

Post #8 of 8 (9175 views)
Permalink
Re: Search result snippets? [In reply to]

Koji Sekiguchi wrote:

>Hi Dan,
>
>I've experienced same error you are facing. Check out:
>
>http://www.gossamer-threads.com/lists/lucene/java-user/28554?search_string=T
>ermVectorOffsetInfo;#28554
>
>Hope this helps.
>
>Koji
>
>


Thanks! That helped! I put all this in the FAQ entry:
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-75566820ee94a425c7e2950ac61d24e405fbd914

Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.