Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Return the sentence number in the indexed files

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


farag_ahmed at yahoo

Jul 19, 2008, 3:00 AM

Post #1 of 3 (225 views)
Permalink
Return the sentence number in the indexed files

Hi All,

I have a text files that contain several sentences, there is space between
each sentence.
When searching the index , i get the path for the documents that match the
query

String path = doc.get("path");


Is it possible to get the number of the sentence that match the query
inside the matched documents?

Thanks in advance
--
View this message in context: http://www.nabble.com/Return-the-sentence-number-in-the-indexed-files-tp18543061p18543061.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


gsingers at apache

Jul 19, 2008, 2:38 PM

Post #2 of 3 (187 views)
Permalink
Re: Return the sentence number in the indexed files [In reply to]

On Jul 19, 2008, at 6:00 AM, starz10de wrote:

>
> Hi All,
>
> I have a text files that contain several sentences, there is space
> between
> each sentence.
> When searching the index , i get the path for the documents that
> match the
> query
>
> String path = doc.get("path");
>
>
> Is it possible to get the number of the sentence that match the query
> inside the matched documents?

Not without some extra work. This kind of thing requires post (or
pre) processing. You can use SpanQuery to know where in a document
you matched, and then do the sentence calculations. Another option is
to index each sentence as a separate document and then post process to
combine.

If you search the archives on this list and java-dev you'll see
several discussions on the topic. See:
http://lucene.markmail.org/message/we25gm32p6qot32c?q=sentence+detection
and
http://lucene.markmail.org/message/uq6ffx3oqsulgxys?q=sentence

HTH,
Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


farag_ahmed at yahoo

Jul 20, 2008, 4:53 AM

Post #3 of 3 (178 views)
Permalink
Re: Return the sentence number in the indexed files [In reply to]

thanks Grant for the answer,

to index each sentence as a separate document , i already did this and it
work fine, i indexed more than 93000 sentences (Documents) approx. in 11
minutes. I though the other option might be more efficient.

Farag

Grant Ingersoll-6 wrote:
>
>
> On Jul 19, 2008, at 6:00 AM, starz10de wrote:
>
>>
>> Hi All,
>>
>> I have a text files that contain several sentences, there is space
>> between
>> each sentence.
>> When searching the index , i get the path for the documents that
>> match the
>> query
>>
>> String path = doc.get("path");
>>
>>
>> Is it possible to get the number of the sentence that match the query
>> inside the matched documents?
>
> Not without some extra work. This kind of thing requires post (or
> pre) processing. You can use SpanQuery to know where in a document
> you matched, and then do the sentence calculations. Another option is
> to index each sentence as a separate document and then post process to
> combine.
>
> If you search the archives on this list and java-dev you'll see
> several discussions on the topic. See:
> http://lucene.markmail.org/message/we25gm32p6qot32c?q=sentence+detection
> and
> http://lucene.markmail.org/message/uq6ffx3oqsulgxys?q=sentence
>
> HTH,
> Grant
>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/Return-the-sentence-number-in-the-indexed-files-tp18543061p18553514.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.