Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Find documents contained in search term

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


davidbrai at gmail

Aug 16, 2012, 10:38 AM

Post #1 of 4 (438 views)
Permalink
Find documents contained in search term

Hi,

I have a situation in which I have many short documents (30-400 chars).
My goal is given a phrase, find an indexed document which is a prefix of the
phrase.
Is there a way to achieve this goal using lucene using a single query?

Thanks,
David.



--
View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Aug 17, 2012, 5:55 AM

Post #2 of 4 (426 views)
Permalink
Re: Find documents contained in search term [In reply to]

Can't see how you could do it with standard queries, but you could
reverse the process and use a MemoryIndex.

Add the single target phrase to the memory index then loop round all
docs executing a search for each one. Maybe use PrefixQuery although
I'd worry about performance. Try it and see.

But if you're just doing string comparison

for each doc {
if target.startsWith(doc.text) {
// match
}
}

might be easier.


--
Ian.

On Thu, Aug 16, 2012 at 6:38 PM, davidbrai <davidbrai [at] gmail> wrote:
> Hi,
>
> I have a situation in which I have many short documents (30-400 chars).
> My goal is given a phrase, find an indexed document which is a prefix of the
> phrase.
> Is there a way to achieve this goal using lucene using a single query?
>
> Thanks,
> David.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


davidbrai at gmail

Aug 17, 2012, 8:55 AM

Post #3 of 4 (421 views)
Permalink
Re: Find documents contained in search term [In reply to]

I was hoping I didn't have to iterate through the short documents.
I have about ~1M of them currently and this process needs to be very fast.
So I understand there is not such functionality available in lucene.



--
View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663p4001867.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


findbestopensource at gmail

Aug 20, 2012, 1:38 AM

Post #4 of 4 (410 views)
Permalink
Re: Find documents contained in search term [In reply to]

Hi

You need to use prefix query for your requirement. Below are my thoughts
and hope it helps.

Say "Hello World" is your phrase.

1. Do a phrase query with your phrase ("Hello World")
2. If not found then strip the last character and then do prefix query
("Hello Worl")
3. Continue step 2 still you get the result or the pharse is empty.

If you give examples of some sample documents in the index and search
phrase then it will help others to give better response.

Regards
Aditya
www.findbestopensource.com - Search from more than 200,000 open source
projects.


On Fri, Aug 17, 2012 at 9:25 PM, davidbrai <davidbrai [at] gmail> wrote:

> I was hoping I didn't have to iterate through the short documents.
> I have about ~1M of them currently and this process needs to be very fast.
> So I understand there is not such functionality available in lucene.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663p4001867.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.