Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

could I implement this scenario?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


wysunxiaohua at yahoo

Sep 19, 2008, 1:00 AM

Post #1 of 10 (1102 views)
Permalink
could I implement this scenario?

Hi all,

How can I implemented this scenario in lucene?

suppose every document has three fields: docid, doctext and simdocid. docid is the id of the document, doctext is the content of the document, dimdocid is the docid of a similar document with this document.
example:
docid  doctext                    simdocid
doc01  ************************   doc04
doc02  ************************   doc03
doc03  ************************   doc02
doc04  ************************   doc03
doc05  ************************   doc04
doc06  ************************   doc02

During query, the index will be searched basing on field doctext. If the hits include four documents doc01,doc03,doc04, doc05, I want to display the corresponding similar documents only, that is, the three documents doc04,doc02,doc03.

Appreicate your help very much.

BR,
Shawn


mathieu at garambrogne

Sep 19, 2008, 1:14 AM

Post #2 of 10 (1071 views)
Permalink
Re: could I implement this scenario? [In reply to]

Yes. You can store data in lucene index and don't search on it : your
simdocid.

M.

On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
<wysunxiaohua [at] yahoo> wrote:
> Hi all,
>
> How can I implemented this scenario in lucene?
>
> suppose every document has three fields: docid, doctext and simdocid.
> docid is the id of the document, doctext is the content of the document,
> dimdocid is the docid of a similar document with this document.
> example:
> docid  doctext                    simdocid
> doc01  ************************   doc04
> doc02  ************************   doc03
> doc03  ************************   doc02
> doc04  ************************   doc03
> doc05  ************************   doc04
> doc06  ************************   doc02
>
> During query, the index will be searched basing on field doctext. If the
> hits include four documents doc01,doc03,doc04, doc05, I want to display
the
> corresponding similar documents only, that is, the three documents
> doc04,doc02,doc03.
>
> Appreicate your help very much.
>
> BR,
> Shawn
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jimi.hullegard at mogul

Sep 19, 2008, 1:20 AM

Post #3 of 10 (1059 views)
Permalink
RE: could I implement this scenario? [In reply to]

I think he meant that he wants the search to automatically retrieve the related documents, instead of the ones that was matched by the query.

mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30 stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 | jimi.hullegard [at] mogul | www.mogul.com


> -----Original Message-----
> From: mathieu [mailto:mathieu [at] garambrogne]
> Sent: den 19 september 2008 10:15
> To: java-user [at] lucene
> Subject: Re: could I implement this scenario?
>
>
>
> Yes. You can store data in lucene index and don't search on it : your
> simdocid.
>
> M.
>
> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
> > Hi all,
> >
> > How can I implemented this scenario in lucene?
> >
> > suppose every document has three fields: docid, doctext and
> simdocid.
> > docid is the id of the document, doctext is the content of
> the document,
> > dimdocid is the docid of a similar document with this document.
> > example:
> > docid doctext simdocid
> > doc01 ************************ doc04
> > doc02 ************************ doc03
> > doc03 ************************ doc02
> > doc04 ************************ doc03
> > doc05 ************************ doc04
> > doc06 ************************ doc02
> >
> > During query, the index will be searched basing on field
> doctext. If the
> > hits include four documents doc01,doc03,doc04, doc05, I
> want to display
> the
> > corresponding similar documents only, that is, the three documents
> > doc04,doc02,doc03.
> >
> > Appreicate your help very much.
> >
> > BR,
> > Shawn
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


wysunxiaohua at yahoo

Sep 19, 2008, 1:25 AM

Post #4 of 10 (1059 views)
Permalink
Re: could I implement this scenario? [In reply to]

Thank you. Mathieu.

But the hits don't include the document doc02  in my example, how to display  doc02?  I don't want to search by docid. Thanks.



----- Original Message ----
From: mathieu <mathieu [at] garambrogne>
To: java-user [at] lucene
Sent: Friday, September 19, 2008 4:14:34 PM
Subject: Re: could I implement this scenario?



Yes. You can store data in lucene index and don't search on it : your
simdocid.

M.

On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
<wysunxiaohua [at] yahoo> wrote:
> Hi all,
>
> How can I implemented this scenario in lucene?
>
> suppose every document has three fields: docid, doctext and simdocid.
> docid is the id of the document, doctext is the content of the document,
> dimdocid is the docid of a similar document with this document.
> example:
> docid  doctext                    simdocid
> doc01  ************************   doc04
> doc02  ************************   doc03
> doc03  ************************   doc02
> doc04  ************************   doc03
> doc05  ************************   doc04
> doc06  ************************   doc02
>
> During query, the index will be searched basing on field doctext. If the
> hits include four documents doc01,doc03,doc04, doc05, I want to display
the
> corresponding similar documents only, that is, the three documents
> doc04,doc02,doc03.
>
> Appreicate your help very much.
>
> BR,
> Shawn
>
>
>     


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


wysunxiaohua at yahoo

Sep 19, 2008, 1:27 AM

Post #5 of 10 (1067 views)
Permalink
Re: could I implement this scenario? [In reply to]

you are right. thank you.



----- Original Message ----
From: Jimi Hullegård <jimi.hullegard [at] mogul>
To: "java-user [at] lucene" <java-user [at] lucene>
Sent: Friday, September 19, 2008 4:20:58 PM
Subject: RE: could I implement this scenario?

I think he meant that he wants the search to automatically retrieve the related documents, instead of the ones that was matched by the query.

mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30 stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 | jimi.hullegard [at] mogul | www.mogul.com


> -----Original Message-----
> From: mathieu [mailto:mathieu [at] garambrogne]
> Sent: den 19 september 2008 10:15
> To: java-user [at] lucene
> Subject: Re: could I implement this scenario?
>
>
>
> Yes. You can store data in lucene index and don't search on it : your
> simdocid.
>
> M.
>
> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
> > Hi all,
> >
> > How can I implemented this scenario in lucene?
> >
> > suppose every document has three fields: docid, doctext and
> simdocid.
> > docid is the id of the document, doctext is the content of
> the document,
> > dimdocid is the docid of a similar document with this document.
> > example:
> > docid  doctext                    simdocid
> > doc01  ************************  doc04
> > doc02  ************************  doc03
> > doc03  ************************  doc02
> > doc04  ************************  doc03
> > doc05  ************************  doc04
> > doc06  ************************  doc02
> >
> > During query, the index will be searched basing on field
> doctext. If the
> > hits include four documents doc01,doc03,doc04, doc05, I
> want to display
> the
> > corresponding similar documents only, that is, the three documents
> > doc04,doc02,doc03.
> >
> > Appreicate your help very much.
> >
> > BR,
> > Shawn
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


mathieu at garambrogne

Sep 19, 2008, 1:34 AM

Post #6 of 10 (1061 views)
Permalink
Re: could I implement this scenario? [In reply to]

Lucene is just an index. Where do you wont to store your data? in a db,
flatfiles, document with an url, in lucene?

M.

On Fri, 19 Sep 2008 16:25:27 +0800 (CST), xh sun
<wysunxiaohua [at] yahoo> wrote:
> Thank you. Mathieu.
>
> But the hits don't include the document doc02  in my example, how to
> display  doc02?  I don't want to search by docid. Thanks.
>
>
>
> ----- Original Message ----
> From: mathieu <mathieu [at] garambrogne>
> To: java-user [at] lucene
> Sent: Friday, September 19, 2008 4:14:34 PM
> Subject: Re: could I implement this scenario?
>
>
>
> Yes. You can store data in lucene index and don't search on it : your
> simdocid.
>
> M.
>
> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
>> Hi all,
>>
>> How can I implemented this scenario in lucene?
>>
>> suppose every document has three fields: docid, doctext and simdocid.
>> docid is the id of the document, doctext is the content of the document,
>> dimdocid is the docid of a similar document with this document.
>> example:
>> docid  doctext                    simdocid
>> doc01  ************************   doc04
>> doc02  ************************   doc03
>> doc03  ************************   doc02
>> doc04  ************************   doc03
>> doc05  ************************   doc04
>> doc06  ************************   doc02
>>
>> During query, the index will be searched basing on field doctext. If the
>> hits include four documents doc01,doc03,doc04, doc05, I want to display
> the
>> corresponding similar documents only, that is, the three documents
>> doc04,doc02,doc03.
>>
>> Appreicate your help very much.
>>
>> BR,
>> Shawn
>>
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


wysunxiaohua at yahoo

Sep 19, 2008, 1:43 AM

Post #7 of 10 (1060 views)
Permalink
Re: could I implement this scenario? [In reply to]

I store the data in flatfiles and db. I want to implement it using Lucene only, but if it fails, maybe I shall create a temporary table for each query.



----- Original Message ----
From: mathieu <mathieu [at] garambrogne>
To: java-user [at] lucene
Sent: Friday, September 19, 2008 4:34:13 PM
Subject: Re: could I implement this scenario?


Lucene is just an index. Where do you wont to store your data? in a db,
flatfiles, document with an url, in lucene?

M.

On Fri, 19 Sep 2008 16:25:27 +0800 (CST), xh sun
<wysunxiaohua [at] yahoo> wrote:
> Thank you. Mathieu.
>
> But the hits don't include the document doc02  in my example, how to
> display  doc02?  I don't want to search by docid. Thanks.
>
>
>
> ----- Original Message ----
> From: mathieu <mathieu [at] garambrogne>
> To: java-user [at] lucene
> Sent: Friday, September 19, 2008 4:14:34 PM
> Subject: Re: could I implement this scenario?
>
>
>
> Yes. You can store data in lucene index and don't search on it : your
> simdocid.
>
> M.
>
> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
>> Hi all,
>>
>> How can I implemented this scenario in lucene?
>>
>> suppose every document has three fields: docid, doctext and simdocid.
>> docid is the id of the document, doctext is the content of the document,
>> dimdocid is the docid of a similar document with this document.
>> example:
>> docid  doctext                    simdocid
>> doc01  ************************   doc04
>> doc02  ************************   doc03
>> doc03  ************************   doc02
>> doc04  ************************   doc03
>> doc05  ************************   doc04
>> doc06  ************************   doc02
>>
>> During query, the index will be searched basing on field doctext. If the
>> hits include four documents doc01,doc03,doc04, doc05, I want to display
> the
>> corresponding similar documents only, that is, the three documents
>> doc04,doc02,doc03.
>>
>> Appreicate your help very much.
>>
>> BR,
>> Shawn
>>
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>     


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


Dragan.Jotanovic at DIOSPHERE

Sep 19, 2008, 1:58 AM

Post #8 of 10 (1062 views)
Permalink
RE: could I implement this scenario? [In reply to]

I think it is not good idea to use lucene as storage, it is just index.
You could probably implement this using flat files and lucene.
Your simDocId would be stored field which you can retrieve from the index after search, and it could also contain the information where on the disk is document located.


-----Original Message-----
From: xh sun [mailto:wysunxiaohua [at] yahoo]
Sent: Friday, September 19, 2008 9:44 AM
To: java-user [at] lucene
Subject: Re: could I implement this scenario?

I store the data in flatfiles and db. I want to implement it using Lucene only, but if it fails, maybe I shall create a temporary table for each query.



----- Original Message ----
From: mathieu <mathieu [at] garambrogne>
To: java-user [at] lucene
Sent: Friday, September 19, 2008 4:34:13 PM
Subject: Re: could I implement this scenario?


Lucene is just an index. Where do you wont to store your data? in a db,
flatfiles, document with an url, in lucene?

M.

On Fri, 19 Sep 2008 16:25:27 +0800 (CST), xh sun
<wysunxiaohua [at] yahoo> wrote:
> Thank you. Mathieu.
>
> But the hits don't include the document doc02  in my example, how to
> display  doc02?  I don't want to search by docid. Thanks.
>
>
>
> ----- Original Message ----
> From: mathieu <mathieu [at] garambrogne>
> To: java-user [at] lucene
> Sent: Friday, September 19, 2008 4:14:34 PM
> Subject: Re: could I implement this scenario?
>
>
>
> Yes. You can store data in lucene index and don't search on it : your
> simdocid.
>
> M.
>
> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
>> Hi all,
>>
>> How can I implemented this scenario in lucene?
>>
>> suppose every document has three fields: docid, doctext and simdocid.
>> docid is the id of the document, doctext is the content of the document,
>> dimdocid is the docid of a similar document with this document.
>> example:
>> docid  doctext                    simdocid
>> doc01  ************************   doc04
>> doc02  ************************   doc03
>> doc03  ************************   doc02
>> doc04  ************************   doc03
>> doc05  ************************   doc04
>> doc06  ************************   doc02
>>
>> During query, the index will be searched basing on field doctext. If the
>> hits include four documents doc01,doc03,doc04, doc05, I want to display
> the
>> corresponding similar documents only, that is, the three documents
>> doc04,doc02,doc03.
>>
>> Appreicate your help very much.
>>
>> BR,
>> Shawn
>>
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>     


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


wysunxiaohua at yahoo

Sep 19, 2008, 2:10 AM

Post #9 of 10 (1068 views)
Permalink
Re: could I implement this scenario? [In reply to]

Thank you for your suggestion.

 
----- Original Message ----
From: Dragan Jotanovic <Dragan.Jotanovic [at] DIOSPHERE>
To: java-user [at] lucene
Sent: Friday, September 19, 2008 4:58:34 PM
Subject: RE: could I implement this scenario?

I think it is not good idea to use lucene as storage, it is just index.
You could probably implement this using flat files and lucene.
Your simDocId would be stored field which you can retrieve from the index after search, and it could also contain the information where on the disk is document located.


-----Original Message-----
From: xh sun [mailto:wysunxiaohua [at] yahoo]
Sent: Friday, September 19, 2008 9:44 AM
To: java-user [at] lucene
Subject: Re: could I implement this scenario?

I store the data in flatfiles and db. I want to implement it using Lucene only, but if it fails, maybe I shall create a temporary table for each query.



----- Original Message ----
From: mathieu <mathieu [at] garambrogne>
To: java-user [at] lucene
Sent: Friday, September 19, 2008 4:34:13 PM
Subject: Re: could I implement this scenario?


Lucene is just an index. Where do you wont to store your data? in a db,
flatfiles, document with an url, in lucene?

M.

On Fri, 19 Sep 2008 16:25:27 +0800 (CST), xh sun
<wysunxiaohua [at] yahoo> wrote:
> Thank you. Mathieu.
>
> But the hits don't include the document doc02  in my example, how to
> display  doc02?  I don't want to search by docid. Thanks.
>
>
>
> ----- Original Message ----
> From: mathieu <mathieu [at] garambrogne>
> To: java-user [at] lucene
> Sent: Friday, September 19, 2008 4:14:34 PM
> Subject: Re: could I implement this scenario?
>
>
>
> Yes. You can store data in lucene index and don't search on it : your
> simdocid.
>
> M.
>
> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
>> Hi all,
>>
>> How can I implemented this scenario in lucene?
>>
>> suppose every document has three fields: docid, doctext and simdocid.
>> docid is the id of the document, doctext is the content of the document,
>> dimdocid is the docid of a similar document with this document.
>> example:
>> docid  doctext                    simdocid
>> doc01  ************************   doc04
>> doc02  ************************   doc03
>> doc03  ************************   doc02
>> doc04  ************************   doc03
>> doc05  ************************   doc04
>> doc06  ************************   doc02
>>
>> During query, the index will be searched basing on field doctext. If the
>> hits include four documents doc01,doc03,doc04, doc05, I want to display
> the
>> corresponding similar documents only, that is, the three documents
>> doc04,doc02,doc03.
>>
>> Appreicate your help very much.
>>
>> BR,
>> Shawn
>>
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>     


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


     

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


glen.newton at gmail

Sep 19, 2008, 6:34 AM

Post #10 of 10 (1063 views)
Permalink
Re: could I implement this scenario? [In reply to]

> I think it is not good idea to use lucene as storage, it is just index.

I strongly disagree with this position.

To qualify my disagreement: yes, you should not use Lucene as your
primary storage for your data in your organization.

But, for a particular application, taking content from your primary
storage system (SQL database, filesystem files, etc) and - in the
context of an end-user application - both indexing and storing the
content is a good solution. The stored content in Lucene if
effectively a cache.

Advantages:
- faster (don't have to make additional queries to find content in
primary storage system)
- less system dependencies (if the primary system is down...)
- no longer hitting primary storage system (which are usually already
busy doing other things and also tend to be expensive)
- simpler

Disadvantages
- larger index
- might be slower, if the index is significantly larger
- updating issues

thanks,

Glen
http://zzzoot.blogspot.com/

2008/9/19 Dragan Jotanovic <Dragan.Jotanovic [at] diosphere>:
> I think it is not good idea to use lucene as storage, it is just index.
> You could probably implement this using flat files and lucene.
> Your simDocId would be stored field which you can retrieve from the index after search, and it could also contain the information where on the disk is document located.
>
>
> -----Original Message-----
> From: xh sun [mailto:wysunxiaohua [at] yahoo]
> Sent: Friday, September 19, 2008 9:44 AM
> To: java-user [at] lucene
> Subject: Re: could I implement this scenario?
>
> I store the data in flatfiles and db. I want to implement it using Lucene only, but if it fails, maybe I shall create a temporary table for each query.
>
>
>
> ----- Original Message ----
> From: mathieu <mathieu [at] garambrogne>
> To: java-user [at] lucene
> Sent: Friday, September 19, 2008 4:34:13 PM
> Subject: Re: could I implement this scenario?
>
>
> Lucene is just an index. Where do you wont to store your data? in a db,
> flatfiles, document with an url, in lucene?
>
> M.
>
> On Fri, 19 Sep 2008 16:25:27 +0800 (CST), xh sun
> <wysunxiaohua [at] yahoo> wrote:
>> Thank you. Mathieu.
>>
>> But the hits don't include the document doc02 in my example, how to
>> display doc02? I don't want to search by docid. Thanks.
>>
>>
>>
>> ----- Original Message ----
>> From: mathieu <mathieu [at] garambrogne>
>> To: java-user [at] lucene
>> Sent: Friday, September 19, 2008 4:14:34 PM
>> Subject: Re: could I implement this scenario?
>>
>>
>>
>> Yes. You can store data in lucene index and don't search on it : your
>> simdocid.
>>
>> M.
>>
>> On Fri, 19 Sep 2008 16:00:20 +0800 (CST), xh sun
>> <wysunxiaohua [at] yahoo> wrote:
>>> Hi all,
>>>
>>> How can I implemented this scenario in lucene?
>>>
>>> suppose every document has three fields: docid, doctext and simdocid.
>>> docid is the id of the document, doctext is the content of the document,
>>> dimdocid is the docid of a similar document with this document.
>>> example:
>>> docid doctext simdocid
>>> doc01 ************************ doc04
>>> doc02 ************************ doc03
>>> doc03 ************************ doc02
>>> doc04 ************************ doc03
>>> doc05 ************************ doc04
>>> doc06 ************************ doc02
>>>
>>> During query, the index will be searched basing on field doctext. If the
>>> hits include four documents doc01,doc03,doc04, doc05, I want to display
>> the
>>> corresponding similar documents only, that is, the three documents
>>> doc04,doc02,doc03.
>>>
>>> Appreicate your help very much.
>>>
>>> BR,
>>> Shawn
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.