Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Searching accross 2 fields

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


mohitanchlia at gmail

May 21, 2012, 11:36 AM

Post #1 of 4 (427 views)
Permalink
Searching accross 2 fields

I am new to search and just went through some concepts of "Lucene in
Action". I have few questions:

Problem I am having is this. Say I have these json docs for eg. Now I want
to query forms.id = 40 and fields.id = L31A and fields.value = 3000. I
expect it to return doc 1. But with the regular search I'll also get doc 2.
What's the best way of designing search for such queries?

Json doc 1
{
"fileName":"filename",
"createdDate":"05/20/12 16:21:56",
"setModel":[.
{
"id":"1",
"compliance":false,
"forms":[
{
"id":"40",
"copy":null,
"tpsId":null,
"forms":[
{
"id":"F40_SW_2",
"copy":null,
"tpsId":"1[]/F40[]",
"forms":[
],
"tables":[
],
"fields":[.
{
"id":"L31A",
"security":null,
"value":"3000."
},
{
"id":"MRSSN1",
"security":null,
"value":"656465464"
}
]
}
]
}
}


Json doc 2
{
"fileName":"filename",
"createdDate":"05/20/12 16:21:56",
"setModel":[.
{
"id":"1",
"compliance":false,
"forms":[
{
"id":"50",
"copy":null,
"tpsId":null,
"forms":[
{
"id":"F50_SW_2",
"copy":null,
"tpsId":"1[]/F50[]",
"forms":[
],
"tables":[
],
"fields":[.
{
"id":"L31A",
"security":null,
"value":"3000."
},
{
"id":"MRSSN1",
"security":null,
"value":"656465464"
}
]
}
]
}


markharw00d at yahoo

May 21, 2012, 3:24 PM

Post #2 of 4 (413 views)
Permalink
Re: Searching accross 2 fields [In reply to]

You're describing what I call the "cross matching" problem if you flatten nested, repeating structures with multiple fields into a single flat Lucene document model.
The approach for handling the more complex mappings is to use nested child docs in Lucene and for that look at BlockJoinQuery.

However, in this particular case it just might be possible to safely collapse your Json doc into a single Lucene doc if the value for "fields.id" (e.g. L31A) was used as the Lucene field name on a single document and the related "value" field was the Lucene field's contents.
Of course you can only go so far with this sort of flattening approach before cross-matching becomes an issue.


Cheers
Mark

On 21 May 2012, at 19:36, Mohit Anchlia wrote:

> I am new to search and just went through some concepts of "Lucene in
> Action". I have few questions:
>
> Problem I am having is this. Say I have these json docs for eg. Now I want
> to query forms.id = 40 and fields.id = L31A and fields.value = 3000. I
> expect it to return doc 1. But with the regular search I'll also get doc 2.
> What's the best way of designing search for such queries?
>
> Json doc 1
> {
> "fileName":"filename",
> "createdDate":"05/20/12 16:21:56",
> "setModel":[.
> {
> "id":"1",
> "compliance":false,
> "forms":[.
> {
> "id":"40",
> "copy":null,
> "tpsId":null,
> "forms":[
> {
> "id":"F40_SW_2",
> "copy":null,
> "tpsId":"1[]/F40[]",
> "forms":[
> ],
> "tables":[
> ],
> "fields":[.
> {
> "id":"L31A",
> "security":null,
> "value":"3000."
> },
> {
> "id":"MRSSN1",
> "security":null,
> "value":"656465464"
> }
> ]
> }
> ]
> }
> }
>
>
> Json doc 2
> {
> "fileName":"filename",
> "createdDate":"05/20/12 16:21:56",
> "setModel":[.
> {
> "id":"1",
> "compliance":false,
> "forms":[.
> {
> "id":"50",
> "copy":null,
> "tpsId":null,
> "forms":[
> {
> "id":"F50_SW_2",
> "copy":null,
> "tpsId":"1[]/F50[]",
> "forms":[
> ],
> "tables":[
> ],
> "fields":[.
> {
> "id":"L31A",
> "security":null,
> "value":"3000."
> },
> {
> "id":"MRSSN1",
> "security":null,
> "value":"656465464"
> }
> ]
> }
> ]
> }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


mohitanchlia at gmail

May 21, 2012, 3:32 PM

Post #3 of 4 (417 views)
Permalink
Re: Searching accross 2 fields [In reply to]

Thanks! Are there any good examples I can look at?

In some cases it's the nested document in other cases it's within the same
document. Something like:

In below example I want to search for form.id = 1040 and name = age and
value = 20 and return only doc1. Does this also fall under "cross matching"
solution that you described?

doc1:

{
form: { id: 1040 }
attrib: {
name: age
value: 20
}
}
doc 2:


{
form: { id: 1040 }
attrib: {
name: age
value: 22
}
}
On Mon, May 21, 2012 at 3:24 PM, Mark Harwood <markharw00d [at] yahoo>wrote:

> You're describing what I call the "cross matching" problem if you flatten
> nested, repeating structures with multiple fields into a single flat Lucene
> document model.
> The approach for handling the more complex mappings is to use nested child
> docs in Lucene and for that look at BlockJoinQuery.
>
> However, in this particular case it just might be possible to safely
> collapse your Json doc into a single Lucene doc if the value for "
> fields.id" (e.g. L31A) was used as the Lucene field name on a single
> document and the related "value" field was the Lucene field's contents.
> Of course you can only go so far with this sort of flattening approach
> before cross-matching becomes an issue.
>
>
> Cheers
> Mark
>
> On 21 May 2012, at 19:36, Mohit Anchlia wrote:
>
> > I am new to search and just went through some concepts of "Lucene in
> > Action". I have few questions:
> >
> > Problem I am having is this. Say I have these json docs for eg. Now I
> want
> > to query forms.id = 40 and fields.id = L31A and fields.value = 3000. I
> > expect it to return doc 1. But with the regular search I'll also get doc
> 2.
> > What's the best way of designing search for such queries?
> >
> > Json doc 1
> > {
> > "fileName":"filename",
> > "createdDate":"05/20/12 16:21:56",
> > "setModel":[.
> > {
> > "id":"1",
> > "compliance":false,
> > "forms":[.
> > {
> > "id":"40",
> > "copy":null,
> > "tpsId":null,
> > "forms":[.
> > {
> > "id":"F40_SW_2",
> > "copy":null,
> > "tpsId":"1[]/F40[]",
> > "forms":[
> > ],
> > "tables":[
> > ],
> > "fields":[.
> > {
> > "id":"L31A",
> > "security":null,
> > "value":"3000."
> > },
> > {
> > "id":"MRSSN1",
> > "security":null,
> > "value":"656465464"
> > }
> > ]
> > }
> > ]
> > }
> > }
> >
> >
> > Json doc 2
> > {
> > "fileName":"filename",
> > "createdDate":"05/20/12 16:21:56",
> > "setModel":[.
> > {
> > "id":"1",
> > "compliance":false,
> > "forms":[.
> > {
> > "id":"50",
> > "copy":null,
> > "tpsId":null,
> > "forms":[.
> > {
> > "id":"F50_SW_2",
> > "copy":null,
> > "tpsId":"1[]/F50[]",
> > "forms":[
> > ],
> > "tables":[
> > ],
> > "fields":[.
> > {
> > "id":"L31A",
> > "security":null,
> > "value":"3000."
> > },
> > {
> > "id":"MRSSN1",
> > "security":null,
> > "value":"656465464"
> > }
> > ]
> > }
> > ]
> > }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


markharw00d at yahoo

May 22, 2012, 2:24 AM

Post #4 of 4 (409 views)
Permalink
Re: Searching accross 2 fields [In reply to]

>>Thanks! Are there any good examples I can look at?

Mike did a good write-up here:http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

>>In below example I want to search for form.id......Does this also fall under "cross matching" solution that you described?

"Cross matching" is a problem caused by flattening - not the solution.
If you only ever have one "form" in a doc as per your example then a single Lucene document could happily represent the 
data shown because "age" looks like a fieldname and "20" its value - that would just be a standard Lucene field called "age".

However, if you have an indeterminate number of forms in each source doc, each with many attribs (e.g. "age" and "height") 
then using a single Lucene doc will create issues because the flattening of this structure will muddle the data and prevent 
you from knowing which age value is related to which height value - i.e. the "cross-matching" problem 
(see here for overview:http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene )

Cheers,
Mark

________________________________
From: Mohit Anchlia <mohitanchlia [at] gmail>
To: java-user [at] lucene
Sent: Monday, 21 May 2012, 23:32
Subject: Re: Searching accross 2 fields

Thanks! Are there any good examples I can look at?

In some cases it's the nested document in other cases it's within the same
document. Something like:

In below example I want to search for form.id = 1040 and name = age and
value = 20 and return only doc1. Does this also fall under "cross matching"
solution that you described?

doc1:

{
form: { id: 1040 }
  attrib: {
                 name: age
                 value: 20
           }
}
doc 2:


{
form: { id: 1040 }
  attrib: {
                 name: age
                 value: 22
           }
}
On Mon, May 21, 2012 at 3:24 PM, Mark Harwood <markharw00d [at] yahoo>wrote:

> You're describing what I call the "cross matching" problem if you flatten
> nested, repeating structures with multiple fields into a single flat Lucene
> document model.
> The approach for handling the more complex mappings is to use nested child
> docs in Lucene and for that look at BlockJoinQuery.
>
> However, in this particular case it just might be possible to safely
> collapse your Json doc into a single Lucene doc if the value for "
> fields.id" (e.g. L31A) was used as the Lucene field name on a single
> document and the related "value"  field was the Lucene field's contents.
> Of course you can only go so far with this sort of flattening approach
> before cross-matching becomes an issue.
>
>
> Cheers
> Mark
>
> On 21 May 2012, at 19:36, Mohit Anchlia wrote:
>
> > I am new to search and just went through some concepts of "Lucene in
> > Action". I have few questions:
> >
> > Problem I am having is this. Say I have these json docs for eg. Now I
> want
> > to query forms.id = 40 and fields.id = L31A and fields.value = 3000. I
> > expect it to return doc 1. But with the regular search I'll also get doc
> 2.
> > What's the best way of designing search for such queries?
> >
> > Json doc 1
> > {
> >   "fileName":"filename",
> >   "createdDate":"05/20/12 16:21:56",
> >   "setModel":[.
> >      {
> >         "id":"1",
> >         "compliance":false,
> >         "forms":[.
> >            {
> >               "id":"40",
> >               "copy":null,
> >               "tpsId":null,
> >               "forms":[.
> >                  {
> >                     "id":"F40_SW_2",
> >                     "copy":null,
> >                     "tpsId":"1[]/F40[]",
> >                     "forms":[.
> >                     ],
> >                     "tables":[.
> >                     ],
> >                     "fields":[.
> >                        {
> >                           "id":"L31A",
> >                           "security":null,
> >                           "value":"3000."
> >                        },
> >                        {
> >                           "id":"MRSSN1",
> >                           "security":null,
> >                           "value":"656465464"
> >                        }
> >                     ]
> >                  }
> >                ]
> >             }
> > }
> >
> >
> > Json doc 2
> > {
> >   "fileName":"filename",
> >   "createdDate":"05/20/12 16:21:56",
> >   "setModel":[.
> >      {
> >         "id":"1",
> >         "compliance":false,
> >         "forms":[.
> >            {
> >               "id":"50",
> >               "copy":null,
> >               "tpsId":null,
> >               "forms":[.
> >                  {
> >                     "id":"F50_SW_2",
> >                     "copy":null,
> >                     "tpsId":"1[]/F50[]",
> >                     "forms":[.
> >                     ],
> >                     "tables":[.
> >                     ],
> >                     "fields":[.
> >                        {
> >                           "id":"L31A",
> >                           "security":null,
> >                           "value":"3000."
> >                        },
> >                        {
> >                           "id":"MRSSN1",
> >                           "security":null,
> >                           "value":"656465464"
> >                        }
> >                     ]
> >                  }
> >                ]
> >             }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.