Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

document search returning no results

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


LangtonR at digitalev

May 8, 2012, 9:49 AM

Post #1 of 3 (94 views)
Permalink
document search returning no results

I have a search that is coming up empty despite a document existing with the search text. Is the / an illegal character?

Here's the field when I'm creating the document:

[5] = {indexed,tokenized<AssignedAreasWithId:3-Genetics,404-AnnalsofFamilyMedicine-July/August2009,60-Obesity/WeightManagement>}

Here's my lucene search query:

{+(AssignedAreasWithId:*404-annalsoffamilymedicine-july/august2009*)}

Thanks,

Ryan Langton
Engineer
Digital Evolution Group
913.951.3175 x155 (office)
913.498.9985 (fax)
langtonr [at] digitalev<mailto:langtonr [at] digitalev>
www.digitalev.com<http://www.digitalev.com>


jack at basetechnology

May 8, 2012, 1:40 PM

Post #2 of 3 (86 views)
Permalink
Re: document search returning no results [In reply to]

Even with “multi-term aware” (in 3.6 and trunk) analysis, you can’t have a single query term that analyzes (tokenizes) into multiple index terms AND has wildcards. In other words, if you want to use wildcard, the query term has to analyze (tokenize) into a single term.

Three strategies:

1. Split the query into multiple terms that are ANDed together and then use wildcards on the specific terms (words or tokens.)

2. Consider whether the field should be tokenized at all. Maybe it should be string or keyword and always wildcard to reference values.

3. Have two fields, one which is tokenized and lets you query by individual words embedded in the field values, and a second field which is a string or keyword and is not tokenized but use wildcards on the full field value, with a copyField to populate one field from the stored value of the other.

-- Jack Krupansky

From: Ryan Langton
Sent: Tuesday, May 08, 2012 12:49 PM
To: mailto:dev [at] lucene
Subject: document search returning no results

I have a search that is coming up empty despite a document existing with the search text. Is the / an illegal character?



Here’s the field when I’m creating the document:



[5] = {indexed,tokenized<AssignedAreasWithId:3-Genetics,404-AnnalsofFamilyMedicine-July/August2009,60-Obesity/WeightManagement>}



Here’s my lucene search query:



{+(AssignedAreasWithId:*404-annalsoffamilymedicine-july/august2009*)}



Thanks,



Ryan Langton

Engineer

Digital Evolution Group

913.951.3175 x155 (office)

913.498.9985 (fax)

langtonr [at] digitalev

www.digitalev.com


LangtonR at digitalev

May 9, 2012, 6:34 AM

Post #3 of 3 (84 views)
Permalink
RE: document search returning no results [In reply to]

I wonder if I should change how I am storing these collections within my indexes.
For example, AssignedAreasWithId is a collection that I’m storing in the index as {id}-{name},{id2}-{name2}
This forces me to use the wildcard to search these fields. Instead maybe I should be using spaces. I was thinking without the wildcards the entire field would have to match, but it seems now that the field just has to contain the search text as a complete word. Is that correct?
So do you think I should store my collections as below?
{id}{name} {id2}{name2} {id3}{name3} {etc}

I’m not clear on the differences between tokenized, string, or keyword fields or if changes are needed here. This is a search only column (not stored) since it contains id’s for uniqueness.

Thanks,

From: Jack Krupansky [mailto:jack [at] basetechnology]
Sent: Tuesday, May 08, 2012 3:41 PM
To: dev [at] lucene
Subject: Re: document search returning no results

Even with “multi-term aware” (in 3.6 and trunk) analysis, you can’t have a single query term that analyzes (tokenizes) into multiple index terms AND has wildcards. In other words, if you want to use wildcard, the query term has to analyze (tokenize) into a single term.

Three strategies:

1. Split the query into multiple terms that are ANDed together and then use wildcards on the specific terms (words or tokens.)

2. Consider whether the field should be tokenized at all. Maybe it should be string or keyword and always wildcard to reference values.

3. Have two fields, one which is tokenized and lets you query by individual words embedded in the field values, and a second field which is a string or keyword and is not tokenized but use wildcards on the full field value, with a copyField to populate one field from the stored value of the other.

-- Jack Krupansky

From: Ryan Langton<mailto:LangtonR [at] digitalev>
Sent: Tuesday, May 08, 2012 12:49 PM
To: mailto:dev [at] lucene
Subject: document search returning no results

I have a search that is coming up empty despite a document existing with the search text. Is the / an illegal character?

Here’s the field when I’m creating the document:

[5] = {indexed,tokenized<AssignedAreasWithId:3-Genetics,404-AnnalsofFamilyMedicine-July/August2009,60-Obesity/WeightManagement>}

Here’s my lucene search query:

{+(AssignedAreasWithId:*404-annalsoffamilymedicine-july/august2009*)}

Thanks,

Ryan Langton
Engineer
Digital Evolution Group
913.951.3175 x155 (office)
913.498.9985 (fax)
langtonr [at] digitalev<mailto:langtonr [at] digitalev>
www.digitalev.com<http://www.digitalev.com>

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.