Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Indexing Problem

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


vsirishreddy at yahoo

Nov 15, 2007, 10:42 AM

Post #1 of 2 (278 views)
Permalink
Indexing Problem

The following is my code snippet for indexing the text:

document.add(Field.Text(IFIELD_TEXT, billMeasureDoc.getText()));

When ever the text is less or short, it works perfectly. But in few of the
cases if the text is too lengthy; i.e. around 1000 lines or more then it
causes a problem.

The problem being when the text is lengthy, after indexing, the document is
getting searched only upto a certain extent of text. For eg: Lets say we
have a text:

Test Test Test Test Test Test Test Test Test
.......... .......... .......... .......... .......... .......... .....
Test1 Test1 Test1 Test1 Test1 Test1 Test1
.......... .......... .......... .......... .......... .......... .....
Test2 Test2 Test2 Test2 Test2 Test2 Test2

Now while searching, it returns the document only if I search for either
Test or Test1 and it ignores any text that is trailing after Test1.

Can someone let me know if there is any text or character size limitation
for Field.Text in my above code. Also I would like to store this text as I
need to implement highlighting for the search text.


--
View this message in context: http://www.nabble.com/Indexing-Problem-tf4816336.html#a13778913
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


erickerickson at gmail

Nov 15, 2007, 11:09 AM

Post #2 of 2 (256 views)
Permalink
Re: Indexing Problem [In reply to]

Your problem is probably, that by default, Lucene stops after
10,000 terms. See IndexWriter.SetMaxFieldLength

Best
Erick

On Nov 15, 2007 1:42 PM, Sirish <vsirishreddy[at]yahoo.co.in> wrote:

>
> The following is my code snippet for indexing the text:
>
> document.add(Field.Text(IFIELD_TEXT, billMeasureDoc.getText()));
>
> When ever the text is less or short, it works perfectly. But in few of the
> cases if the text is too lengthy; i.e. around 1000 lines or more then it
> causes a problem.
>
> The problem being when the text is lengthy, after indexing, the document
> is
> getting searched only upto a certain extent of text. For eg: Lets say we
> have a text:
>
> Test Test Test Test Test Test Test Test Test
> .......... .......... .......... .......... .......... .......... .....
> Test1 Test1 Test1 Test1 Test1 Test1 Test1
> .......... .......... .......... .......... .......... .......... .....
> Test2 Test2 Test2 Test2 Test2 Test2 Test2
>
> Now while searching, it returns the document only if I search for either
> Test or Test1 and it ignores any text that is trailing after Test1.
>
> Can someone let me know if there is any text or character size limitation
> for Field.Text in my above code. Also I would like to store this text as I
> need to implement highlighting for the search text.
>
>
> --
> View this message in context:
> http://www.nabble.com/Indexing-Problem-tf4816336.html#a13778913
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-user-help[at]lucene.apache.org
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.