Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Searching compressed text using CompressionTools

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


parida.suraj at gmail

Feb 1, 2010, 3:43 AM

Post #1 of 9 (1521 views)
Permalink
Searching compressed text using CompressionTools

Hi,

I want to compress a text field (due to its large size and spaces), during
indexing.

I am unable to get the same also want to search.


My code during compressing is as follows:
String value = "Some large text ...... ";
byte[] valuesbyte = CompressionTools.compress(value.getBytes());
final Field f = new Field(key, valuesbyte, Field.Store.YES);
f.setOmitTermFreqAndPositions(true);
f.setOmitNorms(true);
document.add(f);

Please tell me how to search and display this value.

Regards
Suraj
--
View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27402945.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Feb 1, 2010, 3:46 AM

Post #2 of 9 (1463 views)
Permalink
RE: Searching compressed text using CompressionTools [In reply to]

Compression is only used for *stored* fields. For indexing there is no compression available (how should that work). You must clearly differentiate between stored and indexed fields!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> -----Original Message-----
> From: Suraj Parida [mailto:parida.suraj [at] gmail]
> Sent: Monday, February 01, 2010 12:44 PM
> To: java-user [at] lucene
> Subject: Searching compressed text using CompressionTools
>
>
> Hi,
>
> I want to compress a text field (due to its large size and spaces),
> during
> indexing.
>
> I am unable to get the same also want to search.
>
>
> My code during compressing is as follows:
> String value = "Some large text ......
> ";
> byte[] valuesbyte =
> CompressionTools.compress(value.getBytes());
> final Field f = new Field(key, valuesbyte,
> Field.Store.YES);
> f.setOmitTermFreqAndPositions(true);
> f.setOmitNorms(true);
> document.add(f);
>
> Please tell me how to search and display this value.
>
> Regards
> Suraj
> --
> View this message in context: http://old.nabble.com/Searching-
> compressed-text-using-CompressionTools-tp27402945p27402945.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Feb 1, 2010, 3:50 AM

Post #3 of 9 (1471 views)
Permalink
RE: Searching compressed text using CompressionTools [In reply to]

I forget:

To also index those fields, add it a second time with only index enabled and same name:

String value = "Some large text ...... ";
byte[] valuesbyte = CompressionTools.compress(value.getBytes());
Field f = new Field(key, valuesbyte, Field.Store.YES);
Document.add(field); // the stored one, so need for norms/TF suppress
F = new Field(key, value, Field.Store.NO, Field.Index.ANALYZED);
f.setOmitTermFreqAndPositions(true);
f.setOmitNorms(true);
document.add(f);

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe [at] thetaphi]
> Sent: Monday, February 01, 2010 12:46 PM
> To: java-user [at] lucene
> Subject: RE: Searching compressed text using CompressionTools
>
> Compression is only used for *stored* fields. For indexing there is no
> compression available (how should that work). You must clearly
> differentiate between stored and indexed fields!
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
> > -----Original Message-----
> > From: Suraj Parida [mailto:parida.suraj [at] gmail]
> > Sent: Monday, February 01, 2010 12:44 PM
> > To: java-user [at] lucene
> > Subject: Searching compressed text using CompressionTools
> >
> >
> > Hi,
> >
> > I want to compress a text field (due to its large size and spaces),
> > during
> > indexing.
> >
> > I am unable to get the same also want to search.
> >
> >
> > My code during compressing is as follows:
> > String value = "Some large text
> ......
> > ";
> > byte[] valuesbyte =
> > CompressionTools.compress(value.getBytes());
> > final Field f = new Field(key, valuesbyte,
> > Field.Store.YES);
> > f.setOmitTermFreqAndPositions(true);
> > f.setOmitNorms(true);
> > document.add(f);
> >
> > Please tell me how to search and display this value.
> >
> > Regards
> > Suraj
> > --
> > View this message in context: http://old.nabble.com/Searching-
> > compressed-text-using-CompressionTools-tp27402945p27402945.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


parida.suraj at gmail

Feb 1, 2010, 4:04 AM

Post #4 of 9 (1469 views)
Permalink
RE: Searching compressed text using CompressionTools [In reply to]

Uwe,

Thanks for the reply.

I am confused with
document.add(new Field(key, value, Field.Store.COMPRESS,
Field.Index.ANALYZED));

My requirement is also the same, but how can i do it in 3.0 ?
i thought CompressionTools would be used for compression.

Basically i need to compress the text which content of a document and there
are around 50,000 documents and still increasing in number. If possible
please send a code or hint as an example.

Regards,
Suraj













Uwe Schindler wrote:
>
> Compression is only used for *stored* fields. For indexing there is no
> compression available (how should that work). You must clearly
> differentiate between stored and indexed fields!
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>> -----Original Message-----
>> From: Suraj Parida [mailto:parida.suraj [at] gmail]
>> Sent: Monday, February 01, 2010 12:44 PM
>> To: java-user [at] lucene
>> Subject: Searching compressed text using CompressionTools
>>
>>
>> Hi,
>>
>> I want to compress a text field (due to its large size and spaces),
>> during
>> indexing.
>>
>> I am unable to get the same also want to search.
>>
>>
>> My code during compressing is as follows:
>> String value = "Some large text ......
>> ";
>> byte[] valuesbyte =
>> CompressionTools.compress(value.getBytes());
>> final Field f = new Field(key, valuesbyte,
>> Field.Store.YES);
>> f.setOmitTermFreqAndPositions(true);
>> f.setOmitNorms(true);
>> document.add(f);
>>
>> Please tell me how to search and display this value.
>>
>> Regards
>> Suraj
>> --
>> View this message in context: http://old.nabble.com/Searching-
>> compressed-text-using-CompressionTools-tp27402945p27402945.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27403169.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Feb 1, 2010, 4:17 AM

Post #5 of 9 (1469 views)
Permalink
Re: Searching compressed text using CompressionTools [In reply to]

Please read Uwe's answers again. I think he has already answered all
your questions.

The javadocs for 2.9.1 are very useful when upgrading to 3.0.0. Read
the entry for Field.Store.COMPRESS.


--
Ian.



On Mon, Feb 1, 2010 at 12:04 PM, Suraj Parida <parida.suraj [at] gmail> wrote:
>
> Uwe,
>
> Thanks for the reply.
>
> I am confused with
>    document.add(new Field(key, value, Field.Store.COMPRESS,
> Field.Index.ANALYZED));
>
>  My requirement is also the same, but how can i do it in 3.0 ?
>  i thought CompressionTools would be used for compression.
>
> Basically i need to compress the text which content of a document and there
> are around 50,000 documents and still increasing in number.  If possible
> please send a code or hint as an example.
>
> Regards,
> Suraj
>
>
>
>
>
>
>
>
>
>
>
>
>
> Uwe Schindler wrote:
>>
>> Compression is only used for *stored* fields. For indexing there is no
>> compression available (how should that work). You must clearly
>> differentiate between stored and indexed fields!
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe [at] thetaphi
>>
>>> -----Original Message-----
>>> From: Suraj Parida [mailto:parida.suraj [at] gmail]
>>> Sent: Monday, February 01, 2010 12:44 PM
>>> To: java-user [at] lucene
>>> Subject: Searching compressed text using CompressionTools
>>>
>>>
>>> Hi,
>>>
>>> I want to compress a text field (due to its large size and spaces),
>>> during
>>> indexing.
>>>
>>> I am unable to get the same also want to search.
>>>
>>>
>>> My code during compressing is as follows:
>>>                                 String value = "Some large text ......
>>> ";
>>>                              byte[] valuesbyte =
>>> CompressionTools.compress(value.getBytes());
>>>                              final Field f = new Field(key, valuesbyte,
>>> Field.Store.YES);
>>>                              f.setOmitTermFreqAndPositions(true);
>>>                              f.setOmitNorms(true);
>>>                                 document.add(f);
>>>
>>> Please tell me how to search and display this value.
>>>
>>> Regards
>>> Suraj
>>> --
>>> View this message in context: http://old.nabble.com/Searching-
>>> compressed-text-using-CompressionTools-tp27402945p27402945.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27403169.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


parida.suraj at gmail

Feb 3, 2010, 2:59 AM

Post #6 of 9 (1364 views)
Permalink
Re: Searching compressed text using CompressionTools [In reply to]

Ian,

Thanks for solving my previous problems.


Now i tested the compression with 100 docs and found:
1. With Compression size of FS directory (on disk)= 10.8 KB
2. Without Compression size of FS directory (on disk) = 12.0 MB

and with 500 docs:
1. With Compression size of FS directory (on disk) = 45.9 KB
2. Without Compression size of FS directory (on disk) = 56.8 MB

I mean do the compression will increase my disk usage ? if so will 50,000
docs take around 6000 MB?
or please tell if am i doing wrong somewhere because i thought compression
will reduce space usage.

Regards,
Suraj





Ian Lea wrote:
>
> Please read Uwe's answers again. I think he has already answered all
> your questions.
>
> The javadocs for 2.9.1 are very useful when upgrading to 3.0.0. Read
> the entry for Field.Store.COMPRESS.
>
>
> --
> Ian.
>
>
>
> On Mon, Feb 1, 2010 at 12:04 PM, Suraj Parida <parida.suraj [at] gmail>
> wrote:
>>
>> Uwe,
>>
>> Thanks for the reply.
>>
>> I am confused with
>>    document.add(new Field(key, value, Field.Store.COMPRESS,
>> Field.Index.ANALYZED));
>>
>>  My requirement is also the same, but how can i do it in 3.0 ?
>>  i thought CompressionTools would be used for compression.
>>
>> Basically i need to compress the text which content of a document and
>> there
>> are around 50,000 documents and still increasing in number.  If possible
>> please send a code or hint as an example.
>>
>> Regards,
>> Suraj
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Uwe Schindler wrote:
>>>
>>> Compression is only used for *stored* fields. For indexing there is no
>>> compression available (how should that work). You must clearly
>>> differentiate between stored and indexed fields!
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe [at] thetaphi
>>>
>>>> -----Original Message-----
>>>> From: Suraj Parida [mailto:parida.suraj [at] gmail]
>>>> Sent: Monday, February 01, 2010 12:44 PM
>>>> To: java-user [at] lucene
>>>> Subject: Searching compressed text using CompressionTools
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I want to compress a text field (due to its large size and spaces),
>>>> during
>>>> indexing.
>>>>
>>>> I am unable to get the same also want to search.
>>>>
>>>>
>>>> My code during compressing is as follows:
>>>>                                 String value = "Some large text ......
>>>> ";
>>>>                              byte[] valuesbyte =
>>>> CompressionTools.compress(value.getBytes());
>>>>                              final Field f = new Field(key, valuesbyte,
>>>> Field.Store.YES);
>>>>                              f.setOmitTermFreqAndPositions(true);
>>>>                              f.setOmitNorms(true);
>>>>                                 document.add(f);
>>>>
>>>> Please tell me how to search and display this value.
>>>>
>>>> Regards
>>>> Suraj
>>>> --
>>>> View this message in context: http://old.nabble.com/Searching-
>>>> compressed-text-using-CompressionTools-tp27402945p27402945.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-user-help [at] lucene
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27403169.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>

--
View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27434620.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


parida.suraj at gmail

Feb 3, 2010, 3:01 AM

Post #7 of 9 (1372 views)
Permalink
Re: Searching compressed text using CompressionTools [In reply to]

Ian,

Small correction made ...

Thanks for solving my previous problems.


Now i tested the compression with 100 docs and found:
1. Without Compression size of FS directory (on disk)= 10.8 KB
2. With Compression size of FS directory (on disk) = 12.0 MB

and with 500 docs:
1. Without Compression size of FS directory (on disk) = 45.9 KB
2. With Compression size of FS directory (on disk) = 56.8 MB

I mean do the compression will increase my disk usage ? if so will 50,000
docs take around 6000 MB?
or please tell if am i doing wrong somewhere because i thought compression
will reduce space usage.

Regards,
Suraj
--
View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27434651.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ian.lea at gmail

Feb 3, 2010, 4:11 AM

Post #8 of 9 (1369 views)
Permalink
Re: Searching compressed text using CompressionTools [In reply to]

Are you saying that by using compression your index size goes up by a
factor of more than 1024? From c10 kilobytes to 12 megabytes?

Compressing small fields can cause the index to get bigger rather than
smaller but obviously not by that much.

--
Ian.


On Wed, Feb 3, 2010 at 11:01 AM, Suraj Parida <parida.suraj [at] gmail> wrote:
>
> Ian,
>
> Small correction made ...
>
> Thanks for solving my previous problems.
>
>
> Now i tested the compression with 100 docs and found:
>  1. Without Compression size of FS directory (on disk)= 10.8 KB
>  2. With Compression size of FS directory (on disk) = 12.0 MB
>
> and with 500 docs:
>  1. Without Compression size of FS directory (on disk) = 45.9 KB
>  2. With Compression size of FS directory (on disk)  = 56.8 MB
>
> I mean do the compression will increase my disk usage ? if so will 50,000
> docs take around 6000 MB?
> or please tell if am i doing wrong somewhere because i thought compression
> will reduce space usage.
>
> Regards,
> Suraj
> --
> View this message in context: http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27434651.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


erickerickson at gmail

Feb 6, 2010, 7:24 AM

Post #9 of 9 (1200 views)
Permalink
Re: Searching compressed text using CompressionTools [In reply to]

*assuming* that you've made a typo and aren't
really seeing a 1000X difference in size.... Did
you create your index fresh each time (or delete
the old FSDIr) before recording your sizes? If
you're appending to the current index, your
numbers are misleading....

Erick

On Wed, Feb 3, 2010 at 6:01 AM, Suraj Parida <parida.suraj [at] gmail> wrote:

>
> Ian,
>
> Small correction made ...
>
> Thanks for solving my previous problems.
>
>
> Now i tested the compression with 100 docs and found:
> 1. Without Compression size of FS directory (on disk)= 10.8 KB
> 2. With Compression size of FS directory (on disk) = 12.0 MB
>
> and with 500 docs:
> 1. Without Compression size of FS directory (on disk) = 45.9 KB
> 2. With Compression size of FS directory (on disk) = 56.8 MB
>
> I mean do the compression will increase my disk usage ? if so will 50,000
> docs take around 6000 MB?
> or please tell if am i doing wrong somewhere because i thought compression
> will reduce space usage.
>
> Regards,
> Suraj
> --
> View this message in context:
> http://old.nabble.com/Searching-compressed-text-using-CompressionTools-tp27402945p27434651.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.