Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

readVInt, what is it for?

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


blazingwolf7 at gmail

Jul 2, 2008, 2:46 AM

Post #1 of 9 (341 views)
Permalink
readVInt, what is it for?

Hi,

I am fairly new to Lucene and is now currently going through its source
code. I am currently trying to determine how Lucene calculate the frequency
of a term in each document located.

I encounter a method named readVInt() in IndexInput class. It seems
everytime it called this method it will be able to generate the document
number and the frequency of the term in each document.

I am wondering how it work and fail to find and information on it on the
Internet. Could anyone explain it to me? Thanks
--
View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18233802.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


uwe at thetaphi

Jul 2, 2008, 2:59 AM

Post #2 of 9 (332 views)
Permalink
RE: readVInt, what is it for? [In reply to]

A VInt is the way, how integers are stored in the index file in a compressed
and variable length manner.

Read here: http://lucene.apache.org/java/2_3_2/fileformats.html#VInt

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe[at]thetaphi.de

> -----Original Message-----
> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
> Sent: Wednesday, July 02, 2008 11:47 AM
> To: java-dev[at]lucene.apache.org
> Subject: readVInt, what is it for?
>
>
> Hi,
>
> I am fairly new to Lucene and is now currently going through its source
> code. I am currently trying to determine how Lucene calculate the
> frequency
> of a term in each document located.
>
> I encounter a method named readVInt() in IndexInput class. It seems
> everytime it called this method it will be able to generate the document
> number and the frequency of the term in each document.
>
> I am wondering how it work and fail to find and information on it on the
> Internet. Could anyone explain it to me? Thanks
> --
> View this message in context: http://www.nabble.com/readVInt%2C-what-is-
> it-for--tp18233802p18233802.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


blazingwolf7 at gmail

Jul 2, 2008, 5:49 PM

Post #3 of 9 (316 views)
Permalink
RE: readVInt, what is it for? [In reply to]

Thanks, I am clear now on that. But do anyone know where is the frequency of
the term for each document calculated? I mean which class it may be in and
which method?
Thanks


Uwe Schindler wrote:
>
> A VInt is the way, how integers are stored in the index file in a
> compressed
> and variable length manner.
>
> Read here: http://lucene.apache.org/java/2_3_2/fileformats.html#VInt
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe[at]thetaphi.de
>
>> -----Original Message-----
>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>> Sent: Wednesday, July 02, 2008 11:47 AM
>> To: java-dev[at]lucene.apache.org
>> Subject: readVInt, what is it for?
>>
>>
>> Hi,
>>
>> I am fairly new to Lucene and is now currently going through its source
>> code. I am currently trying to determine how Lucene calculate the
>> frequency
>> of a term in each document located.
>>
>> I encounter a method named readVInt() in IndexInput class. It seems
>> everytime it called this method it will be able to generate the document
>> number and the frequency of the term in each document.
>>
>> I am wondering how it work and fail to find and information on it on the
>> Internet. Could anyone explain it to me? Thanks
>> --
>> View this message in context: http://www.nabble.com/readVInt%2C-what-is-
>> it-for--tp18233802p18233802.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18249790.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


yonik at apache

Jul 2, 2008, 6:29 PM

Post #4 of 9 (316 views)
Permalink
Re: readVInt, what is it for? [In reply to]

The frequency is tracked at index time. It's simply a read at query
time. See TermDocs.
If you really want to understand more about the code internals of
Lucene, I'd suggest stepping through more example queries with a
debugger.

-Yonik

On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <blazingwolf7[at]gmail.com> wrote:
>
> Thanks, I am clear now on that. But do anyone know where is the frequency of
> the term for each document calculated? I mean which class it may be in and
> which method?
> Thanks
>
>
> Uwe Schindler wrote:
>>
>> A VInt is the way, how integers are stored in the index file in a
>> compressed
>> and variable length manner.
>>
>> Read here: http://lucene.apache.org/java/2_3_2/fileformats.html#VInt
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe[at]thetaphi.de
>>
>>> -----Original Message-----
>>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>>> Sent: Wednesday, July 02, 2008 11:47 AM
>>> To: java-dev[at]lucene.apache.org
>>> Subject: readVInt, what is it for?
>>>
>>>
>>> Hi,
>>>
>>> I am fairly new to Lucene and is now currently going through its source
>>> code. I am currently trying to determine how Lucene calculate the
>>> frequency
>>> of a term in each document located.
>>>
>>> I encounter a method named readVInt() in IndexInput class. It seems
>>> everytime it called this method it will be able to generate the document
>>> number and the frequency of the term in each document.
>>>
>>> I am wondering how it work and fail to find and information on it on the
>>> Internet. Could anyone explain it to me? Thanks
>>> --
>>> View this message in context: http://www.nabble.com/readVInt%2C-what-is-
>>> it-for--tp18233802p18233802.html
>>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18249790.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


blazingwolf7 at gmail

Jul 2, 2008, 7:04 PM

Post #5 of 9 (313 views)
Permalink
Re: readVInt, what is it for? [In reply to]

Hmmm, I don't think I get it. How is it tracked during index time? I index my
file earlier. Later I will open the index and perform a search. Shouldn't
the frequency of each term in each document found be calculated at during
the searching process?


Yonik Seeley wrote:
>
> The frequency is tracked at index time. It's simply a read at query
> time. See TermDocs.
> If you really want to understand more about the code internals of
> Lucene, I'd suggest stepping through more example queries with a
> debugger.
>
> -Yonik
>
> On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <blazingwolf7[at]gmail.com>
> wrote:
>>
>> Thanks, I am clear now on that. But do anyone know where is the frequency
>> of
>> the term for each document calculated? I mean which class it may be in
>> and
>> which method?
>> Thanks
>>
>>
>> Uwe Schindler wrote:
>>>
>>> A VInt is the way, how integers are stored in the index file in a
>>> compressed
>>> and variable length manner.
>>>
>>> Read here: http://lucene.apache.org/java/2_3_2/fileformats.html#VInt
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe[at]thetaphi.de
>>>
>>>> -----Original Message-----
>>>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>>>> Sent: Wednesday, July 02, 2008 11:47 AM
>>>> To: java-dev[at]lucene.apache.org
>>>> Subject: readVInt, what is it for?
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I am fairly new to Lucene and is now currently going through its source
>>>> code. I am currently trying to determine how Lucene calculate the
>>>> frequency
>>>> of a term in each document located.
>>>>
>>>> I encounter a method named readVInt() in IndexInput class. It seems
>>>> everytime it called this method it will be able to generate the
>>>> document
>>>> number and the frequency of the term in each document.
>>>>
>>>> I am wondering how it work and fail to find and information on it on
>>>> the
>>>> Internet. Could anyone explain it to me? Thanks
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/readVInt%2C-what-is-
>>>> it-for--tp18233802p18233802.html
>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18249790.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18250434.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


yonik at apache

Jul 2, 2008, 7:30 PM

Post #6 of 9 (317 views)
Permalink
Re: readVInt, what is it for? [In reply to]

Lucene creates an inverted index and uses it to search.
Frequency is encoded in the .frq files:
http://lucene.apache.org/java/docs/fileformats.html

-Yonik

On Wed, Jul 2, 2008 at 10:04 PM, blazingwolf7 <blazingwolf7[at]gmail.com> wrote:
>
> Hmmm, I don't think I get it. How is it tracked during index time? I index my
> file earlier. Later I will open the index and perform a search. Shouldn't
> the frequency of each term in each document found be calculated at during
> the searching process?
>
>
> Yonik Seeley wrote:
>>
>> The frequency is tracked at index time. It's simply a read at query
>> time. See TermDocs.
>> If you really want to understand more about the code internals of
>> Lucene, I'd suggest stepping through more example queries with a
>> debugger.
>>
>> -Yonik
>>
>> On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <blazingwolf7[at]gmail.com>
>> wrote:
>>>
>>> Thanks, I am clear now on that. But do anyone know where is the frequency
>>> of
>>> the term for each document calculated? I mean which class it may be in
>>> and
>>> which method?
>>> Thanks
>>>
>>>
>>> Uwe Schindler wrote:
>>>>
>>>> A VInt is the way, how integers are stored in the index file in a
>>>> compressed
>>>> and variable length manner.
>>>>
>>>> Read here: http://lucene.apache.org/java/2_3_2/fileformats.html#VInt
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe[at]thetaphi.de
>>>>
>>>>> -----Original Message-----
>>>>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>>>>> Sent: Wednesday, July 02, 2008 11:47 AM
>>>>> To: java-dev[at]lucene.apache.org
>>>>> Subject: readVInt, what is it for?
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am fairly new to Lucene and is now currently going through its source
>>>>> code. I am currently trying to determine how Lucene calculate the
>>>>> frequency
>>>>> of a term in each document located.
>>>>>
>>>>> I encounter a method named readVInt() in IndexInput class. It seems
>>>>> everytime it called this method it will be able to generate the
>>>>> document
>>>>> number and the frequency of the term in each document.
>>>>>
>>>>> I am wondering how it work and fail to find and information on it on
>>>>> the
>>>>> Internet. Could anyone explain it to me? Thanks
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/readVInt%2C-what-is-
>>>>> it-for--tp18233802p18233802.html
>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18249790.html
>>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18250434.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


gsingers at apache

Jul 2, 2008, 7:37 PM

Post #7 of 9 (316 views)
Permalink
Re: readVInt, what is it for? [In reply to]

I'd suggest starting with a couple of places:
http://lucene.apache.org/java/2_3_2/fileformats.html

and

http://lucene.apache.org/java/2_3_2/scoring.html

and then do as Yonik said and step through the internals, starting
with a simple TermQuery which leads to the TermScorer.

-Grant


On Jul 2, 2008, at 10:04 PM, blazingwolf7 wrote:

>
> Hmmm, I don't think I get it. How is it tracked during index time? I
> index my
> file earlier. Later I will open the index and perform a search.
> Shouldn't
> the frequency of each term in each document found be calculated at
> during
> the searching process?
>
>
> Yonik Seeley wrote:
>>
>> The frequency is tracked at index time. It's simply a read at query
>> time. See TermDocs.
>> If you really want to understand more about the code internals of
>> Lucene, I'd suggest stepping through more example queries with a
>> debugger.
>>
>> -Yonik
>>
>> On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <blazingwolf7[at]gmail.com>
>> wrote:
>>>
>>> Thanks, I am clear now on that. But do anyone know where is the
>>> frequency
>>> of
>>> the term for each document calculated? I mean which class it may
>>> be in
>>> and
>>> which method?
>>> Thanks
>>>
>>>
>>> Uwe Schindler wrote:
>>>>
>>>> A VInt is the way, how integers are stored in the index file in a
>>>> compressed
>>>> and variable length manner.
>>>>
>>>> Read here: http://lucene.apache.org/java/2_3_2/
>>>> fileformats.html#VInt
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe[at]thetaphi.de
>>>>
>>>>> -----Original Message-----
>>>>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>>>>> Sent: Wednesday, July 02, 2008 11:47 AM
>>>>> To: java-dev[at]lucene.apache.org
>>>>> Subject: readVInt, what is it for?
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am fairly new to Lucene and is now currently going through its
>>>>> source
>>>>> code. I am currently trying to determine how Lucene calculate the
>>>>> frequency
>>>>> of a term in each document located.
>>>>>
>>>>> I encounter a method named readVInt() in IndexInput class. It
>>>>> seems
>>>>> everytime it called this method it will be able to generate the
>>>>> document
>>>>> number and the frequency of the term in each document.
>>>>>
>>>>> I am wondering how it work and fail to find and information on
>>>>> it on
>>>>> the
>>>>> Internet. Could anyone explain it to me? Thanks
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/readVInt%2C-what-is-
>>>>> it-for--tp18233802p18233802.html
>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18249790.html
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18250434.html
> Sent from the Lucene - Java Developer mailing list archive at
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


P.Mukherjee at corp

Jul 2, 2008, 9:13 PM

Post #8 of 9 (311 views)
Permalink
RE: readVInt, what is it for? [In reply to]

The slide16 in the following ppt might be of some help. Let me know if
it helps.

http://docs.google.com/Presentation?docid=dmsxgtg_98dbh529dn

-Prasen

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers[at]apache.org]
Sent: Thursday, July 03, 2008 8:08 AM
To: java-dev[at]lucene.apache.org
Subject: Re: readVInt, what is it for?

I'd suggest starting with a couple of places:
http://lucene.apache.org/java/2_3_2/fileformats.html

and

http://lucene.apache.org/java/2_3_2/scoring.html

and then do as Yonik said and step through the internals, starting with
a simple TermQuery which leads to the TermScorer.

-Grant


On Jul 2, 2008, at 10:04 PM, blazingwolf7 wrote:

>
> Hmmm, I don't think I get it. How is it tracked during index time? I
> index my file earlier. Later I will open the index and perform a
> search.
> Shouldn't
> the frequency of each term in each document found be calculated at
> during the searching process?
>
>
> Yonik Seeley wrote:
>>
>> The frequency is tracked at index time. It's simply a read at query
>> time. See TermDocs.
>> If you really want to understand more about the code internals of
>> Lucene, I'd suggest stepping through more example queries with a
>> debugger.
>>
>> -Yonik
>>
>> On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <blazingwolf7[at]gmail.com>
>> wrote:
>>>
>>> Thanks, I am clear now on that. But do anyone know where is the
>>> frequency of the term for each document calculated? I mean which
>>> class it may be in and which method?
>>> Thanks
>>>
>>>
>>> Uwe Schindler wrote:
>>>>
>>>> A VInt is the way, how integers are stored in the index file in a
>>>> compressed and variable length manner.
>>>>
>>>> Read here: http://lucene.apache.org/java/2_3_2/
>>>> fileformats.html#VInt
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>> eMail: uwe[at]thetaphi.de
>>>>
>>>>> -----Original Message-----
>>>>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>>>>> Sent: Wednesday, July 02, 2008 11:47 AM
>>>>> To: java-dev[at]lucene.apache.org
>>>>> Subject: readVInt, what is it for?
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am fairly new to Lucene and is now currently going through its
>>>>> source code. I am currently trying to determine how Lucene
>>>>> calculate the frequency of a term in each document located.
>>>>>
>>>>> I encounter a method named readVInt() in IndexInput class. It
>>>>> seems everytime it called this method it will be able to generate
>>>>> the document number and the frequency of the term in each
>>>>> document.
>>>>>
>>>>> I am wondering how it work and fail to find and information on it
>>>>> on the Internet. Could anyone explain it to me? Thanks
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/readVInt%2C-what-is-
>>>>> it-for--tp18233802p18233802.html
>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --- To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> -- To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p1824979
>>> 0.html Sent from the Lucene - Java Developer mailing list archive at

>>> Nabble.com.
>>>
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18250434.
> html Sent from the Lucene - Java Developer mailing list archive at
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org


blazingwolf7 at gmail

Jul 3, 2008, 1:04 AM

Post #9 of 9 (298 views)
Permalink
RE: readVInt, what is it for? [In reply to]

Thanks for all the help. I understand how it works already. Now I will have
to know how to modify the .frq file. Can anyone help me with this?


Mukherjee, Prasenjit wrote:
>
> The slide16 in the following ppt might be of some help. Let me know if
> it helps.
>
> http://docs.google.com/Presentation?docid=dmsxgtg_98dbh529dn
>
> -Prasen
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers[at]apache.org]
> Sent: Thursday, July 03, 2008 8:08 AM
> To: java-dev[at]lucene.apache.org
> Subject: Re: readVInt, what is it for?
>
> I'd suggest starting with a couple of places:
> http://lucene.apache.org/java/2_3_2/fileformats.html
>
> and
>
> http://lucene.apache.org/java/2_3_2/scoring.html
>
> and then do as Yonik said and step through the internals, starting with
> a simple TermQuery which leads to the TermScorer.
>
> -Grant
>
>
> On Jul 2, 2008, at 10:04 PM, blazingwolf7 wrote:
>
>>
>> Hmmm, I don't think I get it. How is it tracked during index time? I
>> index my file earlier. Later I will open the index and perform a
>> search.
>> Shouldn't
>> the frequency of each term in each document found be calculated at
>> during the searching process?
>>
>>
>> Yonik Seeley wrote:
>>>
>>> The frequency is tracked at index time. It's simply a read at query
>>> time. See TermDocs.
>>> If you really want to understand more about the code internals of
>>> Lucene, I'd suggest stepping through more example queries with a
>>> debugger.
>>>
>>> -Yonik
>>>
>>> On Wed, Jul 2, 2008 at 8:49 PM, blazingwolf7 <blazingwolf7[at]gmail.com>
>>> wrote:
>>>>
>>>> Thanks, I am clear now on that. But do anyone know where is the
>>>> frequency of the term for each document calculated? I mean which
>>>> class it may be in and which method?
>>>> Thanks
>>>>
>>>>
>>>> Uwe Schindler wrote:
>>>>>
>>>>> A VInt is the way, how integers are stored in the index file in a
>>>>> compressed and variable length manner.
>>>>>
>>>>> Read here: http://lucene.apache.org/java/2_3_2/
>>>>> fileformats.html#VInt
>>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>>> eMail: uwe[at]thetaphi.de
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: blazingwolf7 [mailto:blazingwolf7[at]gmail.com]
>>>>>> Sent: Wednesday, July 02, 2008 11:47 AM
>>>>>> To: java-dev[at]lucene.apache.org
>>>>>> Subject: readVInt, what is it for?
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am fairly new to Lucene and is now currently going through its
>>>>>> source code. I am currently trying to determine how Lucene
>>>>>> calculate the frequency of a term in each document located.
>>>>>>
>>>>>> I encounter a method named readVInt() in IndexInput class. It
>>>>>> seems everytime it called this method it will be able to generate
>>>>>> the document number and the frequency of the term in each
>>>>>> document.
>>>>>>
>>>>>> I am wondering how it work and fail to find and information on it
>>>>>> on the Internet. Could anyone explain it to me? Thanks
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/readVInt%2C-what-is-
>>>>>> it-for--tp18233802p18233802.html
>>>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> --- To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------
>>>>> -- To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p1824979
>>>> 0.html Sent from the Lucene - Java Developer mailing list archive at
>
>>>> Nabble.com.
>>>>
>>>>
>>>> --------------------------------------------------------------------
>>>> - To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18250434.
>> html Sent from the Lucene - Java Developer mailing list archive at
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
>> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
> For additional commands, e-mail: java-dev-help[at]lucene.apache.org
>
>
>

--
View this message in context: http://www.nabble.com/readVInt%2C-what-is-it-for--tp18233802p18253849.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-dev-help[at]lucene.apache.org

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.