Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

pruning package- pruneAllPositions

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


zpvie at yahoo

May 2, 2012, 3:11 AM

Post #1 of 5 (464 views)
Permalink
pruning package- pruneAllPositions

Hi,

In the pruning package, pruneAllPositions throws an exception. In the code
it is commented that it should not happen.

// should not happen!
throw new IOException("termPositions.doc > docs[docsPos].doc");

Can you please explain me why it happens and what should I do to fix it?

Thanks in advance,
Best regards
ZP

--
View this message in context: http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


erickerickson at gmail

May 2, 2012, 4:49 AM

Post #2 of 5 (460 views)
Permalink
Re: pruning package- pruneAllPositions [In reply to]

Not unless you provide a lot more context, there's
nothing to go on here!

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Wed, May 2, 2012 at 6:11 AM, Zeynep P. <zpvie [at] yahoo> wrote:
> Hi,
>
> In the pruning package, pruneAllPositions throws an exception. In the code
> it is commented that it should not happen.
>
> // should not happen!
>  throw new IOException("termPositions.doc > docs[docsPos].doc");
>
> Can you please explain me why it happens and what should I do to fix it?
>
> Thanks in advance,
> Best regards
> ZP
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


zpvie at yahoo

May 7, 2012, 7:52 AM

Post #3 of 5 (445 views)
Permalink
Re: pruning package- pruneAllPositions [In reply to]

Thanks for the link. I reviewed it.
Here are more details about the exception:

I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with
MAddDocs: 200000. I wanted to index only a specific period of time so I
added an if statement in doLogic of AddDocTask class.
I tried to prune the index by using pruning package (CarmelTopKPruning) and
I had the exception.

I added System.out.println(term); as the first line of the
initPositionsTerm and System.out.println("***" + term); as the last line of
it. Carmel top k exception comes from pruneAllPositions (throw new
IOException("termPositions.doc > docs[docsPos].doc"); ).

For example, for token body:freely I had the output as follows:

body:freely
***body:freely
body:freely
***body:freely
body:freely
***body:freely
Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
4995)
Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
4996)
Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
4997) ..
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
Carmel topk in exception
body:freely
***body:freely
Carmel topk in exception
Carmel topk in exception
body:freely
***body:freely
body:freely
***body:freely

I hope that my problem is more clear now.

Thanks in advance,
Best Regards
ZP

--
View this message in context: http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762p3968723.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jakedsouza88 at gmail

May 7, 2012, 8:46 AM

Post #4 of 5 (443 views)
Permalink
Re: pruning package- pruneAllPositions [In reply to]

Hi Zeynep,

I was facing the same issue in CarmelUniformTermPruningPolicy in
package org.apache.lucene.index.pruning .

I think the issue is in the while loop condition in following peice of code

*while ((docsPos < (docs.length - 1))*
* && termPositions.doc() > docs[docsPos].doc) {*
* docsPos++;*
* }*
* if (termPositions.doc() == docs[docsPos].doc) {*
* // pass*
* docsPos++; // move to next doc id*
* return false;*
* } else if (termPositions.doc() < docs[docsPos].doc) {*
* return true; // skip this one - it's less important*
* }*
*// should not happen!*
*throw new IOException("termPositions.doc > docs[docsPos].doc");*

in the while loop , docPos will keep getting incremented until the
condition fails which can happen in two cases
1 If *docsPos < (docs.length - 1) or *
* 2 If ** termPositions.doc() > docs[docsPos].doc*
*
*
The error occurs when docsPos < docs.length-1 is false , but
*termPositions.doc()
> docs[docsPos].doc *is still satisfied* . *
*
*
Due to this , the if() { } else if() { } block does not run and the
exception is thrown.

Fix - I added another condition which return true if(docsPos ==
docs.length-1) just above the step which throws the exception

Im not sure if my fix is correct but it seems to be working . Will update
if I am certain .

Regards
Jake


On Mon, May 7, 2012 at 10:52 AM, Zeynep P. <zpvie [at] yahoo> wrote:

> Thanks for the link. I reviewed it.
> Here are more details about the exception:
>
> I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with
> MAddDocs: 200000. I wanted to index only a specific period of time so I
> added an if statement in doLogic of AddDocTask class.
> I tried to prune the index by using pruning package (CarmelTopKPruning) and
> I had the exception.
>
> I added System.out.println(term); as the first line of the
> initPositionsTerm and System.out.println("***" + term); as the last line of
> it. Carmel top k exception comes from pruneAllPositions (throw new
> IOException("termPositions.doc > docs[docsPos].doc"); ).
>
> For example, for token body:freely I had the output as follows:
>
> body:freely
> ***body:freely
> body:freely
> ***body:freely
> body:freely
> ***body:freely
> Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
> 4995)
> Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
> 4996)
> Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() =
> 4997) ..
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> Carmel topk in exception
> body:freely
> ***body:freely
> Carmel topk in exception
> Carmel topk in exception
> body:freely
> ***body:freely
> body:freely
> ***body:freely
>
> I hope that my problem is more clear now.
>
> Thanks in advance,
> Best Regards
> ZP
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762p3968723.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


zpvie at yahoo

Jun 4, 2012, 3:42 AM

Post #5 of 5 (391 views)
Permalink
Re: pruning package- pruneAllPositions [In reply to]

Hi,

Thanks for your fix. I used it but I think there is something wrong with the
fix!!? because
I am using LATimes collection and with epsilon = 0.1 and k =10 I got 97%
pruned index. It means 3% of index left unchanged after pruning. In the the
original paper, "Static index pruning for IR systems", for the same data set
with the same parameters, they have 36.4%. Has enyone used this package with
LATimes dataset?

Thanks in advance,
Best regards

ZP

--
View this message in context: http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762p3987531.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.