Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Spans questions

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


gsingers at apache

Aug 15, 2007, 8:57 AM

Post #1 of 5 (1080 views)
Permalink
Spans questions

Couple of Spans questions for people:

1. Would the docs be clearer for Spans.end() if it said that the
span is not inclusive of the end position? From what I can tell, it
is not inclusive, correct?

2. I have added the following test to TestSpans.java
public void testSpanNearUnOrdered() throws Exception {

SpanNearQuery snq;
SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
{makeSpanTermQuery("u1"),
makeSpanTermQuery("u2")}, 0, false);
snq = new SpanNearQuery(
new SpanQuery[] {
u1u2,
makeSpanTermQuery("u2")
},
1,
false);
spans = snq.getSpans(searcher.getIndexReader());
assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 4, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 3, spans.end());

//Why does this match?
assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 4, spans.doc());
assertEquals("start", 1, spans.start());
assertEquals("end", 3, spans.end());

...
}

My question is why does the second span match? Doc 4 looks like:
"u2 u2 u1" (see the docFields array in TestSpans.java) It seems
incorrect because it is completely inside of the other Span, but
maybe I am just not understanding the slop factor or something about
unordered spans. I would think there would only be one match for
this document since the u1u2 has a slop of 0 and the snq has a slop
of 1 (which shouldn't matter, since there are no other permutations).

In my mind, the correct test should be something like:
public void testSpanNearUnOrdered() throws Exception {

SpanNearQuery snq;
snq = new SpanNearQuery(
new SpanQuery[] {
makeSpanTermQuery("u1"),
makeSpanTermQuery("u2") },
0,
false);
Spans spans = snq.getSpans(searcher.getIndexReader());
assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 4, spans.doc());
assertEquals("start", 1, spans.start());
assertEquals("end", 3, spans.end());

assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 5, spans.doc());
assertEquals("start", 2, spans.start());
assertEquals("end", 4, spans.end());

assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 8, spans.doc());
assertEquals("start", 2, spans.start());
assertEquals("end", 4, spans.end());

assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 9, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 2, spans.end());

assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 10, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 2, spans.end());
assertTrue("Has next and it shouldn't: " + spans.doc(),
spans.next() == false);

SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
{makeSpanTermQuery("u1"),
makeSpanTermQuery("u2")}, 0, false);
snq = new SpanNearQuery(
new SpanQuery[] {
u1u2,
makeSpanTermQuery("u2")
},
1,
false);
spans = snq.getSpans(searcher.getIndexReader());
assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 4, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 3, spans.end());


assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 5, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 4, spans.end());

assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 8, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 5, spans.end());


assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 9, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 5, spans.end());

assertTrue("Does not have next and it should", spans.next());
assertEquals("doc", 10, spans.doc());
assertEquals("start", 0, spans.start());
assertEquals("end", 5, spans.end());
assertTrue("Has next and it shouldn't", spans.next() == false);
}



Thanks,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Aug 30, 2007, 8:21 AM

Post #2 of 5 (1010 views)
Permalink
Re: Spans questions [In reply to]

Grant,

On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:
> Couple of Spans questions for people:
>
> 1. Would the docs be clearer for Spans.end() if it said that the
> span is not inclusive of the end position? From what I can tell, it
> is not inclusive, correct?

Yes. The easiest place to see that is in TermSpans.end(),
which is the term position plus 1, see TermSpans.java line 89.


> 2. I have added the following test to TestSpans.java
> public void testSpanNearUnOrdered() throws Exception {
>
> SpanNearQuery snq;
> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> {makeSpanTermQuery("u1"),
> makeSpanTermQuery("u2")}, 0, false);
> snq = new SpanNearQuery(
> new SpanQuery[] {
> u1u2,
> makeSpanTermQuery("u2")
> },
> 1,
> false);
> spans = snq.getSpans(searcher.getIndexReader());
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 4, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 3, spans.end());
>
> //Why does this match?
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 4, spans.doc());
> assertEquals("start", 1, spans.start());
> assertEquals("end", 3, spans.end());
>
> ...
> }
>
> My question is why does the second span match? Doc 4 looks like:
> "u2 u2 u1" (see the docFields array in TestSpans.java) It seems
> incorrect because it is completely inside of the other Span, but
> maybe I am just not understanding the slop factor or something about
> unordered spans. I would think there would only be one match for
> this document since the u1u2 has a slop of 0 and the snq has a slop
> of 1 (which shouldn't matter, since there are no other permutations).

I split the original NearSpans into an ordered and an unordered
version because there was a bug LUCENE-569 for the ordered
case that was difficult to fix while keeping these two cases
in the same class.

I documented the ordered case in the javadoc of the
NearSpansOrdered class. I also specialized the original
NearSpans class to implement only the unordered case,
and did not add javadoc comments there.

In the current version of NearSpansOrdered the subspans should
not overlap to form a match. I did that to prevent the
ordered spans query "t1 t1" to match all single occurrences of t1.
Btw. similar considerations apply for terms indexed at the same
position. However, iirc there is no test case for a span near query
with the same terms (subspans).

At the time of LUCENE-569 I considered writing separate versions
of ordered/unordered and overlapping/non overlapping, but that
would have resulted in four different cases, and the split into ordered/
unordered was enough to fix the bug, so I left it at that.
The split into ordered and unordered was a split
into (ordered + non overlapping) and (unordered + overlapping),
and this is what you see in your test cases for unordered spans.

To totally clear the semantics of NearSpans, it is probably a good
idea to make all four cases for the subspans separately available.


Regards,
Paul Elschot


P.S. I also remember hesitating between the class names
NearSpansUnordered and NearSpansUnOrdered. In case
you want to change the class name in the trunk to
NearSpansUnOrdered, please do so.

> In my mind, the correct test should be something like:
> public void testSpanNearUnOrdered() throws Exception {
>
> SpanNearQuery snq;
> snq = new SpanNearQuery(
> new SpanQuery[] {
> makeSpanTermQuery("u1"),
> makeSpanTermQuery("u2") },
> 0,
> false);
> Spans spans = snq.getSpans(searcher.getIndexReader());
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 4, spans.doc());
> assertEquals("start", 1, spans.start());
> assertEquals("end", 3, spans.end());
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 5, spans.doc());
> assertEquals("start", 2, spans.start());
> assertEquals("end", 4, spans.end());
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 8, spans.doc());
> assertEquals("start", 2, spans.start());
> assertEquals("end", 4, spans.end());
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 9, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 2, spans.end());
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 10, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 2, spans.end());
> assertTrue("Has next and it shouldn't: " + spans.doc(),
> spans.next() == false);
>
> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> {makeSpanTermQuery("u1"),
> makeSpanTermQuery("u2")}, 0, false);
> snq = new SpanNearQuery(
> new SpanQuery[] {
> u1u2,
> makeSpanTermQuery("u2")
> },
> 1,
> false);
> spans = snq.getSpans(searcher.getIndexReader());
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 4, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 3, spans.end());
>
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 5, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 4, spans.end());
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 8, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 5, spans.end());
>
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 9, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 5, spans.end());
>
> assertTrue("Does not have next and it should", spans.next());
> assertEquals("doc", 10, spans.doc());
> assertEquals("start", 0, spans.start());
> assertEquals("end", 5, spans.end());
> assertTrue("Has next and it shouldn't", spans.next() == false);
> }
>
>
>
> Thanks,
> Grant
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


grant.ingersoll at gmail

Aug 30, 2007, 11:42 AM

Post #3 of 5 (1017 views)
Permalink
Re: Spans questions [In reply to]

On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote:

> Grant,
>
> On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:
>> Couple of Spans questions for people:
>>
>> 1. Would the docs be clearer for Spans.end() if it said that the
>> span is not inclusive of the end position? From what I can tell, it
>> is not inclusive, correct?
>
> Yes. The easiest place to see that is in TermSpans.end(),
> which is the term position plus 1, see TermSpans.java line 89.
>

I will update the docs to make it explicit.

>
>> 2. I have added the following test to TestSpans.java
>> public void testSpanNearUnOrdered() throws Exception {
>>
>> SpanNearQuery snq;
>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
>> {makeSpanTermQuery("u1"),
>> makeSpanTermQuery("u2")}, 0, false);
>> snq = new SpanNearQuery(
>> new SpanQuery[] {
>> u1u2,
>> makeSpanTermQuery("u2")
>> },
>> 1,
>> false);
>> spans = snq.getSpans(searcher.getIndexReader());
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 4, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 3, spans.end());
>>
>> //Why does this match?
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 4, spans.doc());
>> assertEquals("start", 1, spans.start());
>> assertEquals("end", 3, spans.end());
>>
>> ...
>> }
>>
>> My question is why does the second span match? Doc 4 looks like:
>> "u2 u2 u1" (see the docFields array in TestSpans.java) It seems
>> incorrect because it is completely inside of the other Span, but
>> maybe I am just not understanding the slop factor or something about
>> unordered spans. I would think there would only be one match for
>> this document since the u1u2 has a slop of 0 and the snq has a slop
>> of 1 (which shouldn't matter, since there are no other permutations).
>
> I split the original NearSpans into an ordered and an unordered
> version because there was a bug LUCENE-569 for the ordered
> case that was difficult to fix while keeping these two cases
> in the same class.
>
> I documented the ordered case in the javadoc of the
> NearSpansOrdered class. I also specialized the original
> NearSpans class to implement only the unordered case,
> and did not add javadoc comments there.
>
> In the current version of NearSpansOrdered the subspans should
> not overlap to form a match. I did that to prevent the
> ordered spans query "t1 t1" to match all single occurrences of t1.
> Btw. similar considerations apply for terms indexed at the same
> position. However, iirc there is no test case for a span near query
> with the same terms (subspans).
>

> At the time of LUCENE-569 I considered writing separate versions
> of ordered/unordered and overlapping/non overlapping, but that
> would have resulted in four different cases, and the split into
> ordered/
> unordered was enough to fix the bug, so I left it at that.
> The split into ordered and unordered was a split
> into (ordered + non overlapping) and (unordered + overlapping),
> and this is what you see in your test cases for unordered spans.
>
> To totally clear the semantics of NearSpans, it is probably a good
> idea to make all four cases for the subspans separately available.
>
>

Thanks for the info, Paul. This makes sense. I am not sure how I
feel about spans within spans. I think in my test case it isn't that
they are overlapping, the one is a subset of the other, which doesn't
seem correct, but maybe I am wrong. I think you are right, that we
should make the 4 cases explicit.


> Regards,
> Paul Elschot
>
>
> P.S. I also remember hesitating between the class names
> NearSpansUnordered and NearSpansUnOrdered. In case
> you want to change the class name in the trunk to
> NearSpansUnOrdered, please do so.
>

I won't change them. I am never sure how to name those edge cases,
either.

Cheers,
Grant


>> In my mind, the correct test should be something like:
>> public void testSpanNearUnOrdered() throws Exception {
>>
>> SpanNearQuery snq;
>> snq = new SpanNearQuery(
>> new SpanQuery[] {
>> makeSpanTermQuery("u1"),
>> makeSpanTermQuery("u2") },
>> 0,
>> false);
>> Spans spans = snq.getSpans(searcher.getIndexReader());
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 4, spans.doc());
>> assertEquals("start", 1, spans.start());
>> assertEquals("end", 3, spans.end());
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 5, spans.doc());
>> assertEquals("start", 2, spans.start());
>> assertEquals("end", 4, spans.end());
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 8, spans.doc());
>> assertEquals("start", 2, spans.start());
>> assertEquals("end", 4, spans.end());
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 9, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 2, spans.end());
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 10, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 2, spans.end());
>> assertTrue("Has next and it shouldn't: " + spans.doc(),
>> spans.next() == false);
>>
>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
>> {makeSpanTermQuery("u1"),
>> makeSpanTermQuery("u2")}, 0, false);
>> snq = new SpanNearQuery(
>> new SpanQuery[] {
>> u1u2,
>> makeSpanTermQuery("u2")
>> },
>> 1,
>> false);
>> spans = snq.getSpans(searcher.getIndexReader());
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 4, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 3, spans.end());
>>
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 5, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 4, spans.end());
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 8, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 5, spans.end());
>>
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 9, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 5, spans.end());
>>
>> assertTrue("Does not have next and it should", spans.next());
>> assertEquals("doc", 10, spans.doc());
>> assertEquals("start", 0, spans.start());
>> assertEquals("end", 5, spans.end());
>> assertTrue("Has next and it shouldn't", spans.next() == false);
>> }
>>
>>
>>
>> Thanks,
>> Grant
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


grant.ingersoll at gmail

Sep 15, 2007, 7:43 PM

Post #4 of 5 (978 views)
Permalink
Re: Spans questions [In reply to]

On Aug 30, 2007, at 2:42 PM, Grant Ingersoll wrote:

>
> On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote:
>
>> Grant,
>>
>> On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:
>>> Couple of Spans questions for people:
>>>
>>> 1. Would the docs be clearer for Spans.end() if it said that the
>>> span is not inclusive of the end position? From what I can tell, it
>>> is not inclusive, correct?
>>
>> Yes. The easiest place to see that is in TermSpans.end(),
>> which is the term position plus 1, see TermSpans.java line 89.
>>
>
> I will update the docs to make it explicit.
>
>>
>>> 2. I have added the following test to TestSpans.java
>>> public void testSpanNearUnOrdered() throws Exception {
>>>
>>> SpanNearQuery snq;
>>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
>>> {makeSpanTermQuery("u1"),
>>> makeSpanTermQuery("u2")}, 0,
>>> false);
>>> snq = new SpanNearQuery(
>>> new SpanQuery[] {
>>> u1u2,
>>> makeSpanTermQuery("u2")
>>> },
>>> 1,
>>> false);
>>> spans = snq.getSpans(searcher.getIndexReader());
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 4, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 3, spans.end());
>>>
>>> //Why does this match?
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 4, spans.doc());
>>> assertEquals("start", 1, spans.start());
>>> assertEquals("end", 3, spans.end());
>>>
>>> ...
>>> }
>>>
>>> My question is why does the second span match? Doc 4 looks like:
>>> "u2 u2 u1" (see the docFields array in TestSpans.java) It seems
>>> incorrect because it is completely inside of the other Span, but
>>> maybe I am just not understanding the slop factor or something about
>>> unordered spans. I would think there would only be one match for
>>> this document since the u1u2 has a slop of 0 and the snq has a slop
>>> of 1 (which shouldn't matter, since there are no other
>>> permutations).
>>
>> I split the original NearSpans into an ordered and an unordered
>> version because there was a bug LUCENE-569 for the ordered
>> case that was difficult to fix while keeping these two cases
>> in the same class.
>>
>> I documented the ordered case in the javadoc of the
>> NearSpansOrdered class. I also specialized the original
>> NearSpans class to implement only the unordered case,
>> and did not add javadoc comments there.
>>
>> In the current version of NearSpansOrdered the subspans should
>> not overlap to form a match. I did that to prevent the
>> ordered spans query "t1 t1" to match all single occurrences of t1.
>> Btw. similar considerations apply for terms indexed at the same
>> position. However, iirc there is no test case for a span near query
>> with the same terms (subspans).
>>
>
>> At the time of LUCENE-569 I considered writing separate versions
>> of ordered/unordered and overlapping/non overlapping, but that
>> would have resulted in four different cases, and the split into
>> ordered/
>> unordered was enough to fix the bug, so I left it at that.
>> The split into ordered and unordered was a split
>> into (ordered + non overlapping) and (unordered + overlapping),
>> and this is what you see in your test cases for unordered spans.
>>
>> To totally clear the semantics of NearSpans, it is probably a good
>> idea to make all four cases for the subspans separately available.
>>
>>
>
> Thanks for the info, Paul. This makes sense. I am not sure how I
> feel about spans within spans. I think in my test case it isn't
> that they are overlapping, the one is a subset of the other, which
> doesn't seem correct, but maybe I am wrong. I think you are right,
> that we should make the 4 cases explicit.

In thinking about this some more, I think it is actually doing a
reasonable thing, even if it is still a subset of the other, thus I
am going to leave it as is (and update my test). The results that
are returned are "narrower" and I can thus see a case being made for
returning them.

Still, given a doc:
u2 u2 u1

and
SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
{makeSpanTermQuery("u1"),
makeSpanTermQuery("u2")}, 0, false);
snq = new SpanNearQuery(
new SpanQuery[] {
u1u2,
makeSpanTermQuery("u2")
},
1,
false);


I am not totally sure it makes sense to return 0-3 as a span AND 1-3
as Span because the second "u2" is being used to satisfy the u1u2
clause AND the solo "u2" clause in the snq query above. However,
since this behavior has been around for a while and no one has really
complained and I can understand wanting to satisfy the clauses this
way, I can be convinced to leave it alone.

Anyone have opinions otherwise?



>
>
>> Regards,
>> Paul Elschot
>>
>>
>> P.S. I also remember hesitating between the class names
>> NearSpansUnordered and NearSpansUnOrdered. In case
>> you want to change the class name in the trunk to
>> NearSpansUnOrdered, please do so.
>>
>
> I won't change them. I am never sure how to name those edge cases,
> either.
>
> Cheers,
> Grant
>
>
>>> In my mind, the correct test should be something like:
>>> public void testSpanNearUnOrdered() throws Exception {
>>>
>>> SpanNearQuery snq;
>>> snq = new SpanNearQuery(
>>> new SpanQuery[] {
>>> makeSpanTermQuery("u1"),
>>> makeSpanTermQuery("u2") },
>>> 0,
>>> false);
>>> Spans spans = snq.getSpans(searcher.getIndexReader());
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 4, spans.doc());
>>> assertEquals("start", 1, spans.start());
>>> assertEquals("end", 3, spans.end());
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 5, spans.doc());
>>> assertEquals("start", 2, spans.start());
>>> assertEquals("end", 4, spans.end());
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 8, spans.doc());
>>> assertEquals("start", 2, spans.start());
>>> assertEquals("end", 4, spans.end());
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 9, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 2, spans.end());
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 10, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 2, spans.end());
>>> assertTrue("Has next and it shouldn't: " + spans.doc(),
>>> spans.next() == false);
>>>
>>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
>>> {makeSpanTermQuery("u1"),
>>> makeSpanTermQuery("u2")}, 0,
>>> false);
>>> snq = new SpanNearQuery(
>>> new SpanQuery[] {
>>> u1u2,
>>> makeSpanTermQuery("u2")
>>> },
>>> 1,
>>> false);
>>> spans = snq.getSpans(searcher.getIndexReader());
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 4, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 3, spans.end());
>>>
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 5, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 4, spans.end());
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 8, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 5, spans.end());
>>>
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 9, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 5, spans.end());
>>>
>>> assertTrue("Does not have next and it should", spans.next());
>>> assertEquals("doc", 10, spans.doc());
>>> assertEquals("start", 0, spans.start());
>>> assertEquals("end", 5, spans.end());
>>> assertTrue("Has next and it shouldn't", spans.next() == false);
>>> }
>>>
>>>
>>>
>>> Thanks,
>>> Grant
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>>> For additional commands, e-mail: java-dev-help [at] lucene
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>>
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Sep 16, 2007, 1:34 AM

Post #5 of 5 (977 views)
Permalink
Re: Spans questions [In reply to]

Meanwhile it occurred to me that your situation is about containment of spans,
and the one currently implemented is about overlaps and order. Containment
is actually a special case of overlap, but with containment there is less
need to talk about order. Perhaps span containment could even be
treated as a case closely related to SpanNotQuery.

Regards,
Paul Elschot



On Sunday 16 September 2007 04:43, Grant Ingersoll wrote:
>
> On Aug 30, 2007, at 2:42 PM, Grant Ingersoll wrote:
>
> >
> > On Aug 30, 2007, at 11:21 AM, Paul Elschot wrote:
> >
> >> Grant,
> >>
> >> On Wednesday 15 August 2007 17:57, Grant Ingersoll wrote:
> >>> Couple of Spans questions for people:
> >>>
> >>> 1. Would the docs be clearer for Spans.end() if it said that the
> >>> span is not inclusive of the end position? From what I can tell, it
> >>> is not inclusive, correct?
> >>
> >> Yes. The easiest place to see that is in TermSpans.end(),
> >> which is the term position plus 1, see TermSpans.java line 89.
> >>
> >
> > I will update the docs to make it explicit.
> >
> >>
> >>> 2. I have added the following test to TestSpans.java
> >>> public void testSpanNearUnOrdered() throws Exception {
> >>>
> >>> SpanNearQuery snq;
> >>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> >>> {makeSpanTermQuery("u1"),
> >>> makeSpanTermQuery("u2")}, 0,
> >>> false);
> >>> snq = new SpanNearQuery(
> >>> new SpanQuery[] {
> >>> u1u2,
> >>> makeSpanTermQuery("u2")
> >>> },
> >>> 1,
> >>> false);
> >>> spans = snq.getSpans(searcher.getIndexReader());
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 4, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 3, spans.end());
> >>>
> >>> //Why does this match?
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 4, spans.doc());
> >>> assertEquals("start", 1, spans.start());
> >>> assertEquals("end", 3, spans.end());
> >>>
> >>> ...
> >>> }
> >>>
> >>> My question is why does the second span match? Doc 4 looks like:
> >>> "u2 u2 u1" (see the docFields array in TestSpans.java) It seems
> >>> incorrect because it is completely inside of the other Span, but
> >>> maybe I am just not understanding the slop factor or something about
> >>> unordered spans. I would think there would only be one match for
> >>> this document since the u1u2 has a slop of 0 and the snq has a slop
> >>> of 1 (which shouldn't matter, since there are no other
> >>> permutations).
> >>
> >> I split the original NearSpans into an ordered and an unordered
> >> version because there was a bug LUCENE-569 for the ordered
> >> case that was difficult to fix while keeping these two cases
> >> in the same class.
> >>
> >> I documented the ordered case in the javadoc of the
> >> NearSpansOrdered class. I also specialized the original
> >> NearSpans class to implement only the unordered case,
> >> and did not add javadoc comments there.
> >>
> >> In the current version of NearSpansOrdered the subspans should
> >> not overlap to form a match. I did that to prevent the
> >> ordered spans query "t1 t1" to match all single occurrences of t1.
> >> Btw. similar considerations apply for terms indexed at the same
> >> position. However, iirc there is no test case for a span near query
> >> with the same terms (subspans).
> >>
> >
> >> At the time of LUCENE-569 I considered writing separate versions
> >> of ordered/unordered and overlapping/non overlapping, but that
> >> would have resulted in four different cases, and the split into
> >> ordered/
> >> unordered was enough to fix the bug, so I left it at that.
> >> The split into ordered and unordered was a split
> >> into (ordered + non overlapping) and (unordered + overlapping),
> >> and this is what you see in your test cases for unordered spans.
> >>
> >> To totally clear the semantics of NearSpans, it is probably a good
> >> idea to make all four cases for the subspans separately available.
> >>
> >>
> >
> > Thanks for the info, Paul. This makes sense. I am not sure how I
> > feel about spans within spans. I think in my test case it isn't
> > that they are overlapping, the one is a subset of the other, which
> > doesn't seem correct, but maybe I am wrong. I think you are right,
> > that we should make the 4 cases explicit.
>
> In thinking about this some more, I think it is actually doing a
> reasonable thing, even if it is still a subset of the other, thus I
> am going to leave it as is (and update my test). The results that
> are returned are "narrower" and I can thus see a case being made for
> returning them.
>
> Still, given a doc:
> u2 u2 u1
>
> and
> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> {makeSpanTermQuery("u1"),
> makeSpanTermQuery("u2")}, 0, false);
> snq = new SpanNearQuery(
> new SpanQuery[] {
> u1u2,
> makeSpanTermQuery("u2")
> },
> 1,
> false);
>
>
> I am not totally sure it makes sense to return 0-3 as a span AND 1-3
> as Span because the second "u2" is being used to satisfy the u1u2
> clause AND the solo "u2" clause in the snq query above. However,
> since this behavior has been around for a while and no one has really
> complained and I can understand wanting to satisfy the clauses this
> way, I can be convinced to leave it alone.
>
> Anyone have opinions otherwise?
>
>
>
> >
> >
> >> Regards,
> >> Paul Elschot
> >>
> >>
> >> P.S. I also remember hesitating between the class names
> >> NearSpansUnordered and NearSpansUnOrdered. In case
> >> you want to change the class name in the trunk to
> >> NearSpansUnOrdered, please do so.
> >>
> >
> > I won't change them. I am never sure how to name those edge cases,
> > either.
> >
> > Cheers,
> > Grant
> >
> >
> >>> In my mind, the correct test should be something like:
> >>> public void testSpanNearUnOrdered() throws Exception {
> >>>
> >>> SpanNearQuery snq;
> >>> snq = new SpanNearQuery(
> >>> new SpanQuery[] {
> >>> makeSpanTermQuery("u1"),
> >>> makeSpanTermQuery("u2") },
> >>> 0,
> >>> false);
> >>> Spans spans = snq.getSpans(searcher.getIndexReader());
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 4, spans.doc());
> >>> assertEquals("start", 1, spans.start());
> >>> assertEquals("end", 3, spans.end());
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 5, spans.doc());
> >>> assertEquals("start", 2, spans.start());
> >>> assertEquals("end", 4, spans.end());
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 8, spans.doc());
> >>> assertEquals("start", 2, spans.start());
> >>> assertEquals("end", 4, spans.end());
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 9, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 2, spans.end());
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 10, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 2, spans.end());
> >>> assertTrue("Has next and it shouldn't: " + spans.doc(),
> >>> spans.next() == false);
> >>>
> >>> SpanNearQuery u1u2 = new SpanNearQuery(new SpanQuery[]
> >>> {makeSpanTermQuery("u1"),
> >>> makeSpanTermQuery("u2")}, 0,
> >>> false);
> >>> snq = new SpanNearQuery(
> >>> new SpanQuery[] {
> >>> u1u2,
> >>> makeSpanTermQuery("u2")
> >>> },
> >>> 1,
> >>> false);
> >>> spans = snq.getSpans(searcher.getIndexReader());
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 4, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 3, spans.end());
> >>>
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 5, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 4, spans.end());
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 8, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 5, spans.end());
> >>>
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 9, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 5, spans.end());
> >>>
> >>> assertTrue("Does not have next and it should", spans.next());
> >>> assertEquals("doc", 10, spans.doc());
> >>> assertEquals("start", 0, spans.start());
> >>> assertEquals("end", 5, spans.end());
> >>> assertTrue("Has next and it shouldn't", spans.next() == false);
> >>> }
> >>>
> >>>
> >>>
> >>> Thanks,
> >>> Grant
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> >>> For additional commands, e-mail: java-dev-help [at] lucene
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-dev-help [at] lucene
> >>
> >
> > ------------------------------------------------------
> > Grant Ingersoll
> > http://www.grantingersoll.com/
> > http://lucene.grantingersoll.com
> > http://www.paperoftheweek.com/
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> > For additional commands, e-mail: java-dev-help [at] lucene
> >
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.