Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

SpanQuery for Terms at same position

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


ctignor at thinkmap

Nov 19, 2009, 2:28 PM

Post #1 of 17 (1751 views)
Permalink
SpanQuery for Terms at same position

Hello,

I would like to search for all documents that contain both "plan" and "_v"
(my part of speech token for verb) at the same position.
I have tokenized the documents accordingly so these tokens exists at the
same location.

I can achieve programaticaly using PhraseQueries by adding the Terms
explicitly at the same position but I need to be able to recover the Payload
data for each
term found within the matched instance of my query.

Unfortunately the PayloadSpanUtil doesn't seem to return the same results as
the PhraseQuery, possibly becuase it is converting it inoto Spans first
which do not support searching for Terms at the same document position?

Any help appreciated.

thanks,

C>T>

--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


adrianocrestani at gmail

Nov 21, 2009, 7:47 PM

Post #2 of 17 (1667 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Hi,

I didn't test, but you might want to try SpanNearQuery and set slop to zero.
Give it a try and let me know if it worked.

Regards,
Adriano Crestani

On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <ctignor [at] thinkmap>wrote:

> Hello,
>
> I would like to search for all documents that contain both "plan" and "_v"
> (my part of speech token for verb) at the same position.
> I have tokenized the documents accordingly so these tokens exists at the
> same location.
>
> I can achieve programaticaly using PhraseQueries by adding the Terms
> explicitly at the same position but I need to be able to recover the
> Payload
> data for each
> term found within the matched instance of my query.
>
> Unfortunately the PayloadSpanUtil doesn't seem to return the same results
> as
> the PhraseQuery, possibly becuase it is converting it inoto Spans first
> which do not support searching for Terms at the same document position?
>
> Any help appreciated.
>
> thanks,
>
> C>T>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>


paul.elschot at xs4all

Nov 22, 2009, 4:50 AM

Post #3 of 17 (1667 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> Hi,
>
> I didn't test, but you might want to try SpanNearQuery and set slop to zero.
> Give it a try and let me know if it worked.

The slop is the number of positions "in between", so zero would still be too
much to only match at the same position.

SpanNearQuery may or may not work for a slop of -1, but one could try
that for both the ordered and unordered cases.
One way to do that is to start from the existing test cases.

Regards,
Paul Elschot

>
> Regards,
> Adriano Crestani
>
> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <ctignor [at] thinkmap>wrote:
>
> > Hello,
> >
> > I would like to search for all documents that contain both "plan" and "_v"
> > (my part of speech token for verb) at the same position.
> > I have tokenized the documents accordingly so these tokens exists at the
> > same location.
> >
> > I can achieve programaticaly using PhraseQueries by adding the Terms
> > explicitly at the same position but I need to be able to recover the
> > Payload
> > data for each
> > term found within the matched instance of my query.
> >
> > Unfortunately the PayloadSpanUtil doesn't seem to return the same results
> > as
> > the PhraseQuery, possibly becuase it is converting it inoto Spans first
> > which do not support searching for Terms at the same document position?
> >
> > Any help appreciated.
> >
> > thanks,
> >
> > C>T>
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


adrianocrestani at gmail

Nov 22, 2009, 11:11 PM

Post #4 of 17 (1651 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

You are right Paul, 0 would not work, probably something less than zero, as
Paul suggested. Give it a try and tell us if it worked ; )

On Sun, Nov 22, 2009 at 9:50 AM, Paul Elschot <paul.elschot [at] xs4all>wrote:

> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> > Hi,
> >
> > I didn't test, but you might want to try SpanNearQuery and set slop to
> zero.
> > Give it a try and let me know if it worked.
>
> The slop is the number of positions "in between", so zero would still be
> too
> much to only match at the same position.
>
> SpanNearQuery may or may not work for a slop of -1, but one could try
> that for both the ordered and unordered cases.
> One way to do that is to start from the existing test cases.
>
> Regards,
> Paul Elschot
>
> >
> > Regards,
> > Adriano Crestani
> >
> > On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> ctignor [at] thinkmap>wrote:
> >
> > > Hello,
> > >
> > > I would like to search for all documents that contain both "plan" and
> "_v"
> > > (my part of speech token for verb) at the same position.
> > > I have tokenized the documents accordingly so these tokens exists at
> the
> > > same location.
> > >
> > > I can achieve programaticaly using PhraseQueries by adding the Terms
> > > explicitly at the same position but I need to be able to recover the
> > > Payload
> > > data for each
> > > term found within the matched instance of my query.
> > >
> > > Unfortunately the PayloadSpanUtil doesn't seem to return the same
> results
> > > as
> > > the PhraseQuery, possibly becuase it is converting it inoto Spans first
> > > which do not support searching for Terms at the same document position?
> > >
> > > Any help appreciated.
> > >
> > > thanks,
> > >
> > > C>T>
> > >
> > > --
> > > TH!NKMAP
> > >
> > > Christopher Tignor | Senior Software Architect
> > > 155 Spring Street NY, NY 10012
> > > p.212-285-8600 x385 f.212-285-8999
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


ctignor at thinkmap

Nov 23, 2009, 6:20 AM

Post #5 of 17 (1642 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Tested it out. It doesn't work. A slop of zero indicates no words between
the provided terms. E.g. my query of "plan" "_n" returns entries like
"contingency "plan".

My work around for this problem is to use a PhraseQuery, where you can
explicitly set Terms to occur at the same location, t orecover the desired
document ids. Then, because I need the payload data for each match, I create
a SpanTermQuery for all the individual terms used, use a modified version
PayloadSpanUtil to recover only the PayloadSpans for each query from the
document ids collected above and then find the intersection of all these
sets making sure to factor in where the each span starts (the end will just
be one ordinal value after) within each document to ensure they're at the
same position.

Definitely more work than it needs to be I think. Still looking for another
way.



On Sat, Nov 21, 2009 at 10:47 PM, Adriano Crestani <
adrianocrestani [at] gmail> wrote:

> Hi,
>
> I didn't test, but you might want to try SpanNearQuery and set slop to
> zero.
> Give it a try and let me know if it worked.
>
> Regards,
> Adriano Crestani
>
> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <ctignor [at] thinkmap
> >wrote:
>
> > Hello,
> >
> > I would like to search for all documents that contain both "plan" and
> "_v"
> > (my part of speech token for verb) at the same position.
> > I have tokenized the documents accordingly so these tokens exists at the
> > same location.
> >
> > I can achieve programaticaly using PhraseQueries by adding the Terms
> > explicitly at the same position but I need to be able to recover the
> > Payload
> > data for each
> > term found within the matched instance of my query.
> >
> > Unfortunately the PayloadSpanUtil doesn't seem to return the same results
> > as
> > the PhraseQuery, possibly becuase it is converting it inoto Spans first
> > which do not support searching for Terms at the same document position?
> >
> > Any help appreciated.
> >
> > thanks,
> >
> > C>T>
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>



--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


ctignor at thinkmap

Nov 23, 2009, 8:27 AM

Post #6 of 17 (1642 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

A slop of -1 doesn't work either. I get no results returned.

this would be a *really* helpful feature for me if someone might suggest an
implementation as I would really like to be able to do arbitrary span
searches where tokens may be at the same position and also in other
positions where the ordering of subsequent terms may be restricted as per
the normal span API.

thanks,


On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <paul.elschot [at] xs4all>wrote:

> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> > Hi,
> >
> > I didn't test, but you might want to try SpanNearQuery and set slop to
> zero.
> > Give it a try and let me know if it worked.
>
> The slop is the number of positions "in between", so zero would still be
> too
> much to only match at the same position.
>
> SpanNearQuery may or may not work for a slop of -1, but one could try
> that for both the ordered and unordered cases.
> One way to do that is to start from the existing test cases.
>
> Regards,
> Paul Elschot
>
> >
> > Regards,
> > Adriano Crestani
> >
> > On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> ctignor [at] thinkmap>wrote:
> >
> > > Hello,
> > >
> > > I would like to search for all documents that contain both "plan" and
> "_v"
> > > (my part of speech token for verb) at the same position.
> > > I have tokenized the documents accordingly so these tokens exists at
> the
> > > same location.
> > >
> > > I can achieve programaticaly using PhraseQueries by adding the Terms
> > > explicitly at the same position but I need to be able to recover the
> > > Payload
> > > data for each
> > > term found within the matched instance of my query.
> > >
> > > Unfortunately the PayloadSpanUtil doesn't seem to return the same
> results
> > > as
> > > the PhraseQuery, possibly becuase it is converting it inoto Spans first
> > > which do not support searching for Terms at the same document position?
> > >
> > > Any help appreciated.
> > >
> > > thanks,
> > >
> > > C>T>
> > >
> > > --
> > > TH!NKMAP
> > >
> > > Christopher Tignor | Senior Software Architect
> > > 155 Spring Street NY, NY 10012
> > > p.212-285-8600 x385 f.212-285-8999
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


paul.elschot at xs4all

Nov 23, 2009, 8:56 AM

Post #7 of 17 (1647 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Op maandag 23 november 2009 17:27:56 schreef Christopher Tignor:
> A slop of -1 doesn't work either. I get no results returned.

I think the problem is in the NearSpansOrdered.docSpansOrdered methods.
Could you replace the < by <= in there (4 times) and try again?
That will allow spans at the same position to be considered ordered.
From a quick reading of the code both the unordered and ordered cases
might work for a slop of -1 with that modification.

>
> this would be a *really* helpful feature for me if someone might suggest an
> implementation as I would really like to be able to do arbitrary span
> searches where tokens may be at the same position and also in other
> positions where the ordering of subsequent terms may be restricted as per
> the normal span API.

My pleasure,
Paul Elschot

>
> thanks,
>
> C>T>
>
> On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <paul.elschot [at] xs4all>wrote:
>
> > Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> > > Hi,
> > >
> > > I didn't test, but you might want to try SpanNearQuery and set slop to
> > zero.
> > > Give it a try and let me know if it worked.
> >
> > The slop is the number of positions "in between", so zero would still be
> > too
> > much to only match at the same position.
> >
> > SpanNearQuery may or may not work for a slop of -1, but one could try
> > that for both the ordered and unordered cases.
> > One way to do that is to start from the existing test cases.
> >
> > Regards,
> > Paul Elschot
> >
> > >
> > > Regards,
> > > Adriano Crestani
> > >
> > > On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> > ctignor [at] thinkmap>wrote:
> > >
> > > > Hello,
> > > >
> > > > I would like to search for all documents that contain both "plan" and
> > "_v"
> > > > (my part of speech token for verb) at the same position.
> > > > I have tokenized the documents accordingly so these tokens exists at
> > the
> > > > same location.
> > > >
> > > > I can achieve programaticaly using PhraseQueries by adding the Terms
> > > > explicitly at the same position but I need to be able to recover the
> > > > Payload
> > > > data for each
> > > > term found within the matched instance of my query.
> > > >
> > > > Unfortunately the PayloadSpanUtil doesn't seem to return the same
> > results
> > > > as
> > > > the PhraseQuery, possibly becuase it is converting it inoto Spans first
> > > > which do not support searching for Terms at the same document position?
> > > >
> > > > Any help appreciated.
> > > >
> > > > thanks,
> > > >
> > > > C>T>
> > > >
> > > > --
> > > > TH!NKMAP
> > > >
> > > > Christopher Tignor | Senior Software Architect
> > > > 155 Spring Street NY, NY 10012
> > > > p.212-285-8600 x385 f.212-285-8999
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


markrmiller at gmail

Nov 23, 2009, 8:59 AM

Post #8 of 17 (1651 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Your trying -1 with ordered right? Try it with non ordered.

Christopher Tignor wrote:
> A slop of -1 doesn't work either. I get no results returned.
>
> this would be a *really* helpful feature for me if someone might suggest an
> implementation as I would really like to be able to do arbitrary span
> searches where tokens may be at the same position and also in other
> positions where the ordering of subsequent terms may be restricted as per
> the normal span API.
>
> thanks,
>
> C>T>
>
> On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <paul.elschot [at] xs4all>wrote:
>
>
>> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
>>
>>> Hi,
>>>
>>> I didn't test, but you might want to try SpanNearQuery and set slop to
>>>
>> zero.
>>
>>> Give it a try and let me know if it worked.
>>>
>> The slop is the number of positions "in between", so zero would still be
>> too
>> much to only match at the same position.
>>
>> SpanNearQuery may or may not work for a slop of -1, but one could try
>> that for both the ordered and unordered cases.
>> One way to do that is to start from the existing test cases.
>>
>> Regards,
>> Paul Elschot
>>
>>
>>> Regards,
>>> Adriano Crestani
>>>
>>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
>>>
>> ctignor [at] thinkmap>wrote:
>>
>>>> Hello,
>>>>
>>>> I would like to search for all documents that contain both "plan" and
>>>>
>> "_v"
>>
>>>> (my part of speech token for verb) at the same position.
>>>> I have tokenized the documents accordingly so these tokens exists at
>>>>
>> the
>>
>>>> same location.
>>>>
>>>> I can achieve programaticaly using PhraseQueries by adding the Terms
>>>> explicitly at the same position but I need to be able to recover the
>>>> Payload
>>>> data for each
>>>> term found within the matched instance of my query.
>>>>
>>>> Unfortunately the PayloadSpanUtil doesn't seem to return the same
>>>>
>> results
>>
>>>> as
>>>> the PhraseQuery, possibly becuase it is converting it inoto Spans first
>>>> which do not support searching for Terms at the same document position?
>>>>
>>>> Any help appreciated.
>>>>
>>>> thanks,
>>>>
>>>> C>T>
>>>>
>>>> --
>>>> TH!NKMAP
>>>>
>>>> Christopher Tignor | Senior Software Architect
>>>> 155 Spring Street NY, NY 10012
>>>> p.212-285-8600 x385 f.212-285-8999
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ctignor at thinkmap

Nov 23, 2009, 9:26 AM

Post #9 of 17 (1643 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Thanks so much for this.

Using an un-ordered query, the -1 slop indeed returns the correct results,
matching tokens at the same position.

I tried the same query but ordered both after and before rebuilding the
source with Paul's changes to NearSpansOrdered but the query was still
failing, returning no results.


On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <markrmiller [at] gmail> wrote:

> Your trying -1 with ordered right? Try it with non ordered.
>
> Christopher Tignor wrote:
> > A slop of -1 doesn't work either. I get no results returned.
> >
> > this would be a *really* helpful feature for me if someone might suggest
> an
> > implementation as I would really like to be able to do arbitrary span
> > searches where tokens may be at the same position and also in other
> > positions where the ordering of subsequent terms may be restricted as per
> > the normal span API.
> >
> > thanks,
> >
> > C>T>
> >
> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <paul.elschot [at] xs4all
> >wrote:
> >
> >
> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> >>
> >>> Hi,
> >>>
> >>> I didn't test, but you might want to try SpanNearQuery and set slop to
> >>>
> >> zero.
> >>
> >>> Give it a try and let me know if it worked.
> >>>
> >> The slop is the number of positions "in between", so zero would still be
> >> too
> >> much to only match at the same position.
> >>
> >> SpanNearQuery may or may not work for a slop of -1, but one could try
> >> that for both the ordered and unordered cases.
> >> One way to do that is to start from the existing test cases.
> >>
> >> Regards,
> >> Paul Elschot
> >>
> >>
> >>> Regards,
> >>> Adriano Crestani
> >>>
> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> >>>
> >> ctignor [at] thinkmap>wrote:
> >>
> >>>> Hello,
> >>>>
> >>>> I would like to search for all documents that contain both "plan" and
> >>>>
> >> "_v"
> >>
> >>>> (my part of speech token for verb) at the same position.
> >>>> I have tokenized the documents accordingly so these tokens exists at
> >>>>
> >> the
> >>
> >>>> same location.
> >>>>
> >>>> I can achieve programaticaly using PhraseQueries by adding the Terms
> >>>> explicitly at the same position but I need to be able to recover the
> >>>> Payload
> >>>> data for each
> >>>> term found within the matched instance of my query.
> >>>>
> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the same
> >>>>
> >> results
> >>
> >>>> as
> >>>> the PhraseQuery, possibly becuase it is converting it inoto Spans
> first
> >>>> which do not support searching for Terms at the same document
> position?
> >>>>
> >>>> Any help appreciated.
> >>>>
> >>>> thanks,
> >>>>
> >>>> C>T>
> >>>>
> >>>> --
> >>>> TH!NKMAP
> >>>>
> >>>> Christopher Tignor | Senior Software Architect
> >>>> 155 Spring Street NY, NY 10012
> >>>> p.212-285-8600 x385 f.212-285-8999
> >>>>
> >>>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >>
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


ctignor at thinkmap

Nov 23, 2009, 11:07 AM

Post #10 of 17 (1646 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Also, I noticed that with the above edit to NearSpansOrdered I am getting
erroneous results fo normal ordered searches using searches like:

"_n" followed by "work"

where because "_n" and "work" are at the same position the code changes
accept their pairing as a valid in-order result now that the eqaul to clause
has been added to the inequality.


On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
<ctignor [at] thinkmap>wrote:

> Thanks so much for this.
>
> Using an un-ordered query, the -1 slop indeed returns the correct results,
> matching tokens at the same position.
>
> I tried the same query but ordered both after and before rebuilding the
> source with Paul's changes to NearSpansOrdered but the query was still
> failing, returning no results.
>
> C>T>
>
>
> On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <markrmiller [at] gmail>wrote:
>
>> Your trying -1 with ordered right? Try it with non ordered.
>>
>> Christopher Tignor wrote:
>> > A slop of -1 doesn't work either. I get no results returned.
>> >
>> > this would be a *really* helpful feature for me if someone might suggest
>> an
>> > implementation as I would really like to be able to do arbitrary span
>> > searches where tokens may be at the same position and also in other
>> > positions where the ordering of subsequent terms may be restricted as
>> per
>> > the normal span API.
>> >
>> > thanks,
>> >
>> > C>T>
>> >
>> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <paul.elschot [at] xs4all
>> >wrote:
>> >
>> >
>> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
>> >>
>> >>> Hi,
>> >>>
>> >>> I didn't test, but you might want to try SpanNearQuery and set slop to
>> >>>
>> >> zero.
>> >>
>> >>> Give it a try and let me know if it worked.
>> >>>
>> >> The slop is the number of positions "in between", so zero would still
>> be
>> >> too
>> >> much to only match at the same position.
>> >>
>> >> SpanNearQuery may or may not work for a slop of -1, but one could try
>> >> that for both the ordered and unordered cases.
>> >> One way to do that is to start from the existing test cases.
>> >>
>> >> Regards,
>> >> Paul Elschot
>> >>
>> >>
>> >>> Regards,
>> >>> Adriano Crestani
>> >>>
>> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
>> >>>
>> >> ctignor [at] thinkmap>wrote:
>> >>
>> >>>> Hello,
>> >>>>
>> >>>> I would like to search for all documents that contain both "plan" and
>> >>>>
>> >> "_v"
>> >>
>> >>>> (my part of speech token for verb) at the same position.
>> >>>> I have tokenized the documents accordingly so these tokens exists at
>> >>>>
>> >> the
>> >>
>> >>>> same location.
>> >>>>
>> >>>> I can achieve programaticaly using PhraseQueries by adding the Terms
>> >>>> explicitly at the same position but I need to be able to recover the
>> >>>> Payload
>> >>>> data for each
>> >>>> term found within the matched instance of my query.
>> >>>>
>> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the same
>> >>>>
>> >> results
>> >>
>> >>>> as
>> >>>> the PhraseQuery, possibly becuase it is converting it inoto Spans
>> first
>> >>>> which do not support searching for Terms at the same document
>> position?
>> >>>>
>> >>>> Any help appreciated.
>> >>>>
>> >>>> thanks,
>> >>>>
>> >>>> C>T>
>> >>>>
>> >>>> --
>> >>>> TH!NKMAP
>> >>>>
>> >>>> Christopher Tignor | Senior Software Architect
>> >>>> 155 Spring Street NY, NY 10012
>> >>>> p.212-285-8600 x385 f.212-285-8999
>> >>>>
>> >>>>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >>
>> >>
>> >
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>



--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


paul.elschot at xs4all

Nov 23, 2009, 2:50 PM

Post #11 of 17 (1634 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> Also, I noticed that with the above edit to NearSpansOrdered I am getting
> erroneous results fo normal ordered searches using searches like:
>
> "_n" followed by "work"
>
> where because "_n" and "work" are at the same position the code changes
> accept their pairing as a valid in-order result now that the eqaul to clause
> has been added to the inequality.

Thanks for trying this. Indeed the "followed by" semantics is broken for
the ordered case when spans at the same positions are considered
ordered.

Did I understand correctly that the unordered case with a slop of -1
and without the edit works to match terms at the same position?
In that case it may be worthwhile to add that to the javadocs,
and also add a few testcases.

Regards,
Paul Elschot

>
> C>T>
>
> On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
> <ctignor [at] thinkmap>wrote:
>
> > Thanks so much for this.
> >
> > Using an un-ordered query, the -1 slop indeed returns the correct results,
> > matching tokens at the same position.
> >
> > I tried the same query but ordered both after and before rebuilding the
> > source with Paul's changes to NearSpansOrdered but the query was still
> > failing, returning no results.
> >
> > C>T>
> >
> >
> > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <markrmiller [at] gmail>wrote:
> >
> >> Your trying -1 with ordered right? Try it with non ordered.
> >>
> >> Christopher Tignor wrote:
> >> > A slop of -1 doesn't work either. I get no results returned.
> >> >
> >> > this would be a *really* helpful feature for me if someone might suggest
> >> an
> >> > implementation as I would really like to be able to do arbitrary span
> >> > searches where tokens may be at the same position and also in other
> >> > positions where the ordering of subsequent terms may be restricted as
> >> per
> >> > the normal span API.
> >> >
> >> > thanks,
> >> >
> >> > C>T>
> >> >
> >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <paul.elschot [at] xs4all
> >> >wrote:
> >> >
> >> >
> >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> I didn't test, but you might want to try SpanNearQuery and set slop to
> >> >>>
> >> >> zero.
> >> >>
> >> >>> Give it a try and let me know if it worked.
> >> >>>
> >> >> The slop is the number of positions "in between", so zero would still
> >> be
> >> >> too
> >> >> much to only match at the same position.
> >> >>
> >> >> SpanNearQuery may or may not work for a slop of -1, but one could try
> >> >> that for both the ordered and unordered cases.
> >> >> One way to do that is to start from the existing test cases.
> >> >>
> >> >> Regards,
> >> >> Paul Elschot
> >> >>
> >> >>
> >> >>> Regards,
> >> >>> Adriano Crestani
> >> >>>
> >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> >> >>>
> >> >> ctignor [at] thinkmap>wrote:
> >> >>
> >> >>>> Hello,
> >> >>>>
> >> >>>> I would like to search for all documents that contain both "plan" and
> >> >>>>
> >> >> "_v"
> >> >>
> >> >>>> (my part of speech token for verb) at the same position.
> >> >>>> I have tokenized the documents accordingly so these tokens exists at
> >> >>>>
> >> >> the
> >> >>
> >> >>>> same location.
> >> >>>>
> >> >>>> I can achieve programaticaly using PhraseQueries by adding the Terms
> >> >>>> explicitly at the same position but I need to be able to recover the
> >> >>>> Payload
> >> >>>> data for each
> >> >>>> term found within the matched instance of my query.
> >> >>>>
> >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the same
> >> >>>>
> >> >> results
> >> >>
> >> >>>> as
> >> >>>> the PhraseQuery, possibly becuase it is converting it inoto Spans
> >> first
> >> >>>> which do not support searching for Terms at the same document
> >> position?
> >> >>>>
> >> >>>> Any help appreciated.
> >> >>>>
> >> >>>> thanks,
> >> >>>>
> >> >>>> C>T>
> >> >>>>
> >> >>>> --
> >> >>>> TH!NKMAP
> >> >>>>
> >> >>>> Christopher Tignor | Senior Software Architect
> >> >>>> 155 Spring Street NY, NY 10012
> >> >>>> p.212-285-8600 x385 f.212-285-8999
> >> >>>>
> >> >>>>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ctignor at thinkmap

Nov 24, 2009, 6:17 AM

Post #12 of 17 (1613 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

yes that indeed works for me.

thanks,


On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot <paul.elschot [at] xs4all>wrote:

> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> > Also, I noticed that with the above edit to NearSpansOrdered I am getting
> > erroneous results fo normal ordered searches using searches like:
> >
> > "_n" followed by "work"
> >
> > where because "_n" and "work" are at the same position the code changes
> > accept their pairing as a valid in-order result now that the eqaul to
> clause
> > has been added to the inequality.
>
> Thanks for trying this. Indeed the "followed by" semantics is broken for
> the ordered case when spans at the same positions are considered
> ordered.
>
> Did I understand correctly that the unordered case with a slop of -1
> and without the edit works to match terms at the same position?
> In that case it may be worthwhile to add that to the javadocs,
> and also add a few testcases.
>
> Regards,
> Paul Elschot
>
> >
> > C>T>
> >
> > On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
> > <ctignor [at] thinkmap>wrote:
> >
> > > Thanks so much for this.
> > >
> > > Using an un-ordered query, the -1 slop indeed returns the correct
> results,
> > > matching tokens at the same position.
> > >
> > > I tried the same query but ordered both after and before rebuilding the
> > > source with Paul's changes to NearSpansOrdered but the query was still
> > > failing, returning no results.
> > >
> > > C>T>
> > >
> > >
> > > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <markrmiller [at] gmail
> >wrote:
> > >
> > >> Your trying -1 with ordered right? Try it with non ordered.
> > >>
> > >> Christopher Tignor wrote:
> > >> > A slop of -1 doesn't work either. I get no results returned.
> > >> >
> > >> > this would be a *really* helpful feature for me if someone might
> suggest
> > >> an
> > >> > implementation as I would really like to be able to do arbitrary
> span
> > >> > searches where tokens may be at the same position and also in other
> > >> > positions where the ordering of subsequent terms may be restricted
> as
> > >> per
> > >> > the normal span API.
> > >> >
> > >> > thanks,
> > >> >
> > >> > C>T>
> > >> >
> > >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <
> paul.elschot [at] xs4all
> > >> >wrote:
> > >> >
> > >> >
> > >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> > >> >>
> > >> >>> Hi,
> > >> >>>
> > >> >>> I didn't test, but you might want to try SpanNearQuery and set
> slop to
> > >> >>>
> > >> >> zero.
> > >> >>
> > >> >>> Give it a try and let me know if it worked.
> > >> >>>
> > >> >> The slop is the number of positions "in between", so zero would
> still
> > >> be
> > >> >> too
> > >> >> much to only match at the same position.
> > >> >>
> > >> >> SpanNearQuery may or may not work for a slop of -1, but one could
> try
> > >> >> that for both the ordered and unordered cases.
> > >> >> One way to do that is to start from the existing test cases.
> > >> >>
> > >> >> Regards,
> > >> >> Paul Elschot
> > >> >>
> > >> >>
> > >> >>> Regards,
> > >> >>> Adriano Crestani
> > >> >>>
> > >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> > >> >>>
> > >> >> ctignor [at] thinkmap>wrote:
> > >> >>
> > >> >>>> Hello,
> > >> >>>>
> > >> >>>> I would like to search for all documents that contain both "plan"
> and
> > >> >>>>
> > >> >> "_v"
> > >> >>
> > >> >>>> (my part of speech token for verb) at the same position.
> > >> >>>> I have tokenized the documents accordingly so these tokens exists
> at
> > >> >>>>
> > >> >> the
> > >> >>
> > >> >>>> same location.
> > >> >>>>
> > >> >>>> I can achieve programaticaly using PhraseQueries by adding the
> Terms
> > >> >>>> explicitly at the same position but I need to be able to recover
> the
> > >> >>>> Payload
> > >> >>>> data for each
> > >> >>>> term found within the matched instance of my query.
> > >> >>>>
> > >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the same
> > >> >>>>
> > >> >> results
> > >> >>
> > >> >>>> as
> > >> >>>> the PhraseQuery, possibly becuase it is converting it inoto Spans
> > >> first
> > >> >>>> which do not support searching for Terms at the same document
> > >> position?
> > >> >>>>
> > >> >>>> Any help appreciated.
> > >> >>>>
> > >> >>>> thanks,
> > >> >>>>
> > >> >>>> C>T>
> > >> >>>>
> > >> >>>> --
> > >> >>>> TH!NKMAP
> > >> >>>>
> > >> >>>> Christopher Tignor | Senior Software Architect
> > >> >>>> 155 Spring Street NY, NY 10012
> > >> >>>> p.212-285-8600 x385 f.212-285-8999
> > >> >>>>
> > >> >>>>
> > >> >>
> ---------------------------------------------------------------------
> > >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > >> >> For additional commands, e-mail: java-user-help [at] lucene
> > >> >>
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > >> For additional commands, e-mail: java-user-help [at] lucene
> > >>
> > >>
> > >
> > >
> > > --
> > > TH!NKMAP
> > >
> > > Christopher Tignor | Senior Software Architect
> > > 155 Spring Street NY, NY 10012
> > > p.212-285-8600 x385 f.212-285-8999
> > >
> >
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


ctignor at thinkmap

Nov 25, 2009, 12:20 PM

Post #13 of 17 (1585 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

It's worth noting however that this -1 slop doesn't seem to work for cases
where oyu want to discover instances of more than two terms at the same
position. Would be nice to be able to explicitly set this in the query
construction.

thanks,

On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor <ctignor [at] thinkmap>wrote:

> yes that indeed works for me.
>
> thanks,
>
> C>T>
>
>
> On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot <paul.elschot [at] xs4all>wrote:
>
>> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
>> > Also, I noticed that with the above edit to NearSpansOrdered I am
>> getting
>> > erroneous results fo normal ordered searches using searches like:
>> >
>> > "_n" followed by "work"
>> >
>> > where because "_n" and "work" are at the same position the code changes
>> > accept their pairing as a valid in-order result now that the eqaul to
>> clause
>> > has been added to the inequality.
>>
>> Thanks for trying this. Indeed the "followed by" semantics is broken for
>> the ordered case when spans at the same positions are considered
>> ordered.
>>
>> Did I understand correctly that the unordered case with a slop of -1
>> and without the edit works to match terms at the same position?
>> In that case it may be worthwhile to add that to the javadocs,
>> and also add a few testcases.
>>
>> Regards,
>> Paul Elschot
>>
>> >
>> > C>T>
>> >
>> > On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
>> > <ctignor [at] thinkmap>wrote:
>> >
>> > > Thanks so much for this.
>> > >
>> > > Using an un-ordered query, the -1 slop indeed returns the correct
>> results,
>> > > matching tokens at the same position.
>> > >
>> > > I tried the same query but ordered both after and before rebuilding
>> the
>> > > source with Paul's changes to NearSpansOrdered but the query was still
>> > > failing, returning no results.
>> > >
>> > > C>T>
>> > >
>> > >
>> > > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <markrmiller [at] gmail
>> >wrote:
>> > >
>> > >> Your trying -1 with ordered right? Try it with non ordered.
>> > >>
>> > >> Christopher Tignor wrote:
>> > >> > A slop of -1 doesn't work either. I get no results returned.
>> > >> >
>> > >> > this would be a *really* helpful feature for me if someone might
>> suggest
>> > >> an
>> > >> > implementation as I would really like to be able to do arbitrary
>> span
>> > >> > searches where tokens may be at the same position and also in other
>> > >> > positions where the ordering of subsequent terms may be restricted
>> as
>> > >> per
>> > >> > the normal span API.
>> > >> >
>> > >> > thanks,
>> > >> >
>> > >> > C>T>
>> > >> >
>> > >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <
>> paul.elschot [at] xs4all
>> > >> >wrote:
>> > >> >
>> > >> >
>> > >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
>> > >> >>
>> > >> >>> Hi,
>> > >> >>>
>> > >> >>> I didn't test, but you might want to try SpanNearQuery and set
>> slop to
>> > >> >>>
>> > >> >> zero.
>> > >> >>
>> > >> >>> Give it a try and let me know if it worked.
>> > >> >>>
>> > >> >> The slop is the number of positions "in between", so zero would
>> still
>> > >> be
>> > >> >> too
>> > >> >> much to only match at the same position.
>> > >> >>
>> > >> >> SpanNearQuery may or may not work for a slop of -1, but one could
>> try
>> > >> >> that for both the ordered and unordered cases.
>> > >> >> One way to do that is to start from the existing test cases.
>> > >> >>
>> > >> >> Regards,
>> > >> >> Paul Elschot
>> > >> >>
>> > >> >>
>> > >> >>> Regards,
>> > >> >>> Adriano Crestani
>> > >> >>>
>> > >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
>> > >> >>>
>> > >> >> ctignor [at] thinkmap>wrote:
>> > >> >>
>> > >> >>>> Hello,
>> > >> >>>>
>> > >> >>>> I would like to search for all documents that contain both
>> "plan" and
>> > >> >>>>
>> > >> >> "_v"
>> > >> >>
>> > >> >>>> (my part of speech token for verb) at the same position.
>> > >> >>>> I have tokenized the documents accordingly so these tokens
>> exists at
>> > >> >>>>
>> > >> >> the
>> > >> >>
>> > >> >>>> same location.
>> > >> >>>>
>> > >> >>>> I can achieve programaticaly using PhraseQueries by adding the
>> Terms
>> > >> >>>> explicitly at the same position but I need to be able to recover
>> the
>> > >> >>>> Payload
>> > >> >>>> data for each
>> > >> >>>> term found within the matched instance of my query.
>> > >> >>>>
>> > >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the
>> same
>> > >> >>>>
>> > >> >> results
>> > >> >>
>> > >> >>>> as
>> > >> >>>> the PhraseQuery, possibly becuase it is converting it inoto
>> Spans
>> > >> first
>> > >> >>>> which do not support searching for Terms at the same document
>> > >> position?
>> > >> >>>>
>> > >> >>>> Any help appreciated.
>> > >> >>>>
>> > >> >>>> thanks,
>> > >> >>>>
>> > >> >>>> C>T>
>> > >> >>>>
>> > >> >>>> --
>> > >> >>>> TH!NKMAP
>> > >> >>>>
>> > >> >>>> Christopher Tignor | Senior Software Architect
>> > >> >>>> 155 Spring Street NY, NY 10012
>> > >> >>>> p.212-285-8600 x385 f.212-285-8999
>> > >> >>>>
>> > >> >>>>
>> > >> >>
>> ---------------------------------------------------------------------
>> > >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > >> >> For additional commands, e-mail: java-user-help [at] lucene
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > >> For additional commands, e-mail: java-user-help [at] lucene
>> > >>
>> > >>
>> > >
>> > >
>> > > --
>> > > TH!NKMAP
>> > >
>> > > Christopher Tignor | Senior Software Architect
>> > > 155 Spring Street NY, NY 10012
>> > > p.212-285-8600 x385 f.212-285-8999
>> > >
>> >
>> >
>> >
>> > --
>> > TH!NKMAP
>> >
>> > Christopher Tignor | Senior Software Architect
>> > 155 Spring Street NY, NY 10012
>> > p.212-285-8600 x385 f.212-285-8999
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>



--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


paul.elschot at xs4all

Nov 25, 2009, 1:25 PM

Post #14 of 17 (1590 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> It's worth noting however that this -1 slop doesn't seem to work for cases
> where oyu want to discover instances of more than two terms at the same
> position. Would be nice to be able to explicitly set this in the query
> construction.

I think requiring n terms at the same position would need a slop of 1-n,
and I'd like to have some test cases added for that.
Now if I only had some time...

Regards,
Paul Elschot

>
> thanks,
>
> C>T>
> On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor <ctignor [at] thinkmap>wrote:
>
> > yes that indeed works for me.
> >
> > thanks,
> >
> > C>T>
> >
> >
> > On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot <paul.elschot [at] xs4all>wrote:
> >
> >> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> >> > Also, I noticed that with the above edit to NearSpansOrdered I am
> >> getting
> >> > erroneous results fo normal ordered searches using searches like:
> >> >
> >> > "_n" followed by "work"
> >> >
> >> > where because "_n" and "work" are at the same position the code changes
> >> > accept their pairing as a valid in-order result now that the eqaul to
> >> clause
> >> > has been added to the inequality.
> >>
> >> Thanks for trying this. Indeed the "followed by" semantics is broken for
> >> the ordered case when spans at the same positions are considered
> >> ordered.
> >>
> >> Did I understand correctly that the unordered case with a slop of -1
> >> and without the edit works to match terms at the same position?
> >> In that case it may be worthwhile to add that to the javadocs,
> >> and also add a few testcases.
> >>
> >> Regards,
> >> Paul Elschot
> >>
> >> >
> >> > C>T>
> >> >
> >> > On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
> >> > <ctignor [at] thinkmap>wrote:
> >> >
> >> > > Thanks so much for this.
> >> > >
> >> > > Using an un-ordered query, the -1 slop indeed returns the correct
> >> results,
> >> > > matching tokens at the same position.
> >> > >
> >> > > I tried the same query but ordered both after and before rebuilding
> >> the
> >> > > source with Paul's changes to NearSpansOrdered but the query was still
> >> > > failing, returning no results.
> >> > >
> >> > > C>T>
> >> > >
> >> > >
> >> > > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <markrmiller [at] gmail
> >> >wrote:
> >> > >
> >> > >> Your trying -1 with ordered right? Try it with non ordered.
> >> > >>
> >> > >> Christopher Tignor wrote:
> >> > >> > A slop of -1 doesn't work either. I get no results returned.
> >> > >> >
> >> > >> > this would be a *really* helpful feature for me if someone might
> >> suggest
> >> > >> an
> >> > >> > implementation as I would really like to be able to do arbitrary
> >> span
> >> > >> > searches where tokens may be at the same position and also in other
> >> > >> > positions where the ordering of subsequent terms may be restricted
> >> as
> >> > >> per
> >> > >> > the normal span API.
> >> > >> >
> >> > >> > thanks,
> >> > >> >
> >> > >> > C>T>
> >> > >> >
> >> > >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <
> >> paul.elschot [at] xs4all
> >> > >> >wrote:
> >> > >> >
> >> > >> >
> >> > >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> >> > >> >>
> >> > >> >>> Hi,
> >> > >> >>>
> >> > >> >>> I didn't test, but you might want to try SpanNearQuery and set
> >> slop to
> >> > >> >>>
> >> > >> >> zero.
> >> > >> >>
> >> > >> >>> Give it a try and let me know if it worked.
> >> > >> >>>
> >> > >> >> The slop is the number of positions "in between", so zero would
> >> still
> >> > >> be
> >> > >> >> too
> >> > >> >> much to only match at the same position.
> >> > >> >>
> >> > >> >> SpanNearQuery may or may not work for a slop of -1, but one could
> >> try
> >> > >> >> that for both the ordered and unordered cases.
> >> > >> >> One way to do that is to start from the existing test cases.
> >> > >> >>
> >> > >> >> Regards,
> >> > >> >> Paul Elschot
> >> > >> >>
> >> > >> >>
> >> > >> >>> Regards,
> >> > >> >>> Adriano Crestani
> >> > >> >>>
> >> > >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> >> > >> >>>
> >> > >> >> ctignor [at] thinkmap>wrote:
> >> > >> >>
> >> > >> >>>> Hello,
> >> > >> >>>>
> >> > >> >>>> I would like to search for all documents that contain both
> >> "plan" and
> >> > >> >>>>
> >> > >> >> "_v"
> >> > >> >>
> >> > >> >>>> (my part of speech token for verb) at the same position.
> >> > >> >>>> I have tokenized the documents accordingly so these tokens
> >> exists at
> >> > >> >>>>
> >> > >> >> the
> >> > >> >>
> >> > >> >>>> same location.
> >> > >> >>>>
> >> > >> >>>> I can achieve programaticaly using PhraseQueries by adding the
> >> Terms
> >> > >> >>>> explicitly at the same position but I need to be able to recover
> >> the
> >> > >> >>>> Payload
> >> > >> >>>> data for each
> >> > >> >>>> term found within the matched instance of my query.
> >> > >> >>>>
> >> > >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the
> >> same
> >> > >> >>>>
> >> > >> >> results
> >> > >> >>
> >> > >> >>>> as
> >> > >> >>>> the PhraseQuery, possibly becuase it is converting it inoto
> >> Spans
> >> > >> first
> >> > >> >>>> which do not support searching for Terms at the same document
> >> > >> position?
> >> > >> >>>>
> >> > >> >>>> Any help appreciated.
> >> > >> >>>>
> >> > >> >>>> thanks,
> >> > >> >>>>
> >> > >> >>>> C>T>
> >> > >> >>>>
> >> > >> >>>> --
> >> > >> >>>> TH!NKMAP
> >> > >> >>>>
> >> > >> >>>> Christopher Tignor | Senior Software Architect
> >> > >> >>>> 155 Spring Street NY, NY 10012
> >> > >> >>>> p.212-285-8600 x385 f.212-285-8999
> >> > >> >>>>
> >> > >> >>>>
> >> > >> >>
> >> ---------------------------------------------------------------------
> >> > >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> > >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >>
> >> > >>
> >> > >> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> > >> For additional commands, e-mail: java-user-help [at] lucene
> >> > >>
> >> > >>
> >> > >
> >> > >
> >> > > --
> >> > > TH!NKMAP
> >> > >
> >> > > Christopher Tignor | Senior Software Architect
> >> > > 155 Spring Street NY, NY 10012
> >> > > p.212-285-8600 x385 f.212-285-8999
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > TH!NKMAP
> >> >
> >> > Christopher Tignor | Senior Software Architect
> >> > 155 Spring Street NY, NY 10012
> >> > p.212-285-8600 x385 f.212-285-8999
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ctignor at thinkmap

Nov 25, 2009, 2:38 PM

Post #15 of 17 (1577 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

my own tests with my own data show you are correct and the 1-n slop works
for matching terms at the same ordinal position.

thanks!


On Wed, Nov 25, 2009 at 4:25 PM, Paul Elschot <paul.elschot [at] xs4all>wrote:

> Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> > It's worth noting however that this -1 slop doesn't seem to work for
> cases
> > where oyu want to discover instances of more than two terms at the same
> > position. Would be nice to be able to explicitly set this in the query
> > construction.
>
> I think requiring n terms at the same position would need a slop of 1-n,
> and I'd like to have some test cases added for that.
> Now if I only had some time...
>
> Regards,
> Paul Elschot
>
> >
> > thanks,
> >
> > C>T>
> > On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor <
> ctignor [at] thinkmap>wrote:
> >
> > > yes that indeed works for me.
> > >
> > > thanks,
> > >
> > > C>T>
> > >
> > >
> > > On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot <paul.elschot [at] xs4all
> >wrote:
> > >
> > >> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> > >> > Also, I noticed that with the above edit to NearSpansOrdered I am
> > >> getting
> > >> > erroneous results fo normal ordered searches using searches like:
> > >> >
> > >> > "_n" followed by "work"
> > >> >
> > >> > where because "_n" and "work" are at the same position the code
> changes
> > >> > accept their pairing as a valid in-order result now that the eqaul
> to
> > >> clause
> > >> > has been added to the inequality.
> > >>
> > >> Thanks for trying this. Indeed the "followed by" semantics is broken
> for
> > >> the ordered case when spans at the same positions are considered
> > >> ordered.
> > >>
> > >> Did I understand correctly that the unordered case with a slop of -1
> > >> and without the edit works to match terms at the same position?
> > >> In that case it may be worthwhile to add that to the javadocs,
> > >> and also add a few testcases.
> > >>
> > >> Regards,
> > >> Paul Elschot
> > >>
> > >> >
> > >> > C>T>
> > >> >
> > >> > On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
> > >> > <ctignor [at] thinkmap>wrote:
> > >> >
> > >> > > Thanks so much for this.
> > >> > >
> > >> > > Using an un-ordered query, the -1 slop indeed returns the correct
> > >> results,
> > >> > > matching tokens at the same position.
> > >> > >
> > >> > > I tried the same query but ordered both after and before
> rebuilding
> > >> the
> > >> > > source with Paul's changes to NearSpansOrdered but the query was
> still
> > >> > > failing, returning no results.
> > >> > >
> > >> > > C>T>
> > >> > >
> > >> > >
> > >> > > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <
> markrmiller [at] gmail
> > >> >wrote:
> > >> > >
> > >> > >> Your trying -1 with ordered right? Try it with non ordered.
> > >> > >>
> > >> > >> Christopher Tignor wrote:
> > >> > >> > A slop of -1 doesn't work either. I get no results returned.
> > >> > >> >
> > >> > >> > this would be a *really* helpful feature for me if someone
> might
> > >> suggest
> > >> > >> an
> > >> > >> > implementation as I would really like to be able to do
> arbitrary
> > >> span
> > >> > >> > searches where tokens may be at the same position and also in
> other
> > >> > >> > positions where the ordering of subsequent terms may be
> restricted
> > >> as
> > >> > >> per
> > >> > >> > the normal span API.
> > >> > >> >
> > >> > >> > thanks,
> > >> > >> >
> > >> > >> > C>T>
> > >> > >> >
> > >> > >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <
> > >> paul.elschot [at] xs4all
> > >> > >> >wrote:
> > >> > >> >
> > >> > >> >
> > >> > >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano Crestani:
> > >> > >> >>
> > >> > >> >>> Hi,
> > >> > >> >>>
> > >> > >> >>> I didn't test, but you might want to try SpanNearQuery and
> set
> > >> slop to
> > >> > >> >>>
> > >> > >> >> zero.
> > >> > >> >>
> > >> > >> >>> Give it a try and let me know if it worked.
> > >> > >> >>>
> > >> > >> >> The slop is the number of positions "in between", so zero
> would
> > >> still
> > >> > >> be
> > >> > >> >> too
> > >> > >> >> much to only match at the same position.
> > >> > >> >>
> > >> > >> >> SpanNearQuery may or may not work for a slop of -1, but one
> could
> > >> try
> > >> > >> >> that for both the ordered and unordered cases.
> > >> > >> >> One way to do that is to start from the existing test cases.
> > >> > >> >>
> > >> > >> >> Regards,
> > >> > >> >> Paul Elschot
> > >> > >> >>
> > >> > >> >>
> > >> > >> >>> Regards,
> > >> > >> >>> Adriano Crestani
> > >> > >> >>>
> > >> > >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> > >> > >> >>>
> > >> > >> >> ctignor [at] thinkmap>wrote:
> > >> > >> >>
> > >> > >> >>>> Hello,
> > >> > >> >>>>
> > >> > >> >>>> I would like to search for all documents that contain both
> > >> "plan" and
> > >> > >> >>>>
> > >> > >> >> "_v"
> > >> > >> >>
> > >> > >> >>>> (my part of speech token for verb) at the same position.
> > >> > >> >>>> I have tokenized the documents accordingly so these tokens
> > >> exists at
> > >> > >> >>>>
> > >> > >> >> the
> > >> > >> >>
> > >> > >> >>>> same location.
> > >> > >> >>>>
> > >> > >> >>>> I can achieve programaticaly using PhraseQueries by adding
> the
> > >> Terms
> > >> > >> >>>> explicitly at the same position but I need to be able to
> recover
> > >> the
> > >> > >> >>>> Payload
> > >> > >> >>>> data for each
> > >> > >> >>>> term found within the matched instance of my query.
> > >> > >> >>>>
> > >> > >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return the
> > >> same
> > >> > >> >>>>
> > >> > >> >> results
> > >> > >> >>
> > >> > >> >>>> as
> > >> > >> >>>> the PhraseQuery, possibly becuase it is converting it inoto
> > >> Spans
> > >> > >> first
> > >> > >> >>>> which do not support searching for Terms at the same
> document
> > >> > >> position?
> > >> > >> >>>>
> > >> > >> >>>> Any help appreciated.
> > >> > >> >>>>
> > >> > >> >>>> thanks,
> > >> > >> >>>>
> > >> > >> >>>> C>T>
> > >> > >> >>>>
> > >> > >> >>>> --
> > >> > >> >>>> TH!NKMAP
> > >> > >> >>>>
> > >> > >> >>>> Christopher Tignor | Senior Software Architect
> > >> > >> >>>> 155 Spring Street NY, NY 10012
> > >> > >> >>>> p.212-285-8600 x385 f.212-285-8999
> > >> > >> >>>>
> > >> > >> >>>>
> > >> > >> >>
> > >> ---------------------------------------------------------------------
> > >> > >> >> To unsubscribe, e-mail:
> java-user-unsubscribe [at] lucene
> > >> > >> >> For additional commands, e-mail:
> java-user-help [at] lucene
> > >> > >> >>
> > >> > >> >>
> > >> > >> >>
> > >> > >> >
> > >> > >> >
> > >> > >> >
> > >> > >>
> > >> > >>
> > >> > >>
> ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > >> > >> For additional commands, e-mail:
> java-user-help [at] lucene
> > >> > >>
> > >> > >>
> > >> > >
> > >> > >
> > >> > > --
> > >> > > TH!NKMAP
> > >> > >
> > >> > > Christopher Tignor | Senior Software Architect
> > >> > > 155 Spring Street NY, NY 10012
> > >> > > p.212-285-8600 x385 f.212-285-8999
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > TH!NKMAP
> > >> >
> > >> > Christopher Tignor | Senior Software Architect
> > >> > 155 Spring Street NY, NY 10012
> > >> > p.212-285-8600 x385 f.212-285-8999
> > >> >
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > >> For additional commands, e-mail: java-user-help [at] lucene
> > >>
> > >>
> > >
> > >
> > > --
> > > TH!NKMAP
> > >
> > > Christopher Tignor | Senior Software Architect
> > > 155 Spring Street NY, NY 10012
> > > p.212-285-8600 x385 f.212-285-8999
> > >
> >
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


erickerickson at gmail

Nov 25, 2009, 4:49 PM

Post #16 of 17 (1577 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

Hmmm, are they unit tests? Or would you be wiling to create stand-alone
unit tests demonstrating this and submit it as a patch?

Best
Erick [at] AlwaysTrollingForWorkFromOthers

On Wed, Nov 25, 2009 at 5:38 PM, Christopher Tignor <ctignor [at] thinkmap>wrote:

> my own tests with my own data show you are correct and the 1-n slop works
> for matching terms at the same ordinal position.
>
> thanks!
>
> C>T>
>
> On Wed, Nov 25, 2009 at 4:25 PM, Paul Elschot <paul.elschot [at] xs4all
> >wrote:
>
> > Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> > > It's worth noting however that this -1 slop doesn't seem to work for
> > cases
> > > where oyu want to discover instances of more than two terms at the same
> > > position. Would be nice to be able to explicitly set this in the query
> > > construction.
> >
> > I think requiring n terms at the same position would need a slop of 1-n,
> > and I'd like to have some test cases added for that.
> > Now if I only had some time...
> >
> > Regards,
> > Paul Elschot
> >
> > >
> > > thanks,
> > >
> > > C>T>
> > > On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor <
> > ctignor [at] thinkmap>wrote:
> > >
> > > > yes that indeed works for me.
> > > >
> > > > thanks,
> > > >
> > > > C>T>
> > > >
> > > >
> > > > On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot <
> paul.elschot [at] xs4all
> > >wrote:
> > > >
> > > >> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> > > >> > Also, I noticed that with the above edit to NearSpansOrdered I am
> > > >> getting
> > > >> > erroneous results fo normal ordered searches using searches like:
> > > >> >
> > > >> > "_n" followed by "work"
> > > >> >
> > > >> > where because "_n" and "work" are at the same position the code
> > changes
> > > >> > accept their pairing as a valid in-order result now that the eqaul
> > to
> > > >> clause
> > > >> > has been added to the inequality.
> > > >>
> > > >> Thanks for trying this. Indeed the "followed by" semantics is broken
> > for
> > > >> the ordered case when spans at the same positions are considered
> > > >> ordered.
> > > >>
> > > >> Did I understand correctly that the unordered case with a slop of -1
> > > >> and without the edit works to match terms at the same position?
> > > >> In that case it may be worthwhile to add that to the javadocs,
> > > >> and also add a few testcases.
> > > >>
> > > >> Regards,
> > > >> Paul Elschot
> > > >>
> > > >> >
> > > >> > C>T>
> > > >> >
> > > >> > On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
> > > >> > <ctignor [at] thinkmap>wrote:
> > > >> >
> > > >> > > Thanks so much for this.
> > > >> > >
> > > >> > > Using an un-ordered query, the -1 slop indeed returns the
> correct
> > > >> results,
> > > >> > > matching tokens at the same position.
> > > >> > >
> > > >> > > I tried the same query but ordered both after and before
> > rebuilding
> > > >> the
> > > >> > > source with Paul's changes to NearSpansOrdered but the query was
> > still
> > > >> > > failing, returning no results.
> > > >> > >
> > > >> > > C>T>
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <
> > markrmiller [at] gmail
> > > >> >wrote:
> > > >> > >
> > > >> > >> Your trying -1 with ordered right? Try it with non ordered.
> > > >> > >>
> > > >> > >> Christopher Tignor wrote:
> > > >> > >> > A slop of -1 doesn't work either. I get no results returned.
> > > >> > >> >
> > > >> > >> > this would be a *really* helpful feature for me if someone
> > might
> > > >> suggest
> > > >> > >> an
> > > >> > >> > implementation as I would really like to be able to do
> > arbitrary
> > > >> span
> > > >> > >> > searches where tokens may be at the same position and also in
> > other
> > > >> > >> > positions where the ordering of subsequent terms may be
> > restricted
> > > >> as
> > > >> > >> per
> > > >> > >> > the normal span API.
> > > >> > >> >
> > > >> > >> > thanks,
> > > >> > >> >
> > > >> > >> > C>T>
> > > >> > >> >
> > > >> > >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <
> > > >> paul.elschot [at] xs4all
> > > >> > >> >wrote:
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano
> Crestani:
> > > >> > >> >>
> > > >> > >> >>> Hi,
> > > >> > >> >>>
> > > >> > >> >>> I didn't test, but you might want to try SpanNearQuery and
> > set
> > > >> slop to
> > > >> > >> >>>
> > > >> > >> >> zero.
> > > >> > >> >>
> > > >> > >> >>> Give it a try and let me know if it worked.
> > > >> > >> >>>
> > > >> > >> >> The slop is the number of positions "in between", so zero
> > would
> > > >> still
> > > >> > >> be
> > > >> > >> >> too
> > > >> > >> >> much to only match at the same position.
> > > >> > >> >>
> > > >> > >> >> SpanNearQuery may or may not work for a slop of -1, but one
> > could
> > > >> try
> > > >> > >> >> that for both the ordered and unordered cases.
> > > >> > >> >> One way to do that is to start from the existing test cases.
> > > >> > >> >>
> > > >> > >> >> Regards,
> > > >> > >> >> Paul Elschot
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >>> Regards,
> > > >> > >> >>> Adriano Crestani
> > > >> > >> >>>
> > > >> > >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> > > >> > >> >>>
> > > >> > >> >> ctignor [at] thinkmap>wrote:
> > > >> > >> >>
> > > >> > >> >>>> Hello,
> > > >> > >> >>>>
> > > >> > >> >>>> I would like to search for all documents that contain both
> > > >> "plan" and
> > > >> > >> >>>>
> > > >> > >> >> "_v"
> > > >> > >> >>
> > > >> > >> >>>> (my part of speech token for verb) at the same position.
> > > >> > >> >>>> I have tokenized the documents accordingly so these tokens
> > > >> exists at
> > > >> > >> >>>>
> > > >> > >> >> the
> > > >> > >> >>
> > > >> > >> >>>> same location.
> > > >> > >> >>>>
> > > >> > >> >>>> I can achieve programaticaly using PhraseQueries by adding
> > the
> > > >> Terms
> > > >> > >> >>>> explicitly at the same position but I need to be able to
> > recover
> > > >> the
> > > >> > >> >>>> Payload
> > > >> > >> >>>> data for each
> > > >> > >> >>>> term found within the matched instance of my query.
> > > >> > >> >>>>
> > > >> > >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return
> the
> > > >> same
> > > >> > >> >>>>
> > > >> > >> >> results
> > > >> > >> >>
> > > >> > >> >>>> as
> > > >> > >> >>>> the PhraseQuery, possibly becuase it is converting it
> inoto
> > > >> Spans
> > > >> > >> first
> > > >> > >> >>>> which do not support searching for Terms at the same
> > document
> > > >> > >> position?
> > > >> > >> >>>>
> > > >> > >> >>>> Any help appreciated.
> > > >> > >> >>>>
> > > >> > >> >>>> thanks,
> > > >> > >> >>>>
> > > >> > >> >>>> C>T>
> > > >> > >> >>>>
> > > >> > >> >>>> --
> > > >> > >> >>>> TH!NKMAP
> > > >> > >> >>>>
> > > >> > >> >>>> Christopher Tignor | Senior Software Architect
> > > >> > >> >>>> 155 Spring Street NY, NY 10012
> > > >> > >> >>>> p.212-285-8600 x385 f.212-285-8999
> > > >> > >> >>>>
> > > >> > >> >>>>
> > > >> > >> >>
> > > >>
> ---------------------------------------------------------------------
> > > >> > >> >> To unsubscribe, e-mail:
> > java-user-unsubscribe [at] lucene
> > > >> > >> >> For additional commands, e-mail:
> > java-user-help [at] lucene
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >>
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > ---------------------------------------------------------------------
> > > >> > >> To unsubscribe, e-mail:
> java-user-unsubscribe [at] lucene
> > > >> > >> For additional commands, e-mail:
> > java-user-help [at] lucene
> > > >> > >>
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > TH!NKMAP
> > > >> > >
> > > >> > > Christopher Tignor | Senior Software Architect
> > > >> > > 155 Spring Street NY, NY 10012
> > > >> > > p.212-285-8600 x385 f.212-285-8999
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > TH!NKMAP
> > > >> >
> > > >> > Christopher Tignor | Senior Software Architect
> > > >> > 155 Spring Street NY, NY 10012
> > > >> > p.212-285-8600 x385 f.212-285-8999
> > > >> >
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > > >> For additional commands, e-mail: java-user-help [at] lucene
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > TH!NKMAP
> > > >
> > > > Christopher Tignor | Senior Software Architect
> > > > 155 Spring Street NY, NY 10012
> > > > p.212-285-8600 x385 f.212-285-8999
> > > >
> > >
> > >
> > >
> > > --
> > > TH!NKMAP
> > >
> > > Christopher Tignor | Senior Software Architect
> > > 155 Spring Street NY, NY 10012
> > > p.212-285-8600 x385 f.212-285-8999
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999
>


ctignor at thinkmap

Nov 30, 2009, 6:19 AM

Post #17 of 17 (1335 views)
Permalink
Re: SpanQuery for Terms at same position [In reply to]

It would take a bit of work work / learning (haven't used a RAMDirectory
yet) to make them into test cases usable by others and am deep into this
project and under the gun right now. But if some time surfaces I will for
sure...

thanks -


On Wed, Nov 25, 2009 at 7:49 PM, Erick Erickson <erickerickson [at] gmail>wrote:

> Hmmm, are they unit tests? Or would you be wiling to create stand-alone
> unit tests demonstrating this and submit it as a patch?
>
> Best
> Erick [at] AlwaysTrollingForWorkFromOthers
>
> On Wed, Nov 25, 2009 at 5:38 PM, Christopher Tignor <ctignor [at] thinkmap
> >wrote:
>
> > my own tests with my own data show you are correct and the 1-n slop works
> > for matching terms at the same ordinal position.
> >
> > thanks!
> >
> > C>T>
> >
> > On Wed, Nov 25, 2009 at 4:25 PM, Paul Elschot <paul.elschot [at] xs4all
> > >wrote:
> >
> > > Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> > > > It's worth noting however that this -1 slop doesn't seem to work for
> > > cases
> > > > where oyu want to discover instances of more than two terms at the
> same
> > > > position. Would be nice to be able to explicitly set this in the
> query
> > > > construction.
> > >
> > > I think requiring n terms at the same position would need a slop of
> 1-n,
> > > and I'd like to have some test cases added for that.
> > > Now if I only had some time...
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > > >
> > > > thanks,
> > > >
> > > > C>T>
> > > > On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor <
> > > ctignor [at] thinkmap>wrote:
> > > >
> > > > > yes that indeed works for me.
> > > > >
> > > > > thanks,
> > > > >
> > > > > C>T>
> > > > >
> > > > >
> > > > > On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot <
> > paul.elschot [at] xs4all
> > > >wrote:
> > > > >
> > > > >> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> > > > >> > Also, I noticed that with the above edit to NearSpansOrdered I
> am
> > > > >> getting
> > > > >> > erroneous results fo normal ordered searches using searches
> like:
> > > > >> >
> > > > >> > "_n" followed by "work"
> > > > >> >
> > > > >> > where because "_n" and "work" are at the same position the code
> > > changes
> > > > >> > accept their pairing as a valid in-order result now that the
> eqaul
> > > to
> > > > >> clause
> > > > >> > has been added to the inequality.
> > > > >>
> > > > >> Thanks for trying this. Indeed the "followed by" semantics is
> broken
> > > for
> > > > >> the ordered case when spans at the same positions are considered
> > > > >> ordered.
> > > > >>
> > > > >> Did I understand correctly that the unordered case with a slop of
> -1
> > > > >> and without the edit works to match terms at the same position?
> > > > >> In that case it may be worthwhile to add that to the javadocs,
> > > > >> and also add a few testcases.
> > > > >>
> > > > >> Regards,
> > > > >> Paul Elschot
> > > > >>
> > > > >> >
> > > > >> > C>T>
> > > > >> >
> > > > >> > On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
> > > > >> > <ctignor [at] thinkmap>wrote:
> > > > >> >
> > > > >> > > Thanks so much for this.
> > > > >> > >
> > > > >> > > Using an un-ordered query, the -1 slop indeed returns the
> > correct
> > > > >> results,
> > > > >> > > matching tokens at the same position.
> > > > >> > >
> > > > >> > > I tried the same query but ordered both after and before
> > > rebuilding
> > > > >> the
> > > > >> > > source with Paul's changes to NearSpansOrdered but the query
> was
> > > still
> > > > >> > > failing, returning no results.
> > > > >> > >
> > > > >> > > C>T>
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller <
> > > markrmiller [at] gmail
> > > > >> >wrote:
> > > > >> > >
> > > > >> > >> Your trying -1 with ordered right? Try it with non ordered.
> > > > >> > >>
> > > > >> > >> Christopher Tignor wrote:
> > > > >> > >> > A slop of -1 doesn't work either. I get no results
> returned.
> > > > >> > >> >
> > > > >> > >> > this would be a *really* helpful feature for me if someone
> > > might
> > > > >> suggest
> > > > >> > >> an
> > > > >> > >> > implementation as I would really like to be able to do
> > > arbitrary
> > > > >> span
> > > > >> > >> > searches where tokens may be at the same position and also
> in
> > > other
> > > > >> > >> > positions where the ordering of subsequent terms may be
> > > restricted
> > > > >> as
> > > > >> > >> per
> > > > >> > >> > the normal span API.
> > > > >> > >> >
> > > > >> > >> > thanks,
> > > > >> > >> >
> > > > >> > >> > C>T>
> > > > >> > >> >
> > > > >> > >> > On Sun, Nov 22, 2009 at 7:50 AM, Paul Elschot <
> > > > >> paul.elschot [at] xs4all
> > > > >> > >> >wrote:
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >> Op zondag 22 november 2009 04:47:50 schreef Adriano
> > Crestani:
> > > > >> > >> >>
> > > > >> > >> >>> Hi,
> > > > >> > >> >>>
> > > > >> > >> >>> I didn't test, but you might want to try SpanNearQuery
> and
> > > set
> > > > >> slop to
> > > > >> > >> >>>
> > > > >> > >> >> zero.
> > > > >> > >> >>
> > > > >> > >> >>> Give it a try and let me know if it worked.
> > > > >> > >> >>>
> > > > >> > >> >> The slop is the number of positions "in between", so zero
> > > would
> > > > >> still
> > > > >> > >> be
> > > > >> > >> >> too
> > > > >> > >> >> much to only match at the same position.
> > > > >> > >> >>
> > > > >> > >> >> SpanNearQuery may or may not work for a slop of -1, but
> one
> > > could
> > > > >> try
> > > > >> > >> >> that for both the ordered and unordered cases.
> > > > >> > >> >> One way to do that is to start from the existing test
> cases.
> > > > >> > >> >>
> > > > >> > >> >> Regards,
> > > > >> > >> >> Paul Elschot
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >>> Regards,
> > > > >> > >> >>> Adriano Crestani
> > > > >> > >> >>>
> > > > >> > >> >>> On Thu, Nov 19, 2009 at 7:28 PM, Christopher Tignor <
> > > > >> > >> >>>
> > > > >> > >> >> ctignor [at] thinkmap>wrote:
> > > > >> > >> >>
> > > > >> > >> >>>> Hello,
> > > > >> > >> >>>>
> > > > >> > >> >>>> I would like to search for all documents that contain
> both
> > > > >> "plan" and
> > > > >> > >> >>>>
> > > > >> > >> >> "_v"
> > > > >> > >> >>
> > > > >> > >> >>>> (my part of speech token for verb) at the same position.
> > > > >> > >> >>>> I have tokenized the documents accordingly so these
> tokens
> > > > >> exists at
> > > > >> > >> >>>>
> > > > >> > >> >> the
> > > > >> > >> >>
> > > > >> > >> >>>> same location.
> > > > >> > >> >>>>
> > > > >> > >> >>>> I can achieve programaticaly using PhraseQueries by
> adding
> > > the
> > > > >> Terms
> > > > >> > >> >>>> explicitly at the same position but I need to be able to
> > > recover
> > > > >> the
> > > > >> > >> >>>> Payload
> > > > >> > >> >>>> data for each
> > > > >> > >> >>>> term found within the matched instance of my query.
> > > > >> > >> >>>>
> > > > >> > >> >>>> Unfortunately the PayloadSpanUtil doesn't seem to return
> > the
> > > > >> same
> > > > >> > >> >>>>
> > > > >> > >> >> results
> > > > >> > >> >>
> > > > >> > >> >>>> as
> > > > >> > >> >>>> the PhraseQuery, possibly becuase it is converting it
> > inoto
> > > > >> Spans
> > > > >> > >> first
> > > > >> > >> >>>> which do not support searching for Terms at the same
> > > document
> > > > >> > >> position?
> > > > >> > >> >>>>
> > > > >> > >> >>>> Any help appreciated.
> > > > >> > >> >>>>
> > > > >> > >> >>>> thanks,
> > > > >> > >> >>>>
> > > > >> > >> >>>> C>T>
> > > > >> > >> >>>>
> > > > >> > >> >>>> --
> > > > >> > >> >>>> TH!NKMAP
> > > > >> > >> >>>>
> > > > >> > >> >>>> Christopher Tignor | Senior Software Architect
> > > > >> > >> >>>> 155 Spring Street NY, NY 10012
> > > > >> > >> >>>> p.212-285-8600 x385 f.212-285-8999
> > > > >> > >> >>>>
> > > > >> > >> >>>>
> > > > >> > >> >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> > >> >> To unsubscribe, e-mail:
> > > java-user-unsubscribe [at] lucene
> > > > >> > >> >> For additional commands, e-mail:
> > > java-user-help [at] lucene
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >>
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > ---------------------------------------------------------------------
> > > > >> > >> To unsubscribe, e-mail:
> > java-user-unsubscribe [at] lucene
> > > > >> > >> For additional commands, e-mail:
> > > java-user-help [at] lucene
> > > > >> > >>
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > TH!NKMAP
> > > > >> > >
> > > > >> > > Christopher Tignor | Senior Software Architect
> > > > >> > > 155 Spring Street NY, NY 10012
> > > > >> > > p.212-285-8600 x385 f.212-285-8999
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > TH!NKMAP
> > > > >> >
> > > > >> > Christopher Tignor | Senior Software Architect
> > > > >> > 155 Spring Street NY, NY 10012
> > > > >> > p.212-285-8600 x385 f.212-285-8999
> > > > >> >
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > > > >> For additional commands, e-mail: java-user-help [at] lucene
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > TH!NKMAP
> > > > >
> > > > > Christopher Tignor | Senior Software Architect
> > > > > 155 Spring Street NY, NY 10012
> > > > > p.212-285-8600 x385 f.212-285-8999
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > TH!NKMAP
> > > >
> > > > Christopher Tignor | Senior Software Architect
> > > > 155 Spring Street NY, NY 10012
> > > > p.212-285-8600 x385 f.212-285-8999
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > > For additional commands, e-mail: java-user-help [at] lucene
> > >
> > >
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
> >
>



--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.