Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

customized SpanQuery Payload usage

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


ctignor at thinkmap

Nov 24, 2009, 6:56 AM

Post #1 of 3 (421 views)
Permalink
customized SpanQuery Payload usage

Hello,

For certain span queries I construct problematically by piecing together my
own SpanTermQueries I would like to enforce that Payload data is not
returned for matches on those specific terms used by the constituent
SapnTermQueries.

For exmaple if I search for a position match with a SpanQuery referencing
the tokens "_n" and "work" and there is Payload data for each (there needs
to be for other types of queries) I would like to be able to screen out the
payload data originating from any matched "_n" tokens.

I thought for the tokens I am not interested in receiving payload data from
I might simply create (anonymously) my own subclass of SpanTermQuery which
overrides getSpans and returns another custom class which extends TermSpans
but there simply overrides isPayloadAvailable to return false:

new SpanTermQuery(new Term(myField, myTokenString)) {



public Spans getSpans(IndexReader reader)
throws IOException {
return new
TermSpans(reader.termPositions(term), term) {

public boolean isPayloadAvailable()
{
return false;
}

};
}
});

This however seems to eliminating payload data for all matches though I'm
not sure why and am tracing through the code, looking at NearSpansUnordered.

Any thoughts?

thanks so much,

C>T>


--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


gsingers at apache

Nov 25, 2009, 5:10 AM

Post #2 of 3 (374 views)
Permalink
Re: customized SpanQuery Payload usage [In reply to]

On Nov 24, 2009, at 9:56 AM, Christopher Tignor wrote:

> Hello,
>
> For certain span queries I construct problematically by piecing together my
> own SpanTermQueries I would like to enforce that Payload data is not
> returned for matches on those specific terms used by the constituent
> SapnTermQueries.

I'm not sure I follow. For those terms you don't want payloads, why can't you just avoid getting payloads? Span queries themselves do not require payloads for execution. Can you share your code for iterating over the spans?

>
> For exmaple if I search for a position match with a SpanQuery referencing
> the tokens "_n" and "work" and there is Payload data for each (there needs
> to be for other types of queries) I would like to be able to screen out the
> payload data originating from any matched "_n" tokens.
>
> I thought for the tokens I am not interested in receiving payload data from
> I might simply create (anonymously) my own subclass of SpanTermQuery which
> overrides getSpans and returns another custom class which extends TermSpans
> but there simply overrides isPayloadAvailable to return false:
>
> new SpanTermQuery(new Term(myField, myTokenString)) {
>
>
>
> public Spans getSpans(IndexReader reader)
> throws IOException {
> return new
> TermSpans(reader.termPositions(term), term) {
>
> public boolean isPayloadAvailable()
> {
> return false;
> }
>
> };
> }
> });
>
> This however seems to eliminating payload data for all matches though I'm
> not sure why and am tracing through the code, looking at NearSpansUnordered.
>
> Any thoughts?
>
> thanks so much,
>
> C>T>
>
>
> --
> TH!NKMAP
>
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ctignor at thinkmap

Nov 25, 2009, 6:26 AM

Post #3 of 3 (373 views)
Permalink
Re: customized SpanQuery Payload usage [In reply to]

The problem is that I need to be able to match spans resulting from a a
SpanNearQuery with the Term they came from so I can eliminate using Payloads
from certain Terms on a query-by-query basis.

I still need this term to effect the results of a NearSpanQuery as per the
usual logic, I just need to know when iterating over the resulting Spans
that when I hit one originating from a certain Term not to load it's payload
data.

I recently solved the problem fairly simply after doing much research into
the source. When I am building the query and encounter a term I don't want
to recover payload data from, I add my own anonymous sub-type of
SpanTermQuery to my developing SpanNearQuery that itself creates an
anonymous sub-type of SpanTerm which simply returns an empty Collection for
it's payload data.

new SpanTermQuery(new Term(QueryVocabTracker.CONTENT_FIELD, tagToken)) {
@Override
public Spans getSpans(IndexReader reader)
throws IOException {
return new
TermSpans(reader.termPositions(term), term) {

@Override
public Collection getPayload()
throws IOException {
// no payload data for this
TermSpan
return Collections.emptyList();
}
};
}
}

thanks,




On Wed, Nov 25, 2009 at 8:10 AM, Grant Ingersoll <gsingers [at] apache>wrote:

>
> On Nov 24, 2009, at 9:56 AM, Christopher Tignor wrote:
>
> > Hello,
> >
> > For certain span queries I construct problematically by piecing together
> my
> > own SpanTermQueries I would like to enforce that Payload data is not
> > returned for matches on those specific terms used by the constituent
> > SapnTermQueries.
>
> I'm not sure I follow. For those terms you don't want payloads, why can't
> you just avoid getting payloads? Span queries themselves do not require
> payloads for execution. Can you share your code for iterating over the
> spans?
>
> >
> > For exmaple if I search for a position match with a SpanQuery referencing
> > the tokens "_n" and "work" and there is Payload data for each (there
> needs
> > to be for other types of queries) I would like to be able to screen out
> the
> > payload data originating from any matched "_n" tokens.
> >
> > I thought for the tokens I am not interested in receiving payload data
> from
> > I might simply create (anonymously) my own subclass of SpanTermQuery
> which
> > overrides getSpans and returns another custom class which extends
> TermSpans
> > but there simply overrides isPayloadAvailable to return false:
> >
> > new SpanTermQuery(new Term(myField, myTokenString)) {
> >
> >
> >
> > public Spans getSpans(IndexReader reader)
> > throws IOException {
> > return new
> > TermSpans(reader.termPositions(term), term) {
> >
> > public boolean
> isPayloadAvailable()
> > {
> > return false;
> > }
> >
> > };
> > }
> > });
> >
> > This however seems to eliminating payload data for all matches though I'm
> > not sure why and am tracing through the code, looking at
> NearSpansUnordered.
> >
> > Any thoughts?
> >
> > thanks so much,
> >
> > C>T>
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.