Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

synonym payload boosting

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


davidginzburg at gmail

Nov 8, 2009, 5:23 AM

Post #1 of 3 (479 views)
Permalink
synonym payload boosting

Hi,
I have a field and a wighted synonym map.
I have indexed the synonyms with the weight as payload.
my code snippet from my filter

*public Token next(final Token reusableToken) throws IOException *
* . *
* . *
* .*
* Payload boostPayload;*
*
*
* for (Synonym synonym : syns) {*
* *
* Token newTok = new Token(nToken.startOffset(),
nToken.endOffset(), "SYNONYM");*
* newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
synonym.getToken().length());*
* // set the position increment to zero*
* // this tells lucene the synonym is*
* // in the exact same location as the originating word*
* newTok.setPositionIncrement(0);*
* boostPayload = new
Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
* newTok.setPayload(boostPayload);*
*
*
I have put it in the index time analyzer : this is my field definition:

*
<fieldType name="PersonName" class="solr.TextField"
positionIncrementGap="100" >
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="com.digitaltrowel.solr.DTSynonymFactory"
FreskoFunction="names_with_scoresPipe23Columns.txt" ignoreCase="true"
expand="false"/>

<!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
<!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="com.digitaltrowel.solr.DTSynonymFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
<!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/ >-->
</analyzer>
</fieldType>


my similarity class is
public class BoostingSymilarity extends DefaultSimilarity {


public BoostingSymilarity(){
super();

}
@Override
public float scorePayload(String field, byte [] payload, int offset,
int length)
{
double weight = PayloadHelper.decodeFloat(payload, 0);
return (float)weight;
}

@Override public float coord(int overlap, int maxoverlap)
{
return 1.0f;
}

@Override public float idf(int docFreq, int numDocs)
{
return 1.0f;
}

@Override public float lengthNorm(String fieldName, int numTerms)
{
return 1.0f;
}

@Override public float tf(float freq)
{
return 1.0f;
}
}

My problem is that scorePayload method does not get called at search time
like the other methods in my similarity class.
I tested and verified it with break points.
What am I doing wrong?
I used solr 1.3 and thinking of the payload boos support in solr 1.4.


*


simon.willnauer at googlemail

Nov 8, 2009, 5:46 AM

Post #2 of 3 (442 views)
Permalink
Re: synonym payload boosting [In reply to]

You might get an answer on the solr list. This is the lucene users list.

Simon

On Nov 8, 2009 2:24 PM, "David Ginzburg" <davidginzburg [at] gmail> wrote:

Hi,
I have a field and a wighted synonym map.
I have indexed the synonyms with the weight as payload.
my code snippet from my filter

*public Token next(final Token reusableToken) throws IOException *
* . *
* . *
* .*
* Payload boostPayload;*
*
*
* for (Synonym synonym : syns) {*
* *
* Token newTok = new Token(nToken.startOffset(),
nToken.endOffset(), "SYNONYM");*
* newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
synonym.getToken().length());*
* // set the position increment to zero*
* // this tells lucene the synonym is*
* // in the exact same location as the originating word*
* newTok.setPositionIncrement(0);*
* boostPayload = new
Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
* newTok.setPayload(boostPayload);*
*
*
I have put it in the index time analyzer : this is my field definition:

*
<fieldType name="PersonName" class="solr.TextField"
positionIncrementGap="100" >
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="com.digitaltrowel.solr.DTSynonymFactory"
FreskoFunction="names_with_scoresPipe23Columns.txt" ignoreCase="true"
expand="false"/>

<!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
<!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="com.digitaltrowel.solr.DTSynonymFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
<!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/ >-->
</analyzer>
</fieldType>


my similarity class is
public class BoostingSymilarity extends DefaultSimilarity {


public BoostingSymilarity(){
super();

}
@Override
public float scorePayload(String field, byte [] payload, int offset,
int length)
{
double weight = PayloadHelper.decodeFloat(payload, 0);
return (float)weight;
}

@Override public float coord(int overlap, int maxoverlap)
{
return 1.0f;
}

@Override public float idf(int docFreq, int numDocs)
{
return 1.0f;
}

@Override public float lengthNorm(String fieldName, int numTerms)
{
return 1.0f;
}

@Override public float tf(float freq)
{
return 1.0f;
}
}

My problem is that scorePayload method does not get called at search time
like the other methods in my similarity class.
I tested and verified it with break points.
What am I doing wrong?
I used solr 1.3 and thinking of the payload boos support in solr 1.4.


*


iorixxx at yahoo

Nov 8, 2009, 5:52 AM

Post #3 of 3 (429 views)
Permalink
Re: synonym payload boosting [In reply to]

Additionaly you need to modify your queryparser to return BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.

With these types of Queries scorePayload method invoked.

Hope this helps.

--- On Sun, 11/8/09, David Ginzburg <davidginzburg [at] gmail> wrote:

> From: David Ginzburg <davidginzburg [at] gmail>
> Subject: synonym payload boosting
> To: java-user [at] lucene
> Date: Sunday, November 8, 2009, 3:23 PM
> Hi,
> I have a field and a wighted synonym map.
> I have indexed the synonyms with the weight as payload.
> my code snippet from my filter
>
> *public Token next(final Token reusableToken) throws
> IOException *
> *        . *
> *        . *
> *        .*
>        * Payload boostPayload;*
> *
> *
> *        for (Synonym synonym : syns)
> {*
> *            *
> *            Token newTok =
> new Token(nToken.startOffset(),
> nToken.endOffset(), "SYNONYM");*
> *           
> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
> synonym.getToken().length());*
> *            // set the
> position increment to zero*
> *            // this tells
> lucene the synonym is*
> *            // in the exact
> same location as the originating word*
> *           
> newTok.setPositionIncrement(0);*
> *            boostPayload =
> new
> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
> *           
> newTok.setPayload(boostPayload);*
> *
> *
> I have put it in the index time analyzer : this is my field
> definition:
>
> *
> <fieldType name="PersonName" class="solr.TextField"
> positionIncrementGap="100" >
>       <analyzer type="index">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <filter
> class="com.digitaltrowel.solr.DTSynonymFactory"
> FreskoFunction="names_with_scoresPipe23Columns.txt"
> ignoreCase="true"
> expand="false"/>
>
>         <!--<filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>         <!--<filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <!--<filter
> class="com.digitaltrowel.solr.DTSynonymFactory"
> synonyms="synonyms.txt" ignoreCase="true"
> expand="false"/>-->
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <!--<filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>         <!--<filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/ 
>   >-->
>       </analyzer>
>     </fieldType>
>
>
> my similarity class is
> public class BoostingSymilarity extends DefaultSimilarity
> {
>
>
>     public BoostingSymilarity(){
>         super();
>
>   }
>     @Override
>     public  float scorePayload(String field,
> byte [] payload, int offset,
> int length)
> {
> double weight = PayloadHelper.decodeFloat(payload, 0);
> return (float)weight;
> }
>
> @Override public float coord(int overlap, int maxoverlap)
> {
> return 1.0f;
> }
>
> @Override public float idf(int docFreq, int numDocs)
> {
> return 1.0f;
> }
>
> @Override public float lengthNorm(String fieldName, int
> numTerms)
> {
> return 1.0f;
> }
>
> @Override public float tf(float freq)
> {
> return 1.0f;
> }
> }
>
> My problem is that scorePayload method does not get called
> at search time
> like the other methods in  my similarity class.
> I tested and verified it with break points.
> What am I doing wrong?
> I used solr 1.3 and thinking of the payload boos support in
> solr 1.4.
>
>
> *
>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.