Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Indexing TREC GOV2 data in Lucene

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


jakedsouza88 at gmail

Apr 11, 2012, 10:45 PM

Post #1 of 2 (266 views)
Permalink
Indexing TREC GOV2 data in Lucene

Hi All ,

I am working on a project on Static Index pruning and I am using the TREC
GOV2 database . I have seen that the Trec data can be parsed and the
necessary java files are present in the contrib package , but has any user
used Lucene to index the GOV2 dataset or is there source code available for
the same ?

Regards
Jake Dsouza


hany at eecs

Apr 12, 2012, 2:07 AM

Post #2 of 2 (251 views)
Permalink
Re: Indexing TREC GOV2 data in Lucene [In reply to]

Hi,

I am not sure if there's something in the contrib for GOV2 but it really
depends on what you want to parse. If you are just interested in full-text
search then it should be similar to parsing a regular document while being
conscious of the trec-specific delimiters. It's something like <DOC>.
However, if you are interested in performing structured search and
maintaining indexes over different fields such as titles, etc. then this
will require some customisation. Note that if you want to store the anchor
text separately and perform some sort of link resolution and page ranking
then again you will need to customize your parsing.

h.

> Hi All ,
>
> I am working on a project on Static Index pruning and I am using the TREC
> GOV2 database . I have seen that the Trec data can be parsed and the
> necessary java files are present in the contrib package , but has any user
> used Lucene to index the GOV2 dataset or is there source code available
> for
> the same ?
>
> Regards
> Jake Dsouza
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.