Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Wikipedia revision history dump + lucene benchmark

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


zpvie at yahoo

Apr 10, 2012, 10:33 AM

Post #1 of 1 (93 views)
Permalink
Wikipedia revision history dump + lucene benchmark

wikipedia.alg in benchmark is only able to extract and index current pages
dumps. It does not take revisions into account. Do you know any way to do
this? Or should I change EnwikiContentSource to handle the versions?

Although, Wikipedia dumps are widely used especially for research purposes,
as far as I know, there is no topics/qrels for them (except the one
http://www.mpi-inf.mpg.de/~kberberi/ecir2010/ here for revision history
dump 2001 - 2005 which is annotated based on temporal expressions). The
question is that do you know any other?

By the way, I think in wikipedia.alg
query.maker=org.apache.lucene.benchmark.byTask.feeds.*ReutersQueryMaker*
should be remplaced by *EnwikiQueryMaker*.

Thanks in advance,
Best regards
--
ZP

--
View this message in context: http://lucene.472066.n3.nabble.com/Wikipedia-revision-history-dump-lucene-benchmark-tp3900346p3900346.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.