
zpvie at yahoo
Apr 10, 2012, 10:33 AM
Post #1 of 1
(93 views)
Permalink
|
|
Wikipedia revision history dump + lucene benchmark
|
|
wikipedia.alg in benchmark is only able to extract and index current pages dumps. It does not take revisions into account. Do you know any way to do this? Or should I change EnwikiContentSource to handle the versions? Although, Wikipedia dumps are widely used especially for research purposes, as far as I know, there is no topics/qrels for them (except the one http://www.mpi-inf.mpg.de/~kberberi/ecir2010/ here for revision history dump 2001 - 2005 which is annotated based on temporal expressions). The question is that do you know any other? By the way, I think in wikipedia.alg query.maker=org.apache.lucene.benchmark.byTask.feeds.*ReutersQueryMaker* should be remplaced by *EnwikiQueryMaker*. Thanks in advance, Best regards -- ZP -- View this message in context: http://lucene.472066.n3.nabble.com/Wikipedia-revision-history-dump-lucene-benchmark-tp3900346p3900346.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe [at] lucene For additional commands, e-mail: java-user-help [at] lucene
|