Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Re: Indexing Wikipedia with Solr/Lucene

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


orenbochman at gmail

May 13, 2012, 2:59 PM

Post #1 of 1 (104 views)
Permalink
Re: Indexing Wikipedia with Solr/Lucene

in wiki text external links look like [url description]
A category declaration takes the form [[Category:*Category name*]] or
[[Category:*Category name*|*Sortkey*]].

writing an anlyzer is pretty simple !!

good luck!

On Sun, May 13, 2012 at 8:55 PM, vineet yadav
<vineet.yadav.iiit [at] gmail>wrote:

> Hi all,
> I want to create Lucene/Solr index of wikipedia xml dump. I used Solr
> example(
> http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia)
> to index wikipedia xml dump. Since in wikipedia, Category and external
> links are part of wikipedia text, I am not able to index category and
> external links separately. I want to index Category, Externals
> links etc separately and store them in separate fields.
> Would anyone please be kind enough to give me a bit of advice?
> Thanks
> Vineet Yadav
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--

Oren Bochman

Office tel. 061 4921492
Mobile +36 30 866 6706
skype id: orenbochman
e-mail: oren [at] romai-horizon
site http://www.riverport.hu

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.