Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Convert XML to HTML?

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


chengbinzheng at gmail

Jul 18, 2009, 8:23 PM

Post #1 of 3 (367 views)
Permalink
Convert XML to HTML?

Since the static HTML Wikipedia is not updating (please update), and XML
updates like everyday, the logical choice is to go with XML. Is there any
way to convert XML to HTML, like the static HTML version? I need it in HTML,
and I don't want a one year old version of Wikipedia, with all the useless
information on user talk, discussions, etc.
Thank you.
_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


a at foo

Jul 19, 2009, 2:30 AM

Post #2 of 3 (350 views)
Permalink
Re: Convert XML to HTML? [In reply to]

On Sun, Jul 19, 2009 at 5:23 AM, Chengbin Zheng<chengbinzheng[at]gmail.com> wrote:
> Since the static HTML Wikipedia is not updating (please update), and XML
> updates like everyday, the logical choice is to go with XML. Is there any
> way to convert XML to HTML, like the static HTML version? I need it in HTML,
> and I don't want a one year old version of Wikipedia, with all the useless
> information on user talk, discussions, etc.
> Thank you.

There are plenty of options to parse the XML (or just the Mediawiki
markup) to HTML like :

- http://sourceforge.net/apps/mediawiki/wikiprep/index.php?title=Main_Page
(the parser is decent but currently
no real full featured HTML export)

- http://wiki.laptop.org/go/Wiki_Slice (but not using XML as source,
just stripping down output using ?action=raw)

- https://projects.fslab.de/projects/wpofflineclient/wiki/Specifications
(but also using the raw action)

(a nice article of how to a static version of Wikipedia :
http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html)

There is a also a nice list of all the parser available (usually from
the Mediawiki markup
to something else) :

http://www.mediawiki.org/wiki/Alternative_parsers

Regarding the XML format, usually you want to seek into the XML and
look for start of
<page> and the end of </page> to get the page and look for the <text>
element containing
the raw page in mediawiki markup format. So you can use all the
existing mediawiki
markup parser as long you have extract the latest revision of the page
in mediawiki format.

Hope this helps,

adulau

--
-- Alexandre Dulaunoy (adulau) -- http://www.foo.be/
-- http://www.foo.be/cgi-bin/wiki.pl/Diary
-- "Knowledge can create problems, it is not through ignorance
-- that we can solve them" Isaac Asimov

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


chengbinzheng at gmail

Jul 19, 2009, 6:25 AM

Post #3 of 3 (347 views)
Permalink
Re: Convert XML to HTML? [In reply to]

On Sun, Jul 19, 2009 at 5:30 AM, Alexandre Dulaunoy <a[at]foo.be> wrote:

> On Sun, Jul 19, 2009 at 5:23 AM, Chengbin Zheng<chengbinzheng[at]gmail.com>
> wrote:
> > Since the static HTML Wikipedia is not updating (please update), and XML
> > updates like everyday, the logical choice is to go with XML. Is there any
> > way to convert XML to HTML, like the static HTML version? I need it in
> HTML,
> > and I don't want a one year old version of Wikipedia, with all the
> useless
> > information on user talk, discussions, etc.
> > Thank you.
>
> There are plenty of options to parse the XML (or just the Mediawiki
> markup) to HTML like :
>
> - http://sourceforge.net/apps/mediawiki/wikiprep/index.php?title=Main_Page
> (the parser is decent but currently
> no real full featured HTML export)
>
> - http://wiki.laptop.org/go/Wiki_Slice (but not using XML as source,
> just stripping down output using ?action=raw)
>
> - https://projects.fslab.de/projects/wpofflineclient/wiki/Specifications
> (but also using the raw action)
>
> (a nice article of how to a static version of Wikipedia :
> http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html)
>
> There is a also a nice list of all the parser available (usually from
> the Mediawiki markup
> to something else) :
>
> http://www.mediawiki.org/wiki/Alternative_parsers
>
> Regarding the XML format, usually you want to seek into the XML and
> look for start of
> <page> and the end of </page> to get the page and look for the <text>
> element containing
> the raw page in mediawiki markup format. So you can use all the
> existing mediawiki
> markup parser as long you have extract the latest revision of the page
> in mediawiki format.
>
> Hope this helps,
>
> adulau
>
> --
> -- Alexandre Dulaunoy (adulau) -- http://www.foo.be/
> -- http://www.foo.be/cgi-bin/wiki.pl/Diary
> -- "Knowledge can create problems, it is not through ignorance
> -- that we can solve them" Isaac Asimov
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l[at]lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Hi Alexandre

Thank you so much for your response! Do you have a method (or preferably a
GUI) that doesn't take insane computer skills to convert XML to HTML? I am
clueless of how to do this. I'm simply a 15 year old student that want a
copy of Wikipedia on my Archos 5. I don't have the time to learn it.
Thanks.
_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.