Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

MediaWiki to Latex Converter

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


hugo at bluewatersys

Dec 12, 2004, 1:18 PM

Post #1 of 15 (391 views)
Permalink
MediaWiki to Latex Converter

Hi everyone,

I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.

I would prefer a script written in Python, but any recommendations
would be very welcome.

Do you know of anything suitable?

Kind Regards,
Hugo Vincent,
Bluewater Systems.


magnus.manske at web

Dec 12, 2004, 3:13 PM

Post #2 of 15 (372 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

Hugo Vincent wrote:

> Hi everyone,
>
> I recently set up a MediaWiki
> (http://server.bluewatersys.com/w90n740/) and I need to extra the
> content from it and convert it into LaTeX syntax for printed
> documentation. I have googled for a suitable OSS solution but nothing
> was apparent.
>
> I would prefer a script written in Python, but any recommendations
> would be very welcome.
>
> Do you know of anything suitable?


I don't know an existing solution, *but* you could help with the
wiki2xml parser (bison format) which was started by Timwi.

It should be (relatively) easy to convert from XML to LaTeX within the
MediaWiki software. I have already started a demo XML-to-XHTML parser
(in CVS HEAD). The output could be adjusted to generate LaTeX, PDF, RTF,
or even wiki code (wikitext beautifier!).

That would be a long-term investment, so to speak, but I'm certain it
will pay off.

Magnus


elian at djini

Dec 12, 2004, 5:35 PM

Post #3 of 15 (383 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

Hiho,

Hugo Vincent wrote:

> I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
> and I need to extra the content from it and convert it into LaTeX
> syntax for printed documentation. I have googled for a suitable OSS
> solution but nothing was apparent.

http://sourceforge.net/projects/wikipdf/
http://de.wikipedia.org/wiki/Wikipedia:PDF-Generator
it generates latex files and runs it through pdflatex.

I don't know if it is usable in the current state, though (last time I
checked it missed table and image support)

greetings,
elian


hugo at bluewatersys

Dec 13, 2004, 1:06 PM

Post #4 of 15 (375 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

Thanks everyone,

I decided to write my own, using reg-ex substitutions, done in Python.
Its about 90% there - I will post it online somewhere when I am done.

Kind Regards,
Hugo Vincent.


hunniger at cip

Jun 16, 2012, 1:51 AM

Post #5 of 15 (350 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

Hugo Vincent <hugo <at> bluewatersys.com> writes:

>
> Hi everyone,
>
> I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
> and I need to extra the content from it and convert it into LaTeX
> syntax for printed documentation. I have googled for a suitable OSS
> solution but nothing was apparent.
>
> I would prefer a script written in Python, but any recommendations
> would be very welcome.
>
> Do you know of anything suitable?
>
> Kind Regards,
> Hugo Vincent,
> Bluewater Systems.
>

This problem is actually sovled there is an easy way to export mediawiki
articles to LaTeX and PDF.

see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Yours Dirk Hünniger




_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


svippy at gmail

Jun 16, 2012, 3:03 AM

Post #6 of 15 (349 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 16 June 2012 10:51, Dirk Hünniger <hunniger [at] cip> wrote:

> This problem is actually sovled there is an easy way to export mediawiki
> articles to LaTeX and PDF.
>
> see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Interesting, but why is it so large? Is the source code available?

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hunniger at cip

Jun 16, 2012, 3:25 AM

Post #7 of 15 (347 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 06/16/2012 12:03 PM, Svip wrote:
> Interesting, but why is it so large? Is the source code available?
The source code is available here

http://wb2pdf.svn.sourceforge.net/viewvc/wb2pdf/

The Binary is large because it contains everything necessery to compile
the generated LaTeX code, which is basically a full installation of MikTeX.
Yours Dirk Hünniger

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jun 16, 2012, 8:53 AM

Post #8 of 15 (338 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 16/06/12 10:51, Dirk Hünniger wrote:
> This problem is actually sovled there is an easy way to export mediawiki
> articles to LaTeX and PDF.
>
> see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf
>
> Yours Dirk Hünniger

How does it compare with
http://www.mediawiki.org/wiki/Extension:Wiki2LaTeX ?

Also, are you aware you're replying to an 8 years old thread?



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hunniger at cip

Jun 16, 2012, 9:31 AM

Post #9 of 15 (338 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 06/16/2012 05:53 PM, Platonides wrote:
> On 16/06/12 10:51, Dirk Hünniger wrote:> This problem is actually sovled there is an easy way to export mediawiki> articles to LaTeX and PDF.> > see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf> > Yours Dirk Hünniger
> How does it compare withhttp://www.mediawiki.org/wiki/Extension:Wiki2LaTeX ?
>
I invested much more time in the development. So it is probably more
complete. If you really want to know I can make a feature by feature
list. But its going to be very long.

Just to give you an idea how deeply I went into detail I give you a
question I had to think about. If a table is very wide, it has to be
landscape, but if it is a nested one it must not. And if it as very long
it has to span several pages. And if it begins with a set of rows
continuously containing at least on header cell each, those rows have to
be repeated on top of each new page of the table. And by the way what
happens if these cells contain footnotes.

Sounds like fun?

An important advantage for the user is that you can immediately use it
in wikipedia, wikibooks, etc.
This is because it is running on the client side.

On the other hand Wiki2LaTeX runs on the server side. That means it
needs to be installed by the administrator of the Wiki.

I will also provide a server side version of my software if requested to
do so.

Yours Dirk

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jun 16, 2012, 9:49 AM

Post #10 of 15 (340 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 16/06/12 12:25, Dirk Hünniger wrote:
> On 06/16/2012 12:03 PM, Svip wrote:
>> Interesting, but why is it so large? Is the source code available?
> The source code is available here
>
> http://wb2pdf.svn.sourceforge.net/viewvc/wb2pdf/
>
> The Binary is large because it contains everything necessery to compile
> the generated LaTeX code, which is basically a full installation of MikTeX.
> Yours Dirk Hünniger

Have you heard of dependencies?
You have to download a 364M file, which extracts to 898M
Of those 94M are Linux-specific. The rest includes miktex files, object
files, dlls, exes, imagemagick, tcl/tk, Olson db...
The real code seem to lie at trunk/wb2pdf/trunk/src, being just 4MB.

And if we look at the linux version, it isn't better. It does not only
place everything into a /usr/bin subfolder, it copies everything (90M)
to /tmp on each run. Completely oblivious of security.
Running this program on a shared system is a vulnerability on itself.

Why don't you make a package with just the wb2pdf specific files?
Also, temporary build files are not needed on a release.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hunniger at cip

Jun 16, 2012, 10:14 AM

Post #11 of 15 (335 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 06/16/2012 06:49 PM, Platonides wrote:
> Have you heard of dependencies?You have to download a 364M file, which extracts to 898MOf those 94M are Linux-specific. The rest includes miktex files, objectfiles, dlls, exes, imagemagick, tcl/tk, Olson db...The real code seem to lie at trunk/wb2pdf/trunk/src, being just 4MB.
> And if we look at the linux version, it isn't better. It does not onlyplace everything into a /usr/bin subfolder, it copies everything (90M)to /tmp on each run. Completely oblivious of security.Running this program on a shared system is a vulnerability on itself.
> Why don't you make a package with just the wb2pdf specific files?Also, temporary build files are not needed on a release.


I provide one download that is easy to use for any user of both Linux an
Windows. Thus is obviously contains files unnecessary for each of the
two operating systems. I have heard of dependencies and the .deb
contains a lot of them, and they are downloaded when it is installed. I
can produce a higher quality .deb file. It will still be 90MByte because
I need a full Unicode font. To be precise I need twelve variants of it
and thats the 90MByte. I essentially did the tmp trick in order to get
around the work of researching where to install each file and to
properly fix the path names in the code and to test that. So for now you
can run the software, you can test every feature you want, and if you or
somebody else decided s/he wants to use it, I will make a .deb file that
fits yours needs. This will probably take two weeks, with most of the
time being spent on chose proper directories.

Yours Dirk

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jun 16, 2012, 4:50 PM

Post #12 of 15 (332 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

On 16/06/12 19:14, Dirk Hünniger wrote:
> On 06/16/2012 06:49 PM, Platonides wrote:
>> Have you heard of dependencies?You have to download a 364M file, which
>> extracts to 898MOf those 94M are Linux-specific. The rest includes
>> miktex files, objectfiles, dlls, exes, imagemagick, tcl/tk, Olson
>> db...The real code seem to lie at trunk/wb2pdf/trunk/src, being just
>> 4MB.
>> And if we look at the linux version, it isn't better. It does not
>> onlyplace everything into a /usr/bin subfolder, it copies everything
>> (90M)to /tmp on each run. Completely oblivious of security.Running
>> this program on a shared system is a vulnerability on itself.
>> Why don't you make a package with just the wb2pdf specific files?Also,
>> temporary build files are not needed on a release.
>
>
> I provide one download that is easy to use for any user of both Linux an
> Windows. Thus is obviously contains files unnecessary for each of the
> two operating systems.
If it was just a few extra MB, I could agree. But 94M / 800M IMHO are
past the point here you should split per OS.

> I have heard of dependencies and the .deb
> contains a lot of them, and they are downloaded when it is installed. I
> can produce a higher quality .deb file.

> It will still be 90MByte because
> I need a full Unicode font. To be precise I need twelve variants of it
> and thats the 90MByte.
You mean the mega font? That's actually 207M uncompressed :)
That should probably go to a different package (and depend on it). I
don't see why it couldn't fallback to another available font if it's not
available, though.
Many wikis are written in just a tiny subset of unicode.

It seems you're creating it from wqyzenhei + unifont + freeserif fonts.
Why do you need to merge them?


> I essentially did the tmp trick in order to get
> around the work of researching where to install each file and to
> properly fix the path names in the code and to test that.

In case of doubt, you should have placed the folder in /usr/lib
A number of would be better placed at /usr/share, though.
But I'm not sure what are many files.
For instance, what's the purpose of geturl and pa programs?

And why do you have copies at bin/ and dist/build? Furthermore, why are
they different?
Build artifacts are also common there.

> So for now you
> can run the software, you can test every feature you want, and if you or
> somebody else decided s/he wants to use it, I will make a .deb file that
> fits yours needs. This will probably take two weeks, with most of the
> time being spent on chose proper directories.

I feel a bit wary of running that :S


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hunniger at cip

Jun 17, 2012, 12:21 AM

Post #13 of 15 (330 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

> You mean the mega font? That's actually 207M uncompressed :)
> That should probably go to a different package (and depend on it). I
> don't see why it couldn't fallback to another available font if it's not
> available, though.
The point is that the change of the font has to happen inside a run of
LaTeX compiler. I tried that and it sometimes works but often the
compiler does not produce any output if I do that. So the best is to
give the compiler one font for the whole document and let run with that.
>
> It seems you're creating it from wqyzenhei + unifont + freeserif fonts.
> Why do you need to merge them?
I merged them because changing the font in LaTeX does not always work,
especially inside headings which become part of the table of contents.
>> I essentially did the tmp trick in order to get
>> around the work of researching where to install each file and to
>> properly fix the path names in the code and to test that.
> In case of doubt, you should have placed the folder in /usr/lib
> A number of would be better placed at /usr/share, though.
> But I'm not sure what are many files.
> For instance, what's the purpose of geturl and pa programs?
The main part of the program is written in the wonderful and easy to
learn purely functional programming language Haskell. Some minor parts
are written in Python3, these two parts need to communicate. Currently
pa and geturl are binaries created by the Haskell Compiler ghc. pa is
essitially a compiler for the mediawiki language, it parses to a tree
and writes it down as LaTeX. The problem with the mediawiki language is
that it allows improper bracketing of tags and thus is not context free
and thus there is no BNF for it and thus all normal parsers are ruled
out and thus you need to use a more obscure technology like monadic
parser combinators in Haskell.

But since you seem to have a good idea where to put which file, you
maybe could give me some hints on that, since that would make my work
much easier.
> And why do you have copies at bin/ and dist/build? Furthermore, why are
> they different?
> Build artifacts are also common there.
I will remember this for future versions of the deb file. Essentially I
only need the stuff in the bin directory. The stuff in the build
directory is just created by the ghc build tools.

Yours Dirk

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


hunniger at cip

Jun 18, 2012, 4:31 AM

Post #14 of 15 (322 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

> You mean the mega font? That's actually 207M uncompressed :)
> That should probably go to a different package (and depend on it). I
> don't see why it couldn't fallback to another available font if it's not
> available, though.
I could indeed work without that font. But in this case I will create
font switching commands in the latex file. This means that it won't
compile with pdflatex, since that does not allow font switching inside
headings. Furthermore the LaTeX file will become significantly less
readable. I also cannot put the fonts to another package, since the
Debian project is not going to accept that package, as I just
investigated. So essentially it is not possible to create a
significantly better deb file from my point of view.
Yours Dirk

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dirk.hunniger at googlemail

Aug 4, 2013, 1:05 AM

Post #15 of 15 (46 views)
Permalink
Re: MediaWiki to Latex Converter [In reply to]

Hello,
I made a new debian package, which resolves the security issues you
mentioned.
It is available here:
http://sourceforge.net/projects/wb2pdf/files/mediawiki2latex/6.5/
Yours Dirk

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.