
ph10 at cus
Apr 1, 2005, 8:41 AM
Post #1 of 1
(105 views)
Permalink
|
|
Status of documentation conversion
|
|
Well, it's a couple of weeks since my last report, and I'm going to Africa at the end of next week, so here's where I am: After many days' work, I have got the Exim manual into a sort-of usable form as an AsciiDoc document that can be converted to DocBook XML and then processed from there. However, I still need to read it again thoroughly, to look for glaring errors. There will probably be plenty. Then I'll ask for volunteers to read it as well. Here are some comments: AsciiDoc -------- I'm pushing the limitations of AsciiDoc, and indeed, have found a few things that it cannot do satisfactorily. Or at least, I haven't found how to do them. It can't for example, correctly nest one list inside another list item and then revert to the outer list item. You can't end the inner list without ending the outer list item. (There is a fudge for this, but it puts vertical white space in the output.) Also, AsciiDoc markup is no less bizarre than the original SGCAL markup, which isn't really surprising, given that it's supposed to do the same job. I suppose the advantage over SGCAL is that the result is DocBook XML, which is "standard". However, if you get the AsciiDoc markup wrong, it can generate invalid XML. AsciiDoc is written in Python, which makes it quite slow when processing the 400 pages of the Exim manual. Processing the DocBook ---------------------- Using xmlto plus fop to produce PostScript works (slowly), with a lot of "not implemented yet" messages (fop is really still alpha software), but it is typographically quite unsatisfactory at times. Such as when it puts a section heading as the last line of a page. Or when the first line of a page is the end of a paragraph (one example contained just the word "set"). Sigh. I have not found a way of preserving typographic markup in the index entries (it even ignores <quote>xxx</quote>), and it doesn't merge identical page numbers in the index. It also insists on indexing secondary terms as primaries, which is a nonsense in many cases. Who wants an index entry for "specifying"? The HTML output also has its problems. The index seems to point only to section headings rather than into the text, which is pretty useless for Exim's command line options (but I haven't fully investigated this yet). There will have to be pre-processors for the DocBook, to cope with characters not available in various output formats, and probably a post-processor for text output to tidy it up. Status bottom line ------------------ There is still a lot to do on this. Despite my grumbles, there's probably no better option. We can hope that better free XML processors come along. Philip -- Philip Hazel University of Cambridge Computing Service, ph10 [at] cus Cambridge, England. Phone: +44 1223 334714.
|