Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Forrest: Dev

[RT] On xml infrastructure work

 

 

Forrest dev RSS feed   Index | Next | Previous | View Threaded


stevenn at outerthought

Feb 14, 2002, 1:20 PM

Post #1 of 7 (286 views)
Permalink
[RT] On xml infrastructure work

Looks like I have found a nice new project to grok my teeth upon :-)

Going through the list archives so far, I saw some discussions whether
to use old-fashioned DTD's vs RelaxNG or other schema languages. Being a
primarly document-oriented and old-school-SGML user, I believe this all
boils down to tools support.

Sure RelaxNG somehow supports the best of both world between DTD's and
XML Schemas, i.e. primarly document-oriented structures with the strong
datatyping inherited from the XML Schema world, but I still have pretty
strong feelings when it comes upon deciding what schema language to
enforce upon users of your grammars.

I believe the primary kind of 'users' for Forrest will be document
creators and editors, which means we should opt for a schema language
that is supported by tools for these particular tasks. In my daily life,
floating between creating XML documents for further processing by the
Java/XML tools of the Apache project, I find myself in a multitool
environment, switching between Softquad XMetal, XMLSpy and Excelon
Stylus for creating XML documents and XSLT stylesheets, using MSXML and
Xerces/Xalan for commandline validation and operations on my XML stuff,
and finally doing production runs on my XML collection using
Cocoon(/CLI), the Ant Style task etc etc...

I'm pretty sure not everybody wants to load its laptop/workstation with
a miriad of commercial/opensource tools just for playing around with XML
documents, but I believe we should think about the tools in the document
creation process when deciding on the infrastructure that Forrest will
provide (enforce?) upon its users. Some of them will stick to trusty old
Textpad/Emacs/nameyourbrand, others like to work with more WYSIWYG XML
editors such as XMetal, and yet others perhaps would like Forrest to
come with an XML editing environment preconfigured for the grammars and
tasks at hand. The decision to use catalogs (and their XML variants)
clearly comes from a validation and transportability perspective, and
catalogs are supported by a lot of commercial XML tools, too.

As I stated in my previous mail, I'd be pretty happy to do some work on
providing XMetal-specific configuration files, but I was thinking also
of preparing a non-commercial XML editor for inclusion or supporting
out-of-the-box.

More precisely, I was thinking about Pollo
(http://pollo.sourceforge.net/), which has its own little schema
language but I know that its author (being an ex-colleague) has some
tools to translate between DTD's and the Pollo schema language. I've
been using Pollo quite often when teaching XML & Cocoon courses, it is
pretty configurable and really well build and it comes with an MIT
license. Doing the Pollo configuration work, we can offer our users (the
various XML Apache project teams) with a 'website' and documentation
editing tool if they have no access to other (commercial) tools and want
to work with an XML-aware environment. Doing the XMetal configuration
job, we do the same for .... euh ... people like me who like a spell
checker and stuff when writing (XML) documents.

Besides all these vague arguments, I just wanted to say that I plan to
do some work on Forrest since the idea behind it is quite appealing for
a dochead like me :-)

Regards,

</Steven>


crossley at indexgeo

Feb 15, 2002, 12:19 AM

Post #2 of 7 (278 views)
Permalink
Re: [RT] On xml infrastructure work [In reply to]

Steven Noels wrote:
> Looks like I have found a nice new project to grok my teeth upon :-)
>
> Going through the list archives so far, I saw some discussions whether
> to use old-fashioned DTD's vs RelaxNG or other schema languages. Being a
> primarly document-oriented and old-school-SGML user, I believe this all
> boils down to tools support.
>
> Sure RelaxNG somehow supports the best of both world between DTD's and
> XML Schemas, i.e. primarly document-oriented structures with the strong
> datatyping inherited from the XML Schema world, but I still have pretty
> strong feelings when it comes upon deciding what schema language to
> enforce upon users of your grammars.

Looking at RELAX NG some more, i see that one of its primary
purposes is structural and content validation. I gather that we
would still need the Apache DTDs for other reasons. The
previous discussion was not really looking for a replacement
to DTDs, rather looking for a reliable way to do XML validation.

My aim is to have a validation ability in both Cocoon and
Forrest that ensures that the XML documentation and the
configuration files are reliable.

I feel that your discussion here is about a separate issue,
which is another important part of an "xml infrastructure".
It may be that we need different schema for different purposes.
As they can be completely external to the document instances,
it is OK to have multiple schema. So i do not think that we
need to "enforce" any particlar schema language.

> I believe the primary kind of 'users' for Forrest will be document
> creators and editors, which means we should opt for a schema language
> that is supported by tools for these particular tasks. In my daily life,
> floating between creating XML documents for further processing by the
> Java/XML tools of the Apache project, I find myself in a multitool
> environment, switching between Softquad XMetal, XMLSpy and Excelon
> Stylus for creating XML documents and XSLT stylesheets, using MSXML and
> Xerces/Xalan for commandline validation and operations on my XML stuff,
> and finally doing production runs on my XML collection using
> Cocoon(/CLI), the Ant Style task etc etc...
>
> I'm pretty sure not everybody wants to load its laptop/workstation with
> a miriad of commercial/opensource tools just for playing around with XML
> documents, but I believe we should think about the tools in the document
> creation process when deciding on the infrastructure that Forrest will
> provide (enforce?) upon its users. Some of them will stick to trusty old
> Textpad/Emacs/nameyourbrand, others like to work with more WYSIWYG XML
> editors such as XMetal, and yet others perhaps would like Forrest to
> come with an XML editing environment preconfigured for the grammars and
> tasks at hand. The decision to use catalogs (and their XML variants)
> clearly comes from a validation and transportability perspective, and
> catalogs are supported by a lot of commercial XML tools, too.

Yes, i agree with all that you say ... and yes, "provide"
rather than "enforce".

> As I stated in my previous mail, I'd be pretty happy to do some work on
> providing XMetal-specific configuration files, but I was thinking also
> of preparing a non-commercial XML editor for inclusion or supporting
> out-of-the-box.
>
> More precisely, I was thinking about Pollo
> (http://pollo.sourceforge.net/), which has its own little schema
> language but I know that its author (being an ex-colleague) has some
> tools to translate between DTD's and the Pollo schema language. I've
> been using Pollo quite often when teaching XML & Cocoon courses, it is
> pretty configurable and really well build and it comes with an MIT
> license. Doing the Pollo configuration work, we can offer our users (the
> various XML Apache project teams) with a 'website' and documentation
> editing tool if they have no access to other (commercial) tools and want
> to work with an XML-aware environment. Doing the XMetal configuration
> job, we do the same for .... euh ... people like me who like a spell
> checker and stuff when writing (XML) documents.

It would be excellent to see support for various tools.
This would further encourage reliable XML documents.

I will investigate Pollo further - i tried a while ago but turned
busy with other stuff.

> Besides all these vague arguments, I just wanted to say that I plan to
> do some work on Forrest since the idea behind it is quite appealing for
> a dochead like me :-)

I suspected that we have a lot in common, and now i see
that we are both docheads :-)
--David

> Regards,
>
> </Steven>


stefano at apache

Feb 15, 2002, 10:05 AM

Post #3 of 7 (277 views)
Permalink
Re: [RT] On xml infrastructure work [In reply to]

David Crossley wrote:

> > Besides all these vague arguments, I just wanted to say that I plan to
> > do some work on Forrest since the idea behind it is quite appealing for
> > a dochead like me :-)
>
> I suspected that we have a lot in common, and now i see
> that we are both docheads :-)

Cool, the more picky docheads we wrap around this (count me in), the
better.

Now my comments:

I see two different conceners here about schemas:

1) authoring (here DTDs still are the way to go)

2) validation (here DTDs are showing their age, mostly due to lack of
namespace support and RelaxNG is shining here, mostly due to its
infoset-neutral and namespace-aware behavior)

My idea is to use DTDs to instruct authoring tools and to use RelaxNG to
test that they did the right job.

So you can thin of DTDs as schemas for the client side and RelaxNG as
schemas for the server side.

Does it make sense?

--
Stefano Mazzocchi One must still have chaos in oneself to be
able to give birth to a dancing star.
<stefano [at] apache> Friedrich Nietzsche
--------------------------------------------------------------------


stefano at apache

Feb 15, 2002, 10:05 AM

Post #4 of 7 (278 views)
Permalink
Re: [RT] On xml infrastructure work [In reply to]

David Crossley wrote:

> > Besides all these vague arguments, I just wanted to say that I plan to
> > do some work on Forrest since the idea behind it is quite appealing for
> > a dochead like me :-)
>
> I suspected that we have a lot in common, and now i see
> that we are both docheads :-)

Cool, the more picky docheads we wrap around this (count me in), the
better.

Now my comments:

I see two different conceners here about schemas:

1) authoring (here DTDs still are the way to go)

2) validation (here DTDs are showing their age, mostly due to lack of
namespace support and RelaxNG is shining here, mostly due to its
infoset-neutral and namespace-aware behavior)

My idea is to use DTDs to instruct authoring tools and to use RelaxNG to
test that they did the right job.

So you can thin of DTDs as schemas for the client side and RelaxNG as
schemas for the server side.

Does it make sense?

--
Stefano Mazzocchi One must still have chaos in oneself to be
able to give birth to a dancing star.
<stefano [at] apache> Friedrich Nietzsche
--------------------------------------------------------------------


stevenn at outerthought

Feb 16, 2002, 10:04 PM

Post #5 of 7 (278 views)
Permalink
RE: [RT] On xml infrastructure work [In reply to]

Stefano Mazzocchi wrote:

> Now my comments:
>
> I see two different conceners here about schemas:
>
> 1) authoring (here DTDs still are the way to go)
>
> 2) validation (here DTDs are showing their age, mostly due to lack of
> namespace support and RelaxNG is shining here, mostly due to its
> infoset-neutral and namespace-aware behavior)
>
> My idea is to use DTDs to instruct authoring tools and to use
> RelaxNG to
> test that they did the right job.
>
> So you can think of DTDs as schemas for the client side and RelaxNG as
> schemas for the server side.
>

I for one don't see a real need to validate yet again on the server,
especially if that means maintaining two separate sets of validation
rules in a different schema language. How we will maintain synchronicity
between both sets, except by using some automated tools, seemed like a
major problem to me.

So I went off testing grammar translators, and I've been playing around
with http://www.thaiopensource.com/dtdinst/, result of converting my
local copy of document-v11.dtd to its RelaxNG equivalent is attached.

DTDInst is of 'James Clark'-quality: which means it is documented what
is not supported and we can be pretty sure that the rest will be
up-to-spec. Validating my sample instances against the DTD and generated
RelaxNG version of the document grammar using Jing
(http://www.thaiopensource.com/relaxng/jing.html) worked well.

So although I'm not too keen on the usefullness of 'double-validating'
our documents, at least technically, it will work with minimal hassle.

Bye for now,

</Steven>
Attachments: document-v11.rng.xml (28.7 KB)


crossley at indexgeo

Feb 16, 2002, 10:57 PM

Post #6 of 7 (279 views)
Permalink
Re: [RT] On xml infrastructure work [In reply to]

Steven Noels wrote:
> Stefano Mazzocchi wrote:
>
> > Now my comments:
> >
> > I see two different conceners here about schemas:
> >
> > 1) authoring (here DTDs still are the way to go)
> >
> > 2) validation (here DTDs are showing their age, mostly due to lack of
> > namespace support and RelaxNG is shining here, mostly due to its
> > infoset-neutral and namespace-aware behavior)
> >
> > My idea is to use DTDs to instruct authoring tools and to use
> > RelaxNG to test that they did the right job.
> >
> > So you can think of DTDs as schemas for the client side
> > and RelaxNG as schemas for the server side.
>
> I for one don't see a real need to validate yet again on the server,
> especially if that means maintaining two separate sets of validation
> rules in a different schema language.

My reasoning for wanting Forrest (and Cocoon for that matter)
to be capable of performing validation, is to guarantee that all
XML instance documents are reliable. Thus stylesheets can be
assured about what they are dealing with.

These documents will come from a variety of input sources.
Sure, those sources might say that they have valid documents.
However, perhaps they have used sub-standard tools, perhaps
they are not configured properly, perhaps they did not even
bother with the validation step. It has been my experience that
most document sets have problems.

Your discussion below supports my own investigations in
another thread "experiment with RELAX NG". I think that the
reliable DTDs of Forrest will be able to be converted into
RELAX NG and, with minor tweaks, used to assist validation.
--David

> How we will maintain synchronicity
> between both sets, except by using some automated tools, seemed like a
> major problem to me.
>
> So I went off testing grammar translators, and I've been playing around
> with http://www.thaiopensource.com/dtdinst/, result of converting my
> local copy of document-v11.dtd to its RelaxNG equivalent is attached.
>
> DTDInst is of 'James Clark'-quality: which means it is documented what
> is not supported and we can be pretty sure that the rest will be
> up-to-spec. Validating my sample instances against the DTD and generated
> RelaxNG version of the document grammar using Jing
> (http://www.thaiopensource.com/relaxng/jing.html) worked well.
>
> So although I'm not too keen on the usefullness of 'double-validating'
> our documents, at least technically, it will work with minimal hassle.
>
> Bye for now,
>
> </Steven>


stevenn at outerthought

Feb 17, 2002, 12:00 AM

Post #7 of 7 (278 views)
Permalink
RE: [RT] On xml infrastructure work [In reply to]

David Crossley wrote:

> These documents will come from a variety of input sources.
> Sure, those sources might say that they have valid documents.
> However, perhaps they have used sub-standard tools, perhaps
> they are not configured properly, perhaps they did not even
> bother with the validation step. It has been my experience that
> most document sets have problems.
>
> Your discussion below supports my own investigations in
> another thread "experiment with RELAX NG". I think that the
> reliable DTDs of Forrest will be able to be converted into
> RELAX NG and, with minor tweaks, used to assist validation.

Yep.

So can we agree upon using DTDInst for doing the automagical translation
of DTDs to RelaxNG grammars for the time being, rather than keeping both
versions in sync manually? DTDInst's license is
http://thaiopensource.com/dtdinst/copying.txt, seems OK for inclusion in
CVS. An Ant target who calls dtdinst.jar to do the translation should be
no problem.

I'd prefer the 'tweaking' to happen using XSLT on the generated RNG
grammar - so that is is easy to track DTD changes: using DTDInst and
re-applying tweaks with a stylesheet. Does anyone already has an idea
about the kind of tweaks which would be needed?

W.r.t. namespaces: if we want our docs to be declared in certain
namespace, we can also patch the DTD itself in order to support this:

--------------------
@@ -481,6 +480,7 @@

<!ELEMENT document (header?, body, footer?)>
<!ATTLIST document %common.att;>
+<!ATTLIST document xmlns CDATA #FIXED
"http://apache.org/forrest/ns/document/1.0">

<!-- ==================================================== -->
<!-- Header -->
--------------------

Personally, I'm +0 whether to use namespaces for the docs or not. While
it adds to the transportability of the documents, it also adds a burden
to the stylesheet authors who should make sure to match source elements
in the correct namespace, i.e. 'document:p' instead of just 'p'.

Anyway, let's just finalize the schema/grammar issue so that we can
continue to work on something less debatable ;-)

Regards,

</Steven>

Forrest dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.