
djay at avaya
Oct 31, 2001, 3:39 PM
Post #4 of 7
(1031 views)
Permalink
|
Why not use ID attributes if they are defined in the DTD? That way if someone wants a stable url for every time they parse it then its their responsibility to put in the ids. Just use your other schemes for things that don't have ID attributes. > -----Original Message----- > From: Martijn Faassen [mailto:faassen [at] vet] > Sent: Thursday, 1 November 2001 2:52 AM > To: Rik Hoekstra > Cc: zope-xml [at] zope > Subject: Re: [Zope-xml] Status report > > > Rik Hoekstra wrote: > > <snip> > > > > >What I'm currently working on (not yet merged with the > trunk in CVS) > > >is unique element ids. This will enable a much more stable way to > > >access nodes through URLs than before; a simple insert or > append into the > > >DOM tree now won't change the entire node URL so easily > anymore. This > > >is still work in progress but I'd like to know what people > think. Currently > > >URLs to nodes look like this: > > > > > >http://foobar.com/path/to/doc/0/5/3 > > > > > >where the 0/5/3 part means: go to the 0th node of the > document, take its > > >5th child node, take its 3rd childnode. > > > > > >This is very unstable to changes in the DOM structure. If > I insert a node, > > >I might make node 5 node 6, breaking the URL right away. > > > > I see your point, but this is not undesirable behaviour per > definition. It > > is if your want your url to be persistent, but not if you > actually want to > > resolve http://foobar.com/path/to/doc/0/5/3 to "the 0th node of the > > document, take its 5th child node, take its 3rd childnode" > > That's true; as long as the document doesn't change this is actually > *more* stable in some circumstances, for instance when > there's a reparse. > I'd like a way to get stable references into a document > somehow for all > kinds of purposes, such as annotation and 'hey, we saw this > node before'. > > > >What I'm playing with is a way to add unique ids to > element nodes, so > > >that at least inserting an element won't break everything anymore. > > > > taking that http://foobar.com/path/to/doc/0/5/3 would still > resolve in the > > way expressed above, so that it will still be possible to > retrieve an > > element of which we just know that it is the first child > node of a given DOM > > structure > > You're correct. Perhaps I need another way to arrive at semi-stable > references to nodes, possibly based on XPointer (unfortunately those > are just XPath expressions, and they may be as unstable as anything). > The requirement then would be to be stable through minor edits both > through reparse and DOM manipulation. Perhaps this isn't easily > possible, though. :) > > > I take it that the id would be something random, and not > meaningful? A url > > like http://foobar.com/path/to/doc/0/5/3, or worse something like > > http://foobar.com/path/to/doc/9198274/2394837/9192877 does > not sound very > > attractive to me. > > In the test implementation I'm simply using a document global > counter that > gives each element a unique id. So the first element > encountered during > document construction will be e0, the second e1, then e2, etc. > > > And even then: at what point is a node with a given > identity 919287576 still > > the same, and at what point will it be decided that a change in the > > underlying XML document (and the DOM) that a node will no > longer remain the > > same? After a change in its content? After a change in its > element name > > (even if its content is still the same, just as its place > in the DOM tree)? > > etc > > Yup, this is a tricky problem. Good points. :) > > > >Perhaps it is a good idea to introduce unique ids to other nodes > > >as well, instead of only to elements.. for some reason I didn't > > >do that but I forget now why not. > > > > >Having such a unique id per node does cost a bit of extra > memory per > > >node to store the ids. > > > > > >So, your feedback and opinions, please. > > > > my 0.02 EUR (euro that is, in case it get scrambled along the way) > > I just see EUR here. :) > > I suppose the only way to get a semi-stable link into a document is to > use some heuristics to construct an XPath expression. Of course the > *certain* way is to use actual id attributes embedded in the > document, but > I'd really like to leave the document alone if at all possible. > > So, what kind of heuristics do we need? Let's restrict the problem to > references to element nodes for now; the problem is hard enough and > that would tackle most of the requirements people have in my opinion. > > Element name seems a good idea to start with. Then name of the parent > element is probably a good idea, perhaps a few levels up. This remains > stable under fairly many document edits and changes. > > Attributes used and value is also helpful, I think, and again remains > relatively stable. > > Next we can move on to determine text node contents. If this > is non-whitespace > content we could extract a bit of the text, say the first n > characters, > to get even more of a match. This isn't always possible; perhaps the > first element child node can also give us more of a contextual match, > though this I think is less stable under changes. Sibling nodes is > something else to consider. > > Even if we have a heuristic which works well, we have some other > difficulties: > > * our XPointer/XPath expression probably becomes horribly > long and complicated > > * it is relatively slow to construct and resolve these things > > A completely different approach that I experimented a bit with is > somekind of bookmark ability. This approach would allow one to use > an API to 'bookmark' a reference to a node. You get somekind of number > back, but you just treat it as a token. Put the number back in, and > you'll get a reference to a node again. Internally you'd have a > dictionary mapping bookmark tokens to nodes. > > A couple of problems with this though; once a node is bookmarked it > won't be garbage collected even if not in the tree anymore, as it's > always referenced from the dictionary. The other problem would be > that this isn't stable over reparses. One could of course store the > bookmark as URLs to nodes (either using the simple 0/1/2 approach or > using the complicated heuristic approach) just before any reparse, and > try to reestablish the bookmark dictionary afterwards. Even the same > tokens would be preserved. > > The garbage collecting problem seems like the hardest to crack, though > perhaps we can come at a fairly simple solution. I tried using a > weak dictionary but that didn't seem to want to work with > Zope's extensionclass mechanism. I could use some form of manual > refcount approach, but how? Another way could be to regularly purge > the dictionary of any nodes that aren't connected to the tree. > > Hmm... I need to think more about this, and more feedback and ideas > here would certainly help! > > Regards, > > Martijn > > > _______________________________________________ > Zope-xml mailing list > Zope-xml [at] zope > http://lists.zope.org/mailman/listinfo/zope-xml >
|