[WebODF] Questions about core.PositionIterator

Philip Peitsch P.Peitsch at qsrinternational.com
Mon Sep 16 04:16:41 CEST 2013


On 14/09/2013, at 12:14 AM, Friedrich W. H. Kossebau wrote:

> […]
> 
> So these properties:
> * a filter
>  (at least one for empty textnodes, but can be extended by one passed as arg)
> * a root node from pure, unfiltered DOM, its subtree is where the iterator 
> moves inside
> * a position, defined by a container and the current position in it
> Any other things queried are only derived from these core properties.
> Q: Correct?

Yep. Your summary looks correct.

> Q: What is the actual semantic of the container and the positions in it, to 
> what possible points in the DOM can/should container and position point to?

Given the following fragment:
<p>
	a
	<span>simple</span>
	fragment
</p>

For the aforementioned fragment, a list of all exhaustive positions is:
<p>, 0
	"a"#text, 0
	"a"#text, 1
<p>, 1
	<span>, 0
		"simple"#text, 0
		…
		"simple"#text, 6
	<span>, 1
<p>, 2
	"fragment"#text, 0
	…
	"fragment"#text, 8
<p>, 3

The iteration logic in nextPosition and previousPosition will report the following
subset of these positions (calling container() and unfilteredDomOffset() ):
	"a"#text, 0
<p>, 1
		"simple"#text, 0
		…
		"simple"#text, 5
	<span>, 1
	"fragment"#text, 0
	…
	"fragment"#text, 7
<p>, 3

Note, it doesn't return a mapping that needs to be translated, but simply a subset
of the actual positions.

The reasoning behind the subset I'm going to guess is because the eliminated
positions are semantically identical to one or more of the included positions.

E.g.,
(<p>, 0) means "before the first child", while ("a"#text, 0) means before the first
character. These two positions would be visually indistinguishable in a selection,
so one of them can safely be ignored.

In the same way, ("a"#text, 1) means after the last character, which is basically
the same as (<p>, 1) which is after the first child, but before the second child of <p>.

I cannot really explain why this filtering has been done this way, but it appears
that various places up the stack rely on some of these things. E.g., 
TextPositionFilter.acceptPosition will have a runtime assert if the 
unfilteredDomOffset === text.length (OdtDocument, line 204)


> Q: Most API ignores the concept of container and offset and instead talks 
> about nodes/siblings. But both use the same value of the "currentPos" var. 
> Confuses me completely. Why is the container concept irrelevant in those 
> calls? Or do these two different views on the iterator perfectly go along? 

The views go together. nodes/siblings are used for navigation and searching.
Once the desired position is found, container & unfilteredDomOffset return the
location the iterator has ended up at.

currentPos is a really confusingly named variable, as it's behaviour follows one of
two patterns:

For a text node, currentPos = index in text string (0 => text.length - 1)

For an element node, currentPos === 0 until all children within the node are
iterated. In this sense, it is almost the direction of travel. I.e., 
- When currentPos === 0: report current node, then progress to child or sibling
(travel direction is down the tree towards children)
- When currentPos === 1: report current node, then progress back to the parent 
(travel is up the tree towards parents)

> So the states:
> The filter is at least the one for empty textnodes, so always exists and 
> works.
> The root node is required in the API dox, so always exists (but could be moved 
> out of the DOM during the lifetime of the iterator, API dox needs to require 
> that this is not happening perhaps)

Handling of this is already defined in the standard for the TreeWalker. Potentially
the API dox just need to link to this for the technical detail.

> The position is stored via the "walker" object and the "currentPos" variable.
> The passed root node could have a subtree with no positions (after filtering). 
> Q: What values would the properties have in that state?
> Q: Or do we have to require that the root node allows at least one position?

(I want to prefix these answers with the quick disclaimer that PositionIterator lacks
almost any protection from abuse. It works generally in practice at the moment because
there are not that many consumers of the iterator. It is relatively straightforward to cause
the behaviour to run off the rails with the scenarios you've listed.)

There is some constructor checks (as per one of the later questions) that happen
in init(). This basically means the first call to previousPosition/nextPosition will
return false, and the TreeWalker will not change position.

If nextPosition is called and there are no walkable nodes, the function call will
return false, but calls to container() will return rootNode, and unfilteredDomOffset
will return container().childNodes.length. So it will return valid nodes and positions…
just potentially ones that do not pass the supplied node filter (not likely desirable
behaviour).

If setUnfilteredPosition is called when there are no walkable nodes, an assertion
will be thrown (line 295 & 326).

> Q: Should the iterator always be in positions which are allowed by the filter? 
> "moveToEndOfNode()" could resolve in a position that is not valid, because of 
> "walker.currentNode = node;" which will not check against the root node or the 
> filter (/me to blame for that code, but noone cleaned it as well ;) ).
> "setUnfilteredPosition()" is also not checking if the container is inside the 
> root.
> Seems to me the states of the position are not properly defined, or at least 
> slightly fragile with some methods.

Yep. +9000 votes from me on this. In my opinion, PositionIterator should be
further stripped back to essentially be a TreeWalker, with the unfilteredDomOffset
behaviour. Basically, the behaviour of root node and first container after
setUnfilteredOffset should follow the much clearer TreeWalker standard behaviour
in that they may not pass the supplied node filter.

The current holes (thus far) have not resulted in crashes or incorrect behaviour *yet*,
hence why none of them have been fixed. I think the correct approach to this is actually
more around encouraging consumers of the position iterator to have less strict expectations
about the positions returned, as this leads to the consumers being far more robust, and able
to be re-used with arbitrary positions. 

At the moment, there are a bunch of places where a PositionIterator is used purely to try and
find a "valid" position that the TextPositionFilter will understand (e.g., SessionController, line
209). I suspect there are undiscovered bugs that will be caused by "raw" positions being
passed into the TextPositionFilter.

> Questions for the implementation:
> Q: Why does container() return the parentNode in case of currentPos === 0 && t 
> !== Node.TEXT_NODE, and not only if t !== Node.TEXT_NODE?

See previous answer for my best guess. I think this is eliminating some semantically
equivalent positions.

> Q: So what is the semantics of "container", where is the difference to 
> "currentNode"?

container() will report an element's position within it's parent container for
the first position. E.g., in the demo fragment above, it reports (<p>, 1) when
the iterator is actually at (<span>, 0). These are almost identical points.

For any position other than the first, container() === currentNode

I can't see a clear reason as to why this is chosen, other than it makes the implementation
of the unfilteredDomOffset() relatively straight forward.

> Q: In init() why is the currentPos set to 1 if walker.firstChild() === null? 

As noted in one of the previous examples, currentPos is the direction of travel. Setting
it to 1 means "this node is completely iterated and there are no further positions available".
This would actually catch your example of a root node with no walkable children.

> Q: Why seems currentPos === 0 only valid with existing visible child nodes, 
> what?

Previously answered. This is the direction of travel behaviour again :-)

> And than these naming issues continue to confuse me:
> * getter methods named "get*" vs "*"
> * getter and setter methods too similar, e.g. "nextPosition" vs. "rightNode"
> * mixing of "right"/"left" and "next"/previous"

+1 agree with this. There is a difference in query vs move, but I don't think
this is consistently applied (or even consistently used…).

> * "dom" in the "*Offset" methods is redundant, no other offset method

There is another offset method: unfilteredDomOffset

I actually want domOffset eliminated :-). At the moment it is only used in 
OdfDocument.getPositionInTextNode. Usage of this particular method
is fraught with danger, as it is very difficult to translate this back into
*real* DOM positions, because the returned offset is the number of
filtered siblings. As such, one can't just do:

container().childNodes[domOffset()] 

As this is will be incorrect if container() contains a filtered out node
such as an empty text block or a cursor.

> * "setUnfilteredPosition" sounds to me as if there is an unfiltered position 
> possible and that gets set

This naming difference used to make sense when there was a method called
setPosition which could only handle "filtered" container + offset as input pairs. 
It's name (similar to unfilteredDomOffset) is a remnant of a time bygone when
PositionIterator exposed 3 different coordinate systems.

I'd feel comfortable removing the "unfiltered" part out of setUnfilteredPosition
and unfilteredDomOffset.

> 
> Q: What do you think about these renamings:
> // modifying
> moveToNextPosition() <- nextPosition()
> moveToPreviousPosition() <- previousPosition()
> moveToEnd()
> moveToEndOfNode(node)

+1 for these

> setPositionByUnfilteredPosition(container, offset) <- setUnfilteredPosition(*)
> (hm, ideas for better name? moveCloseToUnfilteredPosition?)

setPosition seems clearest to me. The API dox detail the behaviour when this is 
called with a position that would not pass the node filter. There is no longer the
confusion of multiple ways to set the position on a PositionIterator either which
was the whole reason for the "unfiltered" part in the name.

> // querying
> getContainer() <- container()

+1

> getOffset() <- domOffset()

-1. As listed above, this is most likely NEVER the function a consumer wants to
use. I think unfilteredDomOffset should become getOffset :-). And domOffset one
should be renamed to "notTheDomOffsetYoureLookingFor" =)

> getCurrentNode()

+1

> getUnfilteredOffset() <- unfilteredDomOffset()

-1. See previous explanation

> getRightNode() <- rightNode()
> getLeftNode() <- leftNode()
> getPreviousSibling()
> getNextSibling()
> 
> // querying, just used from tests
> getNodeFilter()

+1 for all these.

> Sorry, I too long postponed getting a clue of all this (again), but better now 
> than never.

As you said… better late than never :-). A different pair of eyes is always
good. I fear I have become accustomed to the sharp edges and unthinkingly
avoid them :-/. It would definitely help my sanity if we could
simplify this and reduce the number of ways it can break :-)



More information about the WebODF mailing list