[Officeshots] Pixelwise comparison of pdf files

Fri Jun 24 18:09:42 CEST 2011

Hi,

thank you for your support!

I'll try to respond by this summary mail, by picking from all responses

> In practice this would mean that the tool has to have a mode where its much 
> less sensitive to one or two pixels difference. So that subtle differencs in the 
> way that the text is rendered are ignored.
> On the other hand it could be much more sensitive to changes that indicate 
> whole characters offset.   Essentially;  the more connected pixels are 
> different, the stronger the indication of a structural problem.
> And such a problem would be very valuable to find since the human eye quickly 
> misses those details..
Yes, just tiny differences in glyph rendering should be avoided.
Normally, such difference made the fit to fall down to 80-90% and can
hide an error in, say, a single character. Maybe some morphologic
operations can help here, imagemagick, which I used, supports them. So,
I see the future of this feature positive.

> You show results from two families of office suites: MSO and OOo.
> My interpretation of these data:
 - Lotus Symphony 3 beta is really beta, it's off the scale
 - MSO is pretty consistent internally except for a larger deviation for
the > ODF add-in
 - OOo is also pretty consistent internally, but a bit less than MSO
 - differences between MSO and OOo are large

> The ODF standard does not require this consistency, but many expect it
anyway,
> While 100% is required by the spec it would be nice to have anyway. if
these
> numbers are way off, they indicate real problems such as incorrect
positioning
> or incorrect margins. The matrixi is very useful. It could be extended with
> more soft ODF measures, such as number of pages in a rendered document.

Yes, many of the test tools are really Beta, with capital B. My primary
concern now is the difference between the OOO and MSO families, since
they seem to be the most important ones. These  comparisons can be,
perhaps, of interest to the developers.

Regarding the consistency:
> The ODF standard does not require this consistency, but many expect it
> anyway,
I do not understand. Does this mean that there is some level of
variability allowed, in the sense of HTML, which allows to render
differently in various environments?

> One concern with interpreting the data is that a single layout error can 
> cause all of the document after that error to be incorrect.  So it is hard 
> to distinguish between a single error and many independent errors.

>I wonder whether something similar 
> could be done in the document comparisons, e.g., resync to the next 
> paragraph.

There is an easy solution: each tested feature should start on a new
page. I tried that with the same document as before. The results are
attached, pdfs are available here:
http://pirin.viskom.oeaw.ac.at/~milos/compare-2.tgz

One can see there that, for example OOO/LO differs from MSO only in one
test - a centered image with wrapping. It is however, not clear, which
one does it correctly. MSO or OOO/LO. Or neither of them?

Numbers in the table are really hard to analyze. The values for slightly
different pages and significantly different ones are very similar. I
played with the morphologic operations a little, but with no success for
now.

Milos

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsltp.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 18298 bytes
Desc: not available
URL: <http://lists.opendocsociety.org/pipermail/officeshots/attachments/20110624/3ea95459/attachment-0001.ods>