[Officeshots] Pixelwise comparison of pdf files

Thu Jun 16 10:32:32 CEST 2011

On Thursday, June 16, 2011 07:08:19 AM Milos Sramek wrote:
> I am new on this list, so let me introduce myself first: my name is
> Milos Sramek. I found the officeshots page when I was looking for tools
> to compare quality of ODF renderings created by different applications.
> I have been talking to Michiel  for some time now, and I even have set
> up my own factory (the OOO3.4 a LO3.4 ones, others may follow).

Welcome to the list Milos!
You did some very nice work here.

> Officeshots can create pdfs by many tools so that they can be inspected
> visually. However, only significant differences are visible.  Thus I
> wrote a script, which takes pdfs and compares them - both some simple
> statistics is computed and difference images are created.
> 
> As an example I attach the results for the shah/2 Inserting An Image
> Try.odt file, which shows numeric differences between all pairs of
> available pdfs. The results is in a per cent-like scale:
> 100 means perfect fit of both images, 0 means that no two black pixels
> in the pdf files overlap.
Does a score of 50% mean that 

> You can visually check what do these values mean in difference images,
> which are available in the file
> http://www.viskom.oeaw.ac.at/~milos/shah2-LO3.4.pdf
> Here, the LO3.4 result is compared to other applications (marked by the
> blue labels). Pixels set by just one application are red or cyan, those
> set by both are black. If the numerical result is 100, only black pixels
> are shown. These images are related to line 2 of the table.
> 
> This way, one can see exactly where are the differences. One can also
> see that some application render the text really poorly.
You show results from two families of office suites: MSO and OOo.
My interpretation of these data:
 - Lotus Symphony 3 beta is really beta, it's off the scale
 - MSO is pretty consistent internally except for a larger deviation for the 
ODF add-in
 - OOo is also pretty consistent internally, but a bit less than MSO
 - differences between MSO and OOo are large

The ODF standard does not require this consistency, but many expect it anyway, 
While 100% is required by the spec it would be nice to have anyway. if these 
numbers are way off, they indicate real problems such as incorrect positioning 
or incorrect margins. The matrixi is very useful. It could be extended with 
more soft ODF measures, such as number of pages in a rendered document.

An interesting number for each file would be the number of black pixels and how 
that differs between files. It will be rare that you will get a score of zero 
between two files. If a file A has a percentage Ap of random black and Bp has a 
percentage Bp of random black, and the score will be Ap*Bp. At 20% black, the 
score would be 4. I think this is close enough to zero that there is no need 
for a random baseline.

> Currently I use this tool to prepare a study, which I would like to sent
> to Slovak government as an argument that ODF is the right standard to
> use. I have, however some problems with the officeshots service:
> 
> - the server with MSO software is nearly always down. Is there any idea
> when it will be up? I would like to send a couple of files for testing.
> I think that I can even install and run such factory, but I do not own
> licenses for Windows and MSO.
> - In spite of the fact that my factory with LO3.4 and OOO3.4 is running
> (when I send a file for testing, it is correctly processed), most files
> of the available galleries are  not processed by LO3.4 and OOO3.4. (e.g.
> see
> http://www.officeshots.org/galleries/view/opendocument-fellowship-test-suit
> e) Is it somehow possible to force the system to rerun all tests on a
> certain factory?
> 
> If you want to test the script, it is attached.

Cheers,
Jos

[1] http://cran.r-
project.org/web/packages/HSAUR/vignettes/Ch_principal_components_analysis.pdf