[Officeshots] Pixelwise comparison of pdf files
Jos van den Oever
jos at vandenoever.info
Thu Jun 16 10:32:32 CEST 2011
On Thursday, June 16, 2011 07:08:19 AM Milos Sramek wrote:
> I am new on this list, so let me introduce myself first: my name is
> Milos Sramek. I found the officeshots page when I was looking for tools
> to compare quality of ODF renderings created by different applications.
> I have been talking to Michiel for some time now, and I even have set
> up my own factory (the OOO3.4 a LO3.4 ones, others may follow).
Welcome to the list Milos!
You did some very nice work here.
> Officeshots can create pdfs by many tools so that they can be inspected
> visually. However, only significant differences are visible. Thus I
> wrote a script, which takes pdfs and compares them - both some simple
> statistics is computed and difference images are created.
>
> As an example I attach the results for the shah/2 Inserting An Image
> Try.odt file, which shows numeric differences between all pairs of
> available pdfs. The results is in a per cent-like scale:
> 100 means perfect fit of both images, 0 means that no two black pixels
> in the pdf files overlap.
Does a score of 50% mean that
> You can visually check what do these values mean in difference images,
> which are available in the file
> http://www.viskom.oeaw.ac.at/~milos/shah2-LO3.4.pdf
> Here, the LO3.4 result is compared to other applications (marked by the
> blue labels). Pixels set by just one application are red or cyan, those
> set by both are black. If the numerical result is 100, only black pixels
> are shown. These images are related to line 2 of the table.
>
> This way, one can see exactly where are the differences. One can also
> see that some application render the text really poorly.
You show results from two families of office suites: MSO and OOo.
My interpretation of these data:
- Lotus Symphony 3 beta is really beta, it's off the scale
- MSO is pretty consistent internally except for a larger deviation for the
ODF add-in
- OOo is also pretty consistent internally, but a bit less than MSO
- differences between MSO and OOo are large
The ODF standard does not require this consistency, but many expect it anyway,
While 100% is required by the spec it would be nice to have anyway. if these
numbers are way off, they indicate real problems such as incorrect positioning
or incorrect margins. The matrixi is very useful. It could be extended with
more soft ODF measures, such as number of pages in a rendered document.
An interesting number for each file would be the number of black pixels and how
that differs between files. It will be rare that you will get a score of zero
between two files. If a file A has a percentage Ap of random black and Bp has a
percentage Bp of random black, and the score will be Ap*Bp. At 20% black, the
score would be 4. I think this is close enough to zero that there is no need
for a random baseline.
> Currently I use this tool to prepare a study, which I would like to sent
> to Slovak government as an argument that ODF is the right standard to
> use. I have, however some problems with the officeshots service:
>
> - the server with MSO software is nearly always down. Is there any idea
> when it will be up? I would like to send a couple of files for testing.
> I think that I can even install and run such factory, but I do not own
> licenses for Windows and MSO.
> - In spite of the fact that my factory with LO3.4 and OOO3.4 is running
> (when I send a file for testing, it is correctly processed), most files
> of the available galleries are not processed by LO3.4 and OOO3.4. (e.g.
> see
> http://www.officeshots.org/galleries/view/opendocument-fellowship-test-suit
> e) Is it somehow possible to force the system to rerun all tests on a
> certain factory?
>
> If you want to test the script, it is attached.
Cheers,
Jos
[1] http://cran.r-
project.org/web/packages/HSAUR/vignettes/Ch_principal_components_analysis.pdf
More information about the Officeshots
mailing list