[Officeshots] Interoperability testing - first results

Milos Sramek sramek.milos at gmail.com
Sat Apr 20 12:55:10 CEST 2013


Thank you, Rob, for the good idea.
I took me some time to find the historic OOO3.3, but I have found it and 
have produced some results.
You can download them from http://ubuntuone.com/0UxhVxU5E1Fr7uztOwgX0i

I do these tests because I take part (as a volunteer) in 
interoperability testing initiated by the Slovak government - they were 
pressed by the public to start to consider using open source software. 
What we want to do first is to map the situation in interoperability of 
office applications and formats. Simultaneously, we want to prepare a 
list of document elements which can be safely used and which should be 
avoided. According to the attached file the set of safe elements is 
currently very narrow. I hope that, pointing to this situation, we can 
persuade the developers on all sides to improve it.

Technically, the testing strategy is as follows:
1. for each program (AO33, AO34, LO40 and MS13) there is an original 
file in odt (AO33, AO34, LO40) or docx (MS13).
example file: formatting-AO34.odt
2. Originals are converted to rtf and docx ((AO33, AO34, LO40) and odt 
and rtf (MS13)
example file: formatting-AO34.rtf

In both 1 an 2 AO34 means that the file was produced by AO34

3. each of these files is printed by all tested programs. Files as 
formatting-AO34.rtf.MS13.pdf are created which means:
File was created by AO34 and printed by MS13

4. From these pdfs  two types of renderings were created
- side by side, where on the left is the original 
(formatting-AO34.odt.AO34.rtf) and on the right the file being compared
- overlays (-ov in file name), with the same combination of files

The side-by-side renderings are more useful to see major differences, 
overlays are better for details, which may become otherwise unnoticed.
The 'original' file was always created manually and therefore originals 
may mutually differ. Therefore I never compare, say, 
formatting-AO34.rtf.MS13.pdf
to formatting-LO40.rtf.MS13.pdf

For example, from the file spaces-ov-odt.pdf we can see:
in numbered lists (numbering)
- MS13 makes a little bigger interline spaces in files originating from 
AO33, AO34, LO40
- MS13 (probably) saves odf incorrectly, since all programs (including 
itsef) display them equally bad

In chapter titles (chapter)
- files created by MS13 and AO34 are significantly different if opened 
in other programs
- files created by AO33 and LO40  are the same if opened in other programs

In bullet lists (bullets)
- MS13 makes a little bigger interline spaces in files originating from 
AO33, AO34, LO40
- MS13 (probably) saves odf incorrectly, since all programs (including 
itsef) display them equally bad

So, we can see that paragraph spacing is interpreted in a different way 
by different programs. For rtf and odf files the situation is similar.
Do we know which rendering is correct and which not? Do some 'absolute' 
measures exist, say, by measuring distances after printing on paper?
The example of lists (see above):
Was the implementation in AO33 correct and a bug was introduced in AO34? 
Or, was it incorrect (and as such taken to LO40) and AO34 is correct 
now? Or, perhaps the MS implementation is the right one? Who is the 
authority to decide?

With best regards
Milos

Dňa 18.04.2013 22:03, robert_weir at us.ibm.com wrote / napísal(a):
> Hello Milos,
>
> This is very interesting.  Thanks for doing it.  As you know, AOO and 
> LO have a common ancestry.  So to track down the source of the 
> variation one approach would be to do a comparison of the rendering of 
> the common ancestor, OpenOffice.org 3.3.0, to the current AOO and LO. 
>  That would give a very good clue to what is happening here.
>
> Regards,
>
> -Rob
>
>
> officeshots-bounces at nlnet.nl wrote on 04/16/2013 02:36:38 AM:
>
> > From: Milos Sramek <sramek.milos at gmail.com>
> > To: Officeshots support and development <officeshots at nlnet.nl>
> > Date: 04/16/2013 02:38 AM
> > Subject: [Officeshots] Interoperability testing - first results
> > Sent by: officeshots-bounces at nlnet.nl
> >
> > Hi,
> >
> > I have implemented the interoperability testing tools, which I
> > mentioned in my previous mails. Currently without network access -
> > that would be nice in the future. I convert to other office formats
> > and to pdf by command line tools:
> > LibreOffice - directly calling LO from command line
> > Apache OO - using its "server mode and the DocumentConverter.py script
> > MSO: using the OfficeConvert tool (MSO 2013, I used the version sent
> > here by Pasqual - thank you, Pasqual for that)
> >
> > In the attached files you can see how are odf documents displayed by
> > LO40, AO34 and MSO13. To enhance the differences, I overlayed the
> > documents - black means perfect fit, red/cyan means poor fit 
> (odt-ov-srt.pdf).
> >
> > One can see there that the main interoperability problem of LO and
> > AOO on one side and MSO on the other side is line and paragraph
> > spacing - There are also other differences, but these are mostly
> > 'shadowed" by the shifted lines. Do you have an idea why is it so?
> > Ist it just buggy implementation or is it caused by possible
> > ambiguity in interpretation of the ODF standard?
> >
> > Such differences in formating are well visible and, I think, are a
> > hurdle in acceptance of ODF as a platform independent standard. Do
> > you have an idea what to do with that? Talk to developers at AOO,
> > LO, Microsoft and alsewhere and persuade them to implement ODF in a
> > consistent way?
> >
> > best regards
> > Milos
> >
> >
> > Dňa 10.03.2013 16:12, Milos Sramek wrote / napísal(a):
> > Thank you Rob, for pointing me to this tool.
> >
> > It looks like that its purpose is to analyze problems with opening a
> > file in the binary format. I am more concerned with rendering. I is
> > a common situation that diffrent tools render in a different way I
> > would like to know how severe this is.
> >
> > Milos
> >
> > Dňa 08.03.2013 14:41, robert_weir at us.ibm.com wrote / napísal(a):
> > officeshots-bounces at nlnet.nl wrote on 03/07/2013 04:50:46 PM:
> >
> > > From: Michiel Leenaars <michiel.ml at nlnet.nl>
> > > To: Officeshots support and development <officeshots at nlnet.nl>,
> > > Date: 03/07/2013 04:51 PM
> > > Subject: Re: [Officeshots] interoperability testing
> > > Sent by: officeshots-bounces at nlnet.nl
> > >
> > > Hi Milos,
> > >
> > > > Perhaps there is a chance in Bratislava to set up a Windows 
> machine with
> > > > running MSO 2007/10/13 for our testing purposes. If you find that
> > > > interesting, I  can ask Microsoft representatives here in 
> Bratislava, if
> > > > they would be willing to donate the necessary licenses.
> > >
> > > I have some licences for testing machines for MSO 2007 for you. John
> > > Haug or Jim Thatcher might be able to provide copies for c2010 and 
> 2013
> > > - as he and Doug have done in prior instances - I will send their
> > > contact details to you off list.
> > >
> > > There is no reason other document conversions shouldn't work with 
> minor
> > > modifications, I think we designed it generically. Of course there 
> are
> > > no validators for the binary formats for instance, but it would be
> > > useful without already as Jos remarked.
> > >
> >
> > Actually, have you seen this:
> >
> > 
> http://msdn.microsoft.com/en-us/library/office/gg649868%28v=office.14%29.aspx
> >
> > This was beta with Office 2010.  I don't know if Microsoft took this
> > further, but it could be useful for testing binary format documents.
> >
> > -Rob
> >
> >
> > > OfficeConvert is able to already handle all conversions that MS Office
> > > performs, and the Officeshots python factory is trivial to change. I
> > > think the biggest part is for someone to go through the Officeshots
> > > server code, and dig into the definition of file types. And of 
> course we
> > > need to fit it into the UI.
> > >
> > > Anyone on this list fluent enough in PHP to go there?
> > >
> > > Best,
> > > Michiel
> > >
> > > > Regarding the officeshots API: would it be possible to use it 
> also to do
> > > > other types of conversions, not only odf->odf->pdf? Currently, 
> within a
> > > > larger project, I work on a user study, which would enable for
> > > > quantitative  evaluation of a "level of interoperability" 
> between office
> > > > applications on the basis of various document standards. The 
> study is
> > > > based on automated conversion of documents (similar to the 
> officeshots
> > > > framework, only running locally). Conversion by AOO, LO an 
> Google Docs
> > > > is already implemented this way (not a big deal, GD using their 
> API). I
> > > > would, however like to convert the documents also by MSO. I hope 
> that
> > > > using the officeshots API and the above mentioned machine it 
> would be
> > > > possible.
> > >
> > > _______________________________________________
> > > Officeshots mailing list
> > > Officeshots at nlnet.nl
> > > https://open.nlnet.nl/mailman/listinfo/officeshots
> > >
> >
>
> > _______________________________________________
> > Officeshots mailing list
> > Officeshots at nlnet.nl
> > https://open.nlnet.nl/mailman/listinfo/officeshots
>
> >
>
> > --
> > email & jabber: sramek.milos at gmail.com
>
> >
>
> > --
> > email & jabber: sramek.milos at gmail.com
> > [attachment "odt-srt.pdf" deleted by Robert Weir/Cambridge/IBM]
> > [attachment "odt-ov-srt.pdf" deleted by Robert Weir/Cambridge/IBM]
> > _______________________________________________
> > Officeshots mailing list
> > Officeshots at nlnet.nl
> > https://open.nlnet.nl/mailman/listinfo/officeshots


-- 
email & jabber: sramek.milos at gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open.nlnet.nl/pipermail/officeshots/attachments/20130420/d2fe8b33/attachment.html>


More information about the Officeshots mailing list