[Officeshots] Interoperability testing - first results

robert_weir at us.ibm.com robert_weir at us.ibm.com
Mon Apr 22 20:36:33 CEST 2013


Milos Sramek <sramek.milos at gmail.com> wrote on 04/20/2013 06:55:10 AM:

> 
> Thank you, Rob, for the good idea.
> I took me some time to find the historic OOO3.3, but I have found it
> and have produced some results.
> You can download them from http://ubuntuone.com/0UxhVxU5E1Fr7uztOwgX0i
> 

Great.  I'm looking through the results now.   Very interesting.   I think 
this could be a useful test for a project to run before each release, to 
compare results with their previous release.


> I do these tests because I take part (as a volunteer) in 
> interoperability testing initiated by the Slovak government - they 
> were pressed by the public to start to consider using open source 
> software. What we want to do first is to map the situation in 
> interoperability of office applications and formats. Simultaneously,
> we want to prepare a list of document elements which can be safely 
> used and which should be avoided. According to the attached file the
> set of safe elements is currently very narrow. I hope that, pointing
> to this situation, we can persuade the developers on all sides to 
> improve it.  
> 

OK.  I have heard others discuss in the past whether it would be possible 
to create a document template that would contain predefined styles that 
had better interoperability than the default styles defined in an editor. 
By using the special template the documents might be more portable.  I 
don't know if anyone has done this, but it sounds like a possible 
approach.


> Technically, the testing strategy is as follows:
> 1. for each program (AO33, AO34, LO40 and MS13) there is an original
> file in odt (AO33, AO34, LO40) or docx (MS13). 
> example file: formatting-AO34.odt
> 2. Originals are converted to rtf and docx ((AO33, AO34, LO40) and 
> odt and rtf (MS13)
> example file: formatting-AO34.rtf
> 


Why RTF format?  Is that something the Slovak government uses frequently? 
I can understand ODT and DOCX, but I thought that DOC would be the other 
popular choice, not RTF.

> In both 1 an 2 AO34 means that the file was produced by AO34
> 
> 3. each of these files is printed by all tested programs. Files as 
> formatting-AO34.rtf.MS13.pdf are created which means:
> File was created by AO34 and printed by MS13
> 
> 4. From these pdfs  two types of renderings were created
> - side by side, where on the left is the original (formatting-
> AO34.odt.AO34.rtf) and on the right the file being compared
> - overlays (-ov in file name), with the same combination of files
> 
> The side-by-side renderings are more useful to see major 
> differences, overlays are better for details, which may become 
> otherwise unnoticed.
> The 'original' file was always created manually and therefore 
> originals may mutually differ. Therefore I never compare, say, 
> formatting-AO34.rtf.MS13.pdf
> to formatting-LO40.rtf.MS13.pdf
> 
> For example, from the file spaces-ov-odt.pdf we can see:
> in numbered lists (numbering)
> - MS13 makes a little bigger interline spaces in files originating 
> from AO33, AO34, LO40
> - MS13 (probably) saves odf incorrectly, since all programs 
> (including itsef) display them equally bad
> 
> In chapter titles (chapter)
> - files created by MS13 and AO34 are significantly different if 
> opened in other programs
> - files created by AO33 and LO40  are the same if opened in other 
programs
> 
> In bullet lists (bullets)
> - MS13 makes a little bigger interline spaces in files originating 
> from AO33, AO34, LO40
> - MS13 (probably) saves odf incorrectly, since all programs 
> (including itsef) display them equally bad
> 
> So, we can see that paragraph spacing is interpreted in a different 
> way by different programs. For rtf and odf files the situation is 
similar. 
> Do we know which rendering is correct and which not? Do some 
> 'absolute' measures exist, say, by measuring distances after 
> printing on paper?

Two different perspectives:

1) From user's perspective, whatever tool they used to author the document 
is "correct".  If it looks different in another edit, it is "wrong" to the 
user.

2) From perspective of the ODF standard, this would depend on the details 
of what markup was generated by the authoring editor.  But generally ODF 
does not put tight constraints on document layout.  It is not a fixed-page 
format like PDF.  So there may be differences among applications.  They 
same is true of RTF and DOCX. 

> The example of lists (see above):
> Was the implementation in AO33 correct and a bug was introduced in 
> AO34? Or, was it incorrect (and as such taken to LO40) and AO34 is 
> correct now? Or, perhaps the MS implementation is the right one? Who
> is the authority to decide?
> 

In this specific case I don't know.  What we typically would do is discuss 
examples like this among the projects, look carefully at the ODF markup 
and the standard and come to a conclusion.  If you are able to post the 
source ODT documents I can investigate this more.

Regards,

-Rob


> With best regards
> Milos
> 
> Dňa 18.04.2013 22:03, robert_weir at us.ibm.com wrote / napísal(a):
> Hello Milos, 
> 
> This is very interesting.  Thanks for doing it.  As you know, AOO 
> and LO have a common ancestry.  So to track down the source of the 
> variation one approach would be to do a comparison of the rendering 
> of the common ancestor, OpenOffice.org 3.3.0, to the current AOO and
> LO.  That would give a very good clue to what is happening here. 
> 
> Regards, 
> 
> -Rob 
> 
> 
> officeshots-bounces at nlnet.nl wrote on 04/16/2013 02:36:38 AM:
> 
> > From: Milos Sramek <sramek.milos at gmail.com> 
> > To: Officeshots support and development <officeshots at nlnet.nl> 
> > Date: 04/16/2013 02:38 AM 
> > Subject: [Officeshots] Interoperability testing - first results 
> > Sent by: officeshots-bounces at nlnet.nl 
> > 
> > Hi,
> > 
> > I have implemented the interoperability testing tools, which I 
> > mentioned in my previous mails. Currently without network access - 
> > that would be nice in the future. I convert to other office formats 
> > and to pdf by command line tools:
> > LibreOffice - directly calling LO from command line
> > Apache OO - using its "server mode and the DocumentConverter.py script
> > MSO: using the OfficeConvert tool (MSO 2013, I used the version sent
> > here by Pasqual - thank you, Pasqual for that)
> > 
> > In the attached files you can see how are odf documents displayed by
> > LO40, AO34 and MSO13. To enhance the differences, I overlayed the 
> > documents - black means perfect fit, red/cyan means poor fit (odt-
> ov-srt.pdf).
> > 
> > One can see there that the main interoperability problem of LO and 
> > AOO on one side and MSO on the other side is line and paragraph 
> > spacing - There are also other differences, but these are mostly 
> > 'shadowed" by the shifted lines. Do you have an idea why is it so? 
> > Ist it just buggy implementation or is it caused by  possible 
> > ambiguity in interpretation of the ODF standard? 
> > 
> > Such differences in formating are well visible and, I think, are a 
> > hurdle in acceptance of ODF as a platform independent standard. Do 
> > you have an idea what to do with that? Talk to developers at AOO, 
> > LO, Microsoft and alsewhere and persuade them to implement ODF in a 
> > consistent way?
> > 
> > best regards
> > Milos
> > 
> > 
> > Dňa 10.03.2013 16:12, Milos Sramek wrote / napísal(a): 
> > Thank you Rob, for pointing me to this tool. 
> > 
> > It looks like that its purpose is to analyze problems with opening a
> > file in the binary format. I am more concerned with rendering. I is 
> > a common situation that diffrent tools render in a different way I 
> > would like to know how severe this is.
> > 
> > Milos
> > 
> > Dňa 08.03.2013 14:41, robert_weir at us.ibm.com wrote / napísal(a): 
> > officeshots-bounces at nlnet.nl wrote on 03/07/2013 04:50:46 PM:
> > 
> > > From: Michiel Leenaars <michiel.ml at nlnet.nl> 
> > > To: Officeshots support and development <officeshots at nlnet.nl>, 
> > > Date: 03/07/2013 04:51 PM 
> > > Subject: Re: [Officeshots] interoperability testing 
> > > Sent by: officeshots-bounces at nlnet.nl 
> > > 
> > > Hi Milos,
> > > 
> > > > Perhaps there is a chance in Bratislava to set up a Windows 
machine with
> > > > running MSO 2007/10/13 for our testing purposes. If you find that
> > > > interesting, I  can ask Microsoft representatives here in 
Bratislava, if
> > > > they would be willing to donate the necessary licenses.
> > > 
> > > I have some licences for testing machines for MSO 2007 for you. John 

> > > Haug or Jim Thatcher might be able to provide copies for c2010 and 
2013 
> > > - as he and Doug have done in prior instances - I will send their 
> > > contact details to you off list.
> > > 
> > > There is no reason other document conversions shouldn't work with 
minor 
> > > modifications, I think we designed it generically. Of course there 
are 
> > > no validators for the binary formats for instance, but it would be 
> > > useful without already as Jos remarked.
> > > 
> > 
> > Actually, have you seen this: 
> > 
> > http://msdn.microsoft.com/en-us/library/office/gg649868%
> 28v=office.14%29.aspx
> > 
> > This was beta with Office 2010.  I don't know if Microsoft took this
> > further, but it could be useful for testing binary format documents. 
> > 
> > -Rob 
> > 
> > 
> > > OfficeConvert is able to already handle all conversions that MS 
Office
> > > performs, and the Officeshots python factory is trivial to change. I 

> > > think the biggest part is for someone to go through the Officeshots 
> > > server code, and dig into the definition of file types. And of 
course we 
> > > need to fit it into the UI.
> > > 
> > > Anyone on this list fluent enough in PHP to go there?
> > > 
> > > Best,
> > > Michiel
> > > 
> > > > Regarding the officeshots API: would it be possible to use it also 
to do
> > > > other types of conversions, not only odf->odf->pdf? Currently, 
within a
> > > > larger project, I work on a user study, which would enable for
> > > > quantitative  evaluation of a "level of interoperability" between 
office
> > > > applications on the basis of various document standards. The study 
is
> > > > based on automated conversion of documents (similar to the 
officeshots
> > > > framework, only running locally). Conversion by AOO, LO an Google 
Docs
> > > > is already implemented this way (not a big deal, GD using their 
API). I
> > > > would, however like to convert the documents also by MSO. I hope 
that
> > > > using the officeshots API and the above mentioned machine it would 
be
> > > > possible.
> > > 
> > > _______________________________________________
> > > Officeshots mailing list
> > > Officeshots at nlnet.nl
> > > https://open.nlnet.nl/mailman/listinfo/officeshots
> > > 
> > 
> 
> > _______________________________________________
> > Officeshots mailing list
> > Officeshots at nlnet.nl
> > https://open.nlnet.nl/mailman/listinfo/officeshots
> 
> > 
> 
> > -- 
> > email & jabber: sramek.milos at gmail.com
> 
> > 
> 
> > -- 
> > email & jabber: sramek.milos at gmail.com
> > [attachment "odt-srt.pdf" deleted by Robert Weir/Cambridge/IBM] 
> > [attachment "odt-ov-srt.pdf" deleted by Robert Weir/Cambridge/IBM] 
> > _______________________________________________
> > Officeshots mailing list
> > Officeshots at nlnet.nl
> > https://open.nlnet.nl/mailman/listinfo/officeshots
> 

> -- 
> email & jabber: sramek.milos at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open.nlnet.nl/pipermail/officeshots/attachments/20130422/2bf37fae/attachment-0001.html>


More information about the Officeshots mailing list