[Officeshots] Interoperability testing - first results

Milos Sramek sramek.milos at gmail.com
Tue Apr 23 22:10:36 CEST 2013


Dear Rob,

than you for the explanations, my text is below.

Dňa 22.04.2013 20:36, robert_weir at us.ibm.com wrote / napísal(a):
> Milos Sramek <sramek.milos at gmail.com> wrote on 04/20/2013 06:55:10 AM:
>
> >
> > Thank you, Rob, for the good idea.
> > I took me some time to find the historic OOO3.3, but I have found it
> > and have produced some results.
> > You can download them from http://ubuntuone.com/0UxhVxU5E1Fr7uztOwgX0i
> >
>
> Great.  I'm looking through the results now.   Very interesting.   I 
> think this could be a useful test for a project to run before each 
> release, to compare results with their previous release.
This was exactly the idea, which I wanted to express, when I presented 
this tool at Plugfest in Berlin. Meanwhile I just extended it to rtf and 
docx and made it standalone, not dependent on the Officeshots 
infrastructure. I want to write some documentation, then I will post it 
somewhere, to be used. I believe it will contribute to better 
interoperability.
>
>
> > I do these tests because I take part (as a volunteer) in
> > interoperability testing initiated by the Slovak government - they
> > were pressed by the public to start to consider using open source
> > software. What we want to do first is to map the situation in
> > interoperability of office applications and formats. Simultaneously,
> > we want to prepare a list of document elements which can be safely
> > used and which should be avoided. According to the attached file the
> > set of safe elements is currently very narrow. I hope that, pointing
> > to this situation, we can persuade the developers on all sides to
> > improve it.
> >
>
> OK.  I have heard others discuss in the past whether it would be 
> possible to create a document template that would contain predefined 
> styles that had better interoperability than the default styles 
> defined in an editor.  By using the special template the documents 
> might be more portable.  I don't know if anyone has done this, but it 
> sounds like a possible approach.
A good idea.
>
>
> > Technically, the testing strategy is as follows:
> > 1. for each program (AO33, AO34, LO40 and MS13) there is an original
> > file in odt (AO33, AO34, LO40) or docx (MS13).
> > example file: formatting-AO34.odt
> > 2. Originals are converted to rtf and docx ((AO33, AO34, LO40) and
> > odt and rtf (MS13)
> > example file: formatting-AO34.rtf
> >
>
>
> Why RTF format?  Is that something the Slovak government uses 
> frequently?  I can understand ODT and DOCX, but I thought that DOC 
> would be the other popular choice, not RTF.
Unfortunately, RTF is the main format for Slovak public administration. 
We have a law, which states that open standards should be used in 
communication with citizens. This law exists since 2006 and that time 
somebody decided that for text documents rtf, odt, pdf and html should 
be used. Since Slovak  public administration uses only MSO (is free for 
them - the taxpayers pay for that), they use rtf and pdf, but no odt.
There is a chance that this rule will be modified soon and that either 
pdf or html should be used (alone, if editing is not required)  or odt 
and docx (together, if editing is required). Therefore, the question of 
interoperability on the basis of odt and docx is important for us. We 
included rtf mainly to show, how bad is it in multiplatform environment :D.
>
> > In both 1 an 2 AO34 means that the file was produced by AO34
> >
> > 3. each of these files is printed by all tested programs. Files as
> > formatting-AO34.rtf.MS13.pdf are created which means:
> > File was created by AO34 and printed by MS13
> >
> > 4. From these pdfs  two types of renderings were created
> > - side by side, where on the left is the original (formatting-
> > AO34.odt.AO34.rtf) and on the right the file being compared
> > - overlays (-ov in file name), with the same combination of files
> >
> > The side-by-side renderings are more useful to see major
> > differences, overlays are better for details, which may become
> > otherwise unnoticed.
> > The 'original' file was always created manually and therefore
> > originals may mutually differ. Therefore I never compare, say,
> > formatting-AO34.rtf.MS13.pdf
> > to formatting-LO40.rtf.MS13.pdf
> >
> > For example, from the file spaces-ov-odt.pdf we can see:
> > in numbered lists (numbering)
> > - MS13 makes a little bigger interline spaces in files originating
> > from AO33, AO34, LO40
> > - MS13 (probably) saves odf incorrectly, since all programs
> > (including itsef) display them equally bad
> >
> > In chapter titles (chapter)
> > - files created by MS13 and AO34 are significantly different if
> > opened in other programs
> > - files created by AO33 and LO40  are the same if opened in other 
> programs
> >
> > In bullet lists (bullets)
> > - MS13 makes a little bigger interline spaces in files originating
> > from AO33, AO34, LO40
> > - MS13 (probably) saves odf incorrectly, since all programs
> > (including itsef) display them equally bad
> >
> > So, we can see that paragraph spacing is interpreted in a different
> > way by different programs. For rtf and odf files the situation is 
> similar.
> > Do we know which rendering is correct and which not? Do some
> > 'absolute' measures exist, say, by measuring distances after
> > printing on paper?
>
> Two different perspectives:
>
> 1) From user's perspective, whatever tool they used to author the 
> document is "correct".  If it looks different in another edit, it is 
> "wrong" to the user.
Exactly.
>
> 2) From perspective of the ODF standard, this would depend on the 
> details of what markup was generated by the authoring editor.  But 
> generally ODF does not put tight constraints on document layout.  It 
> is not a fixed-page format like PDF.  So there may be differences 
> among applications.  They same is true of RTF and DOCX.
>
> > The example of lists (see above):
> > Was the implementation in AO33 correct and a bug was introduced in
> > AO34? Or, was it incorrect (and as such taken to LO40) and AO34 is
> > correct now? Or, perhaps the MS implementation is the right one? Who
> > is the authority to decide?
> >
>
> In this specific case I don't know.  What we typically would do is 
> discuss examples like this among the projects, look carefully at the 
> ODF markup and the standard and come to a conclusion.  If you are able 
> to post the source ODT documents I can investigate this more.
All files are here: http://ubuntuone.com/3uw68meC0diViYuGYYFWKh (name is 
the same as the last one, sorry).

Are on this list people, who can influence development of different 
office tools if we find here a bug?

Best
Milos

>
> Regards,
>
> -Rob
>
>
> > With best regards
> > Milos
> >
> > Dňa 18.04.2013 22:03, robert_weir at us.ibm.com wrote / napísal(a):
> > Hello Milos,
> >
> > This is very interesting.  Thanks for doing it.  As you know, AOO
> > and LO have a common ancestry.  So to track down the source of the
> > variation one approach would be to do a comparison of the rendering
> > of the common ancestor, OpenOffice.org 3.3.0, to the current AOO and
> > LO.  That would give a very good clue to what is happening here.
> >
> > Regards,
> >
> > -Rob
> >
> >
> > officeshots-bounces at nlnet.nl wrote on 04/16/2013 02:36:38 AM:
> >
> > > From: Milos Sramek <sramek.milos at gmail.com>
> > > To: Officeshots support and development <officeshots at nlnet.nl>
> > > Date: 04/16/2013 02:38 AM
> > > Subject: [Officeshots] Interoperability testing - first results
> > > Sent by: officeshots-bounces at nlnet.nl
> > >
> > > Hi,
> > >
> > > I have implemented the interoperability testing tools, which I
> > > mentioned in my previous mails. Currently without network access -
> > > that would be nice in the future. I convert to other office formats
> > > and to pdf by command line tools:
> > > LibreOffice - directly calling LO from command line
> > > Apache OO - using its "server mode and the DocumentConverter.py script
> > > MSO: using the OfficeConvert tool (MSO 2013, I used the version sent
> > > here by Pasqual - thank you, Pasqual for that)
> > >
> > > In the attached files you can see how are odf documents displayed by
> > > LO40, AO34 and MSO13. To enhance the differences, I overlayed the
> > > documents - black means perfect fit, red/cyan means poor fit (odt-
> > ov-srt.pdf).
> > >
> > > One can see there that the main interoperability problem of LO and
> > > AOO on one side and MSO on the other side is line and paragraph
> > > spacing - There are also other differences, but these are mostly
> > > 'shadowed" by the shifted lines. Do you have an idea why is it so?
> > > Ist it just buggy implementation or is it caused by possible
> > > ambiguity in interpretation of the ODF standard?
> > >
> > > Such differences in formating are well visible and, I think, are a
> > > hurdle in acceptance of ODF as a platform independent standard. Do
> > > you have an idea what to do with that? Talk to developers at AOO,
> > > LO, Microsoft and alsewhere and persuade them to implement ODF in a
> > > consistent way?
> > >
> > > best regards
> > > Milos
> > >
> > >
> > > Dňa 10.03.2013 16:12, Milos Sramek wrote / napísal(a):
> > > Thank you Rob, for pointing me to this tool.
> > >
> > > It looks like that its purpose is to analyze problems with opening a
> > > file in the binary format. I am more concerned with rendering. I is
> > > a common situation that diffrent tools render in a different way I
> > > would like to know how severe this is.
> > >
> > > Milos
> > >
> > > Dňa 08.03.2013 14:41, robert_weir at us.ibm.com wrote / napísal(a):
> > > officeshots-bounces at nlnet.nl wrote on 03/07/2013 04:50:46 PM:
> > >
> > > > From: Michiel Leenaars <michiel.ml at nlnet.nl>
> > > > To: Officeshots support and development <officeshots at nlnet.nl>,
> > > > Date: 03/07/2013 04:51 PM
> > > > Subject: Re: [Officeshots] interoperability testing
> > > > Sent by: officeshots-bounces at nlnet.nl
> > > >
> > > > Hi Milos,
> > > >
> > > > > Perhaps there is a chance in Bratislava to set up a Windows 
> machine with
> > > > > running MSO 2007/10/13 for our testing purposes. If you find that
> > > > > interesting, I  can ask Microsoft representatives here in 
> Bratislava, if
> > > > > they would be willing to donate the necessary licenses.
> > > >
> > > > I have some licences for testing machines for MSO 2007 for you. 
> John
> > > > Haug or Jim Thatcher might be able to provide copies for c2010 
> and 2013
> > > > - as he and Doug have done in prior instances - I will send their
> > > > contact details to you off list.
> > > >
> > > > There is no reason other document conversions shouldn't work 
> with minor
> > > > modifications, I think we designed it generically. Of course 
> there are
> > > > no validators for the binary formats for instance, but it would be
> > > > useful without already as Jos remarked.
> > > >
> > >
> > > Actually, have you seen this:
> > >
> > > http://msdn.microsoft.com/en-us/library/office/gg649868%
> > 28v=office.14%29.aspx
> > >
> > > This was beta with Office 2010.  I don't know if Microsoft took this
> > > further, but it could be useful for testing binary format documents.
> > >
> > > -Rob
> > >
> > >
> > > > OfficeConvert is able to already handle all conversions that MS 
> Office
> > > > performs, and the Officeshots python factory is trivial to 
> change. I
> > > > think the biggest part is for someone to go through the Officeshots
> > > > server code, and dig into the definition of file types. And of 
> course we
> > > > need to fit it into the UI.
> > > >
> > > > Anyone on this list fluent enough in PHP to go there?
> > > >
> > > > Best,
> > > > Michiel
> > > >
> > > > > Regarding the officeshots API: would it be possible to use it 
> also to do
> > > > > other types of conversions, not only odf->odf->pdf? Currently, 
> within a
> > > > > larger project, I work on a user study, which would enable for
> > > > > quantitative  evaluation of a "level of interoperability" 
> between office
> > > > > applications on the basis of various document standards. The 
> study is
> > > > > based on automated conversion of documents (similar to the 
> officeshots
> > > > > framework, only running locally). Conversion by AOO, LO an 
> Google Docs
> > > > > is already implemented this way (not a big deal, GD using 
> their API). I
> > > > > would, however like to convert the documents also by MSO. I 
> hope that
> > > > > using the officeshots API and the above mentioned machine it 
> would be
> > > > > possible.
> > > >
> > > > _______________________________________________
> > > > Officeshots mailing list
> > > > Officeshots at nlnet.nl
> > > > https://open.nlnet.nl/mailman/listinfo/officeshots
> > > >
> > >
> >
> > > _______________________________________________
> > > Officeshots mailing list
> > > Officeshots at nlnet.nl
> > > https://open.nlnet.nl/mailman/listinfo/officeshots
> >
> > >
> >
> > > --
> > > email & jabber: sramek.milos at gmail.com
> >
> > >
> >
> > > --
> > > email & jabber: sramek.milos at gmail.com
> > > [attachment "odt-srt.pdf" deleted by Robert Weir/Cambridge/IBM]
> > > [attachment "odt-ov-srt.pdf" deleted by Robert Weir/Cambridge/IBM]
> > > _______________________________________________
> > > Officeshots mailing list
> > > Officeshots at nlnet.nl
> > > https://open.nlnet.nl/mailman/listinfo/officeshots
> >
>
> > --
> > email & jabber: sramek.milos at gmail.com


-- 
email & jabber: sramek.milos at gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://open.nlnet.nl/pipermail/officeshots/attachments/20130423/523caab0/attachment-0001.html>


More information about the Officeshots mailing list