[TYPO3-doc] Converting manual.sxw to reST using OpenOffice daemon

Thu Jun 14 11:05:30 CEST 2012

[Fabien Udriot] wrote & schrieb:

>Hi Martin,
>
>Do you have any snippet that I can use to convert a manual.sxw to reST taking advantage of the
>OpenOffice daemon?

Yes. I try to explain below.

>
>I noticed a cron job [1] on the server that is used for the TER manuals. Within the script, I
>spotted the function _convertsxw2html_ involved in the OOo manual conversion... However, the whole
>script goes over my scope for my project [2].
>
>I would like to be able to transform only one manual.sxw by the mean of a command that takes as
>input a manual.sxw and gives as output the reST files? Do we have already something like that?
>
>If not, an idea would be to re-factor the script a little bit to split up the functionalities?

It is already split to some extend. I think I can't split that
further.

>
>Regards,
>
>Fabien
>
>[1] /home/mbless/HTDOCS/render-ter-extensions/010_cronjob_get_new_from_ter.py
>[2] http://preview.docs.typo3.org/getthedocs/

Rough Layout for a better README file or chapter in the Wiki:
=============================================================

The whole stuff is already in our RestTools.git repository. You may
click here as well:
http://srv123.typo3.org/~mbless/git.typo3.org/Documentation/RestTools.git/RenderOfficialDocsFirsttime/

It works like this:

1. Start with 'manual.sxw'

2. Save as 'manual.html'

3. Copy 'manual.html' to 'manual-cleaned.html' and thereby remove one 
or more <sdfield>...</sdfield> tags

4. Use 'tidy(.exe)' to generate 'manual-from-tidy.html' from 
'manual-cleaned.html'

5. Parse 'manual-from-tidy.html' to 'manual.rst'

6. You may beautify 'manual.rst' a bit and make sure that no more than
two empty lines are in a row

7. Generate 'index.html' from 'manual.rst'

8. Split 'index.html' into a temp-structure suitable for Sphinx

9. Copy the temp-Sphinx-structure and thereby rename files and folders
and create or copy the extra files we need (.gitignore, Make, conf.py,
...)

10. Done.

This sequence of steps 1 to 10 is done in one process by the script
'1_do_the_work.py'.

Commands on Linux (I think ...):
--------------------------------

1. start with manual.sxw

2. python documentconverter.py <infile>   <outfile>
2. python documentconverter.py manual.sxw manual.html

3. Do that manually or
3. python copyclean.py manual.html manual-clened.html
If module 'argparse' is missing get that from a 2.7.x Python and place
it in the same directory.

4. tidy -asxhtml -utf8 -f errorfile.txt -o manual-from-tidy.html
manual-cleaned.html
Step 4 will create a valid xhtml file.

5. python ooxhtml2rst.py  manual-from-tidy.html  manual.rst

6. optional:
   python normalize_empty_lines.py  manual.rst  temp.rst
   cp temp.rst manual.rst

The remaining steps go something like this - see '1_do_the_work.py':

 slice_to_numbered_files.main(srcfile,tempdir)

write_sphinx_structure.main(tempdir,finalsourcedir,srcdirimages,verbose=0)

Windows
-------

2. To write the manual.html I could use:
   soffice --headless -convert-to html -outdir . manual.sxw

Everything else works like on Linux.

More info about the OpenOffice headless for conversion:
-------------------------------------------------------

See the Google Wiki
http://code.google.com/p/openmeetings/wiki/OpenOfficeConverter#Install_Open_Office_Service_on_Debian/%28K%29Ubuntu_%28versions_%3E_2:

pyodconverter:
Script to talk to OpenOffice headless:
https://github.com/mirkonasato/pyodconverter
Listens on port 8100 on localhost. The port has to be adjusted.
Example:
  python DocumentConverter.py /tmp/doc/manual.sxw /tmp/doc/manual.html

HTH, Martin

-- 
Certified TYPO3 Integrator | TYPO3 Documentation Team Member

http://mbless.de