In this article I'll show you how you can easily convert between many different document formats, these to be precise:
- in: html, odt, doc, docx, rtf, sdw, txt...
- out: html, odt, docx, rtf, pdf, txt, tex, xhtml...
and many more, check out this table. We'll use LibreOffice as a conversion engine.
LibreOffice Installation
On windows go to the download page, application installs like pretty much any windows takes 200 mb.
On linux either use package manager (recommended) or go to the download page
- Ubuntu: apt-get install libreoffice-core libreoffice-write
- RHEL: yum install libreoffice-core libreoffice-write
Note that on linux you'll need at least version 4 (current is 5.2, if you're on old distro, refer to this guide).
Post installation
After installation is done, make sure that Caché can access "soffice" application
- On windows add "C:\Program Files (x86)\LibreOffice 5\program" (may differ due to version installed) to system PATH, if you're running Caché with default settings, or to the user PATH under which you run Caché if you modified that setting. Guide.
- If you installed via package manager, "soffice" should be in path, otherwise add to PATH manually
Use
Import code into Caché and you're ready to goconvert.
Call from the terminal:
set sc = ##class(Converter.LibreOffice).convert(source, target, format) write $System.Status.GetErrorText(sc)
Where:
- source - file to convert
- target - result file
- format - specification for target file. Possible values: docx,html,mediawiki,csv,pptx,ppt,wmf,emf,svg,xlsx,xls. More possible values here.
Example
To convert doc into docx call:
set sc = ##class(Converter.LibreOffice).convert("C:\temp\1.doc", "C:\temp\1.docx", "docx") write sc >1
Code
Here's some interesting snippets:
/// Get path to libreoffice/soffice ClassMethod getSO() { if $$$isWINDOWS { set path = "soffice" } else { set path = "export HOME=/tmp && unset LD_LIBRARY_PATH && soffice" } return path }
Note, that in linux we additionally set HOME variable (it should be writable to Caché user) and remove LD_LIBRARY_PATH variable. This is important for LibreOffice (particularly in web context). If you need LD_LIBRARY_PATH variable, add additional call to reset it later.
/// Convert a file %1 into %2 format and place it into %3 directory (name equal to %1 name, extension = %2) using %4 - LibreOffice Parameter COMMAND = "%4 --headless --writer --convert-to %2 --outdir %3 %1"; /// Convert source into format and place it into targetDir ClassMethod executeConvert(source, targetDir, format) As %Status { // Libreoffice needs targetDir without last slash set:$e(targetDir,*)=..#SLASH targetDir = $e(targetDir, 1, *-1) set timeout = 100 set cmd = $$$FormatText(..#COMMAND, source, format, targetDir, ..getSO()) return ..execute(cmd, timeout) }
$$$FormatText macro can be used to make OS commands more readable (use it instead of concatenation!).
Conclusions
If you need to convert between a large number of document formats this project can be useful.
Great post! Very useful, thanks
This is a nice tool. The link under Use chapter is not working. Please correct that.
Thank you! Fixed the link.
Hi,
instead of : path = "export HOME=/tmp && unset LD_LIBRARY_PATH && soffice"
you can use path = "export HOME= $(eval echo ~$(id -u -n)) && unset LD_LIBRARY_PATH && soffice".
That way soffice will use the home directory of the cache user running the job. It could be usefull, specialy if you need to add or modify parameters into the registrymodifications.xcu file.
HI. I have tried this but getting this error even after giving all rights to directory
set sc = ##class(Converter.LibreOffice).convert("C:\Temp\a.csv", "C:\TempC:\InterSystems\Ensemble2018\mgr\Temp\b.xlsx", ".xlxs")
ERROR #5001: Error moving 'C:\InterSystems\Ensemble2018\mgr\Temp\145158\a..xlxs1C:\TempC:\InterSystems\Ensemble2018\mgr\Temp\b.xlsx' with code: -2
Replace with: C:\InterSystems\Ensemble2018\mgr\Temp\b.xlsx
The article is considered as InterSystems Data Platform Best Practice.
Has anyone had success using this or a similar solution running Health Connect/HealthShare on an AIX 7.2 platform? LibreOffice does not provide AIX-specific packages.
No. But what I did was make a html table and saved in a xls file.
That works. Not most elegant solution but does work
Appreciate the response but what I'm looking to do is convert RTF to PDF and then base64 encode it.
I think virtual PDF printer can help in this situation.
From a cursory search LibreOffice does not seem to be ported to AIX. You can try to compile it from source with AIX Toolbox for Linux Applications.
Hi, does anyone have an example to convert a file using LibreOffice, html file upload and JavaScript? Thanks.
Convert a file using LibreOffice code is available in the article above, file upload example is available here.
Thanks, that works great.
Has anyone ever converted an HL7-embedded Base64-encoded PDF TO (HL7-embedded) RTF?
This looks great, but i noticed the linked LibreOffice conversion table seems to indicate PDF can only be exported (conversion target), not imported (conversion source)
Honestly interested in PDF of any kind; "converting" B64 PDF to a flatfile *.pdf is trivial, but it looks like using *.pdf as a source is not supported for LibreOffice method?
FWIW: Ensemble v2016.2 on Windows
I'm not really sure about the use case.
PDF is a publish format (used to present documents and make sure that they look the same everywhere).
RTF is a simple editing format.
Generally, you can easily convert edit formats into publish formats, but reverse action is impossible.
Furthermore, PDF is far more feature rich than RTF so not all PDF features could be converted into corresponding RTF features.
What are you trying to do?
Hi Edward,
Thank you for posting this article. We run Ensemble on AIX. It appears this library runs on windows and Linux, but doesn't look like it will run on AIX. Do you have any suggestions for us?
Thanks,
Richard Gibbs 847-208-2099
You can try to compile from source.
If you get access error on Linux:
javaldx failed! Warning: failed to read path from javaldx LibreOffice 7.3 - Fatal Error: The application cannot be started. User installation could not be completed. LibreOffice user installation could not be processed due to missing access rights. Please make sure that you have sufficient access rights for the following location and restart LibreOffice.
Add this to LibreOffice parameters:
set args($i(args)) = "-env:UserInstallation=file:///tmp/libreofficehome/"
where /tmp/libreofficehome is any empty folder InterSystems IRIS has write access to.