Skip to content

LibreOffice goes BCP 47

This week I accomplished an important milestone of the major rewrite that – apart from the daily work such as fixing bugs, coding small enhancements and reviewing patches – I'm working on since 9 months or so. In current master LibreOffice finally is able to transparently handle arbitrary (if valid) BCP 47 language tags and fully support the fo:script and *:rfc-language-tag attributes defined in ODF 1.2.

So what does this mean? It means that you'll be able to get your language in.

It means that already supported languages or writing scripts that so far used a kludge to squeeze them into ISO 639 language codes and ISO 3166 country codes only, are finally supported using the proper language tags registered with IANA. For example:

ca-ES-valencia Catalan Valencian
The Valencian variant of Catalan previously used the ca-XV kludge where XV is a reserved for private use ISO 3166 code, which meant it could be used for UI translation purposes but not for document content. This is now stored in ODF as style:rfc-language-tag='ca-ES-valencia' attributes.
sr-Latn Serbian Latin
Previously the deprecated sh kludge was used to differentiate between Serbian Latin and sr Serbian Cyrillic. Serbian Latin in Serbia sr-Latn-RS is now stored in ODF as fo:language='sr' fo:script='Latn' fo:country='RS' attributes.

It also means that a tag en-GB-oed can be and now is already supported, including the corresponding language list entry already being added to the list. This is English, Oxford English Dictionary spelling, which is mandatory for UN documents and as it seems also used for EU documents. LibreOffice will be the first free office suite to support spell-checkers with Oxford English Dictionary spelling along with en-GB and en-US spelling at the same time.

Transparently handle arbitrary tags means that when a document is read that contains language attribution not specifically known to LibreOffice (i.e. does not have an entry in the language list), when positioning the cursor on or selecting such text the language tag is shown in the status bar and in the language list of the character attribution so you will not see Unknown or, even worse, nothing or the system locale's language. If a dictionary was installed that handled such tag then it could be used for spell-checking. Transparently of course also means that the tag will be stored again to ODF when saving the document so the attribution is not lost.

The following screenshot shows an example of a document that uses the tag de-DE-1901 to designate German, German variant, traditional orthography:

Screenshot of LibreOffice displaying a BCP 47 language tag.
Screenshot of LibreOffice displaying a BCP 47 language tag.

 

I'm extremely glad to have this step ready just in time and of course I'll talk about it at the LibreOffice Conference 2013 at Milano, so to get all the details please join me and attend Getting you language in on Thursday, 26 September at 15:30 in Sala Alfa.

LibreOffice Milano Conference 2013 logo

If you are interested in the technical details of BCP 47 language tags I recommend my bookmarks as a starting point.

 

characterize.vim - Unicode character metadata

Just came across and installed the characterize.vim plugin that in Vim modernizes the ga command to reveal character representation. Displayed are now

  • Unicode character names: U+00A9 COPYRIGHT SYMBOL
  • Vim digraphs (type after <C-K> to insert the character): Co, cO
  • Emoji codes: :copyright:
  • HTML entities: &copy;

So for example with the cursor on a letter ö hitting ga in command mode displays in the status line

<ö> 246, \366, U+00F6 LATIN SMALL LETTER O WITH DIAERESIS, ^Ko:, &ouml;

Neat.

Upgrading Fedora from F16 to F17 with /usr being a separate Logical Volume

Due to legacy reasons I have a separate volume mounted under /usr, preupgrade and anaconda didn't complain about and seamlessly installed all necessary upgrades. However, after having booted with the F17 kernel I was unpleasantly surprised..

After "Loading initial ramdisk ..." I got "Dropping to debug shell." and was dumped into a dracut prompt. The last dmesg line read "dracut Warning:" and nothing more. The previous lines indicated that the root volume was mounted successfully. Digging around revealed that one of the next steps should had been the "/usr merge", but not without the /usr volume ...

cat /etc/fstab showed

/dev/mapper/VG-usr /sysroot/usr ext4 defaults 1 2
so it seems that had been detected, but there was no /dev/mapper/VG-usr, only /dev/mapper/VG-root (plus control and VG-swap and luks-... because the partition where that volume group resides is encrypted)

At least Upgrading Fedora using yum hinted at some ideas what needed to be changed.

If you have /usr on LVM, MD raid or DM raid, make sure the kernel command line has either all settings like "rd.lvm.lv=..." to ensure the /usr device is accessible in dracut or just remove all restrictions like "rd.lvm...", "rd.md...", "rd.dm...".
For further dracut kernel command line options see the man page.

So, reboot and edit the grub entry for the f17 kernel (press e in grub and possibly you need to enter superuser username and password if you specified one for grub), in the kernel parameters (linux /vmlinuz-...) is something like

rd.lvm.lv=VG/root
or whatever you named the volume group and logical volume for your root filesystem, there add another parameter
rd.lvm.lv=VG/usr
with usr being the logical volume name that is mounted under /usr and press Ctrl+X to boot. At least for me booting was successful then and all /bin, /sbin, /lib and /lib64 content was moved into /usr/bin/, /usr/sbin/ /usr/lib/ and /usr/lib64/ and the symbolic links setup in the / directory.

Now to make that change permanent it has to go into the grub configuration, so as root edit /etc/default/grub and add the rd.lvm.lv=VG/usr to the GRUB_CMDLINE_LINUX line and execute the command grub2-mkconfig -o /boot/grub2/grub.cfg

Additionally, when executing grub2-mkconfig I got lots of

/usr/sbin/grub2-probe: warning: the device.map entry `hd0,1' is invalid. Ignoring it. Please correct or delete your device.map.
In /boot/grub2/device.map there was (hd0,1) /dev/sda1, apparently a leftover from older times, removing that line and keeping only (hd0) /dev/sda solved it.

Somehow the updated configuration gave me a different menu than the initial one and I also got three "error: file not found", apparently one for each insmod line of the menu entry chosen, followed by a "press any key to continue", but it booted. After a grub2-install /dev/sda then the grub menu came up in a different font and I had to enter the boot-superuser username and password but the errors were gone.. WTF? it seems that during upgrade the /boot/grub2/grub.cfg along with the new grub2-2.0-0.39 was not properly installed. So for the default Fedora entry I added an --unrestricted option to /etc/grub.d/10_linux in the line that contains menuentry ... gnulinux-simple-$boot_device_id before the \$menuentry_id_option and again ran grub2-mkconfig -o /boot/grub2/grub.cfg ... uffz.. finally.

There are known issues with grub2 passwords, see Setting a password for interactive edit mode and it appears to me that I was lucky having set an edit password before the upgrade and installing the new grub2 ...

Still there is some "error: file ..." displayed before the boot menu pops up, but it is displayed for such a very short time that I couldn't read it and have no idea so far what it is about. But, things work..

What's next?

Going to FOSDEM

As usual I'll attend this year's FOSDEM to spread and gather knowledge, meet people and have fun.

I'll definitely stick around the LibreOffice Devroom on Sunday if you want to meet me and I would be pleased if you come visit my talk Language Tags - or, what is BCP 47 and why would we want it that is scheduled to start at 14:20

See you at Bruxelles!

Filho's Infographic of Debian with license exception does not allow use of TDF or LibreOffice

First when I saw Claudio Filho's new Infographic of Debian Understanding Debian I thought "great work", but then I spotted the license section on the bottom left of the graphic that puts the work under CC BY-NC-SA, but with a restriction

with the exception clause (*): Is forbidden to use, to reference or to use of any material of TDF or LibreOffice in this material or derivatives.
Trying to parse that it seems he wants to say that TDF or LibreOffice or any of its materials may not be used or mentioned in derivatives of his work, i.e. the LibreOffice logo in the Sources section. I'd call that an unfree license. (Yes, the NC in CC is that anyway, but IMHO acceptable in art works).

However, knowing that Claudio Filho is an active supporter of Apache OpenOffice I wonder what good that restriction shall actually do. This is not lowering barriers between both projects. And it certainly is not the right way to promote the spirit of Free Software.

I'm embedding the graphic here, convinced that the entire article does not form a derivative work referencing LibreOffice and embedding would be forbidden.

Infographic of Debian