Auto detection of UTF-8

When a user save a csv or tab separeted file from Microsoft Excel or Apple Numbers, the encoding identifier is not always included in the file. The UTF-8 format has the benefit of containing, not just the latin alphabet but also, East European, Arabic, Chinese, Japanese and many, many others, all in one file type!

One of the standard requirement is to embed three characters in the start of the file, so reader – like Cacidi Extreme and Cacidi LiveMerge, can recognise that this is a UTF-8 file. These three characters is known as Byte-Order-Markers or, in short term, BOM.

A rising number of programs is skipping the BOM, when saving the file, for unknown reasons. Now Extreme and LiveMerge can auto detect when the BOM is missing in the file for UT-8 format.

Leave a Reply