Unicode and Collation

When Unicode support was added in DataFlex 20.0, the way collation is handled in DataFlex changed. Instead of using the familiar df_collate.cfg installed with each language and codes from the also familiar ASCII table, the DataFlex runtime uses the ICU Library for string comparisons and sorting.

ICU is a library that compares strings according to the conventions and standards of a particular language, region, or country. ICU's collation is based on the Unicode Collation Algorithm, plus locale-specific comparison rules from the Common Locale Data Repository, a comprehensive source for this type of data.

DataFlex uses the ICU library and the locale set in the DF_LOCALE_CODE attribute, which defaults to the language of the operating system, to determine the collation to be applied and used in sorting and string operations.

Collations may vary even within the same language, including the English language. It is not set based on the code identifying each character/symbol on the Unicode table, but on an algorithm that takes into account all components that make a collation as described in the ICU paragraph above.

Remember: All languages use a single Unicode table. The range of characters/symbols used in a language varies. The Basic Latin Unicode Block includes all characters used in the English language, but other languages may use characters from various Unicode Blocks. For more information about Unicode, visit Unicode’s FAQ page.

To better understand Unicode and collating in DataFlex, visit Unicode 101 and access the Migrating to DataFlex 2021 course on the DataFlex Learning Center. Part 2 of that course is dedicated to the subject.