Skip to content

Making Existing SQL Databases Unicode-capable

DataFlex supports using Unicode data in your applications. For developers running prior revisions of DataFlex (19.1 and earlier) on SQL databases who want to make their application Unicode-ready, some steps may need to be taken to make those databases able to appropriately handle Unicode data.

Below are the basic actions to prepare a DataFlex database environment to work with Unicode. Note that once databases are set to be Unicode-capable and Unicode data is stored in them, these changes cannot be reversed without the risk of losing information.

Preparing for a Unicode-capable Environment

To migrate an existing SQL database to become Unicode-capable, take the following high-level steps:

  • Analyze and migrate OEM tables to ANSI.
  • Convert databases to be Unicode-capable:
  • For SQL Server databases, convert all char/varchar columns to nchar/nvarchar.
  • For other databases (MySQL/PostgreSQL/DB2), migrate the database to a Unicode-capable character set/collation (for example, utf8mb4 or utf8).

Note: The DataFlex SQL drivers fully support both non-Unicode and Unicode databases. Migration is not required. Leaving databases as non-Unicode-capable simply means they cannot store Unicode data.

Analyze and Migrate OEM Tables

Existing SQL databases created by DataFlex versions prior to DataFlex 2021 may be in either ANSI or OEM format. ANSI tables support Unicode data; OEM tables do not. OEM tables are only supported in DataFlex 2021 and higher for backward compatibility.

To make your database Unicode-capable, migrate any OEM tables to ANSI. See the documentation on handling OEM data so you can adjust your applications accordingly:

  • OEM data in DataFlex (DevelopmentGuide/OEM_Data_in_DataFlex.md)

See also:

  • Table Character Format in DataFlex (DevelopmentGuide/Table_Character_Format_in_DataFlex_20.md)

Convert Databases to Unicode-capable

The exact process of converting a database to Unicode-capable depends on the SQL backend. Review what different databases require to properly handle Unicode data before conversion.

SQL Databases and Unicode

SQL database systems handle Unicode in two general ways:

  1. Special Unicode data types — e.g., SQL Server: non-Unicode types like char and varchar cannot store Unicode; the n types (nchar, nvarchar) store Unicode.
  2. Database or column character set/collation settings — e.g., MySQL: a database created with character set latin1 stores char/varchar columns as latin1 (ANSI) and cannot store Unicode characters; a database created with utf8mb4 stores char/varchar columns as UTF-8 and can store Unicode data.

Unicode Support in SQL Database Systems

How popular database systems enable Unicode:

SQL Server

  • Uses special Unicode data types (nchar, nvarchar, etc.).
  • Since SQL Server 2019, it is also possible to store UTF-8 data in char/varchar columns by using a UTF-8 collation for the column.

MySQL

  • Uses character set and collation settings (e.g., latin1, utf8, utf8mb4).

PostgreSQL

  • Uses database encoding settings (e.g., UTF8).

DB2

  • Supports both special Unicode types (Graphic types) and codeset settings.

See setup and testing guides for specific database systems:

  • How to set up and test using Aurora (DevelopmentGuide/How_to_set_up_and_test_using_Aurora.md)
  • How to set up and test using IBM DB2 (DevelopmentGuide/How_to_set_up_and_test_using_IBM_DB2.md)
  • How to set up and test using MariaDB (DevelopmentGuide/How_to_set_up_and_test_using_MariaDB.md)
  • How to set up and test using MySQL (DevelopmentGuide/How_to_set_up_and_test_using_MySQL.md)
  • How to set up and test using Oracle (DevelopmentGuide/How_to_set_up_and_test_using_Oracle.md)
  • How to set up and test using PostgreSQL (DevelopmentGuide/How_to_set_up_and_test_using_PostgreSQL.md)

Making SQL Server Databases Unicode-capable

To make SQL Server databases Unicode-capable, convert char/varchar columns to nchar/nvarchar.

Columns of the text data type should be changed to nvarchar(max) since Microsoft has indicated that text/ntext data types will be removed in a future version of SQL Server: - https://docs.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver15

Before changing data types, ensure all OEM tables have been migrated to ANSI: - OEM to ANSI Conversion Wizard (Tools/OEM_to_ANSI_Conversion_Wizard.md)

Options to change char/varchar columns to nchar/nvarchar:

  • Use the Table Editor in DataFlex Studio to manually change types per column.
  • Run SQL statements such as ALTER TABLE ... ALTER COLUMN ....
  • Use DataFlex SQL driver restructure commands to programmatically change column types.

The following program illustrates how to convert all char/varchar columns to nchar/nvarchar for all tables in a workspace (DataFlex example):

Procedure ConvertAllTablesToUnicodeNTypes
    Handle hTable
    Repeat
        Get_Attribute DF_FILE_NEXT_USED of hTable to hTable
        If (hTable > 0 and (hTable <> 50)) Begin
            Send ConvertTableToUnicodeNTypes hTable
        End
    Until (hTable = 0)
End_Procedure

Procedure ConvertTableToUnicodeNTypes Handle hTable
    Integer iNumColumns iColumn iDFType iNativeType
    String sTableName sLogicalName sDriverName sColumnName sNativeTypeName
    Boolean bTableOpenError

    // trap open table error so it's not fatal
    Move False to Err
    Send Ignore_All to Error_Object_Id
    Open hTable
    Move Err to bTableOpenError
    Send Trap_All of Error_Object_Id

    If (bTableOpenError = False) Begin
        Get_Attribute DF_FILE_ROOT_NAME of hTable to sTableName
        Get_Attribute DF_FILE_LOGICAL_NAME of hTable to sLogicalName
        Get_Attribute DF_FILE_DRIVER of hTable to sDriverName

        If (sDriverName = "MSSQLDRV") Begin
            Structure_Start hTable
            Get_Attribute DF_FILE_NUMBER_FIELDS of hTable to iNumColumns
            For iColumn from 1 to iNumColumns
                Get_Attribute DF_FIELD_NAME of hTable iColumn to sColumnName
                Get_Attribute DF_FIELD_TYPE of hTable iColumn to iDFType
                Get_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to iNativeType
                Get_Attribute DF_FIELD_NATIVE_TYPE_Name of hTable iColumn to sNativeTypeName

                If (iDFType = DF_ASCII) Begin
                    If (iNativeType = SQL_CHAR) Begin
                        // Convert char to nchar
                        Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WCHAR
                    End
                    If (iNativeType = SQL_VARCHAR) Begin
                        // Convert varchar to nvarchar
                        Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WVARCHAR
                    End
                End

                If (iDFType = DF_TEXT) Begin
                    If (iNativeType = SQL_LONGVARCHAR) Begin
                        // Convert text to ntext
                        Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WLONGVARCHAR
                    End
                    If (iNativeType = SQL_VARCHARMAX) Begin
                        // Convert varcharmax to nvarcharmax
                        Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WVARCHARMAX
                    End
                End
            Loop
            Structure_End hTable
        End
    End
End_Procedure

Making Other Databases Unicode-capable

Database systems without special Unicode data types use character set/collation settings to define a database as Unicode or non-Unicode. For example, in MySQL:

  • In a latin1 database, character data will be stored as latin1 (an ANSI code page) and the database will be non-Unicode.
  • In a utf8mb4 database, character data will be stored as UTF-8 and the database will be Unicode-capable.

Making a non-Unicode database Unicode-capable typically means changing the character set or collation of the database, table, and/or columns.

Important: Changing the collation of a database does NOT convert the existing stored data. Existing data must be converted — for example, export the data from the old collation and import it into the new collation, or use a conversion technique provided by the DBMS (for example, ALTER TABLE [table] CONVERT TO CHARACTER SET [charset] COLLATE [collate] in MySQL): - https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set

Other approaches:

  • Use ALTER TABLE ... ALTER COLUMN ... statements to change column-level collations or types.
  • Create a new Unicode database with the correct character set/collation and convert all tables from the non-Unicode database to the Unicode one. The DataFlex Studio SQL Conversion Wizard can help with this:
  • SQL Conversion Wizard (Tools/SQL_Conversion_Wizard.md)

Note: DataFlex SQL drivers work with both Unicode and non-Unicode databases and will perform character set conversions as needed.

Further Information on Other Databases and Unicode

Below are links and resources for handling Unicode in various database systems.

SQL Server 2019 UTF-8 Columns

SQL Server 2019 supports storing UTF-8 in char/varchar columns by using a UTF-8 collation on the column. See:

  • Introducing UTF-8 support for SQL Server: https://techcommunity.microsoft.com/t5/sql-server/introducing-utf-8-support-for-sql-server/ba-p/734928
  • Set or change the database collation: https://docs.microsoft.com/en-us/sql/relational-databases/collations/set-or-change-the-database-collation?view=sql-server-ver15

Further information on data types and character sets in SQL Server:

  • nvarchar vs varchar discussion: https://docs.microsoft.com/en-us/answers/questions/79294/nvarchar-vs-varchar.html
  • Differences of char, nchar, varchar, and nvarchar: https://www.mssqltips.com/sqlservertip/4322/sql-server-differences-of-char-nchar-varchar-and-nvarchar-data-types/
  • Native UTF-8 support discussion: https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/#guidance
  • Convert between text and varchar(max): https://stackoverflow.com/questions/10596871/convert-between-text-and-varcharmax-in-sql-server
  • Deprecation of text/ntext types: https://docs.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver15

MySQL

Converting a MySQL ANSI (latin1) database to Unicode (utf8mb4) resources:

  • Convert MySQL database from latin1 to utf8mb4 and handle special characters: https://dba.stackexchange.com/questions/123572/convert-mysql-database-from-latin1-to-utf8mb4-and-take-care-of-german-umlauts
  • MariaDB and MySQL character set conversion: https://fromdual.com/mariadb-and-mysql-character-set-conversion

Character set and collation documentation:

  • Unicode support in MySQL: https://dev.mysql.com/doc/refman/8.0/en/charset-unicode.html
  • Character sets for databases: https://dev.mysql.com/doc/refman/8.0/en/charset-database.html
  • ALTER TABLE character set conversion: https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set
  • Change collation of database/table/column (StackOverflow discussion): https://stackoverflow.com/questions/1294117/how-to-change-collation-of-database-table-column

MariaDB

Character set and collation documentation:

  • ALTER DATABASE (MariaDB): https://mariadb.com/kb/en/alter-database/
  • Setting character sets and collations: https://mariadb.com/kb/en/setting-character-sets-and-collations/

Oracle

Character set and migration documentation:

  • Character set migration (Oracle): https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/character-set-migration.html#GUID-53A6A5B1-6F79-44AE-9205-C32D9BA6B69A
  • Choosing a character set (Oracle): https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/choosing-character-set.html#GUID-BF26E01D-AB92-48FC-855A-69A5B3AF9A92
  • Older Oracle server docs on character sets: https://docs.oracle.com/cd/B10501_01/server.920/a96529/ch10.htm#1009580

PostgreSQL

Character set and encoding resources:

  • How to change the character encoding of a Postgres database (StackOverflow): https://stackoverflow.com/questions/5090858/how-do-you-change-the-character-encoding-of-a-postgres-database
  • Change database encoding using SQL or client tools: https://stackoverflow.com/questions/380924/how-can-i-change-database-encoding-for-a-postgresql-database-using-sql-or-phppga
  • PostgreSQL multibyte encoding information: https://www.postgresql.org/docs/9.3/multibyte.html
  • ALTER DATABASE (PostgreSQL): https://www.postgresql.org/docs/9.1/sql-alterdatabase.html
  • SET and client encoding: https://www.postgresql.org/docs/9.1/sql-set.html
  • Runtime client formatting configuration: https://www.postgresql.org/docs/9.1/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-FORMAT

See Also

  • Working with DataFlex 2021 (Working_with_DataFlex_2021.md)