Making Existing SQL Databases Unicode-capable
DataFlex supports using Unicode data in your applications. For developers running prior revisions of DataFlex (19.1 and earlier) on SQL databases who want to make their application Unicode-ready, some steps may need to be taken to make those databases able to appropriately handle Unicode data.
Below are the basic actions to prepare a DataFlex database environment to work with Unicode. Note that once databases are set to be Unicode-capable and Unicode data is stored in them, these changes cannot be reversed without the risk of losing information.
Preparing for a Unicode-capable Environment
To migrate an existing SQL database to become Unicode-capable, take the following high-level steps:
- Analyze and migrate OEM tables to ANSI.
- Convert databases to be Unicode-capable:
- For SQL Server databases, convert all
char/varcharcolumns tonchar/nvarchar. - For other databases (MySQL/PostgreSQL/DB2), migrate the database to a Unicode-capable character set/collation (for example,
utf8mb4orutf8).
Note: The DataFlex SQL drivers fully support both non-Unicode and Unicode databases. Migration is not required. Leaving databases as non-Unicode-capable simply means they cannot store Unicode data.
Analyze and Migrate OEM Tables
Existing SQL databases created by DataFlex versions prior to DataFlex 2021 may be in either ANSI or OEM format. ANSI tables support Unicode data; OEM tables do not. OEM tables are only supported in DataFlex 2021 and higher for backward compatibility.
To make your database Unicode-capable, migrate any OEM tables to ANSI. See the documentation on handling OEM data so you can adjust your applications accordingly:
- OEM data in DataFlex (DevelopmentGuide/OEM_Data_in_DataFlex.md)
See also:
- Table Character Format in DataFlex (DevelopmentGuide/Table_Character_Format_in_DataFlex_20.md)
Convert Databases to Unicode-capable
The exact process of converting a database to Unicode-capable depends on the SQL backend. Review what different databases require to properly handle Unicode data before conversion.
SQL Databases and Unicode
SQL database systems handle Unicode in two general ways:
- Special Unicode data types — e.g., SQL Server: non-Unicode types like
charandvarcharcannot store Unicode; thentypes (nchar,nvarchar) store Unicode. - Database or column character set/collation settings — e.g., MySQL: a database created with character set
latin1storeschar/varcharcolumns aslatin1(ANSI) and cannot store Unicode characters; a database created withutf8mb4storeschar/varcharcolumns as UTF-8 and can store Unicode data.
Unicode Support in SQL Database Systems
How popular database systems enable Unicode:
SQL Server
- Uses special Unicode data types (
nchar,nvarchar, etc.). - Since SQL Server 2019, it is also possible to store UTF-8 data in
char/varcharcolumns by using a UTF-8 collation for the column.
MySQL
- Uses character set and collation settings (e.g.,
latin1,utf8,utf8mb4).
PostgreSQL
- Uses database encoding settings (e.g.,
UTF8).
DB2
- Supports both special Unicode types (Graphic types) and codeset settings.
See setup and testing guides for specific database systems:
- How to set up and test using Aurora (DevelopmentGuide/How_to_set_up_and_test_using_Aurora.md)
- How to set up and test using IBM DB2 (DevelopmentGuide/How_to_set_up_and_test_using_IBM_DB2.md)
- How to set up and test using MariaDB (DevelopmentGuide/How_to_set_up_and_test_using_MariaDB.md)
- How to set up and test using MySQL (DevelopmentGuide/How_to_set_up_and_test_using_MySQL.md)
- How to set up and test using Oracle (DevelopmentGuide/How_to_set_up_and_test_using_Oracle.md)
- How to set up and test using PostgreSQL (DevelopmentGuide/How_to_set_up_and_test_using_PostgreSQL.md)
Making SQL Server Databases Unicode-capable
To make SQL Server databases Unicode-capable, convert char/varchar columns to nchar/nvarchar.
Columns of the text data type should be changed to nvarchar(max) since Microsoft has indicated that text/ntext data types will be removed in a future version of SQL Server:
- https://docs.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver15
Before changing data types, ensure all OEM tables have been migrated to ANSI: - OEM to ANSI Conversion Wizard (Tools/OEM_to_ANSI_Conversion_Wizard.md)
Options to change char/varchar columns to nchar/nvarchar:
- Use the Table Editor in DataFlex Studio to manually change types per column.
- Run SQL statements such as
ALTER TABLE ... ALTER COLUMN .... - Use DataFlex SQL driver restructure commands to programmatically change column types.
The following program illustrates how to convert all char/varchar columns to nchar/nvarchar for all tables in a workspace (DataFlex example):
Procedure ConvertAllTablesToUnicodeNTypes
Handle hTable
Repeat
Get_Attribute DF_FILE_NEXT_USED of hTable to hTable
If (hTable > 0 and (hTable <> 50)) Begin
Send ConvertTableToUnicodeNTypes hTable
End
Until (hTable = 0)
End_Procedure
Procedure ConvertTableToUnicodeNTypes Handle hTable
Integer iNumColumns iColumn iDFType iNativeType
String sTableName sLogicalName sDriverName sColumnName sNativeTypeName
Boolean bTableOpenError
// trap open table error so it's not fatal
Move False to Err
Send Ignore_All to Error_Object_Id
Open hTable
Move Err to bTableOpenError
Send Trap_All of Error_Object_Id
If (bTableOpenError = False) Begin
Get_Attribute DF_FILE_ROOT_NAME of hTable to sTableName
Get_Attribute DF_FILE_LOGICAL_NAME of hTable to sLogicalName
Get_Attribute DF_FILE_DRIVER of hTable to sDriverName
If (sDriverName = "MSSQLDRV") Begin
Structure_Start hTable
Get_Attribute DF_FILE_NUMBER_FIELDS of hTable to iNumColumns
For iColumn from 1 to iNumColumns
Get_Attribute DF_FIELD_NAME of hTable iColumn to sColumnName
Get_Attribute DF_FIELD_TYPE of hTable iColumn to iDFType
Get_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to iNativeType
Get_Attribute DF_FIELD_NATIVE_TYPE_Name of hTable iColumn to sNativeTypeName
If (iDFType = DF_ASCII) Begin
If (iNativeType = SQL_CHAR) Begin
// Convert char to nchar
Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WCHAR
End
If (iNativeType = SQL_VARCHAR) Begin
// Convert varchar to nvarchar
Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WVARCHAR
End
End
If (iDFType = DF_TEXT) Begin
If (iNativeType = SQL_LONGVARCHAR) Begin
// Convert text to ntext
Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WLONGVARCHAR
End
If (iNativeType = SQL_VARCHARMAX) Begin
// Convert varcharmax to nvarcharmax
Set_Attribute DF_FIELD_NATIVE_TYPE of hTable iColumn to SQL_WVARCHARMAX
End
End
Loop
Structure_End hTable
End
End
End_Procedure
Making Other Databases Unicode-capable
Database systems without special Unicode data types use character set/collation settings to define a database as Unicode or non-Unicode. For example, in MySQL:
- In a
latin1database, character data will be stored aslatin1(an ANSI code page) and the database will be non-Unicode. - In a
utf8mb4database, character data will be stored as UTF-8 and the database will be Unicode-capable.
Making a non-Unicode database Unicode-capable typically means changing the character set or collation of the database, table, and/or columns.
Important: Changing the collation of a database does NOT convert the existing stored data. Existing data must be converted — for example, export the data from the old collation and import it into the new collation, or use a conversion technique provided by the DBMS (for example, ALTER TABLE [table] CONVERT TO CHARACTER SET [charset] COLLATE [collate] in MySQL):
- https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set
Other approaches:
- Use
ALTER TABLE ... ALTER COLUMN ...statements to change column-level collations or types. - Create a new Unicode database with the correct character set/collation and convert all tables from the non-Unicode database to the Unicode one. The DataFlex Studio SQL Conversion Wizard can help with this:
- SQL Conversion Wizard (Tools/SQL_Conversion_Wizard.md)
Note: DataFlex SQL drivers work with both Unicode and non-Unicode databases and will perform character set conversions as needed.
Further Information on Other Databases and Unicode
Below are links and resources for handling Unicode in various database systems.
SQL Server 2019 UTF-8 Columns
SQL Server 2019 supports storing UTF-8 in char/varchar columns by using a UTF-8 collation on the column. See:
- Introducing UTF-8 support for SQL Server: https://techcommunity.microsoft.com/t5/sql-server/introducing-utf-8-support-for-sql-server/ba-p/734928
- Set or change the database collation: https://docs.microsoft.com/en-us/sql/relational-databases/collations/set-or-change-the-database-collation?view=sql-server-ver15
Further information on data types and character sets in SQL Server:
- nvarchar vs varchar discussion: https://docs.microsoft.com/en-us/answers/questions/79294/nvarchar-vs-varchar.html
- Differences of
char,nchar,varchar, andnvarchar: https://www.mssqltips.com/sqlservertip/4322/sql-server-differences-of-char-nchar-varchar-and-nvarchar-data-types/ - Native UTF-8 support discussion: https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/#guidance
- Convert between
textandvarchar(max): https://stackoverflow.com/questions/10596871/convert-between-text-and-varcharmax-in-sql-server - Deprecation of
text/ntexttypes: https://docs.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-ver15
MySQL
Converting a MySQL ANSI (latin1) database to Unicode (utf8mb4) resources:
- Convert MySQL database from
latin1toutf8mb4and handle special characters: https://dba.stackexchange.com/questions/123572/convert-mysql-database-from-latin1-to-utf8mb4-and-take-care-of-german-umlauts - MariaDB and MySQL character set conversion: https://fromdual.com/mariadb-and-mysql-character-set-conversion
Character set and collation documentation:
- Unicode support in MySQL: https://dev.mysql.com/doc/refman/8.0/en/charset-unicode.html
- Character sets for databases: https://dev.mysql.com/doc/refman/8.0/en/charset-database.html
- ALTER TABLE character set conversion: https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set
- Change collation of database/table/column (StackOverflow discussion): https://stackoverflow.com/questions/1294117/how-to-change-collation-of-database-table-column
MariaDB
Character set and collation documentation:
- ALTER DATABASE (MariaDB): https://mariadb.com/kb/en/alter-database/
- Setting character sets and collations: https://mariadb.com/kb/en/setting-character-sets-and-collations/
Oracle
Character set and migration documentation:
- Character set migration (Oracle): https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/character-set-migration.html#GUID-53A6A5B1-6F79-44AE-9205-C32D9BA6B69A
- Choosing a character set (Oracle): https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/choosing-character-set.html#GUID-BF26E01D-AB92-48FC-855A-69A5B3AF9A92
- Older Oracle server docs on character sets: https://docs.oracle.com/cd/B10501_01/server.920/a96529/ch10.htm#1009580
PostgreSQL
Character set and encoding resources:
- How to change the character encoding of a Postgres database (StackOverflow): https://stackoverflow.com/questions/5090858/how-do-you-change-the-character-encoding-of-a-postgres-database
- Change database encoding using SQL or client tools: https://stackoverflow.com/questions/380924/how-can-i-change-database-encoding-for-a-postgresql-database-using-sql-or-phppga
- PostgreSQL multibyte encoding information: https://www.postgresql.org/docs/9.3/multibyte.html
- ALTER DATABASE (PostgreSQL): https://www.postgresql.org/docs/9.1/sql-alterdatabase.html
- SET and client encoding: https://www.postgresql.org/docs/9.1/sql-set.html
- Runtime client formatting configuration: https://www.postgresql.org/docs/9.1/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-FORMAT
See Also
- Working with DataFlex 2021 (Working_with_DataFlex_2021.md)