Skip to content

How to Set Up and Test Using IBM DB2

DB2 Unicode and Non-Unicode Databases

In DB2, a database is defined as either a Unicode database or a non-Unicode database. This is determined with the CODESET option in the CREATE DATABASE command. Once a database is created, the CODESET (and thus Unicode/non-Unicode database) cannot be changed.

Example

  1. Create DATABASE ORDER199ANSI USING CODESET ISO8859-1 TERRITORY US
  2. Create DATABASE ORDER199 USING CODESET UTF-8 TERRITORY US

  3. ORDER199ANSI is a non-Unicode database. You cannot store Unicode data in such a database. Data is stored as ANSI or, if you choose an OEM charset, as OEM.

  4. ORDER199 is a Unicode database.

  5. CODESET UTF-8 is the default.

  6. The DB2 documentation is unclear about other codesets defining a Unicode database. The page DB2 Codesets Documentation mentions code page 1200/16-bit Unicode, but code page 1200 is not used anywhere.

Unicode Data in a DB2 Unicode Database

In a DB2 Unicode database, there are two approaches to store Unicode data:

  • Use special Unicode types (graphic types).
  • Use char/varchar types to store Unicode data.

Both methods exist in parallel. The graphic types are an older way to support Unicode, while using the char/varchar types is a more modern approach.

  • Most non-Unicode databases cannot store Unicode data. The graphic types are not supported in most non-Unicode databases.
  • There are some country-specific code pages for China, Japan, etc., that do support Unicode characters in a non-Unicode database.
  • (See the Group columns in: DB2 Codesets Documentation)

Graphic Types

In a DB2 Unicode database, the following special Unicode types are available:

  • Graphic
  • Vargraphic
  • Long Vargraphic

These types store data as UCS-2. They are comparable to the SQL Server NCHAR, NVARCHAR, and NVARCHAR(max) types, respectively.

These types are only available in DB2 Unicode databases and cannot be created in non-Unicode databases.

Char/Varchar Types in DB2 Unicode Database (UTF-8)

In a DB2 Unicode database, character data types (like char, varchar, long varchar) can store Unicode characters. Data is stored as UTF-8.

Char/varchar types in a DB2 Unicode database have, besides a length, a string unit specification.

Example

CREATE TABLE "DB2ADMIN"."TestUnicodeTable" (
    "ID" INTEGER NOT NULL DEFAULT 0,
    "Varchar10Octets" VARCHAR(10 OCTETS) NOT NULL DEFAULT '',
    "Varchar10Units32" VARCHAR(10 CODEUNITS32) NOT NULL DEFAULT '',
    "VarGraphic10Units16" VARGRAPHIC(10 CODEUNITS16) NOT NULL DEFAULT ''
);

Note:

  • OCTETS == Bytes

This string unit specification (OCTETS, CODEUNITS16, CODEUNITS32) can only be done in Unicode databases. In non-Unicode databases, the string unit specification is not possible.

String Units

  • VARCHAR type can have string units OCTETS (== bytes) (default) or CODEUNITS32.
  • VARGRAPHIC type can have string units CODEUNITS16 (default) or CODEUNITS32.
  • VARCHAR(10 OCTETS) can contain 10 single-byte UTF-8 characters. For example, 'Müller7890' will not fit because as a UTF-8 string, it is 11 bytes.
  • VARCHAR(10 CODEUNITS32) can contain 10 4-byte characters. For example, 10 smiley characters of 4 bytes each.
  • VARGRAPHIC(10 CODEUNITS16) can contain 10 2-byte characters; 10 2-byte Unicode chars, but only 5 4-byte chars (smileys).
  • VARGRAPHIC(10 CODEUNITS32) can contain 10 4-byte characters. For example, 10 smiley characters of 4 bytes each.

DataFlex DB2 Driver (DB2_DRV) 20.0

Non-Unicode Database

  • DB2_DRV supports non-Unicode databases, i.e., databases with an ANSI or OEM codeset.

Unicode Database

  • DB2_DRV can read/write Unicode data to the graphic type columns, including composite characters (smileys).
  • DB2_DRV can also read/write Unicode to the char/varchar columns.

The driver will handle the various string unit sizes (Octets, Codeunits32) when reading/writing data to char/varchar columns.

Example

A varchar(10 octets) column can contain:

  • 10 ASCII characters: 'abcdefghij'
  • 5 accented characters: 'üüüüü'
  • 5 Unicode characters: 'фальл'
  • 2 smileys: '��'

A varchar(10 codeunits32) column can contain:

  • 10 ASCII characters: 'abcdefghij'
  • 10 accented characters: 'üüüüüüüüüü'
  • 10 Unicode characters: 'фальлфальл'
  • 10 smileys: '��������'

The db2_drv will detect the string unit size and read/write data correctly to such columns. Driver buffers will be allocated according to string unit size.

When creating new char/varchar columns, the driver will take the default for the string unit size for the database. When not explicitly set, the default string unit size will be OCTETS.

The default can be changed with the DB2 string_units configuration parameter. DB2 String Units Documentation

Example

db2 => update db cfg for ORDER199 using STRING_UNITS CODEUNITS32

This will set the default code units to CODEUNITS32.

With db2_drv 20.0, both methods to use Unicode data in a DB2 Unicode database are supported:

  • Use special Unicode types (graphic types).
  • Use char/varchar types to store Unicode data.

Which method to use is the developer’s choice. Considerations:

  • Graphic types fit better with the way DataFlex works.
  • Graphic types take more space - data is stored as UCS-2.
  • Graphic types may be considered ‘old’ types and not recommended.
  • Char/varchar with string units in octets has length limitations.
  • Char/varchar with string units in codeunits32 requires larger buffers.

Type Mappings

The types to be used can be configured with db2_drv type mapping settings. These type mapping settings can be configured in the db2_drv.int configuration file or at runtime with a driver attribute. Note that mapping settings are only used when creating new columns in a table. Also note that mapping settings do not apply when adding columns in DataFlex Studio, since in Table Editor the DB2 types can be directly chosen.

In db2_drv 20.0 or later, the driver will detect if it is connected to a Unicode database or a non-Unicode database, and default type mappings will be set accordingly.

When connected to a Unicode database, the graphic types will be used for new columns. This can be changed with the MAP_DF_ASCII_TO_SQLTYPE and MAP_DF_TEXT_TO_SQLTYPE settings in .int, or the corresponding attributes at runtime.

; MAP_DFASCII_TO_SQLTYPE: The DB2 type when creating new DF_ASCII columns
; Allowed values:
; char
; varchar
; graphic
; vargraphic
; Default:
; When connected to a DB2 Unicode database: vargraphic
; When connected to a DB2 Non-Unicode database: varchar
;
; MAP_DFASCII_TO_SQLTYPE vargraphic

; MAP_DFTEXT_TO_SQLTYPE: The DB2 type when creating new DF_TEXT columns
; Allowed values:
; varchar
; long varchar
; clob
; vargraphic
; long vargraphic
; DBCLOB
; Default:
; When connected to a DB2 Unicode database: long vargraphic
; When connected to a DB2 Non-Unicode database: long varchar
; MAP_DFTEXT_TO_SQLTYPE long vargraphic

See Also