Skip to content

NormalizeString

See Also: String Functions

Purpose

Returns a normalized copy of the passed String.

Return Type

String

Syntax

NormalizeString( {StringVal}[, {NormalizationForm} ])

Parameters

  • {StringVal}: The string value that needs to be normalized.
  • {NormalizationForm}: Specifies the Unicode normalization form to use (normNFC / normNFD / normNFKC / normNFKD).

What it Does

Unicode uniquely identifies each character using a number, referred to as the code point. Some characters consist of multiple code points that are displayed as a single character. In many cases, these characters built of multiple code points are also available as a single code point. There are even characters that can be represented with more than two code points that can be arranged in a random order.

These characters that can be represented in different ways can be problematic when searching through source code (using Pos, for example) or when using them as unique keys in your database. Normalizing strings resolves this issue by adjusting the string to use the same version of each character.

Unicode specifies four different normalization forms:

  • normNFC: Canonical Decomposition, followed by Canonical Composition (the default if not specified).
  • normNFD: Canonical Decomposition.
  • normNFKC: Compatibility Decomposition, followed by Canonical Composition.
  • normNFKD: Compatibility Decomposition.

Examples

The following example demonstrates normalization using two strings. The sNorm string contains the single code point version of ‘Latin Small Letter N with Tilde’, while sComposite composes the character using code points ‘Latin Small Letter N’ and ‘Combining Tilde’. It shows that using Pos one notation won’t find the other, but when using NormalizeString, it will find the character. Note that to be safe, you should ensure that both strings are normalized to the same form.

// ñ (‘Latin Small Letter N with Tilde’)
Move (Character(241)) to sNorm
// ñ (‘Latin Small Letter N’ + ‘Combining Tilde’)
Move (Character(110) + Character(771)) to sComposite

Move (Pos(sNorm, sComposite)) to iPos
// Results in 0 (not found)

Move (Pos(sNorm, NormalizeString(sComposite))) to iPos
// Results in 1

Move (Pos(NormalizeString(sNorm, normNFD), sComposite)) to iPos
// Results in 1

Move (Pos(NormalizeString(sNorm), NormalizeString(sComposite))) to iPos
// Results in 1