nlib
nn::nlib::unicode::UnicodeNormalizer Class Referencefinal

Class used to normalize a unicode string. More...

#include "nn/nlib/unicode/UnicodeNormalizer.h"

Public Types

enum  NormalizationForm {
  NFC,
  NFD,
  NFKC,
  NFKD
}
 Specifies the unicode normalization format. More...
 

Static Public Member Functions

static errno_t Normalize (InputStream *istr, OutputStream *ostr, NormalizationForm form)
 Normalizes unicode (UTF-8) strings. More...
 

Detailed Description

Class used to normalize a unicode string.

Description
For more information on normalization of Unicode strings, see Unicode Standard Annex 15, or UAX #15, at http://www.unicode.org/reports/tr15/.

Definition at line 16 of file UnicodeNormalizer.h.

Member Enumeration Documentation

Specifies the unicode normalization format.

Description
UnicodeNormalizer::Normalize function.
Enumerator
NFC 

Convert using NFC.
Decomposes the unicode string using canonical decomposition, reorders using canonical ordering, and canonically composes. In most cases, the resulting string is the same as the source string.

NFD 

Converts using NFD.
Decomposes the unicode string using canonical decomposition, and reorders using canonical ordering. Characters such as voicing marks and acute accent characters are decomposed.

NFKC 

Converts using NFKC.
Decomposes the unicode string using compatability decomposition, reorders using canonical ordering, and canonically composes. Wide character alphanumeric characters are converted to half-width (standard) alphanumeric characters, and diacritics are decomposed.

NFKD 

Converts using NFKD.
Decomposes the unicode string using compatability decomposition, and reorders using canonical ordering. In addition to decomposing such characters as diacritics and acute accents, this function also performs other conversions such as changing full-width alphanumerics to half-width (standard), and decomposing enclosed alphanumerics.

Definition at line 18 of file UnicodeNormalizer.h.

Member Function Documentation

nn::nlib::unicode::UnicodeNormalizer::Normalize ( InputStream istr,
OutputStream ostr,
NormalizationForm  form 
)
static

Normalizes unicode (UTF-8) strings.

Parameters
[in]istrInput stream to read the UTF-8 string to be normalized.
[in]ostrOutput stream where the normalized UTF-8 string is written to.
[in]formSpecifies the normalization method.
Return values
0No error occurred.
EINVALIndicates that istr or ostr is NULL.
EILSEQUmlaut and similar characters are too long. (See UAX #15, Section 13.)
Anerror has occured in another stream.

The documentation for this class was generated from the following files: