nlib
nn::nlib::unicode::UnicodeNormalizer Class Referencefinal

Class used to normalize a unicode string. More...

#include "nn/nlib/unicode/UnicodeNormalizer.h"

Public Types

enum  NormalizationForm {
  kNfc,
  kNfd,
  kNfkc,
  kNfkd
}
 Specifies the unicode normalization format. More...
 

Static Public Member Functions

static errno_t Normalize (InputStream *istr, OutputStream *ostr, NormalizationForm form)
 Normalizes unicode (UTF-8) strings. More...
 

Detailed Description

Class used to normalize a unicode string.

Description
For more information on normalization of Unicode strings, see Unicode Standard Annex 15 (UAX #15).
See also
http://www.unicode.org/reports/tr15/ (UAX #15)

Definition at line 29 of file UnicodeNormalizer.h.

Member Enumeration Documentation

◆ NormalizationForm

Specifies the unicode normalization format.

Description
UnicodeNormalizer::Normalize function.
Enumerator
kNfc 

Convert using kNfc.
Decomposes the unicode string using canonical decomposition, reorders using canonical ordering, and canonically composes. In most cases, the resulting string is the same as the source string.

kNfd 

Converts using kNfd.
Decomposes the unicode string using canonical decomposition, and reorders using canonical ordering. Characters such as voicing marks and acute accent characters are decomposed.

kNfkc 

Converts using kNfkc.
Decomposes the unicode string using compatibility decomposition, reorders using canonical ordering, and canonically composes. Wide character alphanumeric characters are converted to half-width (standard) alphanumeric characters, and diacritics are decomposed.

kNfkd 

Converts using kNfkd.
Decomposes the unicode string using compatibility decomposition, and reorders using canonical ordering. In addition to decomposing such characters as diacritics and acute accents, this function also performs other conversions such as changing full-width alphanumerics to half-width (standard), and decomposing enclosed alphanumerics.

Definition at line 31 of file UnicodeNormalizer.h.

Member Function Documentation

◆ Normalize()

nn::nlib::unicode::UnicodeNormalizer::Normalize ( InputStream istr,
OutputStream ostr,
NormalizationForm  form 
)
static

Normalizes unicode (UTF-8) strings.

Parameters
[in]istrInput stream to read the UTF-8 string to be normalized.
[in]ostrOutput stream where the normalized UTF-8 string is written to.
[in]formSpecifies the normalization method.
Return values
0No error occurred.
EINVALIndicates that istr or ostr is NULL.
EILSEQUmlaut and similar characters are too long. (See UAX #15, Section 13.)
Anerror has occurred in another stream.

The documentation for this class was generated from the following files: