nlib
nn::nlib::exi Namespace Reference

Implements binary XML parsers (and XML parsers). More...

Classes

class  ExiAllocator
 Allocator used by the XML parser. The user can also use it. More...
 
class  ExiAllocatorEx
 Allocator that can be set for each instance of XmlStreamReader and XmlStreamWriter. More...
 
class  ExiErrorStatus
 Sets and stores the error status of the XML parser. More...
 
class  Preserve
 Set of options that preserve the binary XML to read or write. More...
 
class  XmlStreamReader
 Abstract class that reads from an XML stream. More...
 
class  XmlStreamReaderSettings
 Structure used as the initialization options for XmlStreamReader. More...
 
class  XmlStreamWriter
 Abstract class that writes to an XML stream. More...
 
class  XmlStreamWriterSettings
 Structure used as the initialization options for XmlStreamWriter. More...
 

Typedefs

typedef wchar_t ExiChar
 A string-type typedef used internally by the XML parser. More...
 

Enumerations

enum  Alignment {
  ALIGNMENT_BIT_PACKED = 0,
  ALIGNMENT_BYTE_ALIGNMENT = 1
}
 Specifies the alignment of the binary XML to read or write. More...
 
enum  XmlProcessor {
  XML_PROCESSOR_EXI = 0,
  XML_PROCESSOR_TEXT
}
 Option that specifies which XML processor to use. More...
 

Functions

bool TransformXml (XmlStreamReader *r, XmlStreamWriter *w) noexcept
 Reads from XmlStreamReader and writes to XmlStreamWriter. More...
 

Detailed Description

Implements binary XML parsers (and XML parsers).

About the exi Library

The exi library provides an efficient means of reading and writing XML files in both binary and text format. Using a binary format alleviates problems that can occur when adopting XML, such as verbosity (size) and processing overhead. It enables you to reap the full benefits of using XML while greatly reducing the risks. It also enables you to reduce the use of network bandwidth.
The binary format that is used to read and write the data follows the W3C EXI standard, and the exi library is a partial implementation of W3C EXI. For more information about W3C EXI, see About the W3C EXI (Efficient XML Interchange).

Features of the exi Library

The exi library has the following features. It is well-suited to handle XML under conditions where either the network bandwidth or CPU overhead are severely constrained.
Compact Representation of XML
One of the often-cited shortcomings of XML is its verbosity, but using the binary representation provided by the exi library can reduce the size of a typical XML file to anywhere from one-third to one-quarter of the original size. One technique is to assign indexes to the strings that appear within the XML document to reduce the amount of data.
An example is shown below. This example has been simplified so it is not completely accurate, but the data size is reduced with other various methods.
<books>
<book>
<title>...</title>
<author>...</author>
<isbn>...</isbn>
</book>
<book>
<title>...</title>
<author>...</author>
<isbn>...</isbn>
</book>
....
</bools>
The above XML is written to assign IDs as follows:
<$1=books>
<$2=book>
<$3=title>...</>
<$4=author>...</>
<$5=isbn>...</>
</>
<$2>
<$3>...</>
<$4>...</>
<$5>...</>
</>
....
</>
Fast Operations
Because the XML is represented with a smaller data size, network transfer speeds and storage read speed can be increased when handling the same XML. Even when the exi library is used to parse XML from memory, it can achieve operational speeds equal to or faster than XML parsers that are said to be fast, such as expat and XmlLite.
Namespace Support
High-speed XML parser implementations sometimes have no support for namespaces, but the exi library supports namespaces. Generally, authoring tools output XML that includes namespaces. Namespace support is required in order to handle this type of XML file.
Safer Than Handling Text-Based XML
The binary XML format itself is inherently more difficult to tamper with or sniff than text-based XML. This also makes it more difficult to inject invalid data to defeat the XML parser or paralyze a system.
  • Tampering with data is more difficult than with text-based XML. With binary XML, the parsing rules (grammars) for data that arrives later on depend on data that has already been read. This makes it more difficult to tamper with only a part of the XML document. The fact that the format uses a bit stream and not a byte stream also makes tampering more difficult.
  • The amount of memory to be allocated for the XML parser is allocated ahead of time and before use. This prevents invalid XML (for example, XML containing extremely long element names) from consuming all the system memory.
  • Parsing of XML attributes runs in \(O(n)\) time. Many old XML parsers run in \(O(n^2)\) time, which sometimes made them targets of distributed denial-of-service (DDoS) attacks.
  • The exi library does not support DTDs. As a result, it is not possible to create XML bombs using DTDs.

Using the exi Library

Writing Binary XML
Write binary XML by creating an XmlStreamWriter instance.
Reading Binary XML
Read binary XML by creating an XmlStreamReader instance. Pull parsing has been implemented in a similar vein as Java's StAX (Streaming API for XML) or the .NET Framework's XmlReader, enabling easier development than using the DOM or SAX.
The following are the salient features of pull parsing.
  • Classes do not need to be created as they do with SAX. Extract information by simply using while and switch statements.
  • Faster than DOM, and does not consume memory.
  • Can implement recursive descent parsers. This is particularly useful if you want to implement a simplified language using XML.
Conversion Between XML and Binary XML Using Command-Line Programs
Using nexiconv.exe enables you to convert from XML to binary XML, and vice versa. By performing the following, an XML file can be converted into a binary XML file (with an .exi extension).
nexiconv.exe <file with an .xml extension>
In addition, by performing the following, a binary XML file can be converted into an XML file.
nexiconv.exe <file with an .exi extension>
This program uses xmllite.dll. Normally this file is pre-installed on your workstation.

Development Roadmap

The features of XML have a very broad range of uses. You can decide later which of these features to implement. The following examples provide some ideas.
  • Support XML schemas. Using the information in an XML schema enables you to decrease the binary XML size even further. This could done in a way that meets the W3C EXI specifications.
  • Implement a full-fledged XML serializer that supports pointers and inheritance.
  • Support XML-RPC. Using binary XML enables you to support highly efficient XML-RPC, which in turn makes it practical to invoke features of different processes, applications, devices, and servers through the same interface.
  • It is well-suited for notation of unstructured data such as rich-text format, and screen layouts.
  • It is well-suited for use as a notation format to share between applications. The following reasons can be offered.
    • It is easy to create extensions to the format, and also easy to create programs that can support extensions to the format.
    • There is no need to share the programs for reading and writing data (headers for data structures).
    • Being XML, the specifications of the format are easy to write and easy to understand.

About the W3C EXI (Efficient XML Interchange)

XML is a powerful data representation format, but it has a drawback: its verbosity leads to bloated data sizes and can result in processing overhead. Consequently, its adoption has been delayed in environments with little tolerance for overhead.
The following workarounds were taken to resolve the problem.
  • Compress the XML using gzip and exchange the compressed version.
  • Use various independent binary XML representations.
    • WBXML (Wireless Binary XML). Supported by the WAP Forum.
    • AMF3 (Action Message Format 3). By Adobe.
    • FastInfoset(Java)
    • Support for various database products (Oracle, Microsoft)
However, there was no standard binary XML format.
During that time, a binary XML format called EXI (Efficient XML Interchange Format 1.0) officially became a recommendation in March 2011 by the W3C (World Wide Web Consortium), a standards organization for web technology. Evidently, EXI is pronounced "ek'si."
The format's specification is hosted at http://www.w3.org/TR/exi/.
Unfortunately, a Japanese translation does not appear to exist at the moment. A Japanese-language introductory article can be found at http://www.publickey1.jp/blog/11/xmlexiw3c_1.html.
The following example applications of W3C EXI in action have been listed at http://www.w3.org/2011/03/exi-pr.html.en.
  • Communication between networks of "smart grid" meters and electric vehicles.
  • Acceleration of data transactions in financial trading systems.
  • Defense applications.

About Text-Based XML Parsers

We have replaced the front end of the binary XML parser to implement a text-based XML parser. This approach allows the exi library to support both binary XML parsers and text-based XML parsers without increasing the code size much.
The text-based XML parser is a non-validating XML parser that is largely compliant with XML 1.0. The following list includes known issues that are not currently compliant with the XML 1.0 standard.
  • Reading of UTF-16-encoded data is not supported. It only works with UTF-8.
  • Encoding declarations within XML declarations are ignored.
  • Document type definitions (DTDs) are skipped when reading, but error checking within the DTD is sometimes omitted. The results of parsing the test data that can be downloaded from http://www.w3.org/XML/Test/ can be viewed through the following link.
A value of 0 indicates successful parsing (no syntax errors were reported). A value of 1 indicates a parsing failure (a syntax error was reported).

Typedef Documentation

A string-type typedef used internally by the XML parser.

Description
Defining char with a typedef uses UTF-8 encoding. Defining wchar_t with a typedef uses either UTF-16 or UTF-32 encoding.
Examples:
exi/script/script.cpp, exi/serializer/serializer.cpp, exi/simple1/simple1.cpp, and exi/simple2/simple2.cpp.

Definition at line 23 of file Types.h.

Enumeration Type Documentation

Specifies the alignment of the binary XML to read or write.

Description
For more information, see http://www.w3.org/TR/exi/#options.
Enumerator
ALIGNMENT_BIT_PACKED 

Reads from and writes to a bit-packed EXI stream (default).

ALIGNMENT_BYTE_ALIGNMENT 

Reads from and writes to a byte-aligned EXI stream.

Definition at line 32 of file Types.h.

Option that specifies which XML processor to use.

Enumerator
XML_PROCESSOR_EXI 

Use binary XML.

XML_PROCESSOR_TEXT 

Use text-based XML.

Definition at line 49 of file Types.h.

Function Documentation

nn::nlib::exi::TransformXml ( XmlStreamReader r,
XmlStreamWriter w 
)
noexcept

Reads from XmlStreamReader and writes to XmlStreamWriter.

Parameters
[in]rThe stream in which XML is input for reading.
[in]wTo stream to which XML is output.
Returns
Returns true when successful.
Description
Use this function to read and convert between binary XML and text XML.