nlib
nn::nlib::TextReader Class Reference

The class for reading text from streams. More...

#include "nn/nlib/TextReader.h"

Public Member Functions

bool Init (InputStream *stream) noexcept
 Initializes a text reader for the specified stream. More...
 
int Read () noexcept
 Reads one character from the stream and returns UTF-32 data. More...
 
int Peek () noexcept
 Returns one character from the start of the stream in UTF-32. More...
 
int SkipWs () noexcept
 Skips white-space characters (space, newline, tab, and return) in the stream and returns the number that were skipped. More...
 
bool ReadUntil (size_t *len, char *buf, size_t n, char delim) noexcept
 Reads as many as n bytes of UTF-8 characters until delim and stores them in buf. More...
 
template<size_t N>
bool ReadUntil (size_t *len, char(&buf)[N], char delim) noexcept
 Calls ReadUntil(len, buf, N, delim).
 
template<class T >
bool ReadUntil (size_t *len, char *buf, size_t n, T pred) noexcept
 Reads as many as n bytes of UTF-8 characters and stores them in buf. More...
 
template<class T , size_t N>
bool ReadUntil (size_t *len, char(&buf)[N], T pred) noexcept
 Calls ReadUntil(len, buf, N, pred).
 
size_t ReadDecimalString (char *buf, size_t n) noexcept
 Reads as many as n of the characters 0 through 9 and stores them in buf. More...
 
template<size_t N>
size_t ReadDecimalString (char(&buf)[N]) noexcept
 Calls ReadDecimalString(buf, N).
 
bool Proceed (const char *str, size_t n) noexcept
 Advances the stream by the amount of the text string str. More...
 
bool Proceed (char c) noexcept
 Advances the stream by the amount of the character specified by c. More...
 
bool ProceedEx (const char *str) noexcept
 Advances the stream by the amount of the text string str. There is no limit on the length of the text string, and the position of the stream might be changed even if it does not match. More...
 
int ReadAsUtf8 (char *b0, char *b1, char *b2, char *b3) noexcept
 Reads one code point from the stream and stores it as UTF-8 in b0, b1, b2, and b3. More...
 
int ReadAsUtf16 (nlib_utf16_t *upper, nlib_utf16_t *lower) noexcept
 Reads one code point from the stream and stores it as UTF-16 in upper and lower. More...
 
int PeekAsUtf16 (nlib_utf16_t *upper, nlib_utf16_t *lower) noexcept
 Stores the one code point from the start of the stream as UTF-16 in upper and lower. More...
 
bool Close () noexcept
 Closes the text reader. More...
 
void SetError (errno_t e) const noexcept
 Sets an error value. More...
 
errno_t GetErrorValue () const noexcept
 This function can get the cause of the error when reading has failed. More...
 
InputStreamGetStream () noexcept
 Gets the stream for the text reader to read. More...
 
int GetLine () const noexcept
 Gets the current line number. More...
 
int GetColumn () const noexcept
 Gets the current column. More...
 
 operator bool () const
 Returns true if no internal error has occurred.
 
Basic Member Functions
 TextReader () noexcept
 Instantiates the object with default parameters (default constructor).
 
virtual ~TextReader () noexcept
 Destructor. The stream is not closed.
 

Detailed Description

The class for reading text from streams.

Description
Reads a UTF-8 text string from a stream and gets one character at a time (UTF-32 or UTF-16).
Newline strings are processed as follows.
  • CRLF is passed as LF.
  • CR is passed as LF.
  • LF is passed as LF.
If verbose UTF-8 is detected, an error is generated (EILSEQ). An error (EILSEQ) is also generated if UTF-8 corresponding to U+D800-U+DFFF is detected.
const char str[] = "multibyte \r\nstring";
MemoryInputStream istr;
istr.Init(str);
if (!r.Init(&istr)) { ERROR; }
int c;
while ((c = r.Read()) != -1) {
// c is a UTF-32 value and can be processed in terms of Unicode code points.
// If you want to convert to these units instead of processing one character at a time, it is better to use a function like unicode::Utf8ToUtf16.
// L"multibyte \nstring" is read one character at a time, in order.
// Newlines are normalized.
}
if (!r) { ERROR; }
if (!r.Close()) { ERROR; }
You can add a check for UTF-8 text by inheriting TextReader and overriding a FillBuffer_ member function. The TextReader class checks if UTF-8 is enabled and processes newline codes.
The following code is a rough sketch of the implementation of the derived class.
virtual void DerivedClass::FillBuffer() NLIB_NOEXCEPT {
TextReader::FillBuffer_();
// The text string is buffered from GetCur to GetBufEnd.
// This can be checked and then processed or an error can be generated.
// To decrease the number of characters, you must set this using the SetBufEnd member function.
// The number of characters cannot be increased.
}
Examples:
misc/readfile/readfile.cpp, and misc/writefile/writefile.cpp.

Definition at line 20 of file TextReader.h.

Member Function Documentation

nn::nlib::TextReader::Close ( )
noexcept

Closes the text reader.

Returns
Returns true if successful.
Description
Clears the reference to the stream, closes the text reader, and detaches the base stream. The base stream is not closed by this operation.
nn::nlib::TextReader::GetColumn ( ) const
inlinenoexcept

Gets the current column.

Returns
The current column number, starting from 1.
Description
The function returns 0 and sets the error EBADF if the stream is not open.

Definition at line 152 of file TextReader.h.

nn::nlib::TextReader::GetErrorValue ( ) const
inlinenoexcept

This function can get the cause of the error when reading has failed.

Return values
0No error occurred.
EINVALInvalid argument.
EEXISTInitialized redundantly.
EBADFNo stream to read.
EIOFailed to read from the stream for some reason.
EILSEQInvalid character found.

Definition at line 149 of file TextReader.h.

nn::nlib::TextReader::GetLine ( ) const
inlinenoexcept

Gets the current line number.

Returns
The current line number, starting from 1.

Definition at line 151 of file TextReader.h.

nn::nlib::TextReader::GetStream ( )
inlinenoexcept

Gets the stream for the text reader to read.

Returns
The pointer to the stream.

Definition at line 150 of file TextReader.h.

nn::nlib::TextReader::Init ( InputStream stream)
noexcept

Initializes a text reader for the specified stream.

Parameters
[in]streamA stream.
Returns
Returns true when successful.
Description
The function returns false if a text reader is already initialized or if stream is NULL. If the text is UTF-8 with BOM, the function tries to read the BOM. The function also returns false if it fails to read the BOM.
nn::nlib::TextReader::Peek ( )
inlinenoexcept

Returns one character from the start of the stream in UTF-32.

Returns
The character that was read (in UTF-32). The function returns -1 if the end of the stream was reached or there was an error.

Definition at line 45 of file TextReader.h.

nn::nlib::TextReader::PeekAsUtf16 ( nlib_utf16_t upper,
nlib_utf16_t lower 
)
inlinenoexcept

Stores the one code point from the start of the stream as UTF-16 in upper and lower.

Parameters
[out]upperStores a UTF-16 character, or the high surrogate.
[out]lowerIf there is a surrogate pair, stores the low surrogate.
Return values
1Data has been stored in just upper.
2Data has been stored in both upper and lower.
0Indicates that the end of the stream was reached or there was an error.

Definition at line 141 of file TextReader.h.

nn::nlib::TextReader::Proceed ( const char *  str,
size_t  n 
)
noexcept

Advances the stream by the amount of the text string str.

Parameters
[in]strA pointer to a UTF-8 text string.
[in]nThe length of the string, in bytes.
Return values
trueIndicates that the start of the stream matched str.
falseReturned in all other cases.
Description
If the start of the stream matches str, the reading of the stream is advanced by that amount. If it does not match, the stream remains at the current position.
The text string specified for str must be no longer than 200 characters and it must end with a UTF-8 delimiter not including a newline character. Behavior is undefined if str does not conform to these limitations.
nn::nlib::TextReader::Proceed ( char  c)
inlinenoexcept

Advances the stream by the amount of the character specified by c.

Parameters
[in]cThe character to skip over.
Return values
trueIndicates that the start of the stream matched c.
falseReturned in all other cases.
Description
If the start of the stream matches c, the reading of the stream is advanced by that amount. If it does not match, the stream remains at the current position.
The character specified for c must be an ASCII character and not a newline character.

Definition at line 89 of file TextReader.h.

nn::nlib::TextReader::ProceedEx ( const char *  str)
noexcept

Advances the stream by the amount of the text string str. There is no limit on the length of the text string, and the position of the stream might be changed even if it does not match.

Parameters
[in]strA pointer to a UTF-8 text string.
Return values
trueIndicates that the start of the stream matched str.
falseReturned in all other cases.
Description
The text string specified for str must end with a UTF-8 delimiter not including a newline character.
nn::nlib::TextReader::Read ( )
inlinenoexcept

Reads one character from the stream and returns UTF-32 data.

Returns
The character that was read (in UTF-32). The function returns -1 if the end of the stream was reached or there was an error.

Definition at line 25 of file TextReader.h.

nn::nlib::TextReader::ReadAsUtf16 ( nlib_utf16_t upper,
nlib_utf16_t lower 
)
inlinenoexcept

Reads one code point from the stream and stores it as UTF-16 in upper and lower.

Parameters
[out]upperStores a UTF-16 character, or the high surrogate.
[out]lowerIf there is a surrogate pair, stores the low surrogate.
Return values
1Data has been stored in just upper.
2Data has been stored in both upper and lower.
0Indicates that the end of the stream was reached or there was an error.

Definition at line 137 of file TextReader.h.

nn::nlib::TextReader::ReadAsUtf8 ( char *  b0,
char *  b1,
char *  b2,
char *  b3 
)
inlinenoexcept

Reads one code point from the stream and stores it as UTF-8 in b0, b1, b2, and b3.

Parameters
[out]b0Stores the first byte of a UTF-8 character.
[out]b1Stores the second byte of a UTF-8 character.
[out]b2Stores the third byte of a UTF-8 character.
[out]b3Stores the fourth byte of a UTF-8 character.
Return values
1Data has been stored in just b0.
2Data has been stored in b0 and b1.
3Data has been stored in b0, b1 and b2.
4Data has been stored in b0, b1, b2 and b3.
0Indicates that the end of the stream was reached or there was an error.

Definition at line 101 of file TextReader.h.

nn::nlib::TextReader::ReadDecimalString ( char *  buf,
size_t  n 
)
noexcept

Reads as many as n of the characters 0 through 9 and stores them in buf.

Parameters
[out]bufThe buffer to which the string is stored.
[in]nThe size of the buffer.
Returns
The number of characters that were read.
Description
buf is not terminated with the null character.
nn::nlib::TextReader::ReadUntil ( size_t *  len,
char *  buf,
size_t  n,
char  delim 
)
noexcept

Reads as many as n bytes of UTF-8 characters until delim and stores them in buf.

Parameters
[out]lenThe number of bytes stored in buf.
[out]bufThe buffer where the text string is stored.
[in]nThe size of the buffer.
[in]delimThe delimiter.
Returns
Returns true if a delimiter was found somewhere within the n bytes. If not, returns false.
Description
delim is not read, and buf is not terminated with the null character. The data is always read in terms of UTF-8 code points.
template<class T>
bool nn::nlib::TextReader::ReadUntil ( size_t *  len,
char *  buf,
size_t  n,
pred 
)
noexcept

Reads as many as n bytes of UTF-8 characters and stores them in buf.

Template Parameters
TThe type for function objects for making determinations.
Parameters
[out]lenThe number of bytes stored in buf.
[out]bufThe buffer where the text string is stored.
[in]nThe size of the buffer.
[in]predA function object.
Returns
Returns true if a delimiter was found somewhere within the n bytes. If not, returns false.
Description
This function calls pred(const char* ptr) and determines whether there is a delimiter. ptr, which is an argument for pred, takes a pointer to a UTF-8 character. One code point of data can be accessed.
Use code like the following to conduct the determination. The static member function is called for just the beginning portion of the code point.
struct SearchE38182 {
bool operator()(const char* ptr) {
const unsigned char* p = reinterpret_cast<const unsigned char*>(ptr);
return p[0] == 0xE3 && p[1] == 0x81 && p[2] == 0x82;
}
};
buf is not terminated with the null character. The data is always read in terms of UTF-8 code points.

Definition at line 187 of file TextReader.h.

nn::nlib::TextReader::SetError ( errno_t  e) const
inlinenoexcept

Sets an error value.

Parameters
[in]eAn error value.
Description
If an error value has not been set, the one specified by e is set.

Definition at line 146 of file TextReader.h.

nn::nlib::TextReader::SkipWs ( )
inlinenoexcept

Skips white-space characters (space, newline, tab, and return) in the stream and returns the number that were skipped.

Returns
The number of skipped white-space characters.

Definition at line 55 of file TextReader.h.


The documentation for this class was generated from the following files: