nlib
nn::nlib::succinct::Trie Class Referencefinal

Implements Trie using LOUDS. More...

#include "nn/nlib/succinct/Trie.h"

Public Types

typedef bool(* MatchCallback) (const char *first, const char *last, uint32_t nodeid, void *user_obj)
 User-defined callback function. More...
 

Public Member Functions

void Match (const char *cstr, MatchCallback callback, void *user_obj) noexcept
 Inspects the string to detect the target string registered in Trie. More...
 
void Match (const char *cstr, MatchCallback callback) noexcept
 A parameter omitted version of the above function which passes nullptr as a parameter.
 
void Match (const void *data, size_t n, MatchCallback callback, void *user_obj) noexcept
 Inspects the data to detect the target pattern. More...
 
void Match (const void *data, size_t n, MatchCallback callback) noexcept
 A parameter omitted version of the above function which passes nullptr as a parameter.
 
void MatchBackward (const void *data, size_t n, MatchCallback callback, void *user_obj) noexcept
 Inspects the data backwards to detect the target pattern. More...
 
void MatchBackward (const void *data, size_t n, MatchCallback callback) noexcept
 A parameter omitted version of the above function which passes nullptr as a parameter.
 
errno_t GetCommonPrefixWords (const void *prefix, size_t n, ReallocCstringVec *vec) noexcept
 Lists the data with a common prefix in dictionary order. More...
 
std::pair< errno_t, Nlist< uint32_t > > GetCommonPrefixWords (const void *prefix, size_t n) noexcept
 Returns a list of nodeid of data that have common prefix. More...
 
size_t MemSize () const noexcept
 Returns the amount of memory explicitly allocated by the class. More...
 
Constructor, Destructor, and Initialization
constexpr Trie () noexcept
 Instantiates the object with default parameters (default constructor). Requires initialization with Init() after execution.
 
 ~Trie () noexcept
 Destructor.
 
Trieassign (Trie &rhs, move_tag)
 Corresponds to a move assignment operator.
 
 Trie (Trie &rhs, move_tag)
 Corresponds to a move constructor.
 
 Trie (Trie &&rhs)
 Instantiates the object (move constructor).
 
Trieoperator= (Trie &&rhs)
 Move assignment operator.
 
bool Init () noexcept
 Initializes an object. Returns true if successful.
 
void Reset () noexcept
 Resets this object to the state immediately after the default constructor was executed.
 
Importing and Exporting
bool Export (BinaryWriter *w) const noexcept
 Writes the object to the file. More...
 
bool Import (BinaryReader *r) noexcept
 Reads the written object. More...
 

Detailed Description

Implements Trie using LOUDS.

Description
Trie can be used with a compact data size. Used by creating using TrieBuilder, or importing it using the Import method.
In the following code, although strings in 26^4 pattern are registered with Trie from aaaa to zzzz, only about 770 KBytes is consumed. Note that if the strings are arranged as they are, about 1.8 MBytes will be consumed.
TrieBuilder builder;
builder.Init();
auto loop = [&](std::function<bool(const char*)> func) {
char buf[5];
buf[4] = '\0';
for (char ch0 = 'a'; ch0 <= 'z'; ++ch0) {
buf[0] = ch0;
for (char ch1 = 'a'; ch1 <= 'z'; ++ch1) {
buf[1] = ch1;
for (char ch2 = 'a'; ch2 <= 'z'; ++ch2) {
buf[2] = ch2;
for (char ch3 = 'a'; ch3 <= 'z'; ++ch3) {
buf[3] = ch3;
if (!func(buf)) return false;
}
}
}
}
return true;
};
bool result = loop([&](const char* str) { return builder.AddWord(str); });
SUCCEED_IF(result);
auto trie = builder.Build2();
nlib_printf("The size of the words([a-z][a-z][a-z][a-z]): %d bytes\n", 4 * 26 * 26 * 26 * 26);
nlib_printf("The size of the trie for them: %" PRIuS " bytes\n", trie->MemSize());
result = loop([&](const char* str) {
int val = 0;
trie->Match(str, [](const char* first, const char* last, uint32_t nodeid, void* user_obj) {
NLIB_UNUSED(nodeid);
NLIB_UNUSED(first);
NLIB_UNUSED(last);
*reinterpret_cast<int*>(user_obj) = 1;
// nlib_printf("%s\n", std::string(first, last).c_str());
return true;
},
&val);
return val == 1;
});
SUCCEED_IF(result);
/*
Output:
The size of the words([a-z][a-z][a-z][a-z]): 1827904 bytes
The size of the trie for them: 785007 bytes
aaaa
....
zzzz
*/

Definition at line 36 of file Trie.h.

Member Typedef Documentation

◆ MatchCallback

nn::nlib::succinct::Trie::MatchCallback

User-defined callback function.

Parameters
[in]firstPointer to the head of the target term (tail when a reverse search).
[in]lastThe end of the target term (immediately before the term when a reverse search).
[in]nodeidThe ID of the detected term (not a continuous value).
[in,out]user_objPointer to the user-specified object.
Returns
Continues analysis if true. Ends the process when false.
Description
Called when the string (pattern) is detected. Developers will write a process to handle each.

Definition at line 42 of file Trie.h.

Member Function Documentation

◆ Export()

nn::nlib::succinct::Trie::Export ( BinaryWriter w) const
noexcept

Writes the object to the file.

Parameters
[in]wWrite out object.
Returns
Returns true when successful.
Description
The written data can be read and restored using the Import function. The data is always written little endian.

◆ GetCommonPrefixWords() [1/2]

nn::nlib::succinct::Trie::GetCommonPrefixWords ( const void *  prefix,
size_t  n,
ReallocCstringVec vec 
)
noexcept

Lists the data with a common prefix in dictionary order.

Parameters
[in]prefixPointer to the prefix.
[in]nLength of the prefix.
[out]vecVector where the string that has prefix as a prefix is stored.
Returns
Returns 0 on success.

◆ GetCommonPrefixWords() [2/2]

nn::nlib::succinct::Trie::GetCommonPrefixWords ( const void *  prefix,
size_t  n 
)
noexcept

Returns a list of nodeid of data that have common prefix.

Parameters
[in]prefixPointer to the prefix.
[in]nLength of the prefix.
Returns
A pair of the error value and a list of nodeid.
Description
nodeid is an integer passed to the parameter of MatchCallback.
An application of creating and using suffix trie with the following code exists.
// text from https://en.wikipedia.org/wiki/Trie
const char* str = R"(In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search tree - an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see compact prefix tree.)";
const size_t len = nlib_strlen(str);
TrieBuilder builder;
SUCCEED_IF(builder.Init());
// construct the suffix trie for 'str'
for (size_t i = 0; i < len - 1; ++i) {
SUCCEED_IF(builder.AddWord(str + i));
}
auto trie = builder.Build2();
// map nodeid -> pos in 'str'
std::map<uint32_t, size_t> nodeid_to_pos;
struct UserObj {
std::map<uint32_t, size_t>* nodeid_to_pos;
size_t i;
} uobj;
uobj.nodeid_to_pos = &nodeid_to_pos;
for (size_t i = 0; i < len - 1; ++i) {
uobj.i = i;
trie->Match(str + i, [](const char*, const char*, uint32_t nodeid, void* user_obj) {
UserObj* u = reinterpret_cast<UserObj*>(user_obj);
(*u->nodeid_to_pos)[nodeid] = u->i;
return true;
}, &uobj);
}
// list the occurences of the specified pattern
auto list_up = [&](const char* pat) {
nlib_printf("---- %s ----\n", pat);
size_t l = nlib_strlen(pat);
auto result = trie->GetCommonPrefixWords(pat, l);
if (result.first != 0) return;
for (auto& nodeid : result.second) {
size_t pos = nodeid_to_pos[nodeid];
size_t first = pos >= 10 ? pos - 10 : 0;
size_t last = pos + 10 + l < len ? pos + 10 + l : len;
nlib_printf("%s\n", std::string(str + first, str + last).c_str());
}
};
list_up("tree");
list_up("of");
/*
Output:
---- tree ----
or prefix tree (as they
of search tree - an orde
d digital tree and somet
n ordered tree data stru
on in the tree defines t
mes radix tree or prefix
de in the tree stores th
ry search tree, no node
of prefix tree, see comp
ct prefix tree.
---- of ----
scendants of a node ha
d to keys of interest.
sentation of prefix tr
is a kind of search tr
on prefix of the strin
*/

◆ Import()

nn::nlib::succinct::Trie::Import ( BinaryReader r)
noexcept

Reads the written object.

Parameters
[in]rRead out object.
Returns
Returns true when import is successful.

◆ Match() [1/2]

nn::nlib::succinct::Trie::Match ( const char *  cstr,
MatchCallback  callback,
void *  user_obj 
)
inlinenoexcept

Inspects the string to detect the target string registered in Trie.

Parameters
[in]cstrString that terminates at null.
[in]callbackCallback function called when the term is detected.
[in,out]user_objPointer to the user-specified object.

Definition at line 44 of file Trie.h.

◆ Match() [2/2]

nn::nlib::succinct::Trie::Match ( const void *  data,
size_t  n,
MatchCallback  callback,
void *  user_obj 
)
noexcept

Inspects the data to detect the target pattern.

Parameters
[in]dataPointer to the head of the data to be inspected.
[in]nData size.
[in]callbackCallback function called when the pattern is detected.
[in,out]user_objPointer to the user-specified object.

◆ MatchBackward()

nn::nlib::succinct::Trie::MatchBackward ( const void *  data,
size_t  n,
MatchCallback  callback,
void *  user_obj 
)
noexcept

Inspects the data backwards to detect the target pattern.

Parameters
[in]dataPointer to the tail of the data to be inspected.
[in]nData size.
[in]callbackCallback function called when the pattern is detected.
[in,out]user_objPointer to the user-specified object.

◆ MemSize()

nn::nlib::succinct::Trie::MemSize ( ) const
noexcept

Returns the amount of memory explicitly allocated by the class.

Returns
Returns the number of bytes.
Description
The actual amount of memory allocated by the system may be larger than the value returned by this function because of alignment and space for heap management.

The documentation for this class was generated from the following files: