Skip to content

1Hyena/nt4c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NT4C Readme

NT4C stands for "NestedText for C" and that is exactly what this project is about.

What is NestedText

In short, NestedText is a file format for holding structured data.

The following resources can explain more if you are unfamiliar with it:

What is NT4C

NT4C is a NestedText parser implementation written in accordance with the C23 standard of the C programming language. It includes the following features:

  • Compliance: NT4C aims to comply with the latest version of the NestedText specification. However, it is currently only compliant with the Minimal NestedText specification.

  • Performance: NT4C is fast as it does not involve any heap memory allocations. It also avoids unnecessary memory copying by directly referencing the input text in the resulting graph.

  • Compactness: The NT4C parser is implemented in a single header file with no dependencies other than the standard C library.

  • Embedding: The NT4C parser is easily reusable in other projects with a simple API that includes a few key functions, primarily nt_parse().

  • Callbacks: NT4C parses the entire document and calls a callback function provided by the application to inform it about each NestedText unit.

  • Tree model: If sufficient memory is provided to the NT4C parser, it constructs a graph where each node directly references a segment from the input text.

  • Portability: NT4C builds and functions on Linux. It should be relatively simple to make it run on most other platforms as long as the platform provides the C standard library.

  • Encoding: NT4C expects UTF-8 encoding of the input text and does not attempt to detect Unicode encoding errors.

  • Permissive license: NT4C is available under the MIT license.

Using NT4C

Parsing NestedText

To parse a NestedText document, you can include the nt4c.h header file directly in your codebase. The parser is implemented in a single C header file for easy integration.

The main functions to use are nt_parse() and nt_parser_parse(). The former is a convenience function for simple callback based parsing whereas the latter takes a pointer to the NT_PARSER structure as its first argument and is to be used for customized parsing.

The NT_PARSER structure stores parsing configuration and the parsing process state. By default, it can handle up to NT_PARSER_NCOUNT nodes in its internal memory. However, you can use the nt_parser_set_memory function to work with a custom array of NT_NODE structures.

When you call nt_parser_parse(), the parser populates the document graph with nodes. It continues processing even if the output buffer reaches its capacity.

After a successful parsing operation, both nt_parse() and nt_parser_parse() return the number of nodes in the input text. This information can help you to determine the memory required for storing the full graph of the document. If parsing fails, the function returns a negative value.

The graph of the document is considered fully stored when the value returned by nt_parser_parse() is non-negative and does not exceed the output buffer's capacity.

Examples

ex_hello

The ex_hello example demonstrates how to use the NT4C parser to generate the text "hello world" and display it on the screen.

static int callback(NT_TYPE, const char *text, size_t size, void *, size_t) {
printf("%.*s\n", (int) size, text);
return -1;
}
int main(int, char **) {
nt_parse("hello world", 0, callback, nullptr);
return EXIT_SUCCESS;
}

screenshot

ex_callback

The ex_callback example demonstrates how to make the NT4C parser call a user-specified function each time it parses the next logical portion of the input document.

static int callback(NT_TYPE t, const char *str, size_t size, void *, size_t d) {
for (size_t i=1; i<d; ++i) {
printf("%s", " ");
}
t = nt_type_type(t); // Let's only print the type codes of node groups.
if (t == NT_NEWLINE) {
printf("%s (depth %lu)\n", nt_type_code(t), d);
}
else {
printf(
"%s => [%.*s] (depth %lu)\n", nt_type_code(t), (int) size, str, d
);
}
return 0;
}
int main(int, char **) {
size_t input_size;
char *input_data = read_file_to_memory("../minimal.nt", &input_size);
if (!input_data) {
fprintf(stderr, "%s\n", "failed to read the input file");
return EXIT_FAILURE;
}
nt_parse(input_data, input_size, callback, nullptr);
return free_and_return(input_data, EXIT_SUCCESS);
}

screenshot

ex_echo

This example demonstrates how to utilize the NT4C parser to parse and display a NestedText document on the screen. The input document undergoes parsing twice. Initially, the length of the document is calculated. Subsequently, a variable-length array is set up to store the Document Object Model (DOM).

NT_NODE nodes[node_count];
NT_PARSER parser = nt_make_parser();
nt_parser_set_memory(&parser, nodes, sizeof(nodes)/sizeof(nodes[0]));
if (nt_parser_parse(&parser, input_data, input_size) > (int) node_count) {
fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
return free_and_return(input_data, EXIT_FAILURE);
}
for (NT_NODE *it = parser.doc.begin; it < parser.doc.end; ++it) {
printf("%.*s", (int) it->size, it->data);
}

screenshot

ex_pretty

This example shows how to use the NT4C parser to pretty-print a NestedText document. It reformats the input text and adds syntax highlighting.

NT_PARSER parser = nt_make_parser();
nt_parser_set_blacklist(&parser, NT_SPACE|NT_NEWLINE);
if (nt_parser_parse(&parser, input_data, 0) > (int) parser.mem.capacity) {
fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
return free_and_return(input_data, EXIT_FAILURE);
}

Here is a NestedText document before and after pretty-printing, as shown in the screenshot below:

nt4c/examples/ugly.nt

Lines 1 to 29 in 490be86

this :
is : an ugly 𝓪𝓼𝓼
# NestedText
#document
#and we are going
to parse it :
- for the purpose
# of
making :
it :
appear :
- more
- easily
- readable
by :
> indenting
> the document
> properly
and :
we:
also :
added :
-color
# ... to
highlight :
- syntax
#
# errors

screenshot

ex_tree

This example shows how to use the NT4C parser to print the structure of a NestedText document on the screen.

constexpr size_t node_count = 200;
NT_NODE nodes[node_count];
NT_PARSER parser = nt_make_parser();
nt_parser_set_memory(&parser, nodes, node_count);
int result = nt_parser_parse(&parser, input_data, input_size);
if (result > (int) node_count) {
fprintf(stderr, "not enough memory for %lu nodes\n", parser.doc.length);
return free_and_return(input_data, EXIT_FAILURE);
}
print_tree(parser.doc.root, 0);

Here is a screenshot showing the structure of the parsed NestedText document:

screenshot

API

nt4c/nt4c.h

Lines 48 to 88 in 490be86

typedef enum : uint32_t {
NT_NONE = 0,
////////////////////////////////////////////////////////////////////////////
NT_TOP_DCT = 1 << 0, // root node contains a dictionary
NT_TOP_LST = 1 << 1, // root node contains a list
NT_TOP_MLS = 1 << 2, // root node contains a multiline string
NT_TOP_NIL = 1 << 3, // root node does not hold any meaningful data
NT_KEY_ROL = 1 << 4, // name of the key for a rest-of-line string
NT_KEY_MLS = 1 << 5, // name of the key for a multiline string
NT_KEY_LST = 1 << 6, // name of the key for the following list
NT_KEY_DCT = 1 << 7, // name of the key for the following dictionary
NT_KEY_NIL = 1 << 8, // name of the key for missing value
NT_SET_ROL = 1 << 9, // node references a rest-of-line assigment
NT_SET_MLS = 1 << 10, // node references a multiline assigment
NT_SET_LST = 1 << 11, // node references a list assigment
NT_SET_DCT = 1 << 12, // node references a dictionary assigment
NT_SET_NIL = 1 << 13, // node references a nil assignment
NT_TAG_MLS = 1 << 14, // node references the tag of a multiline string
NT_TAG_COM = 1 << 15, // node references the tag of a comment line
NT_TAG_LST_ROL = 1 << 16, // tag of the enlisted rest-of-line string
NT_TAG_LST_MLS = 1 << 17, // tag of the enlisted multiline string
NT_TAG_LST_LST = 1 << 18, // tag of the enlisted sublist
NT_TAG_LST_DCT = 1 << 19, // tag of the enlisted dictionary
NT_TAG_LST_NIL = 1 << 20, // tag of the enlisted nil value
NT_STR_ROL = 1 << 21, // node references a rest-of-line string
NT_STR_MLN = 1 << 22, // node references a multiline string
NT_STR_COM = 1 << 23, // node references a comment string
NT_NEWLINE = 1 << 24, // node references the new line data
NT_SPACE = 1 << 25, // node references the (indentation) spaces
NT_INVALID = 1 << 26, // node references a segment of invalid input
NT_DEEP = 1 << 27, // node that exceeds the maximum nesting depth
NT_TOP = NT_TOP_NIL|NT_TOP_LST|NT_TOP_MLS|NT_TOP_DCT,
NT_TAG_LST = (
NT_TAG_LST_ROL|NT_TAG_LST_MLS|NT_TAG_LST_LST|
NT_TAG_LST_DCT|NT_TAG_LST_NIL
),
NT_KEY = NT_KEY_ROL|NT_KEY_MLS|NT_KEY_LST|NT_KEY_DCT|NT_KEY_NIL,
NT_SET = NT_SET_MLS|NT_SET_DCT|NT_SET_LST|NT_SET_ROL|NT_SET_NIL,
NT_STR = NT_STR_MLN|NT_STR_ROL
} NT_TYPE;

Specify the size of the integrated memory buffer of the NT_PARSER structure by defining the NT_PARSER_NCOUNT macro before including the nt4c.h header. The integrated memory was added to increase the API usage convenience in cases where the size of the input document is always known to be small (see ex_pretty).

nt4c/nt4c.h

Lines 41 to 43 in 490be86

#ifndef NT_PARSER_NCOUNT
#define NT_PARSER_NCOUNT 1
#endif

Initialization

nt_make_parser

nt4c/nt4c.h

Lines 114 to 116 in 490be86

static NT_PARSER nt_make_parser(
// Returns a parser structure in its default state.
);

Examples: ex_echo

nt_parser_init

nt4c/nt4c.h

Lines 118 to 122 in 490be86

static void nt_parser_init(
NT_PARSER * parser
// Resets the provided parser to its default state.
);

Parsing

nt_parse

nt4c/nt4c.h

Lines 99 to 112 in 490be86

static int nt_parse(
const char * text,
size_t text_size,
NT_CALLBACK on_text,
void * userdata
// Parses the given text as a NestedText document of the given text size.
// If the callback function pointer argument is not a null pointer, then the
// callback function is called on each text segment of the input text and
// the userdata argument is simply passed on the the callback function.
//
// Returns the number of nodes in the input text or a negative value on
// error.
);

Examples: ex_hello

nt_parser_parse

nt4c/nt4c.h

Lines 124 to 134 in 490be86

static int nt_parser_parse(
NT_PARSER * parser,
const char * text,
size_t text_size
// Parses the given text as a NestedText document of the given text size
// using the parsing configuration stored in the provided parser structure.
//
// Returns the number of nodes in the input text or a negative value on
// error.
);

Examples: ex_echo

Configuration

nt_parser_set_memory

nt4c/nt4c.h

Lines 136 to 143 in 490be86

static void nt_parser_set_memory(
NT_PARSER * parser,
NT_NODE * nodes,
size_t node_count
// Sets the memory region used for the storage of the document object model
// by the given parser.
);

Examples: ex_echo

nt_parser_set_recursion

nt4c/nt4c.h

Lines 145 to 150 in 490be86

static void nt_parser_set_recursion(
NT_PARSER * parser,
size_t depth
// Sets the maximum allowed recursion depth of the given parser.
);

nt_parser_set_blacklist

nt4c/nt4c.h

Lines 152 to 158 in 490be86

static void nt_parser_set_blacklist(
NT_PARSER * parser,
NT_TYPE blacklist
// Sets the node types to be ignored by the given parser when parsing
// NestedText documents.
);

Examples: ex_pretty

nt_parser_set_whitelist

nt4c/nt4c.h

Lines 160 to 166 in 490be86

static void nt_parser_set_whitelist(
NT_PARSER * parser,
NT_TYPE whitelist
// Sets the node types accepted by the given parser when parsing NestedText
// documents.
);

nt_parser_set_userdata

nt4c/nt4c.h

Lines 168 to 174 in 490be86

static void nt_parser_set_userdata(
NT_PARSER * parser,
void * userdata
// Sets the userdata pointer to be passed on to the callback function by the
// given parser when parsing NestedText documents.
);

nt_parser_set_callback

nt4c/nt4c.h

Lines 176 to 182 in 490be86

static void nt_parser_set_callback(
NT_PARSER * parser,
NT_CALLBACK callback
// Sets the callback function to be called by the given parser when parsing
// NestedText documents.
);

Miscellaneous

nt_type_code

nt4c/nt4c.h

Lines 184 to 189 in 490be86

static const char * nt_type_code(
NT_TYPE node_type
// Returns the null-terminated string representation of the given node type
// enumeration.
);

Examples: ex_callback

nt_type_type

nt4c/nt4c.h

Lines 191 to 195 in 490be86

static NT_TYPE nt_type_type(
NT_TYPE node_type
// Returns the group type of the given node type or a group of node types.
);

Examples: ex_callback

License

NT4C has been authored by Erich Erstu and is released under the MIT license.

Contributors

Languages