A TypeScript parser combinator library for building type-safe parsers with declarative grammar definitions.
npm install @timkendrick/syntaximport { syntax } from '@timkendrick/syntax';
// Define a simple arithmetic grammar
const parser = syntax(`
NUMBER ::= /\\d+/
PLUS ::= "+"
MULTIPLY ::= "*"
OPEN_PAREN ::= "("
CLOSE_PAREN ::= ")"
<Formula> ::= {
expression: <expression>
}
<expression> ::= <Addition> | <term>
<Addition> ::= {
left: <term>,
: PLUS,
right: <expression>
}
<term> ::= <Multiplication> | <Number>
<Multiplication> ::= {
left: <Number>,
: MULTIPLY,
right: <term>
}
<Number> ::= {
value: <- NUMBER
}
`);
// Parse input
const result = parser.parse("2+3*4");
console.log(result);The high-level DSL for defining grammars uses BNF grammar syntax as a starting point, and extends this with syntax for automatic construction of AST nodes and lists.
Define tokens using the ::= operator:
TOKEN_NAME ::= "string_literal" // Exact string match
TOKEN_NAME ::= /regex_pattern/ // Regular expression
Examples:
SEMICOLON ::= ";"
QUOTE ::= "\""
IDENTIFIER ::= /[a-zA-Z_][a-zA-Z0-9_]*/
NUMBER ::= /\d+(\.\d+)?/
WHITESPACE ::= /\s+/
Non-terminal helper abstractions can be defined using angle brackets:
<foo> ::= FOO
Note: Alias names must start with a lowercase letter to avoid collisions with AST node names
Within non-terminal rules, patterns can be chained by separating them with spaces:
<parenthesized_expression> ::= OPEN_PAREN <expression> CLOSE_PAREN
Use | for alternatives:
<expression> ::= <number> | <string> | <identifier>
Use "" for empty productions (useful for defining optional rules):
<optional_type> ::= <type> | ""
Use [item, separator] for delimited lists:
<argument_list> ::= [<expression>, COMMA]
<statement_list> ::= [<statement>, SEMICOLON]
The matched patterns will be parsed as arrays, omitting the separators from the parsed value.
Define AST nodes using angle brackets and struct syntax:
<NodeName> ::= {
property_name: <- TOKEN_NAME, // String property
other_property: <OtherNode>, // Nested node
: IGNORED_TOKEN // Consumed but not stored
}
Note: AST node names must start with a capital letter
property: TOKEN- Store token in propertyproperty: <- TOKEN- Store token's text content in property: TOKEN- Match token and store raw token object in propertyproperty: <Node>- Store nested node in property
All grammars must define at least one AST node type. The first AST node declaration will be used as the root node of the overall AST.
Creates a parser from a BNF-style grammar definition string.
Returns: SyntaxParser object with:
parse(input: string)- Parse input string into an ASTtokens- Token factory functionsnodes- Node factory functionsgrammar- Underlying grammar object
Use the syntax-codegen CLI tool to generate TypeScript declarations for your grammar files:
npx @timkendrick/syntax-codegen ./src/**/*.grammar.tsThis will locate all syntax definitions within the provided source files, and generates *.generated.d.ts files containing complete type definitions for the custom grammar.
import type {
InferSyntaxNodeType,
InferSyntaxTokenType,
InferSyntaxRootNode,
InferSyntaxNodeTypeLookup,
InferSyntaxTokenTypeLookup
} from '@timkendrick/syntax';
// Generate a parser via the high-level DSL
const parser = syntax("[...language grammar]")
// Extract AST types from parser (relies on codegen)
type MyNode = InferSyntaxNodeType<typeof parser>; // Union of all node types
type MyToken = InferSyntaxTokenType<typeof parser>; // Union of all token types
type MyRootNode = InferSyntaxRootNode<typeof parser>; // AST root node type
// Look up individual node types by name
type AstNodes = InferSyntaxNodeTypeLookup<typeof parser>;
type ExpressionNode = AstNodes["Expression"];For advanced use cases, create grammars programmatically using parser combinators directly instead of the BNF-style DSL. This provides full control over parser construction.
import { createGrammar } from '@timkendrick/syntax';
const grammar = createGrammar({
tokens: {
NUMBER: /\d+/,
PLUS: '+',
MULTIPLY: '*'
},
rules: (tokens) => ({
Expression: (rules) => choice(
rules.Addition,
rules.Number
),
Addition: (rules) => struct(
field('left', rules.Number),
field(null, token(tokens.PLUS)),
field('right', rules.Expression)
),
Number: (rules) => struct(
field('value', text(token(tokens.NUMBER)))
),
})
});The library provides low-level combinators for building grammars programmatically:
token<T>(type: T): ParserRule<Token<T>>- Match specific token typeempty(): ParserRule<null>- Always succeeds, consumes nothing
sequence<T>([...rules]): ParserRule<T>- All rules must succeed in orderchoice<T>(...rules): ParserRule<T>- First successful rule wins
optional<T>(rule): ParserRule<T | null>- Zero or one matches (never fails)zeroOrMore<T>(rule): ParserRule<T[]>- Zero or more matchesoneOrMore<T>(rule): ParserRule<T[]>- One or more matches
list<T, S>(itemRule, separatorRule): ParserRule<T[]>- Separated list parsing
map<T, U>(rule, fn): ParserRule<U>- Transform parse resulttext(rule): ParserRule<string>- Extract source text from tokensstruct(...fields): ParserRule<object>- Build structured objects
node<T>(type, rule): ParserRule<AstNode<T>>- Create AST nodefield<T>(key, rule): FieldDescriptor<T>- Named field for struct
For complex patterns that cannot be expressed using the existing parser combinators, you can create a custom rule parser.
The rule factory will be passed an object containing all the rules in the grammar, allowing this rule to invoke others (or itself, recursively).
The rule factory returns a ParserRule function, which accepts the parser state and a set of helpers for reading input tokens.
const expressionPair = (rules) => (state, helpers) => {
const { readToken, eof } = helpers;
// Keep track of the parser state and any consumed tokens
let currentState = state;
const tokens = [];
// Parse the left expression
const leftResult = rules.expression(currentState, helpers);
if (leftResult.type === ResultType.Error) return leftResult;
const { value: leftValue, tokens: leftTokens, ...leftState } = leftResult.value;
currentState = { ...leftState, tokens: [...currentState.tokens, ...leftTokens] };
// Parse the separator token
const separatorToken = readToken(currentState.index);
if (separatorToken === null || separatorToken.type !== 'COMMA') {
return {
type: ResultType.Error,
error: new ParseError('Expected comma separator', input, separatorToken?.location ?? eof)
};
}
currentState.currentIndex++;
tokens.push(separatorToken);
// Parse the right expression
const rightResult = rules.expression(currentState, helpers);
if (rightResult.type === ResultType.Error) return rightResult;
const { value: rightValue, tokens: rightTokens, ...rightState } = rightResult.value;
currentState = { ...rightState, tokens: [...currentState.tokens, ...rightTokens] };
// Return the collected value and latest parser state
return {
type: ResultType.Success,
value: {
...currentState,
value: { left: leftValue, right: rightValue },
tokens
}
};
};See ARCHITECTURE.md for more details on how parser rules are implemented.
For tokens that cannot be expressed as strings or regular expressions, you can create a custom parser:
import { createTokenParser } from '@timkendrick/syntax';
const customToken = createTokenParser((state) => {
// Custom parsing logic
// Return new state or null
});import { extendGrammar } from '@timkendrick/syntax';
const extendedGrammar = extendGrammar(baseGrammar, {
NewNodeType: (rules) => struct(
field('property', token('TOKEN'))
)
});The parser provides detailed error information:
try {
const result = parser.parse(input);
console.log(result);
} catch (error) {
if (error instanceof ParserRuleError) {
console.log(`Parse error at position ${error.location.start}: ${error.message}`);
}
}- The parser uses recursive descent with backtracking
- Choice combinators try alternatives in order
- Consider left-recursion elimination for better performance (and avoiding infinite loops)
- Use specific token types early in choice alternatives
src/grammar.ts: Core types and grammar creationsrc/parser.ts: Parsing engine and result handlingsrc/combinators/: Individual combinator implementationssrc/syntax.ts: Public high-level API and DSL
For detailed information about the library's internal design and implementation, see ARCHITECTURE.md.