[Refactor] Add new AST node types and resolve AST TODOs#99
[Refactor] Add new AST node types and resolve AST TODOs#99colinthebomb1 merged 9 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the internal AST representation to better model several SQL constructs (types/keywords, value lists, intervals, CASE expressions, and DISTINCT variants) and updates the expected AST fixtures accordingly.
Changes:
- Introduces new AST node types:
TypeNode,ListNode,IntervalNode, andCaseNode, plus newNodeTypeenum values. - Extends
SelectNodeto representDISTINCTandDISTINCT ON. - Updates
data/asts.pyexpected ASTs (and tweaks formatter tests) to use the new node types.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
core/ast/node.py |
Adds new node classes and extends SelectNode to model DISTINCT/DISTINCT ON. |
core/ast/enums.py |
Adds NodeType enum values for the new node types. |
data/asts.py |
Updates expected AST fixtures to use TypeNode, ListNode, IntervalNode, CaseNode, and DISTINCT metadata. |
tests/test_query_formatter.py |
Adjusts/relaxes formatter tests by commenting out some formatter calls and minor assertion key change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
HazelYuAhiru
left a comment
There was a problem hiding this comment.
Overall looks good to me! Just to confirm, are we now using a more specific approach (creating specific new node types when they are encountered, such as IntervalNode and WhenThenNode) rather than a more general approach for handling new queries? Also, should we add code in ast_util to visualize the new nodes as well?
baiqiushi
left a comment
There was a problem hiding this comment.
Good refactoring overall — replacing FunctionNode workarounds with proper typed AST nodes is a solid improvement. CI is green. A few issues to address:
Bug: SelectNode._distinct_on is not included in children, which breaks generic AST traversals.
Suggestion: The hardcoded allowlists in DataTypeNode and TimeUnitNode are quite restrictive and will throw ValueError on valid SQL types/units not in the set.
Co-authored-by: QIUSHI BAI <baiqiushi@gmail.com>
baiqiushi
left a comment
There was a problem hiding this comment.
Thanks a lot @colinthebomb1 for addressing the comments!
Overview
This PR adds new node types and adds functionality to existing ones to improve the constructed ASTs for test files, resolving most of the TODOs identified in PR #97.
Code Changes
DataTypeNodefor SQL data types used in CAST expressions (TEXT,DATE,INTEGER, etc.)TimeUnitNodefor SQL time units used in INTERVAL and temporal functions (DAY,SECOND,MONTH, etc.)ListNodefor value lists (e.g. the RHS ofINexpressions) — replaces raw Python listsIntervalNodeforINTERVALexpressions — replacesFunctionNode("INTERVAL", ...)WhenThenNodefor individual WHEN/THEN branches within a CASE expressionCaseNodeforCASE WHEN ... THEN ... ELSE ... END— usesWhenThenNodeinstead of raw Python tuples_distinctand_distinct_onparameters toSelectNodeforSELECT DISTINCT/DISTINCT ONNULLrepresented asLiteralNode(None)rather than a separate node typeNodeType.DATA_TYPE,TIME_UNIT,LIST,INTERVAL,CASE,WHEN_THENto enumsdata/asts.pyto use the new node types