Skip to content

Fix typedef/variable name conflict parsing error#219

Open
Copilot wants to merge 9 commits intodevelopfrom
copilot/fix-typedef-variable-name-conflict
Open

Fix typedef/variable name conflict parsing error#219
Copilot wants to merge 9 commits intodevelopfrom
copilot/fix-typedef-variable-name-conflict

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 12, 2026

  • Create test file test/small1/typedef_varname.c for the typedef/variable name conflict
  • Add test to test/testcil.pl
  • Fix init_declarator rule in src/frontc/cparser.mly to call add_identifier via declarator_init_start helper rule between declarator and EQ
  • Fix comment style in cparser.mly to use /* (* TEXT *) */ as per file conventions
  • Fix add_identifier/add_type timing: declarator_no_init immediately registers each declarator's name — add_type for typedef declarators, add_identifier for non-typedef declarators (C11 6.2.1.7: scope begins just after completion of declarator); declarator_init_start registers initialized declarators before their =; doDeclaration now only creates AST nodes with no lexer management
  • Add test/small1/typedef_varname_noinit.c to test multiple initialized declarators: verifies that the second declarator's initializer can reference the first declared variable (which shadows a typedef), exercising declarator_init_start
  • Fix CI regression in test/scope2: declarator_no_init calls add_type (not add_identifier) for typedef declarators, so names are never added to the context tracking list and pop_context cannot remove the wrong binding
  • Add test for NO_INIT middle declarator in typedef_varname.c: mytype a = 1, mytype, b = sizeof mytypesizeof mytype (without parens) is only valid when mytype is a variable, proving correct C11 6.2.1.7 scoping for NO_INIT declarators in mixed declaration lists
Original prompt

This section details on the original issue you should resolve

<issue_title>typedef and variable name conflict parsing error</issue_title>
<issue_description>### Test case from goblint/bench#38 (comment)

#include <stddef.h>
#include <stdint.h>

typedef struct raxNode {
    uint32_t iskey:1;
    uint32_t isnull:1;
    uint32_t iscompr:1;
    uint32_t size:29;
    unsigned char data[];
} raxNode;

typedef struct rax {
    raxNode *head;
    uint64_t numele;
    uint64_t numnodes;
} rax;

void *malloc(size_t size);

int main() {
  rax *rax = malloc(sizeof(*rax)); // PARSING ERROR HERE
  return 0;
}

This is completely fine by GCC/Clang and rax inside the sizeof should refer to the local variable rax that has been declared and is being initialized.

Cause

The parser errors because that inner rax is a NAMED_TYPE token from the lexer instead of IDENT. This is due to the lexer hack being buggy: ours only adds a local variable to the lexer hack table after the entire declaration, but actually the initializer in the declaration may already refer to the local variable. Thus at this point, the lexer hack table considers rax to refer to the typedef instead.

Specification

It's extremely difficult to find specification for this, but I believe it might be 6.2.1.7 in N1570:

Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. Each enumeration constant has scope that begins just after the appearance of its defining enumerator in an enumerator list. Any other identifier has scope that begins just after the completion of its declarator.

The declaractor is the *rax part of the declaration in the above example, so rax needs to be added to the lexer hack table as identifier between the declarator and initializer, not after the whole thing.</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

… register identifier before parsing initializer

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix typedef and variable name conflict parsing error Fix typedef/variable name conflict parsing error Mar 12, 2026
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
@michael-schwarz michael-schwarz marked this pull request as ready for review March 13, 2026 09:00
@michael-schwarz michael-schwarz requested a review from sim642 March 13, 2026 09:00
…initialized declarators

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… its declarator via helper rules

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
…t_start; update test

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… typedef guard

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
@michael-schwarz
Copy link
Copy Markdown
Member

@sim642: This looks reasonable now!

Comment on lines +1053 to +1054
of its declarator). For typedef declarations (is_typedef_decl = true), the name
is registered later via add_type in doDeclaration. *) */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at this in detail but introducing even more global state into the parser is a bit worrying.

Is typedef scoping actually defined to work differently like this by the standard? Or maybe it would also be the right thing to do the add_type stuff earlier than at the very end?
Perhaps one typedef could also define multiple types with commas (I haven't looked at the standard grammar) and then it would also matter there (and even be more correct)?

I might be optimistically hoping for more consistency and less special-casing than C actually has...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Look up the details of this in the standard and either implement Simmo's suggestion or argue why it is not a good idea.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in aa40855.

C11 6.2.1.7: "Any other identifier has scope that begins just after the completion of its declarator." This applies uniformly to all declarations — typedef names included. So moving add_type earlier (per-declarator) is both correct and more standard-conformant.

What changed: declarator_no_init now handles both cases with a single branch:

  • add_type n when is_typedef_decl = true (typedef declarators)
  • add_identifier n when is_typedef_decl = false (variable declarators)

doDeclaration now has zero lexer management — it only builds AST nodes (TYPEDEF / ONLYTYPEDEF / DECDEF).

On the is_typedef_decl global flag: it's still necessary. Yacc reductions are bottom-up — by the time declarator_no_init reduces, the TYPEDEF keyword has already been consumed (in decl_spec_list_common). There's no way to pass this context through grammar parameters in standard yacc without a mid-rule side effect, which is what is_typedef_decl provides. The flag is set by the TYPEDEF rule and cleared by doDeclaration, giving it a well-defined lifetime that spans exactly the declarator list of a typedef declaration.

On typedef commas: per the C grammar, typedef int T, *TP; uses the same init_declarator_listinit_declarator path as variables. With this change, T and TP each get add_type called immediately after their declarator — so a hypothetical typedef int T, T_alias; would see T as already registered when T_alias is parsed. This is C11-correct behaviour.

…rator, none in doDeclaration

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Agent-Logs-Url: https://github.com/goblint/cil/sessions/459679d3-44f6-4f0b-8a30-92252ab7fa0d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typedef and variable name conflict parsing error

3 participants