Skip to content

Add support for C-style block comments (/* */) in tokenizer#149098

Closed
VashuTheGreat wants to merge 1 commit into
python:mainfrom
VashuTheGreat:feature/block-comments
Closed

Add support for C-style block comments (/* */) in tokenizer#149098
VashuTheGreat wants to merge 1 commit into
python:mainfrom
VashuTheGreat:feature/block-comments

Conversation

@VashuTheGreat
Copy link
Copy Markdown

@VashuTheGreat VashuTheGreat commented Apr 28, 2026

Title

Add optional support for C-style block comments (/* ... */)

Summary

This PR introduces support for C-style block comments (/* ... */) at the tokenizer level.

The implementation:

  • Treats block comments as whitespace
  • Is fully backward-compatible
  • Does not affect existing Python syntax or semantics
  • Works seamlessly with strings and operators

Motivation

Many developers coming from C/C++/Java backgrounds expect block comments.
This change improves accessibility without impacting existing code.

Implementation Details

  • Modified tokenizer to detect and skip /* ... */ sequences
  • Proper handling of:
    • Newlines
    • EOF errors (unterminated comments)
    • Operator fallback (/)

Compatibility

  • No breaking changes
  • Existing Python code runs unchanged
  • Feature is purely additive

Tests

Manually tested with:

  • Strings containing /* */
  • Multiline comments
  • Edge cases with operators and indentation

Notes

This is a minimal tokenizer-level enhancement and does not modify parser or runtime behavior.

Additional Testing

The implementation has been validated against multiple edge cases, including:

Implementation Details

  • Implemented in the tokenizer (lexer) layer
  • Detects and skips /* ... */ sequences
  • Treats block comments as whitespace
  • Handles:
    • Multi-line comments
    • Newline tracking (lineno updates)
    • Unterminated comments (raises syntax error)
    • Safe fallback for / operator using tokenizer backup

No changes were made to:

  • Parser
  • AST generation
  • Bytecode compilation
  • Runtime execution

Compatibility

  • Fully backward-compatible
  • No impact on existing Python programs
  • No changes required for user code
  • Standard library remains unaffected

Additional Testing

The implementation has been validated against multiple edge cases, including:

- Block comments between expressions:
  ```python
  a = 10/*comment*/+20
  • Comments inside indented blocks:

    if True:
        /* comment */
        x = 10
  • Comments inside list :

    a=[1,2,/* hello */ 3,4]
  • Multiple consecutive comments:

    /* one */
    /* two */
  • Comments combined with division operator:

    x = 100 / /* comment */ 5
  • Strings containing comment-like patterns (ensuring no false positives):

    print("this is not a comment: /* hello */")
  • Unterminated comment detection:

    /* missing end

All tested scenarios behaved as expected without breaking existing syntax.


@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 28, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 28, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@ZeroIntensity
Copy link
Copy Markdown
Member

ZeroIntensity commented Apr 28, 2026

@VashuTheGreat
Copy link
Copy Markdown
Author

Thank you for the feedback and for sharing the prior discussions.

I understand that similar proposals have been considered before and rejected primarily on design grounds rather than technical limitations.

In this PR, my intention is not to challenge those decisions, but to demonstrate a minimal, backward-compatible tokenizer-level implementation and gather concrete feedback on its behavior.

The current implementation:

  • Does not interfere with existing syntax or semantics
  • Treats block comments strictly as whitespace
  • Works correctly across multiple contexts, including expressions, indentation blocks, and string literals

I have tested several edge cases (including usage inside expressions and nested code structures), and the behavior appears consistent with expectations.

That said, I fully understand that acceptance depends on language design philosophy, not just technical correctness. I would appreciate any specific concerns regarding:

  • potential ecosystem impact
  • readability or style implications
  • conflicts with existing tooling or future language evolution

If this feature is fundamentally out of scope for Python, I am happy to close the PR. Otherwise, I would be glad to refine the implementation or formalize it further (e.g., via a PEP) if there is any interest.

Thanks again for your time and review.

@ZeroIntensity
Copy link
Copy Markdown
Member

Yes, please don't do this. Getting new features, especially changes to syntax, in a major programming language is very difficult. For Python, our process is as follows:

  1. Start with a thread on our ideas forum.
  2. If your idea gains enough traction, a core developer may offer to sponsor a PEP that adds the feature.
  3. Once you draft up a PEP, you then submit it as a PR to the python/peps repository, where the PEP editors will review it.
  4. After the PEP is published, you'll go through several rounds of discussion on the PEPs forum.
  5. Finally, usually after several months or even years of discussion, you'll submit the PEP to the SC, who will make the final decision on whether to accept/reject the PEP.

This process is all documented in PEP 1.

As a side note, please review our AI policy. Responses that are fully generated by AI, such as yours, are often factually incorrect. It's much nicer to use your own words :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants