Skip to content

Teach erl_syntax:is_literal/1 to recognize utf8 binaries#10962

Open
bjorng wants to merge 1 commit intoerlang:masterfrom
bjorng:bjorn/syntax_tools/fix-binary-literals
Open

Teach erl_syntax:is_literal/1 to recognize utf8 binaries#10962
bjorng wants to merge 1 commit intoerlang:masterfrom
bjorng:bjorn/syntax_tools/fix-binary-literals

Conversation

@bjorng
Copy link
Copy Markdown
Contributor

@bjorng bjorng commented Apr 2, 2026

A literal binary encoded as UTF8 would not be recognized as a literal by erl_syntax:is_literal/1:

1> Tree = fun(S) -> {ok,Toks,_} = erl_scan:string(S),
   {ok,[Tree]} = erl_parse:parse_exprs(Toks),
   Tree end.
#Fun<erl_eval.42.113135111>
2> erl_syntax:is_literal(Tree(~s'<<"abc">>.')).
true
3> erl_syntax:is_literal(Tree(~s'<<"abc"/utf8>>.')).
false
4> erl_syntax:is_literal(Tree(~s'~"abc".')).
false

This had consequences for merl. Consider the following module:

-module(merl_example).
-export([f/0]).
-include_lib("syntax_tools/include/merl.hrl").

f() ->
    Mod = some_module,
    Tree = ?Q([~"""
               -module('@Mod@').
               """]),
    merl:print(Tree).

Since the triple-quoted binary is encoded in UTF8, which is not recognized by erl_syntax:is_literal/1 as a literal, the merl parse transform will not do the expected substitution of @Mod@:

c(merl_example).
merl_example.erl:6:5: Warning: variable 'Mod' is unused
%    6|     Mod = some_module,
%     |     ^

{ok,merl_example}
2> merl_example:f().
-module('@Mod@').
ok

After updating erl_syntax:is_literal/1 to recognize an UTF8-encoded binary, this will work:

1> c(merl_example).
{ok,merl_example}
2> merl_example:f().
-module(some_module).
ok

@bjorng bjorng requested a review from lucioleKi April 2, 2026 05:44
@bjorng bjorng self-assigned this Apr 2, 2026
@bjorng bjorng added team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI labels Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

CT Test Results

  2 files   13 suites   3m 21s ⏱️
119 tests 115 ✅ 4 💤 0 ❌
135 runs  131 ✅ 4 💤 0 ❌

Results for commit 4a39245.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

A literal binary encoded as UTF8 would not be recognized as
a literal by `erl_syntax:is_literal/1`:

    1> Tree = fun(S) -> {ok,Toks,_} = erl_scan:string(S),
       {ok,[Tree]} = erl_parse:parse_exprs(Toks),
       Tree end.
    #Fun<erl_eval.42.113135111>
    2> erl_syntax:is_literal(Tree(~s'<<"abc">>.')).
    true
    3> erl_syntax:is_literal(Tree(~s'<<"abc"/utf8>>.')).
    false
    4> erl_syntax:is_literal(Tree(~s'~"abc".')).
    false

This had consequences for `merl`. Consider the following module:

    -module(merl_example).
    -export([f/0]).
    -include_lib("syntax_tools/include/merl.hrl").

    f() ->
        Mod = some_module,
        Tree = ?Q([~"""
                   -module('@mod@').
                   """]),
        merl:print(Tree).

Since the triple-quoted binary is encoded in UTF8, which is not
recognized by `erl_syntax:is_literal/1` as a literal, the `merl` parse
transform will not do the expected substitution of `@Mod@`:

    c(merl_example).
    merl_example.erl:6:5: Warning: variable 'Mod' is unused
    %    6|     Mod = some_module,
    %     |     ^

    {ok,merl_example}
    2> merl_example:f().
    -module('@mod@').
    ok

After updating `erl_syntax:is_literal/1` to recognize an UTF8-encoded
binary, this will work:

    1> c(merl_example).
    {ok,merl_example}
    2> merl_example:f().
    -module(some_module).
    ok
@bjorng bjorng force-pushed the bjorn/syntax_tools/fix-binary-literals branch from 333e2d2 to 4a39245 Compare April 2, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants