Conversation
There was a problem hiding this comment.
Thank you @bjorng and @garazdawi! The proposal is well written and building on top of records does a beautiful job of keeping the language changes minimal. Removing the undefined as default for fields is another great change, as well as having the optional of having "required fields" (which must be given on creation).
My only criticism to the proposal is that it doesn't discuss maps at all. For the last ~10 years, we have been using maps as replacements for records, and while native records aim to ease the migration between "records -> native records", there are no paths between "maps -> native records". Perhaps this is intentional, you are not expecting anyone to migrate between maps and native records, but then we have to deal with the side-effect that, as native records improve, a lot of code (specially in Elixir) will be forever suboptimal.
If I am allowed to spitball a bit, I'd love if "native records" were implemented as "named maps" or "record maps" behind the scenes. The proposal would stay roughly the same, the only differences are that:
-
Accessing a field in any record could use the existing map syntax and just work:
Expr#Field
-
Updating any record could use the existing map syntax and just work:
Expr#{Field1=Expr1, ..., FieldN=ExprN}
You could also support the proposed #_{...} syntax and it would additionally check it is a record and not a "plain map". is_map/1 would obviously return true for records but you could do a more precise check with is_record.
Regarding the key-destructive map operations, such as maps:put/3 or maps:remove/2, I'd make them fail unless you "untag" the map (or you could allow them to succeed but the tag would be removed at the end, which I find too subtle).
Overall, the goal would be to effectively unify records and maps, removing the decade-old question "records or maps". This would also provide a path for Erlang and Elixir to unify behind the same constructs, so I'd love to hear your opinions.
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Perhaps worth adding more context to the errors, such as {badrecord, {default_missing, key}}. I know format_error can be used (and have additional side-band information attached), but there may be a benefits in being upfront about it too?
This would also allow distinguishing from other errors below, such as {badrecord, not_found}, etc.
There was a problem hiding this comment.
More context definitely better, even if it gives only one missing key out of several.
There was a problem hiding this comment.
Yes please! I think more fine-grained errors would would be greatly beneficial for newcomers especially.
There was a problem hiding this comment.
I've changed the error reason to {novalue,Field} (where novalue is a new error reason).
There was a problem hiding this comment.
Accessing a field in any record could use the existing map syntax and just work:
Expr#Field
In erlang there is no existing map syntax for accessing a single field today,
that is still missing, was never implemented I guess because maps can have arbitrary terms as keys.
I guess it could be added for "literal" fields?
And it is really annoying to have to use the add record_name when accessing fields in records today, e.g. Rec#record_name.field
There was a problem hiding this comment.
@dgud yes! I keep forgetting that it was part of EEP but not implemented.
And it is really annoying to have to use the add record_name when accessing fields in records today, e.g. Rec#record_name.field
I am assuming that, as long as you pattern match on the named record when the variable is defined, the compiler would be able to optimize "unamed" field access and updates?
There was a problem hiding this comment.
I am assuming that, as long as you pattern match on the named record when the variable is defined, the compiler would be able to optimize "unamed" field access and updates?
Yes, that should be possible.
There was a problem hiding this comment.
See the discussion in erlang/otp#9174 on a possible x[y] notation - @michalmuskala suggested M#[Field].
essen
left a comment
There was a problem hiding this comment.
Great work!
Would it make sense to have finer grained control on who can do what? For example restrict creation to the defining module while still providing access to fields; or read-only fields outside the defining module. Probably doesn't matter for 95% of use cases I reckon.
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
More context definitely better, even if it gives only one missing key out of several.
a4bdeec to
f5b4834
Compare
We didn't really think about migrations from maps. Your suggestion seems reasonable. We will discuss this in the OTP team. |
There was a problem hiding this comment.
What a cool EEP! Thank you! I've a handful of questions
Language interop
One aspect of this which the document does not touch on that I think could be highly impactful for the BEAM ecosystem is language interop. Today each major language has a different preference for fixed key-value data structures:
- Erlang: maps and records
- Elixir: maps with a special field containing a module atom
- Gleam: records
This creates some degree of friction when calling code from other BEAM languages. If they all were to largely use native records then this friction go away, making interop between languages would be a much better experience.
I'm not immediately seeing any problems for Gleam as we use records there, but it seems like it would be more challenging for Elixir there we use maps.
Adoption within existing OTP modules
It seems that in an ideal world that native records would be the ubiquitous data structure, once they are available. Would existing Erlang/OTP modules be updated to work with them?
Functions that expect classic records, tagged tuples, and maps could have new function clauses added to handle native records in a backwards compatible way, unless I am mistaken. It seems that due to not being compatible with maps or tuples there would be very little ability to update existing functions to return native records.
Is there something we could do here? Or is the expectation that Erlang/OTP code will use different data structures depending on how old it is?
Construction syntax
Is the only difference between the native and classic record syntaxes the # character in the name of the definition? This seems like it will be very error prone, and also hard for less familiar people to debug as the definition will be accepted by the Erlang compiler, but their attempts to construct the record will fail.
Thank you all!
eeps/eep-0079.md
Outdated
| If no value is provided for a field, and there is a default field | ||
| value in the native record definition, the default value is used. If | ||
| no value is provided for a field and there is no default field value | ||
| then a native record creation fails with a `badrecord` error. |
There was a problem hiding this comment.
Yes please! I think more fine-grained errors would would be greatly beneficial for newcomers especially.
|
|
||
| ### Anonymous access of native records | ||
|
|
||
| The following syntax allows accessing field `Field` in any record: |
There was a problem hiding this comment.
Is there a performance difference when using this syntax compared to using the non-anonymous syntax?
Are there situations in which one cannot use the anonymous syntax?
There was a problem hiding this comment.
Is there a performance difference when using this syntax compared to using the non-anonymous syntax?
Yes, certain optimizations we are thinking about implementing will not be applied when using anonymous syntax.
Are there situations in which one cannot use the anonymous syntax?
No.
There was a problem hiding this comment.
@bjorng I assume the optimizations could still be applied for the anonymous syntax if you matched on the full syntax elsewhere before (basically when beam_types will have it narrowed down to a given the record type)?
There was a problem hiding this comment.
Yes, the compiler can still do optimizations if previous matching has narrowed down the type. (When we'll actually implement type-analysis for native records, which we haven't done yet.)
The optimizations I was thinking of was the optimizations in the runtime system planned for OTP 30, which I've outlined under "Performance characteristics". Anonymous records will be handled similarly to small maps, which will still be quite efficient.
eeps/eep-0079.md
Outdated
|
|
||
| 1. Creation of native records cannot be done in guards | ||
|
|
||
| 2. `element/2` will not accept native records. |
There was a problem hiding this comment.
Tuple records can be constructed with field names (using the #blah{a=1,b=2} syntax) or without field names (using the {blah, 1, 2}) syntax. Do native records have a field-nameless #blah{1, 2} construction syntax?
There was a problem hiding this comment.
No.
The nameless syntax for tuple records only works because traditional records are implemented using tuples.
eeps/eep-0079.md
Outdated
| (tuples). So, their performance characteristics should align with maps | ||
| (with insignificant overhead for runtime validation). Additionally, | ||
| given that native-records are more specialized versions of maps (with | ||
| all keys being atoms), there is potential for optimizations. |
There was a problem hiding this comment.
Is the expectation that native records will have inferior performance to tuple records?
There was a problem hiding this comment.
We don't know yet, and it also depends on exactly which operations we are talking about. For example, updating multiple elements in a small map (no more than 32 elements) is quite efficient because it can be done in one pass, and similarly matching out multiple elements is also efficient because it can be done in one pass. Accessing one element at the time is more expensive for a map than for a tuple record.
In the first implementation to be released, native records will be implemented similar to how small maps are implemented, with similar performance. We have an idea for an optimization that would make it faster than maps.
Yes. If a function returns a tuple record, all we can do is to create a new function that returns a native record.
To some extent, yes. I think that is already the case.
Agreed. |
|
Thank you @bjorng. To confirm: all fields must have names? There are no positional fields? Can one define a record which does not have any fields? -record #none{}.
-record #some{value = term()}.
-type option() :: #none{} | #some{} |
Yes.
Yes. |
We are trying to limit the scope here to try to get it in to 29. Hmm, this would require some additional syntax, 'private' | 'protected' | 'public' (borrowing from ets access types), personally |
|
@lpil wrote:
It is a different format, not only the # character, it is like record creation: -record(foo, {a, b, c}).
%% vs.
-record #foo{a, b, c}.So there is also a comma and parentheses that are different. |
|
I don't see the concern raised that we'll now have record, native records and maps. I know they all serve different purposes, but the differences are slight and new users definitely get confused about which to use. I don't think it should be a blocker to adding a new data structure that can help improve devex or performance when choosing the right one but it does worry me that this can't replace records. Related to not replacing records, I think |
Right it's something that likely doesn't have to be done at runtime, but it is important to consider at least in the documentation part, as internal fields definitely shouldn't be documented. Read-only fields can be marked as such easily in the text. But this depends on what the documentation will look like I suppose. |
997a60f to
d2ab33d
Compare
|
Update: there can now be two distinct errors when |
|
@tsloughter wrote:
"Not replacing records" is really phrased: not "Replacing all tuple record usage scenarios.", and "Replace most tuple-record usages without having to update anything but the declaration.". With that in mind I think overloading -record may be more good than bad. |
|
@RaimoNiskanen unless it outright replaces, 100%, named tuple records I think using I take it there are a million reasons At first I wanted I can concede there is no good alternative to |
|
I do share the concern that the similar syntax is confusing, and the difference being 2 characters of punctuation makes it challenging to differentiate between the two when reviewing code.
-struct #state{
values = [] :: list(number()),
avg = 0.0 :: float()
}.Elixir does already have "defstruct", though that construct does seem very similar to this proposal in design and purpose. |
Question: Behavior when adding a field across distributed nodesConsider this scenario: Node A (old code): -record #state{count = 0, name = "default"}.
State = #state{count = 5, name = "server1"}Node B (new code): -record #state{count = 0, name = "default", version = 1}.When Node A sends a
Issue: This seems to check against the definition on Node B, not the fields in the value from Node A. It's unclear whether the update would:
|
Question: Field renaming across nodesConsider this scenario where a field is renamed: Node A (old code): -record #user{id, username, city}.
User = #user{id = 1, username = "alice", city = "Stockholm"}Node B (new code - username renamed to name): -record #user{id, name, city}.When Node A sends
Issue: Field renames appear to be breaking changes in distributed systems. The EEP states:
This means reading works purely based on the value's fields, not the definition. So:
|
Question: Removing a fieldConsider this scenario: Node A (old code): -record #config{host, port, legacy_timeout}.
Config = #config{host = "localhost", port = 8080, legacy_timeout = 5000}Node B (new code - legacy_timeout removed): -record #config{host, port}.When Node A sends
Issue: The interaction between field access (which ignores definition) and pattern matching (which may or may not check definition) is unclear. |
eeps/eep-0079.md
Outdated
|
|
||
| but guaranteed to always work and be more efficient. | ||
|
|
||
| TODO: What should the name of the BIF be? |
There was a problem hiding this comment.
Something like is_subtype or is_compatible, maybe.
There was a problem hiding this comment.
I think subtype and compatible would make me think the type of the fields are the same and are also checked. Perhaps has_record_fields but there are no guards starting with has_. I am not sure how to phrase it using is_.
| Note, however, that it is possible to match on the name of a | ||
| non-exported record. Thus, if the `match_name/1` function in the | ||
| following example is called with an instance of record `r` defined in | ||
| `some_module`, it will succeed even if the record is not exported: |
There was a problem hiding this comment.
What is the motivation for this matching on non exported records by name?
There was a problem hiding this comment.
To have the possibility to use them in an api as socket for example where you want to be able to match it but not the content. Today with opauge types, you have to wrap them in a two tuple {socket, OpagueStaff}.
4024026 to
dc31588
Compare
94aa076 to
f73b48b
Compare
eeps/eep-0079.md
Outdated
| of the module will be used. There is no way to create a native record | ||
| based of an old code generation. |
There was a problem hiding this comment.
Does this mean that this code:
#user{name = ~"John", city = ~"Stockholm"}when run in old code, would create a record using the new definition?
There was a problem hiding this comment.
Is that really a good idea? That would mean that a process running old code could create a record without a certain field and then crash when trying to match on that field. IMO local creation of native records should capture the definition of the running module and not the latest running module.
There was a problem hiding this comment.
That would mean that a process running old code could create a record without a certain field and then crash when trying to match on that field
That would be pretty surprising behavior and make upgrades difficult. Old code shouldn't have to account for what the new code will look like. Only new code should handle what old code produces.
There was a problem hiding this comment.
Is that really a good idea?
No, probably not.
That would mean that a process running old code could create a record without a certain field and then crash when trying to match on that field.
I'm not sure that deleting a field that is used in the previous version of the code is a good idea anyway but I agree the currently implemented behavior can complicate code update.
IMO local creation of native records should capture the definition of the running module and not the latest running module.
Yes, that seems to be safer. I think I could probably do that that change in time for RC2.
| no value is provided for a field and there is no default field value, | ||
| a native record creation fails with a `{novalue,FieldName}` error. | ||
|
|
||
| In OTP 29, the default value can only be an expression that can be |
There was a problem hiding this comment.
Is the goal to have arbitrary expression later on? IMO it should be a goal.
There was a problem hiding this comment.
We haven't decided yet. Supporting arbitrary expressions is tricky because record creating will have to execute expression in another module.
There was a problem hiding this comment.
I've always imagined record creation to be similar to a function call in how it works. That is #foo:bar{} calls the foo module to create a bar record. Using this mechanism you can kind of also build "opaque" fields as an exported record could have defaults that are non-exported records.
-export_record([bar]).
-record #bar{ private_data = #private{} }.
-record #private{ for_my_eyes_only = ~"hello" }.There was a problem hiding this comment.
One argument against allowing arbitrary expressions is that it would make it impossible to create a native record with such default values within a NIF.
| * If there is no corresponding native-record definition, creation | ||
| fails with a `{badrecord,Name}` exception. | ||
|
|
||
| * If the native-record definition is not visible at the call site (it |
There was a problem hiding this comment.
I assume that this would include this example:
-module(foo).
-record #bar{baz = 1}.
foo() -> ?MODULE:bar{}.That is a creation of a non-exported record using the global syntax?
There was a problem hiding this comment.
In the current implementation, this creation succeeds. Currently, specifying the ?MODULE is exactly equivalent to omitting the module name. Should we treat them differently, and if so why?
I will add your example to EEP so that we can discuss at the next OTB meeting.
There was a problem hiding this comment.
Should we treat them differently, and if so why?
Because it aligns with how function calls work, this making it more idiomatic Erlang IMO.
There was a problem hiding this comment.
Yes, I will change the EEP to make native-records align with function calls.
eeps/eep-0079.md
Outdated
|
|
||
| * If the native record create expression references the field FN that | ||
| is not defined (in the native-record definition), creation fails | ||
| with a `{badfield,FN}` exception. |
There was a problem hiding this comment.
Should the record name also be part of this exception?
There was a problem hiding this comment.
IMO, I think we should keep the exceptions simple but I'll add this as question to the EEP.
eeps/eep-0079.md
Outdated
| * The native-record value's captured definition states it as | ||
| from a different module than the calling module, and it was | ||
| *not exported* a the time of capture. |
There was a problem hiding this comment.
Can you use ?MODULE:Name.Field on a non-exported record in the same module?
There was a problem hiding this comment.
Hmm... That's a good question. You can't do that with a non-exported function. Why should the behave differently?
| }. | ||
| ``` | ||
|
|
||
| `Name` and `FieldN` need to be atoms. `Name` is allowed to be used without quotes |
There was a problem hiding this comment.
Maybe list some examples of the things that you do need to quote, i.e. _, 1, = etc?
4184479 to
7df600b
Compare
|
I've pushed an updated version where I've adopted the view that native-records operations behave similar to calls (as suggested by @garazdawi).
I've implemented most of these changes but will not make a PR until the EEP has been approved by the OTB. |
| -record #bar{baz = 1}. | ||
| -export([foo/0]). | ||
| foo() -> #?MODULE:bar{}. | ||
| ``` |
There was a problem hiding this comment.
bar record is not exported, but bar is defined in its defining module,
https://github.com/erlang/eep/pull/81/changes#diff-6fbdcf9ebdcf73842e54c4a8c9e012724c2b3cb22b6816d07e1f65220501d078R139
and when the EEP says
foo/0function will fail
is this a linter error, a runtime error if module qux calls foo:foo/0 or an error if module foo calls internally function foo/0?
There was a problem hiding this comment.
I will clarify that it will fail at runtime.
eeps/eep-0079.md
Outdated
| External term format is extended to support serialization of | ||
| native-record values. |
There was a problem hiding this comment.
Should this section be expanded a bit on how the scheme will look like?
In particular given efficiency is a big part of the feature consideration, IMO compactness/efficiency of the encoding on-the-wire for distributed programming scenarios should be considered
There was a problem hiding this comment.
I will add some details about the encoding.
1caab1a to
a13fcfc
Compare
|
I've pushed a version updated after the meeting of the OTP Technical Board. Most of the changes are clarifications of existing content, but OTB also decided that the I've also added a brief description of the external format for native records. The parts of the EEP on which OTB didn't reach a decision has been removed and saved in a branch (named in the EEP). |
a13fcfc to
681e881
Compare
|
While implementing the things that had changed from the EEP, I realized that all of "Compile-time checking of records" is not implementable. Trying to use a tuple record lacking a definition has always been a compile error. If we don't have a record definition, we can't know whether a tuple record or native record was intended, so that must be an error. I've pushed a fixup commit, which I hope will be the last change to this EEP. I hope to merge this tomorrow. |
|
Hi! I noticed the Backward Compatibility section doesn't mention the impact on parse transforms and tools using abstract forms. Native records will introduce new AST node types. Existing parse transforms that traverse the abstract forms may fail on code containing native record syntax (e.g., function clause errors on unknown nodes). Other EEPs that introduce new syntax typically address this — for example, EEP-0037 notes "Existing parse transforms might well fail on code containing the new form, but would work unchanged on code that does not" and EEP-0078 states "parse transforms or any tools using abstract forms, would need to be updated." Should a similar note be added to this EEP's Backward Compatibility section? |
|
Thanks, I've added a similar note. |
| the need to define all fields in a record, it is very hard to accidentally | ||
| create a record with one million elements. | ||
|
|
||
| ### `-import_record()` |
There was a problem hiding this comment.
One incompatibility of sorts that I've found is that -record(....). can be declared anywhere in the code, while -import_record(...) can only be declared before any function.
There exists code that for whatever reason includes file.hrl in in the middle of the code instead of at the top, so if you want to replace file_info with a native-record then those files break.
Moving the include up is of course a trivial fix, but depending on how many such you have it could be a lot of work.
There was a problem hiding this comment.
Yes, it should be allowed to be anywhere in the file. I've added a paragraph about that, and will post a message to the OTP team to see whether there any objections.
ee96a7c to
47f3304
Compare
No description provided.