Create tests for internal entities & revise existing noResolve tests#334
Create tests for internal entities & revise existing noResolve tests#334boswelja wants to merge 6 commits intopdvrieze:masterfrom
Conversation
|
For the parser (not the deserializer) what happens is that if If we want to change that behaviour we need to name the property differently (as the behaviour would be different). However, if your focus is on the serializer, it is possible to handle this there, possibly even using a special custom serializer. To make this work correctly it would be needed to actually properly implement entity resolution for the parser (at least), that could then also hook in this "unresolved" behaviour for attributes. Separately, the tests should be disabled until implemented (so the code will continue to pass the test suite). It would probably also make a lot of sense to split the tests out into tests against the parser(s) and for the deserializer (so both behaviours can be tested - for all available parsers/XmlReader implementations). |
Gotcha, this is good info to have! I will update the documentation for
That makes sense, I can try to keep entity resolution at the serializer level. I haven't dug around the codebase too much yet, but I'm imagining the parser will provide some sort of tokenized entities for the serializer to reference
I see the problem, but that also means users would never be able to serialize an object that should reference entities right? Do we have a way of configuring HTML escaping policies today?
Yeah, it seems like a big undertaking 😅 I'm hoping I can split the work into parts & at least handle my original problem in #311 first.
I'll get the tests set up for serrializer & parser separately this weekend. My intention here is just to have an easy to read way of sharing my thoughts to get us on the same page. The actual work will be done in separate PRs, and will include the respective tests. |
Note that
Ideally the deserialization should work the same in either mode for the the reader. But the parser in
The default XML entities (per standard) are supported (they don't require a DTD), and character references (
Actually entity resolution is almost there for the KtXmlReader. There is an interface for a resolver, and it is called. There is just no way to specify one - there is only the one that resolves default entities. This parser also doesn't support parsing DTD's (even internal) and as such does not record entity declarations to resolve. The StaxReader has underlying support for entities (it is just the JDK default), and could use the non-resolving mode to support the same resolution API at wrapper level. As this API is effectively internal you are free to adjust it (and if possible use the same interface at the serialization policy level)
That is fine. Btw. I'll fix the handling of entities inside attributes in KtXmlReader when I have a bit of time. |
pdvrieze
left a comment
There was a problem hiding this comment.
Good updates to documentation/test. I've just suggested remarking that (as updated now) entities in attribute values are always expanded (as there would otherwise not be a way to detect this).
| * @param expandEntities Whether entity references should be emitted as distinct events, or | ||
| * treated as simple text. In both cases, entities are resolved and their value available via | ||
| * [XmlReader.text]. Accessing `text` for an unknown entity will throw an exception. If | ||
| * [expandEntities] is false, the name of the entity is available via [XmlReader.localName]. |
There was a problem hiding this comment.
Maybe mention that "entities in attribute values are always expanded. If the reference in an attribute is unrecognized this will always result in an exception" (new behaviour, old one was an empty string).
For XML wellformedness it is required that entity references reference declared entities. So throwing an exception would be valid (even though the parser is not quite meeting the requirements of even a non-validating parser.
There was a problem hiding this comment.
I pushed an update, what do you think?
There was a problem hiding this comment.
It looks good. Maybe the test could have a parameter to createReader to determine whether entities are expanded. Alternatively some of the per format subclasses could be duplicated with different factory functions.
There was a problem hiding this comment.
Sounds good - just to clarify, the parameter on createReader should behave the same as the one here right?
| ]> | ||
| <Test>&helloWorld;</Test> | ||
| """.trimIndent() | ||
| val reader = createReader(text) |
There was a problem hiding this comment.
It may be worthwhile to change this to have a parameter on whether entities are (auto)expanded.
|
Sorry for the delay, my primary dev machine is a Windows on ARM laptop, so I can't sync with native targets enabled. Something changed a while ago that prevented me from disabling them in this project too :( |
pdvrieze
left a comment
There was a problem hiding this comment.
Looks good. But can you rebase it on the dev branch. I did redo parts of the generic parser due to a significant performance/memory usage issue on JS (stringbuilders on the JS target just store an underlying string and then do string concatenation in really high-allocation ways). As part of that I have actually made the parser support parsing entity declarations in the doctype. It even supports parameter entities and the use of entities that provide tags in the entity replacement that needs to then be parsed.
As such you may find that once you have rebased the tests pass.
| * @param expandEntities Whether entity references should be emitted as distinct events, or | ||
| * treated as simple text. In both cases, entities are resolved and their value available via | ||
| * [XmlReader.text]. Accessing `text` for an unknown entity will throw an exception. If | ||
| * [expandEntities] is false, the name of the entity is available via [XmlReader.localName]. |
The purpose of this PR is to propose changes (or rather, the addition of) internal entity handling. Following TDD< I've created a couple of tests & updated some existing ones to get your thoughts on my intended solution first.
Essentially, I'd like to
expandEntitiesbehavior, such thatfalsewill never touch any entities