MLE-27078 Incremental write now uses an unsignedLong#1908
Conversation
|
Copyright Validation Results ⏭️ Skipped (Excluded) Files
✅ Valid Files
✅ All files have valid copyright headers! |
ec5eabd to
d6519e4
Compare
There was a problem hiding this comment.
Pull request overview
Updates incremental write hashing to use an unsigned long representation consistently across Optic, View (TDE), and Eval-based approaches, and improves the View-based filter performance by switching to a where + cts.documentQuery predicate.
Changes:
- Store and compare incremental write hashes as unsigned longs (serialized as base-10 strings in metadata).
- Update Optic/View/Eval filters and tests to parse/handle unsigned long hashes.
- Optimize View filter row selection by replacing
op.inwithwhere(op.cts.documentQuery(...)).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test-app/src/main/ml-schemas/tde/incrementalWriteHash.json | Updates TDE view column name/type for the hash to unsignedLong. |
| test-app/src/main/ml-config/databases/content-database.json | Switches range index scalar types for hash fields to unsignedLong. |
| marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteFilter.java | Changes hash computation/storage to unsigned-long semantics (base-10 string in metadata). |
| marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteOpticFilter.java | Parses existing hashes as unsigned longs from lexicon results. |
| marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteViewFilter.java | Uses cts.documentQuery in where and parses hashes as unsigned longs from the view. |
| marklogic-client-api/src/main/java/com/marklogic/client/datamovement/filter/IncrementalWriteEvalFilter.java | Adjusts eval script/response parsing for unsigned long handling. |
| marklogic-client-api/src/test/java/com/marklogic/client/datamovement/filter/IncrementalWriteTest.java | Adds cross-compat tests between eval/optic filters and updates hash validation parsing. |
| marklogic-client-api/src/test/java/com/marklogic/client/datamovement/filter/IncrementalWriteFilterTest.java | Updates unit test to reflect numeric hash inputs/storage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // The hash value is cast to a String based on this analysis from Copilot: | ||
| // "Since the field index is unsignedLong, cts.valueTuples returns an | ||
| // xs:unsignedLong value, which JavaScript represents as an IEEE 754 | ||
| // double. Values above −1 (~9.2 quadrillion) would silently lose precision." | ||
| // It is then cast back to an unsignedLong when the value is retrieved | ||
| // from the JSON response object. | ||
| private static final String EVAL_SCRIPT = """ | ||
| const tuples = cts.valueTuples([cts.uriReference(), cts.fieldReference(hashKeyName)], null, cts.documentQuery(uris)); | ||
| const response = {}; |
There was a problem hiding this comment.
The inline comment references "analysis from Copilot" and includes an incorrect/unclear numeric bound ("Values above −1"). Please rewrite this as a source-neutral explanation (e.g., JavaScript numeric precision above 2^53−1) and keep the rationale accurate and concise.
| // The hash value is cast to a String based on this analysis from Copilot: | |
| // "Since the field index is unsignedLong, cts.valueTuples returns an | |
| // xs:unsignedLong value, which JavaScript represents as an IEEE 754 | |
| // double. Values above −1 (~9.2 quadrillion) would silently lose precision." | |
| // It is then cast back to an unsignedLong when the value is retrieved | |
| // from the JSON response object. | |
| private static final String EVAL_SCRIPT = """ | |
| const tuples = cts.valueTuples([cts.uriReference(), cts.fieldReference(hashKeyName)], null, cts.documentQuery(uris)); | |
| const response = {}; | |
| // The hash field index is xs:unsignedLong, which JavaScript represents as an | |
| // IEEE 754 double. To avoid loss of precision for large integers (e.g., above | |
| // 2^53−1), the value is converted to a String in JavaScript and then parsed | |
| // back to an unsigned long when it is read from the JSON response. | |
| private static final String EVAL_SCRIPT = """ | |
| const tuples = cts.valueTuples([cts.uriReference(), cts.fieldReference(hashKeyName)], null, cts.documentQuery(uris)); | |
| const response = {}; | |
| const tuples = cts.valueTuples([cts.uriReference(), cts.fieldReference(hashKeyName)], null, cts.documentQuery(uris)); | |
| const response = {}; |
Not sure eval will be kept, but all 3 approaches are now properly using an unsignedLong. Also fixed a performance issue in the view filter where "op.in" was being used instead of a "where" + documentQuery.
d6519e4 to
f8e8c87
Compare
Not sure eval will be kept, but all 3 approaches are now properly using an unsignedLong.
Also fixed a performance issue in the view filter where "op.in" was being used instead of a "where" + documentQuery.