Skip to content

Add lineage v1beta1 schema#18

Open
jorgee wants to merge 3 commits into
mainfrom
add-lineage-schema
Open

Add lineage v1beta1 schema#18
jorgee wants to merge 3 commits into
mainfrom
add-lineage-schema

Conversation

@jorgee
Copy link
Copy Markdown

@jorgee jorgee commented May 19, 2026

Summary

  • Adds the JSON Schema (Draft 2020-12) for the lineage data model wire format emitted by nf-lineage (LinEncoder) at lineage/v1beta1/schema.json.
  • Discriminated oneOf over the six LinSerializable subtypes (FileOutput, TaskOutput, TaskRun, Workflow, WorkflowOutput, WorkflowRun), each wrapped in the runtime envelope {version, kind, spec} with version and kind as const discriminators.
  • Shared types (Checksum, DataPath, Parameter, Workflow) deduplicated via $defs/$ref; OffsetDateTime/Path modelled as string (with date-time format where applicable); model withSerializeNulls(true) accommodated via ["X", "null"] typing.
  • Class titles and descriptions sourced from the corresponding GroovyDoc in the model classes.

Tests

lineage/v1beta1/tests/:

  • Validvalid_file_output.json, valid_task_output.json, valid_task_run.json, valid_workflow_run.json. The first three are real .data.json documents from an nf-core/rnaseq run; the workflow_run is trimmed for readability.
  • Invalidinvalid_missing_envelope.json (bare spec body at root), invalid_wrong_kind.json (unknown kind), invalid_wrong_version.json (unsupported version).

Wired lineage/v1beta1 into validate.sh; updated README.md to list the lineage and module schemas.

Test plan

  • ./validate.sh passes locally — confirmed: all 7 lineage cases behave as expected (4 valid pass, 3 invalid rejected); rest of repo unaffected.
  • Additionally validated against 1485 .data.json files from a full nf-core/rnaseq lineage run — all pass.

Source of truth

The schema is generated by a Gradle task in nextflow-io/nextflow (./gradlew :nf-lineage:generateLineageSchema). When the lineage model changes, regenerate there and re-sync this repo.

Add the JSON Schema for the lineage data model wire format emitted by
nf-lineage (LinEncoder), plus tests covering valid documents for each
LinSerializable subtype and invalid cases (missing envelope, unknown
kind, unsupported version). Wire lineage/v1beta1 into validate.sh and
update README.md to list the lineage and module schemas.

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@pditommaso
Copy link
Copy Markdown
Member

Any chance to make it the schema a bit more readable (maybe just asking claude to reformat it)?

jorgee added 2 commits May 19, 2026 19:46
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Improves readability of the lineage schema by reordering each $defs
entry so title and description appear before structural keywords.

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee
Copy link
Copy Markdown
Author

jorgee commented May 19, 2026

Any chance to make it the schema a bit more readable (maybe just asking claude to reformat it)?

I have fixed some issues with arrays split into several lines and put the title and description at the beginning of the definitions.

I also tried to compress the anyOf with type: null and refs but the forced linting is expanding it again. If the applied fixes are not enough, an alternative could be adding extra definitions for these anyOf nullable statements. (such as ChecksumNullable). It could improve the readability but at the same time it will add types that are not part of the lineage model. So, not sure what is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants