Skip to content

Avoid changing timestamps in outputs #2065

@donkirkby

Description

@donkirkby

Steps to reproduce:

  1. Write a container with an app that generates a PDF with content from the inputs. For example, something similar to the basic examples of the Python libraries, reportlab and matplotlib.
  2. Upload the container into Kive, and launch a run.
  3. When the run is finished, rerun it.

Expected behaviour: the outputs should match.

Actual behaviour: PDF outputs usually don't match.

Analysis

Most libraries write PDF files with a timestamp in the file content. That means that the exact same data inputs won't generate the exact same outputs, if the two runs happened at different times.

Good news, though. It looks like both matplotlib and reportlab support the SOURCE_DATE_EPOCH environment variable that is intended to help make outputs reproducible. If we took the timestamp of the container and passed it to each run in the SOURCE_DATE_EPOCH environment variable, that would probably avoid the problems with PDFs not matching after reruns.

Another option is to take the latest date from the container and all the input datasets. I think that would be easier to understand, but I suspect that would be unreliable. For example, if we rerun two nested runs where the output of one is the input of the other, and we have to recreate that output, then it would have a different timestamp from the first output.

If a pipeline wants to generate PDFs with the current date as the creation timestamp, it could unset the SOURCE_DATE_EPOCH environment variable before calling the library code.

  • Test these two libraries to make sure they support the environment variable.
  • Set the environment variable when launching Singularity.
  • Update the pipeline developer documentation to explain what the environment variable does, and how to disable it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions