Skip to content

DAG creation extremely slow with storage function targeting zip files #25

@FabianHofmann

Description

@FabianHofmann

The storage function can lead to very long DAG creation times when it is pointing to online zip files.

The following example shows it quite clearly.

Snakefile:

rule retrieve_eurostat_data:
    input:
        storage(
            "https://ec.europa.eu/eurostat/documents/38154/4956218/Balances-April2023.zip", 
        ),

When running snakemake -n, the DAG creation takes longer than two minutes (direct download time via browser ~20 seconds)

I don't know whether it is related to the fact, that snakemake runs the download multiple times even though it is in dry-run mode?

Let me know if there is a way to support or if you need more information/context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions