Metadata4ing reporter for snakemake

This project is based on the Snakemake reporter plugin. It provides a custom reporter plugin for metadata4ing ontology , which can be used to extract and report metadata from Snakemake pipelines.

Installation

Install the plugin using pip:

python -m pip install git+https://github.com/izus-fokus/snakemake-report-plugin-metadata4ing

or from the source code:

poetry build
pip install --force-reinstall dist/snakemake_report_plugin_metadata4ing-1.0.0-py3-none-any.whl

Then, use it as the reporter in your Snakemake workflow:

snakemake --reporter metadata4ing ...

Output Format

The reporter creates a zip file, which contains a RO-Crate zip file which contains important files from the simulation like the input and output files for each rule. It also creates 3 files -- provenance.jsonld: Knowledge graph based on Metadata4ing ontology -- provenance.ttl: Same as provenance.jsonld graph but in turtle format. -- ro-crate-metadata.json: Research Object Crate file describing the dataset.

Reporter Parameters

`paramscript`

It is possible to pass a script as a parameter extractor. You can write your own extractor in a separate Python script and pass it to the reporter using the paramscript argument:

snakemake --reporter metadata4ing --report-metadata4ing-paramscript /Path_to_Extractor/my_extractor.py ...

Please note that, your extractor should implement the ParameterExtractorInterface.

class ParameterExtractorInterface(ABC):
    @abstractmethod
    def extract_params(self, rule_name: str, file_path: str) -> dict:
        ...

The extract_params method should return a dictionary where:

Keys are the name of the corresponding procssing step (or the rule_name).
Values another dictionary with two keys, has parameter and investigates. These two keys resembele the input and output of that processing step, respectively. Each of these entries again should be a dictionary where the varaiable name is key and values as another dictionary with fixed key names:
Values are dictionaries with the following keys:
- value: parameter value
- unit: unit of the value (if applicable). It will be mapped to the neartest QUDT unit.
- json-path: the path to this value in the output JSON
- data-type: the data type of the value

For example, a simple dictionary could liek this:

{
    "run_simulation": {
        "has parameter": {
            "length": {
                "value": 15,
                "unit": "m",
                "json-path": "/parameters.json/inputs",
                "data-type": "float"
            }
        },
        "investigates": {
            "stress": {
                "value": 1.0,
                "unit": "MPa",
                "json-path": "summary.json",
                "data-type": "float"
            }
        }
    }
}

Please note that if you provide another name (or even multiple entries as the output), it adds new nodes (as processing steps) to the give rule. These new nodes would be add as a m4i:part of to the original processing step. This would be hepful if you have a single file as the summary where it summarizes all the simulation results (input and output parameters).

For example, if the meothd is called with a rule_name like run_simulation and the returned dictionary is like:

{
    "run_simulation_1": {
        "has parameter": {
            "length": {
                "value": 15,
                "unit": "m",
                "json-path": "/parameters.json/inputs",
                "data-type": "float"
            }
        },
        "investigates": {
            "stress": {
                "value": 1.0,
                "unit": "MPa",
                "json-path": "summary.json",
                "data-type": "float"
            }
        }
    },
    "run_simulation_2": {
        "has parameter": {
            "length": {
                "value": 10,
                "unit": "m",
                "json-path": "/parameters.json/inputs",
                "data-type": "float"
            }
        },
        "investigates": {
            "stress": {
                "value": 2.0,
                "unit": "MPa",
                "json-path": "summary.json",
                "data-type": "float"
            }
        }
    }
}

{
    "first_run": {
        "has parameter": {
            "length": {
                "value": 15,
                "unit": "m",
                "json-path": "/parameters.json/inputs",
                "data-type": "float"
            }
        },
        "investigates": {
            "stress": {
                "value": 1.0,
                "unit": "MPa",
                "json-path": "summary.json",
                "data-type": "float"
            }
        }
    },
    "second_run": {
        "has parameter": {
            "length": {
                "value": 10,
                "unit": "m",
                "json-path": "/parameters.json/inputs",
                "data-type": "float"
            }
        },
        "investigates": {
            "stress": {
                "value": 2.0,
                "unit": "MPa",
                "json-path": "summary.json",
                "data-type": "float"
            }
        }
    }
}

Then in the final graph we have:

local:processing_step_* a schema:Action ;
    rdfs:label "run_simualtion" ;
    .....

local:processing_step_** a schema:Action ;
    rdfs:label "first_run" ;
    schema:isPartOf local:processing_step_* ;
    .....

local:processing_step_*** a schema:Action ;
    rdfs:label "second_run" ;
    schema:isPartOf local:processing_step_* ;   
    .....

A sample extractor is provided here.

`filename`

The name of the final ZIP file. If not provided, it defaults to ro-crate-metadata-{simulation_hash}.zip, where simulation_hash is a 16-character hash computed from the content of the graph.

snakemake --reporter metadata4ing --report-metadata4ing-filename MyFile ...

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
examples		examples
sample_extractor		sample_extractor
src/snakemake_report_plugin_metadata4ing		src/snakemake_report_plugin_metadata4ing
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CLI.txt		CLI.txt
Kratos.txt		Kratos.txt
README.md		README.md
codemeta.json		codemeta.json
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata4ing reporter for snakemake

Installation

Output Format

Reporter Parameters

`paramscript`

`filename`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Metadata4ing reporter for snakemake

Installation

Output Format

Reporter Parameters

paramscript

filename

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`paramscript`

`filename`

Packages