eggd_pandora is a variant sharing system, which takes in variant data from a source and inputs that variant data to a variant database. It can work in two ways:
- Take a case in OpenCGA and submit the variants that have been interpreted for the proband to the DECIPHER database
- Take a csv file of interpreted variants and submit these variants to the ClinVar database (typically this csv is made by the variant workbook parser script that gets interpreted variants from an Excel workbook; an example csv is here).
--running_mode: (str) mode that eggd_pandora should run in:- "decipher" - pull from OpenCGA and push to DECIPHER
- "clinvar" - take in a variant csv and submit all the variants in it to ClinVar
- "get_clinvar_accession" - take a clinvar submission ID and retrieve the accession ID.
--decipher_api_keys: (file) DNAnexus link to JSON file containing the client key and user key to access the API for the DECIPHER project to which the variants should be submitted--opencga_config: (file) DNAnexus link to JSON file containing the user and password for OpenCGA--opencga_case_id: (string) the case ID on OpenCGA for the case that is to be submitted to DECIPHER--opencga_study_name: (string) the name of the OpenCGA study containing the case that is to be submitted to DECIPHER--decipher_submitter_id: (int) the DECIPHER account ID of the submitter
--variant_csv: (file) Variant .csv file with data that should be converted to a JSON--clinvar_api_key: (file) File containing ClinVar API key--clinvar_testing: (bool) whether or not to use the ClinVar test endpoint (True) or live endpoint (False)
--clinvar_api_key: (file) File containing ClinVar API key--submission_ids_file: (file) File containing ClinVar submission IDs. Example format
Local_ID ClinVar_Submission_ID
uid_1707758278575948778 SUB14474946
uid_1707745813424903959 SUB14471647
This app runs the script pandora.sh which can run both DECIPHER and ClinVar variant submissions.\
In "decipher" running mode, pandora.sh will run pull_from_opencga.py, which extracts the necessary information for the case to be submitted to DECIPHER and outputs it in a JSON called case_phenotype_and_variant_data.json. This JSON is then passed to push_to_decipher.py which reformats this information and submits it to DECIPHER.\
In "clinvar" running mode, pandora.sh will run pull_from_csv.py to extract the necessary information for submission to ClinVar from a csv of variant data and export a JSON for each variant. These variant JSONs are then passed to push_to_clinvar.py and the variant data is submitted to ClinVar. push_to_clinvar.py runs get_clinvar_accession.py, which queries the ClinVar API to retrieve the accession ID for that submission. The script will query the API every five mins until it retrieves an accession ID, at which point it will quit. It will try for an hour; if no accession ID is generated by ClinVar in an hour, the script will quit.\
In "get_clinvar_accession" running mode, pandora.sh will run get_clinvar_accession.py, which queries the ClinVar API to retrieve the accession ID for the submission IDs in the input file. The script will query the API every five mins until it retrieves an accession ID, at which point it will quit. It will try for an hour; if no accession ID is generated by ClinVar in an hour, the script will quit.
In DECIPHER mode:
This app creates a DECIPHER patient record for a case in OpenCGA and adds HPO phenotype terms and intepreted variants. The eggd_pandora app will create a new proband patient record in DECIPHER if the patient does not already exist, or add the variants to an existing patient. The app outputs a link to the new or updated patient record.\
In ClinVar mode:
Adds variants in the input csv to Clinvar. Outputs a tsv with the local ID and the ClinVar accession ID for each variant\
In ClinVar accession mode:
Outputs a tsv with the local ID and the ClinVar accession ID for each variant
- eggd_pandora for DECIPHER only works for SNVs, indels, insertions and deletions.
- eggd_pandora for DECIPHER assumes that patients that have their sex recorded in OpenCGA as "male" are 46XY and as "female" are 46XX.
- DECIPHER only accepts sequence variants less than 100 bp in length.
- DECIPHER only accepts variants on build GRCh38.