Update MarkerScan for compatibility with v2 of the NCBI API#15
Update MarkerScan for compatibility with v2 of the NCBI API#15prototaxites wants to merge 14 commits intomainfrom
Conversation
- use lightweight custom class reimplementing required functionality with the NCBI API - tidy up some logic with counting loops by just returning counts from the API - use a bit of pathlib to do some path stuff - re-format modified scripts with Ruff
…ersion of snakemake
Update MarkerScan for compatibility with v2 of the NCBI API
|
Hi Jim. An issue I had with the pipeline is that it downloads everything in its own space, essentially duplicating downloads we already have in
|
|
I am sure that would be doable - but I don't know any Snakemake or really understand very well the codebase here, so it would take a bit of effort to resolve |
muffato
left a comment
There was a problem hiding this comment.
- Use
subprocess.check_callrather thanos.systemto check the return code - Commands that don't need wildcards, redirections, etc, should be defined as an array rather than a string
| def get_request(self, command: str) -> requests.Response: | ||
| """ | ||
| Returns a requests response object for a given NCBI api query (command). | ||
|
|
||
| args: | ||
| command -> str: request to append to NCBI API URI | ||
| """ | ||
| if self.ncbi_api_key != "": | ||
| headers = {"Accept": "application/json", "api-key": f"{self.ncbi_api_key}"} | ||
| else: | ||
| headers = {"Accept": "application/json"} | ||
|
|
||
| response = requests.get(self.ncbi_api_uri + command, headers=headers) | ||
|
|
||
| if response.status_code != 200: | ||
| raise Exception( | ||
| f"Cannot connect to NCBI (status code '{str(response.status_code)}')'" | ||
| ) | ||
|
|
||
| return response |
There was a problem hiding this comment.
I'd suggest you copy https://github.com/sanger-tol/blobtoolkit/blob/0.9.0/bin/generate_config.py#L34-L42 into here, so that you can recover from transient network errors. (maybe adapt it for the POST request in download_genomes)
The primary function of this PR is to update MarkerScan to be compatible with v2 of the NCBI API, replacing the dependency on
ncbi-datasets-pylibwhich is deprecated.To do this, I have added a small class in
scripts/NcbiApi.pywhich replicates the required functionality using therequestslibrary. In implementing this into the various scripts, I've also made a few changes to improve efficiency, including determining the number of assemblies directly from the API rather than counting them in a loop.In order to run, currently you must set the environment variable NCBI_API_KEY in order to access the API. I don't know enough Snakemake to modify the Snakefile, but I've also added command-line arguments (
-k) to the affected Python scripts to allow this to be specified manually.In trying to get the modified version of the pipeline to run, I have also made a number of small changes:
map-hifioption used in the pipeline (are there an alternate set of env files somewhere?)