Skip to content

Commit bd0a02e

Browse files
author
Jon Palmer
committed
Add annorefine integration for RNA-seq based gene prediction and UTR refinement
Major Features: - Integrate annorefine library for RNA-seq alignment processing - Add --bam and --bam-library parameters to predict command - Generate hints from BAM files for ab initio gene predictors - Refine UTR models using RNA-seq coverage data Predict Module Changes: - Add BAM file validation and library type checking - Generate BAM-derived hints using annorefine.bam2hints() - Merge BAM hints with protein/transcript alignment hints - Pass unified hints to Augustus and GeneMark predictors - Apply UTR refinement to consensus gene models using annorefine.refine() - Add contig name mapping for BAM processing (inv_contig_map) - Add --min-intron parameter (default: 11bp) - Save hints files to predict_misc for debugging Utilities Changes: - Add sort_features parameter to filter_and_write_gff3() for gfftk compatibility - Implement sort_gene_features() to sort CDS/exon features by genomic position - Change memory logging from info to debug level to reduce verbosity - Improve subprocess logging for cleaner output Testing: - Add comprehensive database URL validation test suite - Test all URLs in downloads.json without downloading large files - Add weekly GitHub Actions workflow to monitor URL availability - Auto-create issues when URLs become unreachable Dependencies: - Add annorefine>=2026.2.9 dependency - Update gfftk to >=26.2.12 (includes CDS phase sorting fix) - Update buscolite to >=26.1.26 - Update pyhmmer to >=0.12.0 - Update Python requirement to >=3.9.0,<3.14 Database URLs: - Update dbCAN URLs to V14 (new server location) - Fix dbCAN URL paths for V12 and V11 resources Other Changes: - Add annorefine version to system info logging - Update pyhmmer API calls (remove .decode() for string fields) - Add debug files to .gitignore - Update version to 26.2.12 This release enables RNA-seq guided gene prediction and UTR refinement, significantly improving annotation quality when RNA-seq data is available.
1 parent 20b0286 commit bd0a02e

13 files changed

Lines changed: 748 additions & 41 deletions

File tree

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
name: Check Database URLs
2+
3+
on:
4+
schedule:
5+
# Run weekly on Monday at 2 AM UTC
6+
- cron: '0 2 * * 1'
7+
workflow_dispatch: # Allow manual triggering
8+
push:
9+
paths:
10+
- 'funannotate2/downloads.json' # Run when downloads.json is updated
11+
12+
jobs:
13+
check-urls:
14+
runs-on: ubuntu-latest
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: '3.11'
23+
cache: 'pip'
24+
25+
- name: Install dependencies
26+
run: |
27+
python -m pip install --upgrade pip
28+
python -m pip install pytest requests
29+
30+
- name: Check database URLs
31+
run: |
32+
pytest tests/unit/test_database_urls.py -v --tb=short
33+
34+
- name: Create issue if URLs are broken
35+
if: failure()
36+
uses: actions/github-script@v7
37+
with:
38+
script: |
39+
const title = '🔗 Database URLs are unreachable';
40+
const body = `## Database URL Check Failed
41+
42+
The automated weekly check for database URLs has failed. One or more URLs in \`funannotate2/downloads.json\` are not reachable.
43+
44+
**Workflow run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
45+
46+
Please check the workflow logs for details on which URLs are failing and update \`funannotate2/downloads.json\` accordingly.
47+
48+
### Common causes:
49+
- Database has moved to a new URL
50+
- Database server is temporarily down
51+
- Database version has been updated
52+
- Network connectivity issues
53+
54+
---
55+
*This issue was automatically created by the [Check Database URLs workflow](${{ github.server_url }}/${{ github.repository }}/blob/main/.github/workflows/check-database-urls.yml)*`;
56+
57+
// Check if an issue already exists
58+
const issues = await github.rest.issues.listForRepo({
59+
owner: context.repo.owner,
60+
repo: context.repo.repo,
61+
state: 'open',
62+
labels: 'database-urls'
63+
});
64+
65+
if (issues.data.length === 0) {
66+
// Create new issue
67+
await github.rest.issues.create({
68+
owner: context.repo.owner,
69+
repo: context.repo.repo,
70+
title: title,
71+
body: body,
72+
labels: ['database-urls', 'bug', 'automated']
73+
});
74+
} else {
75+
// Add comment to existing issue
76+
await github.rest.issues.createComment({
77+
owner: context.repo.owner,
78+
repo: context.repo.repo,
79+
issue_number: issues.data[0].number,
80+
body: `Database URLs are still failing as of ${new Date().toISOString()}\n\n**Workflow run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}`
81+
});
82+
}
83+

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,6 @@ tests/integration/__pycache__*
99
output.json
1010
.coverage
1111
annotations.txt
12+
debug_*.fa
13+
debug_*.gff3
14+
quick_genemark_prediction_test.sh

CITATION.cff

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
cff-version: version = "25.11.1"
1+
cff-version: version = "26.2.12"
22
title: 'funannotate2: eukaryotic genome annotation'
33
message: >-
44
If you use this software, please cite it using the
@@ -17,5 +17,5 @@ keywords:
1717
- functional annotation
1818
- consensus gene models
1919
license: BSD-2-Clause
20-
version: version = "25.11.1"
21-
date-released: '2025-11-02'
20+
version: version = "26.2.12"
21+
date-released: '2026-02-13'

docs/installation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Funannotate2 has the following dependencies:
2424
* gfftk (>=24.10.29)
2525
* BUSCOlite (>=24.7.29)
2626
* gapmm2
27-
* pyhmmer (>=0.10.15)
27+
* pyhmmer (>=0.12.0)
2828
* pyfastx (>=2.0.0)
2929
* requests
3030
* gb-io (>=0.3.2)

funannotate2/__main__.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,20 @@ def predict_subparser(subparsers):
230230
nargs="+",
231231
metavar="",
232232
)
233+
optional_args.add_argument(
234+
"-b",
235+
"--bam",
236+
required=False,
237+
help="RNA-seq alignment file in BAM format for generating hints and UTR refinement",
238+
metavar="",
239+
)
240+
optional_args.add_argument(
241+
"--bam-library",
242+
required=False,
243+
choices=["RF", "FR", "UU"],
244+
help="RNA-seq library type (required if --bam is set): RF (dUTP/NSR), FR (Ligation), UU (Unstranded)",
245+
metavar="",
246+
)
233247
optional_args.add_argument(
234248
"-c",
235249
"--cpus",
@@ -239,7 +253,14 @@ def predict_subparser(subparsers):
239253
metavar="",
240254
)
241255
optional_args.add_argument(
242-
"-mi",
256+
"--min-intron",
257+
dest="min_intron",
258+
help="Minimum intron length",
259+
type=int,
260+
default=11,
261+
metavar="",
262+
)
263+
optional_args.add_argument(
243264
"--max-intron",
244265
dest="max_intron",
245266
help="Maximum intron length",

funannotate2/downloads.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
"uniprot": "https://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz",
44
"uniprot-release": "https://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt",
55
"merops": "https://ftp.ebi.ac.uk/pub/databases/merops/current_release/meropsscan.lib",
6-
"dbCAN": "https://bcb.unl.edu/dbCAN2/download/Databases/V13/dbCAN-HMMdb-V13.txt",
7-
"dbCAN-tsv": "https://bcb.unl.edu/dbCAN2/download/Databases/V12/CAZyDB.08062022.fam-activities.txt",
8-
"dbCAN-log": "https://bcb.unl.edu/dbCAN2/download/Databases/V11/readme.txt",
6+
"dbCAN": "http://dbcan-hcc.unl.edu/download/dbCAN-HMMdb-V14.txt",
7+
"dbCAN-tsv": "http://dbcan-hcc.unl.edu/download/Databases/V12/CAZyDB.08062022.fam-activities.txt",
8+
"dbCAN-log": "http://dbcan-hcc.unl.edu/download/Databases/V11/readme.txt",
99
"pfam": "https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz",
1010
"pfam-tsv": "https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz",
1111
"pfam-log": "https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam.version.gz",

funannotate2/log.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
import buscolite
77
import gfftk
8+
import annorefine
89

910
from .__init__ import __version__
1011
from .utilities import human_readable_size
@@ -110,11 +111,12 @@ def system_info(log):
110111
- None
111112
"""
112113
log(
113-
"Python v{}; funannotate2 v{}; gfftk v{}; buscolite v{}".format(
114+
"Python v{}; funannotate2 v{}; gfftk v{}; buscolite v{}; annorefine v{}".format(
114115
platform.python_version(),
115116
__version__,
116117
gfftk.__version__,
117118
buscolite.__version__,
119+
annorefine.__version__,
118120
)
119121
)
120122

@@ -136,7 +138,11 @@ def finishLogging(log, module):
136138
"""
137139
current, peak = tracemalloc.get_traced_memory()
138140
tracemalloc.stop()
139-
log("{} module finished: peak memory usage={}".format(module, human_readable_size(peak)))
141+
log(
142+
"{} module finished: peak memory usage={}".format(
143+
module, human_readable_size(peak)
144+
)
145+
)
140146

141147

142148
def log_dependencies(script=False):

0 commit comments

Comments
 (0)