Skip to content

Commit 5ecb439

Browse files
committed
v0.9.0
1 parent 63bccdf commit 5ecb439

64 files changed

Lines changed: 2007 additions & 932 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/docker-image.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,5 +31,5 @@ jobs:
3131

3232
- name: Build the Docker image
3333
run: |
34-
docker build . --file docker/remeta.dockerfile --tag ghcr.io/${{ env.REPO_NAME }}/remeta:v${{ steps.version.outputs.version }}
34+
docker build . --file docker/remeta.ubuntu22.dockerfile --tag ghcr.io/${{ env.REPO_NAME }}/remeta:v${{ steps.version.outputs.version }}
3535
docker push ghcr.io/${{ env.REPO_NAME }}/remeta:v${{ steps.version.outputs.version }}

CMakeLists.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ set(CMAKE_CXX_FLAGS "-O3 -Wall -ffast-math -fvisibility=hidden -fopenmp")
2020
add_definitions(-DVERSION_NUMBER="v${RM_VERSION}-${GIT_COMMIT}")
2121

2222
add_executable(remeta
23+
${CMAKE_SOURCE_DIR}/src/io/allele_freq_writer.cpp
24+
${CMAKE_SOURCE_DIR}/src/io/anno_reader.cpp
2325
${CMAKE_SOURCE_DIR}/src/io/bgz_reader.cpp
2426
${CMAKE_SOURCE_DIR}/src/io/bgz_writer.cpp
2527
${CMAKE_SOURCE_DIR}/src/io/block_pgen_reader.cpp
@@ -34,6 +36,7 @@ add_executable(remeta
3436
${CMAKE_SOURCE_DIR}/src/io/ref_ld_matrix_reader.cpp
3537
${CMAKE_SOURCE_DIR}/src/io/ref_ld_matrix_writer.cpp
3638
${CMAKE_SOURCE_DIR}/src/io/regenie_anno_reader.cpp
39+
${CMAKE_SOURCE_DIR}/src/io/tabixed_anno_reader.cpp
3740
${CMAKE_SOURCE_DIR}/src/meta/es_meta_analyzer.cpp
3841
${CMAKE_SOURCE_DIR}/src/meta/genep_meta_analyzer.cpp
3942
${CMAKE_SOURCE_DIR}/src/meta/htp_meta_analyzer.cpp
@@ -49,7 +52,6 @@ add_executable(remeta
4952
${CMAKE_SOURCE_DIR}/src/logging.cpp
5053
${CMAKE_SOURCE_DIR}/src/parameter_checks.cpp
5154
${CMAKE_SOURCE_DIR}/src/remeta.cpp
52-
${CMAKE_SOURCE_DIR}/src/run_acatv.cpp
5355
${CMAKE_SOURCE_DIR}/src/run_compute_ref_ld.cpp
5456
${CMAKE_SOURCE_DIR}/src/run_esma.cpp
5557
${CMAKE_SOURCE_DIR}/src/run_genep.cpp

LICENSE

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,30 @@ furnished to do so, subject to the following conditions:
1212
The above copyright notice and this permission notice shall be included in all
1313
copies or substantial portions of the Software.
1414

15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
22+
23+
-------------------------------------------------------------------------------
24+
25+
This software uses code from REGENIE which is licensed under the MIT License:
26+
27+
Copyright (c) 2020-2021 Joelle Mbatchou, Andrey Ziyatdinov & Jonathan Marchini
28+
29+
Permission is hereby granted, free of charge, to any person obtaining a copy
30+
of this software and associated documentation files (the "Software"), to deal
31+
in the Software without restriction, including without limitation the rights
32+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
33+
copies of the Software, and to permit persons to whom the Software is
34+
furnished to do so, subject to the following conditions:
35+
36+
The above copyright notice and this permission notice shall be included in all
37+
copies or substantial portions of the Software.
38+
1539
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
1640
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
1741
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.7.1
1+
0.9.0

docker/remeta.ubuntu20.dockerfile

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
FROM ubuntu:20.04 AS builder
2+
3+
ENV CMAKE_VERSION 3.10
4+
ENV CMAKE_VERSION_PATCH 0
5+
ENV HTSLIB_VERSION 1.20
6+
ENV TMP_DIR /tmp
7+
8+
COPY .git ${TMP_DIR}/remeta/.git
9+
COPY lib ${TMP_DIR}/remeta/lib
10+
COPY src ${TMP_DIR}/remeta/src
11+
COPY CMakeLists.txt ${TMP_DIR}/remeta/CMakeLists.txt
12+
COPY VERSION ${TMP_DIR}/remeta/VERSION
13+
14+
ADD http://cmake.org/files/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.${CMAKE_VERSION_PATCH}-Linux-x86_64.sh cmake_install.sh
15+
ADD https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB intel_key.PUB
16+
17+
ARG DEBIAN_FRONTEND=noninteractive
18+
ENV TZ=Etc/UTC
19+
20+
RUN apt-get update \
21+
&& apt-get install -y --no-install-recommends \
22+
g++ \
23+
make \
24+
gnupg \
25+
gpg-agent \
26+
wget \
27+
bzip2 \
28+
apt-transport-https \
29+
ca-certificates \
30+
git-all \
31+
zlib1g-dev \
32+
libboost-all-dev \
33+
libz-dev \
34+
libbz2-dev \
35+
liblzma-dev \
36+
libcurl4-openssl-dev \
37+
libssl-dev \
38+
libgomp1 \
39+
libdeflate-dev \
40+
&& sh -c 'cat intel_key.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null' \
41+
&& sh -c 'echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list' \
42+
&& apt-get update \
43+
&& apt-get install -y --no-install-recommends intel-oneapi-mkl-devel \
44+
&& . /opt/intel/oneapi/setvars.sh \
45+
&& echo "MKL_THREADING_LAYER=GNU" >> /etc/environment \
46+
&& sh cmake_install.sh --prefix=/usr/local --skip-license --exclude-subdir \
47+
&& mkdir -p $TMP_DIR \
48+
&& wget -q --no-check-certificate https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.bz2 \
49+
&& tar -xf eigen-3.4.0.tar.bz2 -C /usr/local/lib \
50+
&& rm eigen-3.4.0.tar.bz2 \
51+
&& wget -q --no-check-certificate https://github.com/samtools/htslib/releases/download/$HTSLIB_VERSION/htslib-$HTSLIB_VERSION.tar.bz2 \
52+
&& tar -xf htslib-$HTSLIB_VERSION.tar.bz2 -C $TMP_DIR \
53+
&& rm htslib-$HTSLIB_VERSION.tar.bz2 \
54+
&& cd $TMP_DIR/htslib-$HTSLIB_VERSION/ \
55+
&& ./configure \
56+
&& make \
57+
&& make install \
58+
&& cp tabix /usr/local/bin/tabix \
59+
&& cd $TMP_DIR/remeta/lib/pgenlib \
60+
&& make clean \
61+
&& cd $TMP_DIR/remeta/lib/faddeeva \
62+
&& make clean \
63+
&& cd $TMP_DIR/remeta/lib/qfc \
64+
&& make clean \
65+
&& cd $TMP_DIR/remeta \
66+
&& cmake -D EIGEN_PATH=/usr/local/lib/eigen-3.4.0 \
67+
-D CMAKE_CXX_COMPILER=g++ \
68+
-D MKLROOT=${MKLROOT} \
69+
. \
70+
&& make remeta \
71+
&& cp remeta /usr/local/bin/remeta \
72+
&& cd \
73+
&& rm -rf $TMP_DIR
74+
75+
FROM ubuntu:20.04
76+
77+
ARG DEBIAN_FRONTEND=noninteractive
78+
ENV TZ=Etc/UTC
79+
80+
RUN apt-get update \
81+
&& apt-get install -y --no-install-recommends \
82+
g++ \
83+
make \
84+
gnupg \
85+
wget \
86+
apt-transport-https \
87+
ca-certificates \
88+
git-all \
89+
zlib1g-dev \
90+
libboost-all-dev \
91+
libz-dev \
92+
libbz2-dev \
93+
liblzma-dev \
94+
libcurl4-openssl-dev \
95+
libssl-dev \
96+
libgomp1 \
97+
libdeflate-dev
98+
99+
COPY --from=builder /usr/local/bin/remeta /usr/local/bin/
100+
COPY --from=builder /usr/local/bin/tabix /usr/local/bin/
Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1-
FROM ubuntu:22.04 as builder
1+
FROM ubuntu:22.04 AS builder
22

33
ENV CMAKE_VERSION 3.10
44
ENV CMAKE_VERSION_PATCH 0
55
ENV HTSLIB_VERSION 1.20
66
ENV TMP_DIR /tmp
77

8-
COPY . ${TMP_DIR}/remeta
8+
COPY .git ${TMP_DIR}/remeta/.git
9+
COPY lib ${TMP_DIR}/remeta/lib
10+
COPY src ${TMP_DIR}/remeta/src
11+
COPY CMakeLists.txt ${TMP_DIR}/remeta/CMakeLists.txt
12+
COPY VERSION ${TMP_DIR}/remeta/VERSION
913

1014
ADD http://cmake.org/files/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.${CMAKE_VERSION_PATCH}-Linux-x86_64.sh cmake_install.sh
1115
ADD https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB intel_key.PUB
@@ -52,8 +56,13 @@ RUN apt-get update \
5256
&& make \
5357
&& make install \
5458
&& cp tabix /usr/local/bin/tabix \
59+
&& cd $TMP_DIR/remeta/lib/pgenlib \
60+
&& make clean \
61+
&& cd $TMP_DIR/remeta/lib/faddeeva \
62+
&& make clean \
63+
&& cd $TMP_DIR/remeta/lib/qfc \
64+
&& make clean \
5565
&& cd $TMP_DIR/remeta \
56-
&& rm -f CMakeCache.txt \
5766
&& cmake -D EIGEN_PATH=/usr/local/lib/eigen-3.4.0 \
5867
-D CMAKE_CXX_COMPILER=g++ \
5968
-D MKLROOT=${MKLROOT} \

docs/docs/documentation.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ remeta gene \
6565
--condition-htp HTP1 HTP2 ...
6666
```
6767

68-
**Running without LD matrices**
68+
**Running without LD matrices (not recommended)**
6969

7070
`remeta gene` can be run without the required LD matrices by specifying the `--ignore-mask-ld` and `--keep-variants-not-in-ld-mat` flags.
7171
Note that it is not possible to perform conditional analysis without LD matrices.
@@ -92,6 +92,16 @@ Alternatively **remeta** can use a maximum allele frequency observed across coho
9292
Lastly, **remeta** allele frequencies can be specifed in an allele frequency file using the `--aaf-file` argument.
9393
See [File Formats](file_formats.md) for a list of available formats.
9494

95+
**Unbalanced binary traits**
96+
97+
**remeta** uses two strategies to control type 1 error for unbalanced binary traits:
98+
a saddlepoint approximation (SPA) applied per mask and an SPA applied per variant.
99+
The default parameters apply a mask level or variant level SPA when the case-control ratio of the trait falls below a certain threshold.
100+
Simulations suggest that the threshold on case control apply an SPA depends on the test (e.g. burden vs. SKATO),
101+
so parameters can be adjusted per test using several command line parameters.
102+
Mask level parameters are available for burden tests and SKATO, and adjusted using the `--<burden,skato>-mask-spa-<pval,ccr>` arguments.
103+
Variant level parameters are avaiable for burden test, SKATO, and ACATV, and adjusting using the `--<burden,skato,acatv>-sv-spa-<pval,ccr>` arguments.
104+
95105
### Options
96106

97107
| Option | Argument | Type | Description |
@@ -105,25 +115,33 @@ See [File Formats](file_formats.md) for a list of available formats.
105115
| `--trait-name` | STRING | Required | Name of trait. |
106116
| `--trait-type` | STRING | Required | One of BT or QT. |
107117
| `--out` | STRING | Required | Prefix for output files. |
108-
| `--burdern-aaf-bins` (=0.0001 0.001 0.005 0.01) | FLOAT1 FLOAT2 ... | Optional | Allele frequency cutoffs for building masks for burden testing. |
118+
| `--burden-aaf-bins` (=0.0001 0.001 0.005 0.01) | FLOAT1 FLOAT2 ... | Optional | Allele frequency cutoffs for building masks for burden testing. |
109119
| `--burden-singleton-def` (=within) | STRING | Optional | Define singletons for the singleton mask within cohorts or across cohorts. One of 'within', 'across' or 'omit'. |
110120
| `--burden-weight-strategy (=uniform)` | STRING | Optional | Strategy to compute variant weights for burden testing. One of `beta` or `uniform`. |
121+
| `--burden-mask-spa-pval (=0.05)` | FLOAT | Optional | Apply a mask level SPA to burden tests when p-value < spa pval (BTs only). |
122+
| `--burden-mask-spa-ccr (=0.01)` | FLOAT | Optional | Apply a mask level SPA to burden tests # cases / # controls < spa-ccr (BTs only). |
123+
| `--burden-sv-spa-pval (=0.05)` | FLOAT | Optional | Apply a per variant SPA to burden tests p-value < spa pval (BTs only). |
124+
| `--burden-sv-spa-ccr (=0.00)` | FLOAT | Optional | Apply a per variant SPA to burden tests when # cases / # controls < spa ccr (BTs only). |
111125
| `--skip-burden` | FLAG | Optional | Do not run burden testing. |
112126
| `--skato-max-aaf (=0.01)` | FLOAT | Optional | Maximum allele frequency for a variant to be included in mask for SKATO. |
113127
| `--skato-rho-values (=0 0.01 0.04 0.09 0.16 0.25 0.5 1)` | FLOAT1 FLOAT2 ... | Optional | Rho values for SKATO. |
114128
| `--skato-min-aac (=1)` | INT | Optional | Minimum AAC across cohorts for a variant to be included in a mask for SKATO. |
115129
| `--skato-weight-strategy` | STRING | Optional | Strategy to compute variant weights for SKATO. One of 'beta' or 'uniform'. |
130+
| `--skato-mask-spa-pval (=0.05)` | FLOAT | Optional | Apply a mask level SPA to SKATO when p-value < spa pval (BTs only). |
131+
| `--skato-mask-spa-ccr (=0.02)` | FLOAT | Optional | Apply a mask level SPA to SKATO when # cases / # controls < spa ccr (BTs only). |
132+
| `--skato-sv-spa-pval (=0.05)` | FLOAT | Optional | Apply a per variant SPA to SKATO when p-value < spa pval (BTs only). |
133+
| `--skato-sv-spa-ccr (=0.02)` | FLOAT | Optional | Apply a per variant SPA to SKATO when #cases / # controls < spa ccr (BTs only). |
116134
| `--skip-skato` | FLAG | Optional | Do not run SKATO. |
117135
| `--acatv-max-aaf (=0.01)` | FLOAT | Optional | Maximum allele frequency for a variant to be included in mask for ACATV. |
118136
| `--acatv-min-aac (=5)` | INT | Optional | Minimum AAC across cohorts for a variant to be included in a mask for ACATV. |
119137
| `--acatv-weight-strategy` | STRING | Optional | Strategy to compute variant weights for ACATV. One of 'beta' or 'uniform'. |
138+
| `--acatv-sv-spa-pval (=0.05)` | FLOAT | Optional | Apply a per variant SPA to ACATV when p-value < spa pval (BTs only). |
139+
| `--acatv-sv-spa-ccr (=0.02)` | FLOAT | Optional | Apply a per variant SPA to ACATV when #cases / # controls < spa- ccr (BTs only). |
120140
| `--skip-acatv` | STRING | Optional | Do not run ACATV. |
121141
| `--condition-list` | FILE | Optional | File with variants to condition on (one per line). |
122142
| `--condition-htp` | FILE1 FILE2 ... | Optional | List of HTP files with summary statistics of conditional variants per cohort. |
123143
| `--af-strategy (=overall)` | STRING | Optional | Strategy to compute variant allele frequences. One of 'overall' or 'max'. |
124144
| `--aaf-file` | FILE | Optional | Use precomputed alternate allele frequencies from an external file. |
125-
| `--spa-pval =(0.05)` | FLOAT | Optional | Apply SPA when the burden p-value is below spa-pval (BTs only, not applied to ACATV). |
126-
| `--spa-ccr =(0.01)` | FLOAT | Optional | Apply SPA when # cases / # controls < spa-ccr (BTs only, not applied to ACATV). |
127145
| `--chr` | STRING | Optional | Run only on specifed chromosome. |
128146
| `--gene` | STRING | Optional | Run only on specified gene. |
129147
| `--extract` | FILE | Optional | Include only the variants with IDs listed in this file (one per line). |
@@ -134,6 +152,7 @@ See [File Formats](file_formats.md) for a list of available formats.
134152
| `--recompute-score` | FLAG | Optional | Recompute score statistics from betas and standard errors when missing in input. |
135153
| `--keep-variants-not-in-ld-mat` | FLAG | Optional | Keep variants absent from the LD matrix instead of dropping them. |
136154
| `--ignore-mask-ld` | FLAG | Optional | Ignore LD between variants in a mask. |
155+
| `--write-variant-aaf` | FLAG | Optional | Output variant AAFs used to construct masks. |
137156
| `--threads (=1)` | INT | Optional | Number of threads to use. |
138157

139158

docs/docs/file_formats.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ Most annotation files compatible with **regenie** should also be compatible with
1313
A file defining variant annotations.
1414
Contains 3 whitespace delimited columns: variant id (in CPRA format), gene name, and variant annotation.
1515

16+
**New in v0.9.0**: **remeta** now supports a tabixible 5-column annotation file where column 4 is the chromosome and column 5 is the position.
17+
```
18+
1:55039839:T:C PCSK9 LoF 1 55039839
19+
1:55039842:G:A PCSK9 missense 1 55039842
20+
.
21+
```
22+
1623
### `--set-list`
1724
```
1825
A1BG 19 58346922 19:58346922:C:A,19:58346924:G:A,...

docs/docs/img/remeta_workflow.png

277 KB
Loading

docs/docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ The main features of **remeta** are:
1717
See the [**remeta** tutorial](tutorial.md) for a step-by-step example.
1818

1919
## Citation
20-
Joseph, T., Mbatchou, J., et al. Computationally efficient meta-analysis of gene-based tests using summary statistics in large-scale genetic studies. medRxiv (2024). [https://doi.org/10.1101/2024.12.06.24318617](https://doi.org/10.1101/2024.12.06.24318617).
20+
Joseph, T., Mbatchou, J., et al. Computationally efficient meta-analysis of gene-based tests using summary statistics in large-scale genetic studies. bioRxiv (2024).
2121

2222
## License
2323
**remeta** is distributed under an MIT license.

0 commit comments

Comments
 (0)