-
Notifications
You must be signed in to change notification settings - Fork 83
Documentation for LD matrix methods #3416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
10912c0
a2c59d5
51ab754
fd6e6f7
bae4065
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -684,14 +684,251 @@ and boolean expressions (e.g., {math}`(x > 0)`) are interpreted as 0/1. | |||||||||||||||||
| and {math}`v_j` is the covariance of the trait with the j-th covariate. | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_multi_site)= | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Multi site statistics | ||||||||||||||||||
|
|
||||||||||||||||||
| :::{todo} | ||||||||||||||||||
| Document statistics that use information about correlation between sites, such as | ||||||||||||||||||
| LdCalculator (and perhaps reference {ref}`sec_identity`). Note that if we have a general | ||||||||||||||||||
| framework which has the same calling conventions as the single site stats, | ||||||||||||||||||
| we can rework the sections above. | ||||||||||||||||||
| ::: | ||||||||||||||||||
| (sec_stats_two_locus)= | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Two-locus statistics | ||||||||||||||||||
|
|
||||||||||||||||||
| The {meth}`~TreeSequence.ld_matrix` method provides an interface to | ||||||||||||||||||
| a collection of two-locus statistics with predefined summary functions (see | ||||||||||||||||||
| {ref}`sec_stats_two_locus_summary_functions`) and `site` and `branch` | ||||||||||||||||||
| {ref}`modes <sec_stats_mode>`. The LD matrix method differs from other | ||||||||||||||||||
| statistics methods in that it provides a unified API with an argument to | ||||||||||||||||||
| specify different two-locus summaries of the data. It otherwise behaves | ||||||||||||||||||
| similarly to most other functions with respect to `sample_sets` and `indexes`. | ||||||||||||||||||
|
|
||||||||||||||||||
| Two-locus statistics can be computed using two modes, either `site` or | ||||||||||||||||||
| `branch`, and these should be interpreted in the same way as these modes in the | ||||||||||||||||||
| single-site statistics. Within this framework, we also implement polarisation, | ||||||||||||||||||
| but do not expose it to the user, opting to provide statistics that are | ||||||||||||||||||
| polarised where appropriate. | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_site)= | ||||||||||||||||||
|
|
||||||||||||||||||
| #### Site | ||||||||||||||||||
|
|
||||||||||||||||||
| The `site` mode computes two-locus statistics summarized over alleles between | ||||||||||||||||||
| all pairs of specified sites. The default behavior, leaving `sites` | ||||||||||||||||||
| unspecified, this method will compute a matrix for all pairs of sites, with | ||||||||||||||||||
| rows and columns representing each site in the tree sequence (i.e., an n×n | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
| matrix where n is the number of sites in the tree sequence). We can also | ||||||||||||||||||
| restrict the output to a subset of sites, either by specifying a single vector | ||||||||||||||||||
| for both rows and columns or a pair of vectors for the row sites and column | ||||||||||||||||||
| sites separately. | ||||||||||||||||||
|
|
||||||||||||||||||
| The following computes a matrix of the {math}`r^2` measure of linkage | ||||||||||||||||||
| disequilibrium (LD) computed pairwise between the first 4 sites in the tree | ||||||||||||||||||
| sequence among all samples. In our computations, row sites are used as the row | ||||||||||||||||||
| ("left-hand") loci and column sites are used as the column ("right-hand") | ||||||||||||||||||
| locus, and with a single list of sites specified, we obtain a symmetric square | ||||||||||||||||||
| matrix. | ||||||||||||||||||
|
|
||||||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||||||
| ld = ts.ld_matrix(sites=[[0, 1, 2, 3]]) | ||||||||||||||||||
| print(ld) | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| The following demonstrates how we can specify the row and column sites | ||||||||||||||||||
| independently of each other. We're specifying 3 columns and 2 rows, which | ||||||||||||||||||
| computes a subset of the matrix shown above. | ||||||||||||||||||
|
|
||||||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||||||
| ld = ts.ld_matrix(sites=[[0, 1], [1, 2, 3]]) | ||||||||||||||||||
| print(ld) | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| Because we implement two-locus statistics for multi-allelic data, we require | ||||||||||||||||||
| a method for combining the statistical results from each pair of alleles into | ||||||||||||||||||
| one summary for a pair of sites. There two methods for combining results from | ||||||||||||||||||
| multiple alleles, `hap_weighted` and `total_weighted`, which are | ||||||||||||||||||
| statistic-specific and not chosen by the user: | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing description of hap_weighted and total_weighted here? |
||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_branch)= | ||||||||||||||||||
|
|
||||||||||||||||||
| #### Branch | ||||||||||||||||||
|
|
||||||||||||||||||
| The `branch` mode computes two-locus statistics between pairs of trees. The | ||||||||||||||||||
| resulting statistic is a summary depending on marginal tree topologies and the | ||||||||||||||||||
| branch lengths of the trees at a pair of positions. To perform this | ||||||||||||||||||
| computation, we consider the counts of all possible two-locus haplotypes that | ||||||||||||||||||
| may be generated by possible mutations on each pair of branches between the two | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
| trees, which is summarized using an LD statistic. We then sum each pairwise | ||||||||||||||||||
| comparison weighted by the product of the two branch lengths above the nodes on | ||||||||||||||||||
| each tree. | ||||||||||||||||||
|
|
||||||||||||||||||
| Similar to the single-site statistics computed in `branch` mode, this results | ||||||||||||||||||
| in a statistic that is proportional to the expected statistic under an infinite | ||||||||||||||||||
| sites model (with mutation rate 1), conditioned on the pair of trees. | ||||||||||||||||||
|
Comment on lines
+763
to
+764
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't quite right. At first I was going to just suggest the following: However, quick experimenting with
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See more discussion on this below. |
||||||||||||||||||
|
|
||||||||||||||||||
| The time complexity of this method is quadratic, due to the pairwise | ||||||||||||||||||
| comparisons of branches from a pair of trees. By default, this method computes | ||||||||||||||||||
|
Comment on lines
+766
to
+767
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
| a symmetric matrix for all pairs of trees, with rows and columns representing | ||||||||||||||||||
| each tree in the tree sequence. Similar to the site method, we can restrict the | ||||||||||||||||||
| output to a subset of trees, either by specifying a vector of positions or | ||||||||||||||||||
| a pair of vectors for row and column positions separately. To select a specific | ||||||||||||||||||
| tree, the specified positions must land in the tree span (`[start, end)`). | ||||||||||||||||||
|
|
||||||||||||||||||
| In the following, we compute a matrix of expected {math}`r^2` within and | ||||||||||||||||||
| between the first 4 trees in the tree sequence. The tree breakpoints are | ||||||||||||||||||
| a convenient way to specify those first four trees tree. | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||||||
| ld = ts.ld_matrix(mode="branch", positions=[ts.breakpoints(as_array=True)[0:4]]) | ||||||||||||||||||
| print(ld) | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| Again, we can specify the row and column trees separately. | ||||||||||||||||||
|
|
||||||||||||||||||
| ```{code-cell} ipython3 | ||||||||||||||||||
| breakpoints = ts.breakpoints(as_array=True) | ||||||||||||||||||
| ld = ts.ld_matrix(mode="branch", positions=[breakpoints[[0]], breakpoints[0:4]]) | ||||||||||||||||||
| print(ld) | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_sample_sets)= | ||||||||||||||||||
|
|
||||||||||||||||||
| #### Sample Sets | ||||||||||||||||||
|
|
||||||||||||||||||
| The two-locus API provides a mechanism by which to subset the samples under | ||||||||||||||||||
| consideration, providing the ability to compute a separate LD matrix for each | ||||||||||||||||||
| sample set or an LD matrix between sample sets. Output dimensions are handled | ||||||||||||||||||
| in the same manner as the rest of the stats API (see | ||||||||||||||||||
| {ref}`sec_stats_output_dimensions`). The two-locus API differs from the rest of | ||||||||||||||||||
| the stats API in that we handle one-way and two-way statistics in the same | ||||||||||||||||||
| function call. | ||||||||||||||||||
|
|
||||||||||||||||||
| To compute a two-way two-locus statistic, the `index` argument must be | ||||||||||||||||||
| provided. The statistics are selected in the same way (with the `stat` | ||||||||||||||||||
| argument), but we provide a restricted set of two-way statistics (see | ||||||||||||||||||
| {ref}`sec_stats_two_locus_summary_functions_two_way`). The dimension-dropping | ||||||||||||||||||
| rules for the result follow the rest of the tskit stats API in that scalar | ||||||||||||||||||
| indexes will produce a single two-dimensional matrix, while list of indexes | ||||||||||||||||||
| will produce a three-dimensional array, with the outer dimension of length | ||||||||||||||||||
| equal to the length of the list of indexes. | ||||||||||||||||||
|
Comment on lines
+807
to
+810
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| For concreteness, we would expect the following dimensions with the specified | ||||||||||||||||||
| `sample_sets` and `indexes` arguments. | ||||||||||||||||||
|
|
||||||||||||||||||
| ``` | ||||||||||||||||||
| # one-way | ||||||||||||||||||
| ts.ld_matrix(sample_sets=None) -> 2 dimensions | ||||||||||||||||||
| ts.ld_matrix(sample_sets=[0, 1, 2, 3]) -> 2 dimensions | ||||||||||||||||||
| ts.ld_matrix(sample_sets=[[0, 1, 2, 3]]) -> 3 dimensions | ||||||||||||||||||
| # two-way | ||||||||||||||||||
| ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=(0, 1)) -> 2 dimensions | ||||||||||||||||||
| ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) -> 3 dimensions | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_sample_one_way_stats)= | ||||||||||||||||||
|
|
||||||||||||||||||
| ##### One-way Statistics | ||||||||||||||||||
|
|
||||||||||||||||||
| One-way statistics are summaries of two loci in a single sample set, using a | ||||||||||||||||||
| triple of haplotype counts {math}`\{w_{Ab}, w_{aB}, w_{AB}\}` and the size of | ||||||||||||||||||
| the sample set {math}`n`, where the capitalized and lowercase letters in our notation | ||||||||||||||||||
| represent alternate alleles. | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_sample_two_way_stats)= | ||||||||||||||||||
|
|
||||||||||||||||||
| ##### Two-way Statistics | ||||||||||||||||||
|
|
||||||||||||||||||
| Two-way statistics are summaries of haplotype counts between two sample sets, | ||||||||||||||||||
| which operate on the three haplotype counts (as in one-way stats, above) | ||||||||||||||||||
| computed from each sample set, indexed by `(i, j)`. These statistics take on | ||||||||||||||||||
| a different meaning from their one-way counterparts. For instance {math}`D_i | ||||||||||||||||||
| D_j` can be thought of as the covariance between two loci when computed for | ||||||||||||||||||
| a haplotype in a single sample set. | ||||||||||||||||||
|
|
||||||||||||||||||
| Only a subset of our summary functions are two-way statistics (see | ||||||||||||||||||
| {ref}`sec_two_locus_summary_functions_two_way`). Note that the unbiased two-way | ||||||||||||||||||
| statistics expect non-overlapping sample sets, and we do not make any | ||||||||||||||||||
| assertions about the sample sets and assume that `i` and `j` represent disjoint | ||||||||||||||||||
| sets of samples (see also the note in {meth}`~TreeSequence.divergence`). | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_summary_functions)= | ||||||||||||||||||
|
|
||||||||||||||||||
| #### Summary Functions | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_summary_functions_one_way)= | ||||||||||||||||||
|
|
||||||||||||||||||
| ##### One-way | ||||||||||||||||||
|
|
||||||||||||||||||
| The two-locus summary functions all take haplotype counts and sample set size as | ||||||||||||||||||
| input. Each of our summary functions has the signature | ||||||||||||||||||
| {math}`f(w_{Ab}, w_{aB}, w_{AB}, n)`, converting to haplotype frequencies | ||||||||||||||||||
| {math}`\{p_{Ab}, p_{aB}, p_{AB}\}` where appropriate by dividing by {math}`n`. | ||||||||||||||||||
|
|
||||||||||||||||||
| `D` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{ab} - p_{a}p_{b}` | ||||||||||||||||||
|
|
||||||||||||||||||
| This statistic is polarised, as the unpolarised result, which averages over | ||||||||||||||||||
| allele labelings, is zero. Uses the `total` normalisation method. | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. below this is called "total weighted" not "normalised"; just checking? |
||||||||||||||||||
|
|
||||||||||||||||||
| `D_prime` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{D_{\max}}` | ||||||||||||||||||
|
|
||||||||||||||||||
| Where {math}``` | ||||||||||||||||||
| D_{\max} = \begin{cases} | ||||||||||||||||||
|
Comment on lines
+871
to
+874
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
| \min\{p_A (1-p_B), p_B (1-p_B)\} & \textrm{if }D>=0 \\ | ||||||||||||||||||
| \min\{p_A p_B, (1-p_B) (1-p_B)\} & \textrm{otherwise} | ||||||||||||||||||
| \end{cases}``` | ||||||||||||||||||
|
|
||||||||||||||||||
| and {math}`D` is defined above. | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| `D2` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = (p_{ab} - p_{a} p_{b})^2` | ||||||||||||||||||
|
|
||||||||||||||||||
| Unpolarised, total weighted. | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| `Dz` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D (1 - 2 p_{a})(1 - 2p_{b})` | ||||||||||||||||||
|
|
||||||||||||||||||
| Where {math}`D` is defined above. Unpolarised, total weighted. | ||||||||||||||||||
|
Comment on lines
+887
to
+889
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (the "where" is part of the same sentence as the equation, not a new sentence, so it's not capitalized) |
||||||||||||||||||
|
|
||||||||||||||||||
| `pi2` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{a}p_{b}(1-p_{a})(1-p_{b})` | ||||||||||||||||||
|
|
||||||||||||||||||
| Unpolarised, total weighted. | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| `r` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{\sqrt{p_{a}p_{b}(1-p_{a})(1-p_{b})}}` | ||||||||||||||||||
|
|
||||||||||||||||||
| Where {math}`D` is defined above. Polarised, total weighted. | ||||||||||||||||||
|
|
||||||||||||||||||
|
Comment on lines
+897
to
+900
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
| `r2` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D^{2}}{p_{a}p_{b}(1-p_{a})(1-p_{b})}` | ||||||||||||||||||
|
|
||||||||||||||||||
| Where {math}`D` is defined above. Unpolarised, haplotype weighted. | ||||||||||||||||||
|
Comment on lines
+902
to
+904
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| Unbiased two-locus statistics from the Hill-Robertson (1968) system are | ||||||||||||||||||
| computed from haplotype counts. Derivations for these unbiased estimators can | ||||||||||||||||||
| be found in [Ragsdale and Gravel | ||||||||||||||||||
| (2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples | ||||||||||||||||||
|
Comment on lines
+907
to
+909
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Also derivations, but it's more relevant that the defintiions are there? |
||||||||||||||||||
| to be valid and are specified as `stat=D2_unbiased`, `Dz_unbiased`, or | ||||||||||||||||||
| `pi2_unbiased`. | ||||||||||||||||||
|
Comment on lines
+910
to
+911
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| (sec_two_locus_summary_functions_two_way)= | ||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_two_locus_summary_functions_two_way)= | ||||||||||||||||||
|
|
||||||||||||||||||
| ##### Two-way | ||||||||||||||||||
|
|
||||||||||||||||||
| Two-way statistics are indexed by sample sets {math}`i, j` and compute values | ||||||||||||||||||
| using haplotype counts within pairs of sample sets. | ||||||||||||||||||
|
|
||||||||||||||||||
| `D2` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D_i * D_j` | ||||||||||||||||||
|
|
||||||||||||||||||
| Where {math}`D` is defined above. | ||||||||||||||||||
|
Comment on lines
+923
to
+925
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
|
||||||||||||||||||
| `r2` | ||||||||||||||||||
| : {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{(p_{AB_i} - (p_{A_i} p_{B_i})) (p_{AB_j} - (p_{A_j} p_{B_j}))}{\sqrt{p_{A_i} (1 - p_{A_i}) p_{B_i} (1 - p_{B_i})}\sqrt{p_{A_j} (1 - p_{A_j}) p_{B_j} (1 - p_{B_j})}}` | ||||||||||||||||||
|
|
||||||||||||||||||
| And `D2_unbiased`, which can be found in [Ragsdale and Gravel | ||||||||||||||||||
| (2020)](https://doi.org/10.1093/molbev/msz265). | ||||||||||||||||||
|
|
||||||||||||||||||
apragsdale marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
|
|
||||||||||||||||||
| (sec_stats_notes)= | ||||||||||||||||||
|
|
||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -10930,12 +10930,89 @@ def impute_unknown_mutations_time( | |||||||||||||||||||||||
| def ld_matrix( | ||||||||||||||||||||||||
| self, | ||||||||||||||||||||||||
| sample_sets=None, | ||||||||||||||||||||||||
| sites=None, | ||||||||||||||||||||||||
| positions=None, | ||||||||||||||||||||||||
| mode="site", | ||||||||||||||||||||||||
| stat="r2", | ||||||||||||||||||||||||
| sites=None, | ||||||||||||||||||||||||
| positions=None, | ||||||||||||||||||||||||
| indexes=None, | ||||||||||||||||||||||||
| ): | ||||||||||||||||||||||||
| r""" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Returns a matrix of the specified two-locus statistic (default | ||||||||||||||||||||||||
| :math:`r^2`) computed from sample allelic states or branch lengths. | ||||||||||||||||||||||||
| The resulting linkage disequilibrium (LD) matrix represents either the | ||||||||||||||||||||||||
| two-locus statistic as computed between all pairs of specified | ||||||||||||||||||||||||
| ``sites`` ("site" mode, producing a num_sites-by-num_sites sized | ||||||||||||||||||||||||
| matrix), or as computed from the branch structures at marginal trees | ||||||||||||||||||||||||
| between pairs of trees at all specified ``positions`` ("branch" mode, | ||||||||||||||||||||||||
| producing a num_positions-by-num_positions sized matrix). | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| In the site mode, the sites under consideration can be restricted using | ||||||||||||||||||||||||
| the ``sites`` argument. Sites can be passed as a list of lists, | ||||||||||||||||||||||||
| specifying the ``[[row_sites], [col_sites]]``, resulting in a | ||||||||||||||||||||||||
| rectangular matrix, or by specifying a single list of ``[sites]``, in | ||||||||||||||||||||||||
| which a square matrix will be produced (see | ||||||||||||||||||||||||
| :ref:`sec_stats_two_locus_site` for examples). | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Similarly, in the branch mode, the ``positions`` argument specifies | ||||||||||||||||||||||||
| loci for which the expectation for the two-locus statistic is computed | ||||||||||||||||||||||||
| over pairs of trees at those positions. LD stats are computed between | ||||||||||||||||||||||||
| trees whose ``[start, end)`` contains the given position (such that | ||||||||||||||||||||||||
| repeats of trees are possible). Similar to the site mode, a nested list | ||||||||||||||||||||||||
|
Comment on lines
+10957
to
+10961
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll need to adjust this given the "expectation" discussion above. |
||||||||||||||||||||||||
| of row and column positions can be specified separately (resulting in a | ||||||||||||||||||||||||
| rectangular matrix) or a single list of a specified positions results | ||||||||||||||||||||||||
| in a square matrix (see :ref:`sec_stats_two_locus_branch` for | ||||||||||||||||||||||||
| examples). | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Some LD statistics are defined for two sample sets instead of within a | ||||||||||||||||||||||||
| single set of samples. If the ``indexes`` argument is specified, at | ||||||||||||||||||||||||
| least two sample sets must also be specified. ``indexes`` specifies the | ||||||||||||||||||||||||
| sample set indexes between which to compute LD. | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| For more on how the ``indexes`` and ``sample_sets`` interact with the | ||||||||||||||||||||||||
| output dimensions, see the :ref:`sec_stats_two_locus_sample_sets` | ||||||||||||||||||||||||
| section. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| **Available Stats** (use ``Stat Name`` in the ``stat`` keyword | ||||||||||||||||||||||||
| argument). | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ======================= ========== ================ ============== | ||||||||||||||||||||||||
| Stat Polarised Multi Sample Set Stat Name | ||||||||||||||||||||||||
| ======================= ========== ================ ============== | ||||||||||||||||||||||||
| :math:`r^2` n y "r2" | ||||||||||||||||||||||||
| :math:`r` y n "r" | ||||||||||||||||||||||||
| :math:`D^2` n y "D2" | ||||||||||||||||||||||||
| :math:`D` y n "D" | ||||||||||||||||||||||||
| :math:`D'` y n "D_prime" | ||||||||||||||||||||||||
| :math:`D_z` n n "Dz" | ||||||||||||||||||||||||
| :math:`\pi2` n n "pi2" | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
| :math:`\widehat{D^2}` n y "D2_unbiased" | ||||||||||||||||||||||||
| :math:`\widehat{D_z}` n n "Dz_unbiased" | ||||||||||||||||||||||||
| :math:`\widehat{\pi_2}` n n "pi2_unbiased" | ||||||||||||||||||||||||
| ======================= ========== ================ ============== | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| :param list sample_sets: A list of lists of sample Node IDs, specifying | ||||||||||||||||||||||||
| the groups of nodes to compute the statistic with. Defaults to all | ||||||||||||||||||||||||
| samples grouped by population. | ||||||||||||||||||||||||
| :param str mode: A string giving the "type" of the statistic to be | ||||||||||||||||||||||||
| computed. Defaults to "site", can be "site" or "branch". | ||||||||||||||||||||||||
| :param str stat: A string giving the selected two-locus statistic to | ||||||||||||||||||||||||
| compute. Defaults to "r2". | ||||||||||||||||||||||||
| :param list sites: A list of sites over which to compute LD. Can be | ||||||||||||||||||||||||
| specified as a list of lists to control the row and column sites. | ||||||||||||||||||||||||
| Only applicable in site mode. Specify as | ||||||||||||||||||||||||
| ``[[row_sites], [col_sites]]`` or ``[all_sites]``. | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||
| :param list positions: A list of genomic positions where expected LD is | ||||||||||||||||||||||||
| computed. Only applicable in branch mode. Can be specified as a list | ||||||||||||||||||||||||
| of lists to control the row and column positions. Specify as | ||||||||||||||||||||||||
| ``[[row_positions], [col_positions]]`` or ``[all_positions]``. | ||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Is that right?) |
||||||||||||||||||||||||
| :param list indexes: A list of 2-tuples or a single 2-tuple, specifying | ||||||||||||||||||||||||
| the indexes of two sample sets over which to compute a two-way LD | ||||||||||||||||||||||||
| statistic. Only :math:`r^2`, :math:`D^2`, and :math:`\widehat{D^2}` | ||||||||||||||||||||||||
| are implemented for two-way statistics. | ||||||||||||||||||||||||
| :return: A 2D or 3D array of LD matrices. | ||||||||||||||||||||||||
| :rtype: numpy.ndarray | ||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||
| one_way_stats = { | ||||||||||||||||||||||||
| "D": self._ll_tree_sequence.D_matrix, | ||||||||||||||||||||||||
| "D2": self._ll_tree_sequence.D2_matrix, | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.