Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 243 additions & 6 deletions docs/stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -684,14 +684,251 @@ and boolean expressions (e.g., {math}`(x > 0)`) are interpreted as 0/1.
and {math}`v_j` is the covariance of the trait with the j-th covariate.


(sec_stats_multi_site)=

## Multi site statistics

:::{todo}
Document statistics that use information about correlation between sites, such as
LdCalculator (and perhaps reference {ref}`sec_identity`). Note that if we have a general
framework which has the same calling conventions as the single site stats,
we can rework the sections above.
:::
(sec_stats_two_locus)=

### Two-locus statistics

The {meth}`~TreeSequence.ld_matrix` method provides an interface to
a collection of two-locus statistics with predefined summary functions (see
{ref}`sec_stats_two_locus_summary_functions`) and `site` and `branch`
{ref}`modes <sec_stats_mode>`. The LD matrix method differs from other
statistics methods in that it provides a unified API with an argument to
specify different two-locus summaries of the data. It otherwise behaves
similarly to most other functions with respect to `sample_sets` and `indexes`.

Two-locus statistics can be computed using two modes, either `site` or
`branch`, and these should be interpreted in the same way as these modes in the
single-site statistics. Within this framework, we also implement polarisation,
but do not expose it to the user, opting to provide statistics that are
polarised where appropriate.

(sec_stats_two_locus_site)=

#### Site

The `site` mode computes two-locus statistics summarized over alleles between
all pairs of specified sites. The default behavior, leaving `sites`
unspecified, this method will compute a matrix for all pairs of sites, with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unspecified, this method will compute a matrix for all pairs of sites, with
unspecified, will compute a matrix for all pairs of sites, with

rows and columns representing each site in the tree sequence (i.e., an n×n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rows and columns representing each site in the tree sequence (i.e., an n×n
one row and column for each site in the tree sequence (i.e., an n x n

matrix where n is the number of sites in the tree sequence). We can also
restrict the output to a subset of sites, either by specifying a single vector
for both rows and columns or a pair of vectors for the row sites and column
sites separately.

The following computes a matrix of the {math}`r^2` measure of linkage
disequilibrium (LD) computed pairwise between the first 4 sites in the tree
sequence among all samples. In our computations, row sites are used as the row
("left-hand") loci and column sites are used as the column ("right-hand")
locus, and with a single list of sites specified, we obtain a symmetric square
matrix.

```{code-cell} ipython3
ld = ts.ld_matrix(sites=[[0, 1, 2, 3]])
print(ld)
```

The following demonstrates how we can specify the row and column sites
independently of each other. We're specifying 3 columns and 2 rows, which
computes a subset of the matrix shown above.

```{code-cell} ipython3
ld = ts.ld_matrix(sites=[[0, 1], [1, 2, 3]])
print(ld)
```

Because we implement two-locus statistics for multi-allelic data, we require
a method for combining the statistical results from each pair of alleles into
one summary for a pair of sites. There two methods for combining results from
multiple alleles, `hap_weighted` and `total_weighted`, which are
statistic-specific and not chosen by the user:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing description of hap_weighted and total_weighted here?


(sec_stats_two_locus_branch)=

#### Branch

The `branch` mode computes two-locus statistics between pairs of trees. The
resulting statistic is a summary depending on marginal tree topologies and the
branch lengths of the trees at a pair of positions. To perform this
computation, we consider the counts of all possible two-locus haplotypes that
may be generated by possible mutations on each pair of branches between the two
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
may be generated by possible mutations on each pair of branches between the two
could be generated by mutations on each pair of branches between the two

trees, which is summarized using an LD statistic. We then sum each pairwise
comparison weighted by the product of the two branch lengths above the nodes on
each tree.

Similar to the single-site statistics computed in `branch` mode, this results
in a statistic that is proportional to the expected statistic under an infinite
sites model (with mutation rate 1), conditioned on the pair of trees.
Comment on lines +763 to +764
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right. At first I was going to just suggest the following:

in a statistic that is proportional to the expected statistic under an infinite
sites model. More precisely, suppose we place mutations on each tree as
a Poisson process of rate 1 per unit of branch length, compute
the statistic for each pair of mutations (one on each tree), and add these
up; the `branch` mode is the expected value of the resulting sum.
(Note that unlike the single-site
statistics, these mutations are placed on the trees at a given specific
pair of positions, and so the span of the trees do not affect the result.)

However, quick experimenting with ts.ld_matrix() to check if this was right pointed out that values can easily be nan and so an expected value across mutations is not well-defined. I think we described what this is in the original issue on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See more discussion on this below.


The time complexity of this method is quadratic, due to the pairwise
comparisons of branches from a pair of trees. By default, this method computes
Comment on lines +766 to +767
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The time complexity of this method is quadratic, due to the pairwise
comparisons of branches from a pair of trees. By default, this method computes
The time complexity of this method is quadratic in the number of samples,
due to the pairwise comparisons of branches from each pair of trees.
By default, this method computes

a symmetric matrix for all pairs of trees, with rows and columns representing
each tree in the tree sequence. Similar to the site method, we can restrict the
output to a subset of trees, either by specifying a vector of positions or
a pair of vectors for row and column positions separately. To select a specific
tree, the specified positions must land in the tree span (`[start, end)`).

In the following, we compute a matrix of expected {math}`r^2` within and
between the first 4 trees in the tree sequence. The tree breakpoints are
a convenient way to specify those first four trees tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a convenient way to specify those first four trees tree.
a convenient way to specify those first four trees.


```{code-cell} ipython3
ld = ts.ld_matrix(mode="branch", positions=[ts.breakpoints(as_array=True)[0:4]])
print(ld)
```

Again, we can specify the row and column trees separately.

```{code-cell} ipython3
breakpoints = ts.breakpoints(as_array=True)
ld = ts.ld_matrix(mode="branch", positions=[breakpoints[[0]], breakpoints[0:4]])
print(ld)
```

(sec_stats_two_locus_sample_sets)=

#### Sample Sets

The two-locus API provides a mechanism by which to subset the samples under
consideration, providing the ability to compute a separate LD matrix for each
sample set or an LD matrix between sample sets. Output dimensions are handled
in the same manner as the rest of the stats API (see
{ref}`sec_stats_output_dimensions`). The two-locus API differs from the rest of
the stats API in that we handle one-way and two-way statistics in the same
function call.

To compute a two-way two-locus statistic, the `index` argument must be
provided. The statistics are selected in the same way (with the `stat`
argument), but we provide a restricted set of two-way statistics (see
{ref}`sec_stats_two_locus_summary_functions_two_way`). The dimension-dropping
rules for the result follow the rest of the tskit stats API in that scalar
indexes will produce a single two-dimensional matrix, while list of indexes
will produce a three-dimensional array, with the outer dimension of length
equal to the length of the list of indexes.
Comment on lines +807 to +810
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rules for the result follow the rest of the tskit stats API in that scalar
indexes will produce a single two-dimensional matrix, while list of indexes
will produce a three-dimensional array, with the outer dimension of length
equal to the length of the list of indexes.
rules for the result follow the rest of the tskit stats API in that a single list
or tuple will produce a single two-dimensional matrix, while list of these
will produce a three-dimensional array, with the outer dimension of length
equal to the length of the list.


For concreteness, we would expect the following dimensions with the specified
`sample_sets` and `indexes` arguments.

```
# one-way
ts.ld_matrix(sample_sets=None) -> 2 dimensions
ts.ld_matrix(sample_sets=[0, 1, 2, 3]) -> 2 dimensions
ts.ld_matrix(sample_sets=[[0, 1, 2, 3]]) -> 3 dimensions
# two-way
ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=(0, 1)) -> 2 dimensions
ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) -> 3 dimensions
```

(sec_stats_two_locus_sample_one_way_stats)=

##### One-way Statistics

One-way statistics are summaries of two loci in a single sample set, using a
triple of haplotype counts {math}`\{w_{Ab}, w_{aB}, w_{AB}\}` and the size of
the sample set {math}`n`, where the capitalized and lowercase letters in our notation
represent alternate alleles.

(sec_stats_two_locus_sample_two_way_stats)=

##### Two-way Statistics

Two-way statistics are summaries of haplotype counts between two sample sets,
which operate on the three haplotype counts (as in one-way stats, above)
computed from each sample set, indexed by `(i, j)`. These statistics take on
a different meaning from their one-way counterparts. For instance {math}`D_i
D_j` can be thought of as the covariance between two loci when computed for
a haplotype in a single sample set.

Only a subset of our summary functions are two-way statistics (see
{ref}`sec_two_locus_summary_functions_two_way`). Note that the unbiased two-way
statistics expect non-overlapping sample sets, and we do not make any
assertions about the sample sets and assume that `i` and `j` represent disjoint
sets of samples (see also the note in {meth}`~TreeSequence.divergence`).

(sec_stats_two_locus_summary_functions)=

#### Summary Functions

(sec_stats_two_locus_summary_functions_one_way)=

##### One-way

The two-locus summary functions all take haplotype counts and sample set size as
input. Each of our summary functions has the signature
{math}`f(w_{Ab}, w_{aB}, w_{AB}, n)`, converting to haplotype frequencies
{math}`\{p_{Ab}, p_{aB}, p_{AB}\}` where appropriate by dividing by {math}`n`.

`D`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{ab} - p_{a}p_{b}`

This statistic is polarised, as the unpolarised result, which averages over
allele labelings, is zero. Uses the `total` normalisation method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below this is called "total weighted" not "normalised"; just checking?


`D_prime`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{D_{\max}}`

Where {math}```
D_{\max} = \begin{cases}
Comment on lines +871 to +874
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{D_{\max}}`
Where {math}```
D_{\max} = \begin{cases}
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{D_{\max}}`,
where {math}```
D_{\max} = \begin{cases}

\min\{p_A (1-p_B), p_B (1-p_B)\} & \textrm{if }D>=0 \\
\min\{p_A p_B, (1-p_B) (1-p_B)\} & \textrm{otherwise}
\end{cases}```

and {math}`D` is defined above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and {math}`D` is defined above.
and {math}`D` is defined above. Polarised, `total` weighted.


`D2`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = (p_{ab} - p_{a} p_{b})^2`

Unpolarised, total weighted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unpolarised, total weighted.
Unpolarised, `total` weighted.


`Dz`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D (1 - 2 p_{a})(1 - 2p_{b})`

Where {math}`D` is defined above. Unpolarised, total weighted.
Comment on lines +887 to +889
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D (1 - 2 p_{a})(1 - 2p_{b})`
Where {math}`D` is defined above. Unpolarised, total weighted.
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D (1 - 2 p_{a})(1 - 2p_{b})`,
where {math}`D` is defined above. Unpolarised, `total` weighted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the "where" is part of the same sentence as the equation, not a new sentence, so it's not capitalized)


`pi2`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{a}p_{b}(1-p_{a})(1-p_{b})`

Unpolarised, total weighted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unpolarised, total weighted.
Unpolarised, `total` weighted.


`r`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{\sqrt{p_{a}p_{b}(1-p_{a})(1-p_{b})}}`

Where {math}`D` is defined above. Polarised, total weighted.

Comment on lines +897 to +900
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{\sqrt{p_{a}p_{b}(1-p_{a})(1-p_{b})}}`
Where {math}`D` is defined above. Polarised, total weighted.
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{\sqrt{p_{a}p_{b}(1-p_{a})(1-p_{b})}}`,
where {math}`D` is defined above. Polarised, `total` weighted.

`r2`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D^{2}}{p_{a}p_{b}(1-p_{a})(1-p_{b})}`

Where {math}`D` is defined above. Unpolarised, haplotype weighted.
Comment on lines +902 to +904
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D^{2}}{p_{a}p_{b}(1-p_{a})(1-p_{b})}`
Where {math}`D` is defined above. Unpolarised, haplotype weighted.
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D^{2}}{p_{a}p_{b}(1-p_{a})(1-p_{b})}`,
where {math}`D` is defined above. Unpolarised, `haplotype` weighted.


Unbiased two-locus statistics from the Hill-Robertson (1968) system are
computed from haplotype counts. Derivations for these unbiased estimators can
be found in [Ragsdale and Gravel
(2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples
Comment on lines +907 to +909
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
computed from haplotype counts. Derivations for these unbiased estimators can
be found in [Ragsdale and Gravel
(2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples
computed from haplotype counts. Definitions of these unbiased estimators can
be found in [Ragsdale and Gravel
(2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples

Also derivations, but it's more relevant that the defintiions are there?

to be valid and are specified as `stat=D2_unbiased`, `Dz_unbiased`, or
`pi2_unbiased`.
Comment on lines +910 to +911
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to be valid and are specified as `stat=D2_unbiased`, `Dz_unbiased`, or
`pi2_unbiased`.
to be valid and are specified as `stat="D2_unbiased"`, `"Dz_unbiased"`, or
`"pi2_unbiased"`.


(sec_two_locus_summary_functions_two_way)=

(sec_stats_two_locus_summary_functions_two_way)=

##### Two-way

Two-way statistics are indexed by sample sets {math}`i, j` and compute values
using haplotype counts within pairs of sample sets.

`D2`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D_i * D_j`

Where {math}`D` is defined above.
Comment on lines +923 to +925
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D_i * D_j`
Where {math}`D` is defined above.
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D_i * D_j`,
where {math}`D` is defined above.


`r2`
: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{(p_{AB_i} - (p_{A_i} p_{B_i})) (p_{AB_j} - (p_{A_j} p_{B_j}))}{\sqrt{p_{A_i} (1 - p_{A_i}) p_{B_i} (1 - p_{B_i})}\sqrt{p_{A_j} (1 - p_{A_j}) p_{B_j} (1 - p_{B_j})}}`

And `D2_unbiased`, which can be found in [Ragsdale and Gravel
(2020)](https://doi.org/10.1093/molbev/msz265).


(sec_stats_notes)=
Expand Down
6 changes: 6 additions & 0 deletions python/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ In development
- Add ``json+struct`` metadata codec that allows storing binary data using a struct
schema alongside JSON metadata. (:user:`benjeffery`, :pr:`3306`)

**Features**

- Add ``TreeSequence.ld_matrix`` stats method and documentation, for computing
two-locus statistics in site and branch mode.
(:user:`lkirk`, :user:`apragsdale`, :pr:`3416`)

--------------------
[1.0.2] - 2026-03-06
--------------------
Expand Down
81 changes: 79 additions & 2 deletions python/tskit/trees.py
Original file line number Diff line number Diff line change
Expand Up @@ -10930,12 +10930,89 @@ def impute_unknown_mutations_time(
def ld_matrix(
self,
sample_sets=None,
sites=None,
positions=None,
mode="site",
stat="r2",
sites=None,
positions=None,
indexes=None,
):
r"""

Returns a matrix of the specified two-locus statistic (default
:math:`r^2`) computed from sample allelic states or branch lengths.
The resulting linkage disequilibrium (LD) matrix represents either the
two-locus statistic as computed between all pairs of specified
``sites`` ("site" mode, producing a num_sites-by-num_sites sized
matrix), or as computed from the branch structures at marginal trees
between pairs of trees at all specified ``positions`` ("branch" mode,
producing a num_positions-by-num_positions sized matrix).

In the site mode, the sites under consideration can be restricted using
the ``sites`` argument. Sites can be passed as a list of lists,
specifying the ``[[row_sites], [col_sites]]``, resulting in a
rectangular matrix, or by specifying a single list of ``[sites]``, in
which a square matrix will be produced (see
:ref:`sec_stats_two_locus_site` for examples).

Similarly, in the branch mode, the ``positions`` argument specifies
loci for which the expectation for the two-locus statistic is computed
over pairs of trees at those positions. LD stats are computed between
trees whose ``[start, end)`` contains the given position (such that
repeats of trees are possible). Similar to the site mode, a nested list
Comment on lines +10957 to +10961
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Similarly, in the branch mode, the ``positions`` argument specifies
loci for which the expectation for the two-locus statistic is computed
over pairs of trees at those positions. LD stats are computed between
trees whose ``[start, end)`` contains the given position (such that
repeats of trees are possible). Similar to the site mode, a nested list
Similarly, in the branch mode, the ``positions`` argument specifies
genomic coordinates at which the expectation for the two-locus statistic
is computed, given the local tree structure. This defaults to computing
the LD for each pair of distinct trees, which is equivalent to passing in
the leftmost coordinates of each tree's span (since intervals are closed on
the left and open on the right). Similar to the site mode, a nested list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to adjust this given the "expectation" discussion above.

of row and column positions can be specified separately (resulting in a
rectangular matrix) or a single list of a specified positions results
in a square matrix (see :ref:`sec_stats_two_locus_branch` for
examples).

Some LD statistics are defined for two sample sets instead of within a
single set of samples. If the ``indexes`` argument is specified, at
least two sample sets must also be specified. ``indexes`` specifies the
sample set indexes between which to compute LD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sample set indexes between which to compute LD.
indexes of the sample sets in the ``sample_sets`` list
between which to compute LD.


For more on how the ``indexes`` and ``sample_sets`` interact with the
output dimensions, see the :ref:`sec_stats_two_locus_sample_sets`
section.

**Available Stats** (use ``Stat Name`` in the ``stat`` keyword
argument).

======================= ========== ================ ==============
Stat Polarised Multi Sample Set Stat Name
======================= ========== ================ ==============
:math:`r^2` n y "r2"
:math:`r` y n "r"
:math:`D^2` n y "D2"
:math:`D` y n "D"
:math:`D'` y n "D_prime"
:math:`D_z` n n "Dz"
:math:`\pi2` n n "pi2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:math:`\pi2` n n "pi2"
:math:`\pi_2` n n "pi2"

:math:`\widehat{D^2}` n y "D2_unbiased"
:math:`\widehat{D_z}` n n "Dz_unbiased"
:math:`\widehat{\pi_2}` n n "pi2_unbiased"
======================= ========== ================ ==============

:param list sample_sets: A list of lists of sample Node IDs, specifying
the groups of nodes to compute the statistic with. Defaults to all
samples grouped by population.
:param str mode: A string giving the "type" of the statistic to be
computed. Defaults to "site", can be "site" or "branch".
:param str stat: A string giving the selected two-locus statistic to
compute. Defaults to "r2".
:param list sites: A list of sites over which to compute LD. Can be
specified as a list of lists to control the row and column sites.
Only applicable in site mode. Specify as
``[[row_sites], [col_sites]]`` or ``[all_sites]``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``[[row_sites], [col_sites]]`` or ``[all_sites]``.
``[[row_sites], [col_sites]]`` or ``[all_sites]``.
Defaults to all sites.

:param list positions: A list of genomic positions where expected LD is
computed. Only applicable in branch mode. Can be specified as a list
of lists to control the row and column positions. Specify as
``[[row_positions], [col_positions]]`` or ``[all_positions]``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``[[row_positions], [col_positions]]`` or ``[all_positions]``.
``[[row_positions], [col_positions]]`` or ``[all_positions]``.
Defaults to the leftmost coordinates of all trees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Is that right?)

:param list indexes: A list of 2-tuples or a single 2-tuple, specifying
the indexes of two sample sets over which to compute a two-way LD
statistic. Only :math:`r^2`, :math:`D^2`, and :math:`\widehat{D^2}`
are implemented for two-way statistics.
:return: A 2D or 3D array of LD matrices.
:rtype: numpy.ndarray
"""
one_way_stats = {
"D": self._ll_tree_sequence.D_matrix,
"D2": self._ll_tree_sequence.D2_matrix,
Expand Down