A Python library for tree algorithms and data structures.
The package requires Python 3.10 or higher.
The tralda package is available on PyPI:
pip install traldaAlternatively, you can clone the repo, go to the root folder of package and install it using the command:
pip install .The package has several dependencies (which are installed automatically when using pip):
The class Tree implements the tree data structure which is essential for most of the modules in
the package and can be imported from the subpackage tralda.datastructures.
It provides methods for tree traversals and manipulation, output in Newick format, as well as the
linear-time computation of last common ancestors by Bender & Farach-Colton (class LCA which is
initialized with an instance of type Tree).
Tree instances can be serialized in pickle or json format.
Overview of the functions of the class TreeNode: (Click to expand)
| Function | Description |
|---|---|
attributes() |
generator for the node attributes |
add_child(child_node) |
add child_node as a child |
add_child_right_of(child_node, right_of) |
add child_node as a child to the right of right_of |
remove_child(child_node) |
remove the child child_node |
detach() |
remove the node from its parent's children |
is_leaf() |
check if the node is a leaf |
child_subsequence(left_node, right_node) |
list of children between left_node and right_node |
Overview of the functions of the class Tree: (Click to expand)
| Function | Description |
|---|---|
height() |
height of the tree (number of edges in the longest path from the root to a leaf) |
leaves() |
generator for the leaf nodes |
preorder() |
generator for preorder (=top-down) traversal |
postorder() |
generator for postorder (=bottom-up) traversal |
inner_vertices() |
generator for inner nodes |
edges() |
generator for the edges of the tree |
euler_generator() |
generator for an Euler tour |
leaf_dict() |
compute the list of leaf nodes in the subtree of each node, and return them as a dict |
contract(edges) |
contract all edges in the collection edges |
get_triples() |
return a list of all triples that are displayed by the tree |
delete_and_reconnect(node) |
delete node and reconnect its children to its parent |
copy() |
construct a copy of the tree (node attributes are only copied as references, but mutable data types should be avoided as node attribute values) |
to_newick() |
return a str representation of the tree in Newick format |
random_tree(N, binary=False) |
return a random tree with N leaves that is optionally forced to be binary; new children are stepwise attached to randomly selected nodes until N are reached |
to_nx() |
return a NetworkX DiGraph version of the tree (with the ids of the TreeNode instances as nodes) and its root (also represented by the id) |
parse_nx(G, root) |
convert a tree encoded as a NetworkX DiGraph (together with the root) back into a Tree |
serialize(filename, mode=None) |
serialize a tree in JSON or pickle format specified by mode; default is None, in which case the mode is inferred from the filename ending |
load(filename, mode=None) |
load a tree from file in JSON or pickle format specified by mode; default is None, in which case the mode is inferred from the filename ending |
is_binary() |
check if the tree is binary |
is_phylogenetic() |
check if the tree is phylogenetic (all inner nodes have at least one child) |
equal_topology(other) |
check whether this tree and other have the same topology based on the leaves' label attributes |
is_refinement |
check whether this tree refines other based on the leaves' label attributes |
Overview of the functions of the class LCA: (Click to expand)
| Function | Description |
|---|---|
get(a, b) |
get the last common ancestor of nodes a and b |
displays_triple(a, b, c) |
check whether the triple ab |
are_comparable(u, v) |
check whether u and v are comparable in terms of the ancestor relation |
ancestor_or_equal(u, v) |
check whether u is equal to or an ancestor of v |
ancestor_not_equal(u, v) |
check whether u is a strict ancestor of v |
descendant_or_equal(u, v) |
check whether u is equal to or a descendant of v |
descendant_not_equal(u, v) |
check whether u is a strict descendant of v |
consistent_triples(triples) |
list with the subset of triples that are displayed by the tree |
consistent_triple_generator |
generator for the items in triples that are displayed |
Example usage: (Click to expand)
from tralda.datastructures import Tree, LCA
# construct a random tree with 20 leaves
tree = Tree.random_tree(20)
# serialization, reload via Tree.load('path/to/file.json')
tree.serialize('path/to/file.json')
# linear-time processing of the tree for constant-time
# last common ancestor queries
lca_T = LCA(tree)
# l.c.a. queries via 'TreeNode' instances or labels (if the nodes
# in the tree have the label attribute set)
print( lca_T(4, 7) )
# triple queries (e.g. is 3 5 | 2 displayed?)
print( lca_T.displays_triple(3, 5, 2) )
The subpackage tralda.supertree implements a number of algorithms for the computation of
supertrees:
- BUILD (Aho et al. 1981), class
Buildor functionbuild_supertree - BuildST (Deng & Fernández-Baca 2016), class
BuildSTor functionbuild_st - Loose_Cons_Tree (Jansson et al. 2016), class
LooseConsensusTreeor functionloose_consensus_tree - LinCR (Schaller et al. 2021), class
LinCRor functionlinear_common_refinement
The latter two algorithms compute the loose consensus tree and the common refinement, resp., for a sequence of trees on the same set of leaves in linear time.
The subpackage tralda.cograph contains an efficient algorithm for cograph recognition and
heuristics for cograph editing:
- function
to_cotreerecognizes cographs and returns aTreerepresentation in the positive case (Corneil et al. 1985) - function
edit_to_cographedits an arbitrary graph to a cograph (algorithm from Crespelle 2019) and returns aTreerepresentation
The package contains an implementation of the poly-logarithmic dynamic graph structure described
by Holm, de Lichtenberg & Thorup (2001), called HDT data datastructure after the authors.
The datastructure uses Euler tour trees (Henzinger and King 1999) to determine in O(log n) time
whether two given nodes are connected.
The class HDTGraph can be imported from tralda.datastructures and provides functions for adding
and deleting nodes and edges, as well as querying node and graph connectivity.
The following auxiliary data structures can be imported from the subpackage tralda.datastructures:
- linked list: class
LinkedList - doubly-linked list: class
DoublyLinkedList - AVL trees: classes
TreeSetandTreeDictimplement data structures for sorted sets and dictionaries, respectively
If you want to contribute to tralda, please use the package and project manager
uv.
See this page for installation
instructions.
Moreover, please use pre-commit for automated code formatting and linting.
To install it and initialize it for your local tralda repository, follow these steps:
- Install
uv - Run the following command (after which you should be able to run
pre-commitfrom anywhere)uv tool install pre-commit --with pre-commit-uv
- Navigate to the root directory of your local
traldarepositorycd <MY_PATH_TO>/tralda
- Install
pre-commitas a git hook in thetraldarepositorypre-commit install
If you use tralda in your project or code from it, please consider citing:
Schaller, D., Hellmuth, M., Stadler, P.F. (2021) A Simple Linear-Time Algorithm for the Common Refinement of Rooted Phylogenetic Trees on a Common Leaf Set.
Additional references to algorithms that were implemented are given in the source code.
Please report any bugs and questions in the Issues section. Also, feel free to make suggestions for improvement and/or new functionalities.