-
Notifications
You must be signed in to change notification settings - Fork 3
Python Module
Once installed the tool can be imported in any Python 3.6+ script
import mp3treesim as mp3It is possible to read trees from multiple sources from the Python module if the tree is of the format described in the Input format section.
Suppose we have a file tree.gv containing the following
digraph Tree {
1 [label="A"];
2 [label="B,G"];
3 [label="C"];
4 [label="D"];
5 [label="E"];
6 [label="F"];
1 -> 2;
1 -> 3;
2 -> 4;
2 -> 5;
3 -> 6;
}
The tree can be imported directly with:
tree = mp3.read_dotfile('tree.gv')If the tree is to be considered as a partially labeld tree, it is necessary
to pass the additional argument labeled_only to the function:
tree = mp3.read_dotfile('tree.gv', labeled_only=True)It is possible to exclude a set of mutations from the computation, to avoid rewriting the trees manually by using the exclude argument. The argument must contain a string of comma separated
labels.
Such labels will be ignored when constructing the tree and therefore will not be considered
when computing the similarity score.
labels_to_exlude = 'A,F'
tree = mp3.read_dotfile('tree.gv', labeled_only=True, exclude=labels_to_exlude)It is possible to build trees directly from a Python string
gv_tree = '''
digraph G {
1 [label="A,B,C"];
3 [label="D"];
5 [label="E,F,G"]
0->1;
1->3;
3->4;
0->5;
}
'''
tree = mp3.read_dotstring(gv_tree)If the tree is to be considered as a partially labeld tree, it is necessary
to pass the additional argument labeled_only to the function:
tree = mp3.read_dotstring(gv_tree, labeled_only=True)It is possible to exclude a set of mutations from the computation, to avoid rewriting the trees manually by using the exclude argument. The argument must contain a string of comma separated
labels.
Such labels will be ignored when constructing the tree and therefore will not be considered
when computing the similarity score.
labels_to_exlude = 'A,F'
tree = mp3.read_dotstring(gv_tree, labeled_only=True, exclude=labels_to_exlude)If a networkx.AGraph structure is already available and it conforms with the input formats
specifications you can use it directly to build the tree structure for MP3.
To avoid possible problems, we advise to avoid using it or be careful with this function.
# given T as a networkx.AGraph
tree= mp3.build_tree(T, labeled_only=False, exclude=None)The previous considerations about partially-labeled trees and the exclusion of mutations are applicable to this function as well.
Once two trees are built using the previously described functions, they can be used to compute the similarity score
# given two trees tree1 and tree2
score = mp3.similarity(tree1, tree2)By default mp3treesim uses the sigmoid fuction which combines the similarities
computed on both the intersection and union of labels of the two input trees, as
described in the paper.
It is possible to modify this behaviour and obtain the value computed
on the intersection only, on the union only
or to use a geometric mean as combination of the two values using the mode argument.
We advise the user to use the default sigmoid mode.
# given two trees tree1 and tree2
score_intersection = mp3.similarity(tree1, tree2, mode='intersection')
score_union = mp3.similarity(tree1, tree2, mode='union')
score_geometric = mp3.similarity(tree1, tree2, mode='geometric')The tool scale to hundreds of mutation quite fast (200 mutations in ~30 seconds),
however to decrease run times it is possible to compute the measure in parallel
by using the argument cores by default it is set to 1, if a value of
0 (or less) is specified the program will use all the cores available on the machine.
# given two trees tree1 and tree2
score = mp3.similarity(tree1, tree2, cores=8)It is possible to use networkx and pygraphviz to draw the trees built
in MP3:
# given a tree
mp3.draw_tree(tree)NOTE 1: Networkx uses matplotlib to display the tree. If you are using a Notebook-like
environment (Jupyter, CoLab) it will be display automatically.
If you are using it from command line it will be necessary to run plt.show() to
display it.
NOTE 2: Due to an unreliable behaviour of netxwork and pygraph it is necessary to create a copy of the input tree and loop over the nodes twice. Beware this in case you want to display very large trees.
MP3 tree similarity -- Version 1.0.6