Skip to content

Python Module

Simone Ciccolella edited this page Jun 25, 2020 · 4 revisions

Once installed the tool can be imported in any Python 3.6+ script

import mp3treesim as mp3

Loading input files

It is possible to read trees from multiple sources from the Python module if the tree is of the format described in the Input format section.

Read from file

Suppose we have a file tree.gv containing the following

digraph Tree {
    1 [label="A"];
    2 [label="B,G"];
    3 [label="C"];
    4 [label="D"];
    5 [label="E"];
    6 [label="F"];
    1 -> 2;
    1 -> 3;
    2 -> 4;
    2 -> 5;
    3 -> 6;
}

The tree can be imported directly with:

tree = mp3.read_dotfile('tree.gv')

Partially-labeled tree

If the tree is to be considered as a partially labeld tree, it is necessary to pass the additional argument labeled_only to the function:

tree = mp3.read_dotfile('tree.gv', labeled_only=True)

Exclude mutations

It is possible to exclude a set of mutations from the computation, to avoid rewriting the trees manually by using the exclude argument. The argument must contain a string of comma separated labels. Such labels will be ignored when constructing the tree and therefore will not be considered when computing the similarity score.

labels_to_exlude = 'A,F'
tree = mp3.read_dotfile('tree.gv', labeled_only=True, exclude=labels_to_exlude)

Read from string

It is possible to build trees directly from a Python string

gv_tree = '''
    digraph G {
        1 [label="A,B,C"];
        3 [label="D"];
        5 [label="E,F,G"]
        0->1;
        1->3;
        3->4;
        0->5;
    }
'''

tree = mp3.read_dotstring(gv_tree)

Partially-labeled tree

If the tree is to be considered as a partially labeld tree, it is necessary to pass the additional argument labeled_only to the function:

tree = mp3.read_dotstring(gv_tree, labeled_only=True)

Exclude mutations

It is possible to exclude a set of mutations from the computation, to avoid rewriting the trees manually by using the exclude argument. The argument must contain a string of comma separated labels. Such labels will be ignored when constructing the tree and therefore will not be considered when computing the similarity score.

labels_to_exlude = 'A,F'
tree = mp3.read_dotstring(gv_tree, labeled_only=True, exclude=labels_to_exlude)

Build from networkx AGraph

If a networkx.AGraph structure is already available and it conforms with the input formats specifications you can use it directly to build the tree structure for MP3. To avoid possible problems, we advise to avoid using it or be careful with this function.

# given T as a networkx.AGraph
tree= mp3.build_tree(T, labeled_only=False, exclude=None)

The previous considerations about partially-labeled trees and the exclusion of mutations are applicable to this function as well.

Computing the similarity score

Once two trees are built using the previously described functions, they can be used to compute the similarity score

# given two trees tree1 and tree2
score = mp3.similarity(tree1, tree2)

Scoring mode

By default mp3treesim uses the sigmoid fuction which combines the similarities computed on both the intersection and union of labels of the two input trees, as described in the paper.

It is possible to modify this behaviour and obtain the value computed on the intersection only, on the union only or to use a geometric mean as combination of the two values using the mode argument. We advise the user to use the default sigmoid mode.

# given two trees tree1 and tree2

score_intersection = mp3.similarity(tree1, tree2, mode='intersection')
score_union = mp3.similarity(tree1, tree2, mode='union')
score_geometric = mp3.similarity(tree1, tree2, mode='geometric')

Parallel computation

The tool scale to hundreds of mutation quite fast (200 mutations in ~30 seconds), however to decrease run times it is possible to compute the measure in parallel by using the argument cores by default it is set to 1, if a value of 0 (or less) is specified the program will use all the cores available on the machine.

# given two trees tree1 and tree2

score = mp3.similarity(tree1, tree2, cores=8)

Draw trees

It is possible to use networkx and pygraphviz to draw the trees built in MP3:

# given a tree
mp3.draw_tree(tree)

NOTE 1: Networkx uses matplotlib to display the tree. If you are using a Notebook-like environment (Jupyter, CoLab) it will be display automatically. If you are using it from command line it will be necessary to run plt.show() to display it.

NOTE 2: Due to an unreliable behaviour of netxwork and pygraph it is necessary to create a copy of the input tree and loop over the nodes twice. Beware this in case you want to display very large trees.

Clone this wiki locally