expam’s tree module

A programmatic API to interact with phylogenetic trees, particularly those used in reference databases.

expam.tree.location.Location

class expam.tree.location.Location(name='', type='', dist=0.0, coord=None, accession_id=None, taxid=None, **kwargs)

Represents a node in the phylogeny.

expam.tree.location.Location.__init__(self, name='', type='', dist=0.0, coord=None, accession_id=None, taxid=None, **kwargs)
Parameters
  • name (str, optional) – name of node, defaults to “”

  • type (str, optional) – Leaf or Branch, defaults to “”

  • dist (float, optional) – distance to parent node, defaults to 0.0

  • coord (list, optional) – binary coordinate from root to node, defaults to None

  • accession_id (str, optional) – NCBI accession id, defaults to None

  • taxid (int, optional) – NCBI taxonomy id, defaults to None

Variables
  • name – node name

  • type – “Leaf” or “Branch”

  • distance – distance to parent node

  • coordinate – list of binary binary numbers representing path from root to node

  • nchildren – number of children below this node

  • accession_id – NCBI accession id (only valid for leaves)

  • taxid – NCBI taxonomy id

expam.tree.tree.Index

class expam.tree.tree.Index

Phylogeny index that can load, save and manipulate Newick trees.

expam.tree.tree.Index.load_newick(path, keep_names=False, verbose=True)

load_newick Load Newick tree from file.

Parameters

path (str) – path to Newick file

Raises

OSError – file does not exist

Returns

name of leaves and phylogeny Index object

Return type

List[str], expam.tree.Index

expam.tree.tree.Index.from_newick(newick_string, keep_names=False, verbose=True)

from_newick Parse Newick string.

Parameters

newick_string (str) – Newick string encoding tree.

Returns

name of leaves and phylogeny Index object

Return type

List[str], expam.tree.Index

Example loading an Index object from a Newick string.

>>> from expam.tree.tree import Index
>>> tree_string = "(B:6.0,(A:5.0,C:3.0,E:4.0):5.0,D:11.0);"
>>> leaves, index = Index.from_newick(tree_string)
* Initialising node pool...
* Checking for polytomies...
    Polytomy (degree=3) detected! Resolving...
    Polytomy (degree=3) detected! Resolving...
* Finalising index...
>>> leaves
['B', 'A', 'C', 'E', 'D']
>>> index
<Phylogeny Index, length=10>
>>> index['A']
<expam.tree.Location object at 0x109ac7970>
>>> index['A'].name
'A'
>>> index['A'].coordinate
[0, 0, 1, 0]
expam.tree.tree.Index.resolve_polytomies(pool)

If the phylogeny contains polytomies, continually join the first two children with parents of distance 0 until the polytomy is resolved.

Parameters

pool – List.

Returns

None

Return type

None

expam.tree.tree.Index.coord(self, coordinate)

coord Return Location (node) at coordinate.

Parameters

coordinate (list) – binary list representing path to node

Returns

node in tree

Return type

expam.tree.Location

expam.tree.tree.Index.to_newick(self)

to_newick Output tree to Newick format.

Returns

Newick format tree

Return type

str

expam.tree.tree.Index.yield_child_nodes(self, node_name)

yield_child_nodes Yields node and children nodes (both branches and leaves).

Parameters

node_name (str) – name of node to start yielding from

Yield

node names at or below node_name

Return type

str

>>> for node in index.yield_child_nodes('p1'):  # p1 will always be the root
...    print(node)
...
1
D
2
B
3
E
4
A
C

Note

Internal node (branch) names can start with ‘p’, but this may also be neglected.

expam.tree.tree.Index.yield_leaves(self, node_name)

yield_leaves Yield only the leaves at or below some node.

Parameters

node_name (str) – node to retrieve leaves from.

Yield

leaf names at or below node_name.

Return type

str

expam.tree.tree.Index.get_child_nodes(self, node_name)

get_child_nodes Return list of nodes at or below node_name.

Parameters

node_name (str) – name of node

Returns

list of node names

Return type

List[str]

>>> index.get_child_nodes('1')
['1', 'D', '2', 'B', '3', 'E', '4', 'A', 'C']
>>> index.get_child_nodes('E')
['E']
expam.tree.tree.Index.get_child_leaves(self, node_name)

get_child_leaves Get list of leaves at or below node_name.

Parameters

node_name (str) – name of node

Returns

list of leaf names

Return type

List[str]