Back To Schedule
Tuesday, July 28 • 17:30 - 17:35
A rigorous approach to UPGMA phylogeny by multidimensional scaling of pairwise distances and bioinformatic outcomes for a commercially significant geneset

Log in to save this to your schedule, view media, leave feedback and see who's attending!

This study aimed to rigorously determine a branching order for a set of interrelated housekeeping genes. Following extensive genomic sequencing of the algal triterpenoid-biofuel producer Botryococcus braunii, we rapidly obtained the target set of genes that are related to squalene synthase, by using selective Blast and then iterative SOAP for assembly of overlapping blast hits. In that way we exhaustively ascertained that there were only four homologues present. Introns were crossed by using 2-4 kb paired ends. Full length genes were annotated including potential alternative C termini. We hypothesised that these four key biofuel genes had evolved by two successive gene duplications from squalene synthase, and that two code for proteins tethered to a membrane by their C-termini. In a novel approach to phylogeny, using Matlab we first obtained Needleman-Wunsch protein alignments that minimized the PAM-250 genetic distances between each pair of sequences. We stored each pairwise alignment in a matrix and selected the case with the minimal penalty when normalised. Multiple alignment was intentionally omitted, as all genes were true homologues, so all intergene pairs were valid comparisons. After pairwise alignments, the distance between sequences was computed using the Poisson model. To visualize these distances, multidimensional scaling (MDS) was used to create an optimally distance-preserving projection onto two axes, allowing direct visualization of the relative genetic distances between sequences. The novel MDS approach critically informs the succeeding steps in tree generation, and differs from prior applications of MDS in tree comparison. Use of the Poisson model guarantees an ultrametric tree in the subsequent phylogenetic construction. Phylogeny was analysed by hierarchical clustering, using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) method. UPGMA merges the two nearest neighbor sequences into one cluster C, and determines the new distance d(C,K) between C and the remaining clusters K; in UPGMA this distance d is the average of distances for all sequences in the cluster. The algorithm iterates, terminating when all sequences are merged into a single cluster, which becomes the root of the generated tree. Each merge operation represents one branch of the resulting phylogenetic tree. The root node is wherever the last merge is made. We suggest that the MDS method conducted may detect bioinformatic richness present in the sequences, relative to other phylogenetic methods that tend to treat AA or nucleotide columns as if they did not form part of a whole gene. By considering the gene as the critical unit upfront, and by defining the gene only by its relationship to each other gene, and then allowing distance to be multidimensional, the algorithm presented may respond to uniquely conserved areas sampled across two genes at a time, and which may represent potential ancestral richness predating the pair. The phylogenetic branching order obtained from the tree for this gene set correlates well to observed synapomorphies (here motifs and introns) present across the gene set, giving us high confidence in the phylogenetic order of duplication of the constituent genes, and allowing us to infer biochemical signatures in the active-site pockets of this wider set of triterpenoid biosynthesis proteins.


Robert Moore

There are two speakers sharing the talk. This speaker Robert has experience in molecular microbiology, gene annotation, genome mining, phylogeny and taxonomy, and has worked in plant science, microbiology, and genetics fields. Currently in the environmental microbiology industry... Read More →

Michael Barnathan

Temple University

Tuesday July 28, 2020 17:30 - 17:35 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09