May 14th, 2012
1:30pm - 4:00pm
OLB 105

Each student will have 30 minutes to present including time for questions.

1:30 - 2:00 Alexa Dorsey

2:00 - 2:30 Dimitri Wijesinghe - Checkpointing MCMCTree

PAML (Phylogenetic Analysis with Maximum Likelihood) is a bioinformatics software package that includes MCMCTree. MCMCTree performs Bayesian inference (using Markov Chain Monte Carlo methods) on phylogenetic trees and fossil data to calculate when each species in the evolutionary tree originated. One limitation of MCMCTree is that it cannot handle today's large datasets without causing a system crash or taking several months to compute. To address this problem, we implemented a system of checkpointing for MCMCTree. With this system, MCMCtree can begin a calculation and save data as it goes along. If the system is interrupted or crashes, calculations can be restored from the last checkpoint instead of starting from scratch. This infinitely increases the range of data the system can process. This work required conducting computational experiments and comparing the results of checkpointed and non-checkpointed versions of MCMCTree. This presentation will discuss the challenges of experimental reproducibility when working with randomized tools like MCMCTree. With this system in place, biologists will be able to compute datasets such as the evolutionary tree of all mammals and estimate the origin time of all mammalian species based on existing fossil data.

2:30 - 3:00 Mark Adamo - TreeHouse: Tools for Visualizing and Analyzing Datasets from Large-Scale Phylogenetic Inference

Large-scale phylogenetic inference may return sets of thousands of trees, each possibly containing hundreds of taxa. TreeHouse is a phylogenetic tree-querying program that operates on large, highly-compressed sets of trees. I have extended TreeHouse to support multiple modes of visualization and have incorporated support for biological classification data, enabling entirely new hypothesis testing analyses. New filtering operations allow for trees as well as tree sets to be edited, allowing tree data to be tailored to specialized areas of biological inquiry. Memory optimizations make TreeHouse's memory performance scale much better as the size of trees and trees sets increases. In total, the process of implementing these features has also enhanced TreeHouse's viability as a platform for further development of phylogenetic computations. The most recent of these developments is an implementation of the Ancestral Distance Test for correlated trait evolution.

3:00 - 3:30 Tavish Pegram