Click me
Transcribed

SEA-Phage 2014 Brown University Poster

TETRANUCLEOTIDE USAGE IN MYCOBACTERIOPHAGE GENOMES ALIGNMENT-FREE METHODS TO CLUSTER PHAGE AND INFER EVOLUTIONARY RELATIONSHIPS Chen Ye | Benjamin Siranosian | Emma Herold | Minjae Kwon | Sudheesha Perera | Edward Williams | Sarah Taylor | Christopher de Graffenried INTRODUCTION GENOMIC SELF-SIMILARITY NEIGHBOR-JOINING TREE FROM TUD DISTANCE B3 EXCEPTIONAL K-MER MOTIFS Traditionally, phage genomes are compared using methods that require sequence alignment or gene annotation. These methods may be ineffective for populations with significant horizontal gene transfer and are computationally intensive for large datasets. Mycobacteriophages also lack a common genetic element, like ribosomal RNA in bacteria, from which to compute phylogenetic relationships. Alignment-free sequence analysis methods, such as measures that compute the usage of oligonucleotides in a genome, have the potential to infer relationships between significantly diverged sequences. We examined the usage of tetranucleotides in all 663 phage genomes available in the mycobacteriophage database as an alternative to alignment and annotation based methods. R 3 THESE SHOULDN'T GO HERE - CAUSES? GGATCC IS A BAMHI RESTRICTION SITE! Tetranucleotide Difference Index in cluster L genomes PHAEDRUS CCATGG (2500bp window, 500bp step size) PIPEFISH GATĆ usage deviation GGATCC usage deviation JoeDirt(L1) - Archie(L2) Whirlwind(L3) 300. А1 4 90 КBG 3 DD5 LOCKLEY 200 SOLON U2 JASPER А 2 BETHLEHEM -1 BXB1 CHE12 100 30- PUKOVNIK -2 BXZ2 -3 D29 В З В З В 2 L5 С 2 -4 0- ΟYRZULA 10000 20000 30000 40000 50000 60000 70000 B4 ROSEBUSH MYRNA Tetranucleotide usage deviation Hexanucleotide usage deviation Genomic Position COOPER NIGEL C 1 BXZ1 HORIZONTAL GENE XFER? CONCLUSIONS CATERA SPUD Tetranucleotide usage deviation and other alignment-free methods can investigate relationships within the diverse mycobacteriophage population. TUD accurately reconstructs phylogenetic trees and can highlight regions of particular interest in a take CALI investigate with more likely We found tetranucleotide usage deviation (TUD), a normalized measure of tetranucleotide usage in a genome, to be comparable for members of the same phage subcluster and distinct between subclusters. We used TUD as a measure of SCOTTMCG Phamerator CORNDOG RIZAL Cluster L genomes are very repetitive at the end. Repetitive regions have increased counts of specific 4- mers, contributing to the spike in TDI. E genome. These methods can be applied in a high-throughput manner, BRUJITA very BLAST В 1 CHE9C CJW1 small amounts of computational time, and serve as an excellent first pass in the comparative analysis of a mycobacteriophage genome. With some further work we hope to see these methods applied to every new phage sequence. 244 ORION KOSTYA CHAH PORKY some homology PG1 distance between phage and were able to: D - Construct phylogenetic trees that place members of a subcluster in a monophy- JoeDirt 130 @ 70,000 bp TM4 TROLL4 gp Mycobacterium abscessus E = 2e-48 Flavobacterium psychrophilum E=2e-28 FUTURE DIRECTIONS PBI1 JoeDirt (L1) cluster of repeats at 70kb letic clade ADJUTOR WILDCAT PLOT host-parasite coevolution horizontal gene transfer - Accurately assign subclusters to phage with a nearest neighbor classifier 70000 Opitutaceae bacterium TAV1 ATPase E = 1e-23 GUMBALL BUTTERSCOTCH SCCAG OMEGA SCCAGCCGGG3C - Identify windows in a genome with sig- nificantly different tetranucleotide usage, possibly indicating horizontal gene trans- fer IL | Hosts and parasites have similar oligonucleotide usage profiles. We will use data available on phage host preference to investigate this point further. A naïve Bayesian classifier can use oligonucleotide counts to calculate the probability of a subsequence originating in a given genome. This can be used to find the most F 2 CCHGGHCCG CHE9D Cccccorc F 1 CAGGASCG JLLAG.LACGSIGCTACTCGGIT IGIGuccGCTACCSECCASE TWEETY Н2 likely genome of origin for a possible HGT event. We plan to ment a naïve Bayesian classifier and further investigate FRUITLOOP Colored lines indicate significant clusters of repeats LLIJ BARNYARD МЕТHODS RAMSEY CHE8 BOOMER k-mer counting tetranucleotide usage deviation tetranucleotide difference index PMC leads uncovered with TDI. PACC40 To remove biases in tetranucleotide counts, we divided each observed count by the number of tetranucleotides expected under Genomes are relatively self-similar in oligonucleotide usage. A region with a drastically different TUD signal can indicate horizontal transfer of genetic material. We computed the tetranucelotide difference index (TDI) in a sliding window to look for regions of interest in phage genomes. 4-MERS ARE COUNTED G H 1 USING A SLIDING WINDOW BPS KONSTANTINE LITERATURE CITED GILES HALO PREDATOR GATGATG ATCATG a model of random nucleotide distribution. This gives the TUD for a tetranucleotide w. Betley, J. N., Frith, M. C., Graber, J. H., Choo, S. & Deshler, J. O. A ubiquitous and conserved signal for RNA localization in chordates. Curr. Biol. 12, 1756-1761 (2002). Hall, M. et al. The WEKA Data Mining Software: An Update. SIGKDD Explor. f. 11, 10-18 (2009). observed GATGATGATCATG TUD(w) еxpected Hatfull et. al (2010) our alignment-free tree Hatfull, G. F. et al. Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome Clustering, Gene Acquisition, and Gene Size. Journal of Molecular Biology 397, 119-143 (2010). Tetranucleotide differences are measured in GATGATGATCATG Exp(w) = [(Aª * C° * G9 * Tt) * N – 3] Sandberg, R. et al. Capturing Whole-Genome Characteristics in Short Sequences Using a Na?ve Bayesian Classifier. Genome Res 11, 1404-1409 (2001). each window s by the equation: Bxz1 Spud N Catera Rizal C2 Pride, D. T., Wassenaar, T. M., Ghose, C. & Blaser, M. J. Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics 7, 8 (2006). 256 Cali ScottMcG MYRNA GATGATGATCATG A, C, G,T: genomic frequency of respective nucleotides TD, = > ITUD,(wi) – TUDG(w;)| C1 (c1) ACKNOWLEDGEMENTS BXZ1 a, c, g, t: tetranucleotide frequency of nucleotides i=1 CATERA SPUD N: length of genome CALI SCOTTMCG HERE'S THE RESULT We are grateful to Dr. Peter Shank, Dr. Sorin Istrail, Dr. Zhijin Wu, HHMI's SEA program and the University of Pittsburgh. (c2) Мyna TUD;: the TUD value for word w; in the sliding window GATG x2 TUD,: the TUD value for the entire genome Brujita Che9c PMC F 2 CHE9D additional information III I I Che9dPacc40 Ramsey Fruitloop Llij Che8 Tweety Boomer ATGA x1 (f2) F1 (f1) TWEETY H2 BARNYARD We compare the Z-score of tetranucleotide differences for each window to find regions of significant difference: FRUITLOOP LLIJ I RAMSEY Source code and processed data is available at The TUD signal for a genome is a vector of 4ª = 256 values - one for each possible tetranucleotide CHE8 ROOMER PMC PACC40 TGAT x1 Barnyard Troll4 Predator github.com/bsiranosian/tango Konstantine (h2) (h1) Н1 KONSTANTINE bsiranosian.com BPS II E I EE IE GILES HALO TD, – mean(TD) Zs PREDATOR A digital copy of this poster is available at stdev(TD) I II I II II III I yeesus.com/tangoposter ΗΗΜΙ HOWARD HUGHES MEDICAL INSTITUTE TDI Z-Score O 5 Number of phage Number of phage

SEA-Phage 2014 Brown University Poster

shared by BlasterNT on Jun 18
136 views
1 shares
0 comments
This is the poster Brown University's 2014 Phage Hunters class displayed at HHMI's 2014 SEA-Phage Conference. For more info about our project, you can visit yeesus.com/tangosea.

Designer

Chen Ye

Tags

virus

Category

Science
Did you work on this visual? Claim credit!

Get a Quote

Embed Code

For hosted site:

Click the code to copy

For wordpress.com:

Click the code to copy
Customize size