Using a Hidden Markov Model to Predict Genes in Drosophila melanogaster
Bryce E. Frazier
Dr. Jon Beck and Dr. Anton Weisstein, Faculty Mentors
Currently, the most commonly used computational method in gene prediction is the Hidden Markov Model (HMM). The HMM is a machine learning algorithm that uses Bayesian statistics to predict future states, given previous states of the domain. Thus, it finds the probability of nucleotide occurrences in coding and noncoding regions from a given sequence, then predicts whether a sequence is a gene. Its extensively used in gene prediction due to its capability in handling large data sets of DNA sequences to predict coding regions with significant accuracy. In this study, the genome of the fruit fly, D. melanogaster, will be used to train the HMM. From the trained algorithm, coding regions will be predicted on D. simulans, a closely related species of D. melanogaster. The goal of this study is to help gain insight into the complexity of the HMM in gene prediction on an intensively studied genetic model organism.
Keywords: computational approach to gene prediction, Hidden Markov Model
Topic(s):Computer Science
Biology
Mathematical Biology
Presentation Type: Poster
Session: 6-1
Location: GEO - SUB
Time: 3:30