Parallel Discovery of Transcription Factor Motifs Through the Use of Hash Tables
Justin C. Deters
Dr. Jon Beck, Faculty Mentor
Transcription factor binding sites, which occur in the base pairs just before the transcription start site of a gene, determine when and where a gene is to be expressed within the organism. Genes sharing transcription factor binding sites are regulated by the same factors and generally act in biologically related processes. We seek to find similar motifs by comparing genes directly across dozens of genomes. We accomplish this goal in a multiphase computation. First, the transcription sites are extracted and hashed into hash tables. Second, each gene's regulatory region is compared to every other regulatory region via direct comparison of the tables. Each pairwise comparison generating a measure of similarity. Finally, the genes are grouped into a weighted graph. These phases are parallelized to decrease computation time. We find that hash insertion takes much less time than our queries and significant improvement in computation time compared to our serial implementation.
Keywords: parallel computing, hash tables, gene transcription, gene regulation
Topic(s):Computer Science
Presentation Type: Oral Paper
Session: -1
Location: VH 1328
Time: 9:30