Transcription factor binding sites are notoriously difficult to accurately predi
ID: 214218 • Letter: T
Question
Transcription factor binding sites are notoriously difficult to accurately predict using any one approach. Describe a multi-part strategy that uses multiple approaches to identify transcription factor binding sites in the human genome. In your multipronged approach, you should consider how you might use available comparative genomic data, together with various other experimental approaches that leverage next-generation sequencing to come up with good hypotheses for where these sites exist in the genome.
Explanation / Answer
The identification of transcription factor binding sites (TFBS) is an important initial step in determining the DNA signals that regulate transcription of the genome. We tested the performance of three distinct computational methods for the identification of TFBS applied to the human genome sequence, as judged by their ability to recover the location of experimentally determined, and uniquely mapped, TFBS taken from the TRANSFAC database. These identification methods all attempt to filter the quantity of TFBS identified by aligning positional weight matrices that describe the binding site and employ either (i) a P-value threshold for accepting a site, (ii) an over-representation measure of neighboring sites, or (iii) conservation with the mouse genome and application of P-value thresholds. The results show that the best recognition of TFBS is achieved by combining the identification of TFBS in regions of human-mouse conservation and also by applying a high stringency P-value to the TFBS identified in non-coding regions that are not conserved. Additionally, we find that only half of the 481 experimentally mapped sites can be found in sequence regions conserved with mouse, but the predictive power of the binding site identification method is up to threefold higher in the conserved regions.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.