Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes

The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechani...

Verfasser: Reineke, Anna R.
Bornberg-Bauer, Erich
Gu, Jenny
FB/Einrichtung:FB 13: Biologie
Dokumenttypen:Artikel
Medientypen:Text
Erscheinungsdatum:2011
Publikation in MIAMI:19.02.2013
Datum der letzten Änderung:26.10.2023
Angaben zur Ausgabe:[Electronic ed.]
Quelle:Nucleic Acids Research 39 (2011) 14, 6029-6043
Fachgebiet (DDC):570: Biowissenschaften; Biologie
Lizenz:CC BY-NC 2.5
Sprache:English
Anmerkungen:Finanziert durch den Open-Access-Publikationsfonds 2011/2012 der Deutschen Forschungsgemeinschaft (DFG) und der Westfälischen Wilhelms-Universität Münster (WWU Münster).
Format:PDF-Dokument
URN:urn:nbn:de:hbz:6-97379407083
Weitere Identifikatoren:DOI: doi:10.1093/nar/gkr179
Permalink:https://nbn-resolving.de/urn:nbn:de:hbz:6-97379407083
Onlinezugriff:6029.full.pdf

The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of 100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs.