Random DNA Sequence Mimics #ENCODE !!

You can follow us on twitter – @homolog_us.


When we chose “Nothing is so Alien to the Human Mind as the Idea of Randomness” as the title of our commentary on Dan Graur’s talk, we had no idea about a PNAS paper that is being widely forwarded in twitter today. The unofficial name of PNAS is ‘Papers not accepted in Science’. So, it is safe bet to assume that this brilliant paper got rejected from Nature and Science, and we will tell you why in a minute, but first let us explain what it showed.

Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks

Transcription factors (TFs) recognize short sequence motifs that are present in millions of copies in large eukaryotic genomes. TFs must distinguish their target binding sites from a vast genomic excess of spurious motif occurrences; however, it is unclear whether functional sites are distinguished from nonfunctional motifs by local primary sequence features or by the larger genomic context in which motifs reside. We used a massively parallel enhancer assay in living mouse retinas to compare 1,300 sequences bound in the genome by the photoreceptor transcription factor Cone-rod homeobox (Crx), to 3,000 control sequences. We found that very short sequences bound in the genome by Crx activated transcription at high levels, whereas unbound genomic regions with equal numbers of Crx motifs did not activate above background levels, even when liberated from their larger genomic context. High local GC content strongly distinguishes bound motifs from unbound motifs across the entire genome. Our results show that the cis-regulatory potential of TF-bound DNA is determined largely by highly local sequence features and not by genomic context.

If all that is too Greek to you, Mike White, the first author of the paper, explains in plain English in his blog.

Finding function in the genome with a null hypothesis

Last September, there was a wee bit of a media frenzy over the Phase 2 ENCODE publications. The big story was supposed to be that ‘junk DNA is debunked’ – ENCODE had allegedly shown that instead of being filled with genetic garbage, our genomes are stuffed to the rafters with functional DNA. In the backlash against this storyline, many of us pointed out that the problem with this claim is that it conflates biochemical and organismal definitions of function: ENCODE measured biochemical activities across the human genome, but those biochemical activities are not by themselves strong proof that any particular piece of DNA actually does something useful for us.

The claim that ENCODE results disprove junk DNA is wrong because, as I argued back in the fall, something crucial is missing: a null hypothesis. Without a null hypothesis, how do you know whether to be surprised that ENCODE found biochemical activities over most of the genome? What do you really expect non-functional DNA to look like?

In our paper in this week’s PNAS, we take a stab at answering this question with one of the largest sets of randomly generated DNA sequences ever included in an experimental test of function. We tested 1,300 randomly generated DNAs (more than 100 kb total) for regulatory activity. It turns out that most of those random DNA sequences are active. Conclusion: distinguishing function from non-function is very difficult.

Mike showed the most unexpected thing. Random DNA not only binds to transcription factor, but also –

It turns out that most of the 1,300 random DNA sequences cause reproducible regulatory effects on the reporter gene. You can see this in these results from 620 random DNA sequences below, in what I call a Tie Fighter plot:

(check his blog for the plot)

Why is such an interesting finding not worthy of a Nature paper? Here is the answer.

Nature is too committed to junk scientists and ENCODE results to acknowledge that they blew it big time on real science.


Mike White’s tweet response shows exactly what we suspected.


If you found our blog useful, you may like to join our membership section with a lot more information. Capture -------------------------------------------------------------------------------------------------------------
Heroes and Heroines of New Media--2015

Our blog is deeply honored by the generous contribution of the following readers. Without their patronage, this site would go away.

Outstandingly Generous:   
Amemiya C. Schnable J. ... Osipowski P.
Shen M. Furness M. Graur D. Diesh C.
Amemiya C. Diesh C.    

We are also looking for subscribers to get help to finish the tutorials. Please see this post for details.

6 comments to Random DNA Sequence Mimics #ENCODE !!

  • Just to clarify the conclusion to this piece of research, Even randomly produced sets of DNA sequence can show some activity. Therefore, potential activity does not translate to functional DNA. I would assume that even random sets of DNA can translate and transcribe into a protein molecule.

  • samanta

    Hello DougB,

    I asked Mike White, who is the first author of the paper, about your comment. This is what he has to say –

    Mike White ‏@genologos 2h

    @homolog_us Commenter is confused about the experiment – random DNA acted like enhancer – it was not transcribed and/or translated

    You can understand Mike’s work by first going through the main ENCODE paper (http://www.nature.com/nature/journal/v489/n7414/full/nature11247.html), where they came to the conclusion that 80% of the human genome is functional. ENCODE project used transcription factor binding to come to that conclusion, but Mike White showed that there was nothing special about human DNA. Even randomly assembled sequence can be called functional by the same logic.

  • DougB – would you be so kind to clarify the goal you define for the null hypothesis? For example, the intuitive goal to me would be to assume the incumbent view of non-coding DNA as junk. In which case, aren’t we expecting non-coding?

    As you set it up, the null hypothesis seems tacitly to be that DNA is information rich, hard-to-vary, and defined in terms of *sequence*.

    Therefore, your results are predicted by *their* hypothesis. In that, it’s highly active, and sequence encoded. So change the sequence (i.e. select randomly) and you a completely different logic…so expressed outcome.

    So, unless you are suggesting the randomly selected sequence is producing identical logic in expression, you have actually supported their hypothesis, and undermined your own.

  • Correction: in reality your null hypothesis demonstrates their findings are inclusive, in that they are consistent with all or most possibilities, including junk-dna.

    But they are also consistent with the hypothesis of NOT-junk. You level, the accusation this is junk science or non-science. Presumably on the grounds they have seen patterns they want to see in a huge random dataset.

    But this is flawed logically, because we aren’t talking about a huge random dataset but DNA, which is already known to play a fundamental part in life. Therefore their result is potentially very significant.

    It serves as a foundation for further research. From which evidence may emerge that is hard, and that rules our other possibilities – like junk-dna.

    This isn’t pseudo-science. It’s the way science has always worked. Absent a conclusive abstract theory, science follows all legitimate possibilities by empirical means. Darwin catalogued species and boated around islands. Which is empirical.

    And their results are significant in that sense, and should have been published so that as many like minded researches as possible got access to peer reviewed, corrected, piece. And began searching for some hard evidence to rule in, or out, their hypothesis.

    Can you please explain why you regard this procedure as unscientific?

  • The random DNA negative controls are there to define what ‘non-funtional’ means in this assay of DNA function. By itself it doesn’t prove or disprove the notion of junk DNA, but it shows something that any competent biochemist knows already: DNA is rarely inert, whether it performs a specific function in the cell or not. In other words, merely observing a ChIP-seq peak or a transcript from some place in the genome is not enough to say that the DNA segment is serving a specific function.

  • […] came across an article by Mike White (whose scientific work was mentioned earlier in our blog) telling us that no fundamental problem is left unsolved in […]

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>




Web Analytics