We have a number of interesting bioinformatics puzzles. They are more biologically oriented, but anyone with knowledge of de Bruijn graphs and computational analysis will find them exciting. These days, many smart computer scientists are into bioinformatics and we are saddened to see so much talent getting wasted in optimizing and benchmarking k-mer counters or aligners.
Please contact us by email, if interested. They are just for fun, but most likely you will be able to write a cool paper after solving them.
We will be away from here for a week, and be busy building our RNAseq blog with cutting-edge information.
Best to all.
A new Hindawi paper on assembling the repeat regions was forwarded to us (h/t: @srbehera11), and we decided to check what else is available in the same genre. First, the Hindawi paper -
A de novo genome assembly algorithm for repeats and non- repeat
They claim to assemble the repeat regions from short reads and supposedly do a better job than all other assemblers. Sadly, they did not compare with SPAdes, SOAPdenovo repeat resolution module and Ray, three assemblers we expected to do well with repeats based on their algorithms. Especially a comparison with SPAdes would have been nice, given that Pevzner had been writing on repeat-resolution for almost a decade now. The lack of comparison is not entirely their fault, because they used GAGE benchmarks and not GAGE-B.
Going through the algorithm, we do not understand what is innovative and would like our readers to comment. Here is a short snippet and the paper has lot more details.
To this end, we proposed a new genome assembly algorithm aiming for assembling repeats and non-repeats, named SWA (Sliding Window Assembly), which can assemble repeats and non-repeats completely and accurately. In SWA, sliding window function is used to filter out the sequencing bias caused by sequencing process and improve the confidence of separating repeats and non-repeats.
The main contributions of our approach are as follows: 1) Assembling repeats and non-repeats completely and accurately rather than only detecting where repeats or non-repeats are. Complex repeats structures have very important biomedical functions. Consequently, the completeness and accuracy of assembling repeats are what SWA mainly concerned rather than the continuity of whole genome assembly. 2) Sliding window functions to filter out the sequencing bias are used in genome assembling process. Filtering noise by window function is very common in information processing but is rare in genome assembly process. SWA adopts sliding window to filter out NGS data bias and improve the statistical significance of read counts. What’s more, a compensational mechanism based on sliding window was embedded in SWA. This mechanism can improve the significance of read counts under the condition of low coverage.
Continue reading I Often Repeat Repeat Myself, I Often Repeat Repeat
ENCODE made the major discovery of finding 80% of human genome functional. Usually when Nobel (or IgNobel) prizes are awarded, the committees work hard to make sure other less well-known papers making similar discoveries get credited.
We looked extensively into literature to find which group made this fundamental discovery prior to ENCODE and came across work from a humble researcher from Stanford, who never bothers to read or question paper submitted under his name (for more on that see below). Please check the line marked in bold in this paper published in 2006, which was one year before ENCODE published their first paper.
Continue reading Who Deserves the ENCODE Nobel Prize? Ans. Ron Davis
To refute Science Magazine’s claim that classroom version of history of DNA starts with Watson whereas Crick and Miescher is all but forgotten, Dan Graur shared very informative slides from his classroom.
Ironically, the presentation ended with everything that Dan Graur finds wrong about big science and computer automated analysis (aka ‘big data’). It is none of his fault, because the final final slide is computer-generated by slideshare (presumaby based on match with the word ‘DNA’) and we display it below -
We have noticed that most of our readers moved to Feedly after Google Readers shut down.
Feedly offers nice stats on every feed, and we thought that could be an useful replacement of ‘journal impact factor’ in the internet age. Let us rather call it journal visibility factor, because if your paper is published in a more subscribed feed, it is more likely that others would notice it. Conceptually, the idea is not too different from the number of paper subscriptions of journals and magazines in the old days.
Here are the visibility stats for some journals and websites.
Science: Current Issue – 36K readers
Nature Communications – 3K readers
PNAS Current Issue – 7K readers
PLOS Genetics – 1.5K readers (added up multiple feeds)
Mike Eisen’s blog – 1K readers
PLOS Computational Biology – 1K readers
Moral of the story: consider publishing in Eisen’s blog by paying him a smaller fee than PLOS Comp Bio
1. Python code available in this link and snippet of code is posted below. You will find other examples in the comment section of the link.
# Quick and dirty demonstration of CVE-2014-0160 by Jared Stafford (firstname.lastname@example.org)
# The author disclaims copyright to this source code.
from optparse import OptionParser
Continue reading Hijacking Other Websites Using CVE-2014-0160 – Code and Demo
In our previous commentary on k-mer counting, one program was mentioned peripherally, because we knew little about it. On further reading, we find the paper worth mentioning in a separate commentary, because it brings a number of new concepts to bioinformatics.
In simple language, the approach can be described as ‘modified BFcounter’.
First we use a Bloom ﬁlter to identify the k-mers that were seen at least twice (with a small false positive rate). To count the frequency of these k-mers, we use an array of items containing a k-mer and its count. These are the two main components of our tool. Once the counts are computed, we can output the k-mers having frequency greater than the chosen cutoff.
Continue reading scTurtle Algorithm for Kmer Counting
Dan Graur’s ENCODE bombshell – “On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE” came out in February 2013. For several months, he was treated like pariah, and well-connected researchers with vested interests (e.g. Daniel Macarthur and BioMickWatson) called him a troll in public. Others quickly distanced themselves from him, except for one fringe blog.
These days, we sense seeping anger against ENCODE among almost all biologists not connected with the project. They realize by now that Dan Graur stated the obvious – ENCODE was pulling the wool over everyone’s eyes. All those press releases and simultaneous publication of 30 or so papers had only one purpose – to hide the fact that the mega-project achieved nothing. The other part of the equation has also become clear. The $300 million ENCODE wasted to fool them was supposed to go for their research under normal circumstances.
Continue reading Is It Time to Get the ENCODE Paper Retracted?
We are planning to split our blog into multiple subsections. Each part (called ‘study section’ henceforth) is guaranteed to carry a topic that interests nobody. To make matters worse, most study section will be underfunded, i.e. updated sparingly, often with no prior notice and at times in the middle of polar night.
Please let us know if you find any topic interesting so that we can change it before it is too late. Being useless and uninteresting are very important criteria for our choice of topics. For example, we make sure we do not talk about human genome and human diseases, because everyone finds those topics useful.
It is the location of the current blog and the theme will continue to be the same – hardware, bioinformatics algorithms, codes. To keep majority of bioinformatics technicians away from our blog, we will reduce the discussions on how to run programs and how to set up standard pipelines to an absolute minimum.
Continue reading Splitting Our Blog into Different Subsections