Happy New Year ! Here is a great way to bring some fun and challenges to your new year. We got
a note from Nikolay Vyahhi, who helped build Rosalind and Stepik, that their organization is hosting a bioinformatics competition. The details are posted below -
Yesterday, Pacbio received its Christmas present for 2016. Roche decided to abruptly terminate its
three year-long alliance with the company. During this collaboration, Roche paid Pacbio to develop the Sequel instrument and
reserved the exclusive right to sell it in the human clinical market.
Investor warning: The following post is for entertainment purpose only, and should not be considered as financial advice of any sort. Please consult your favorite government-certified investment adviser or central banker regarding decisions on investing your life savings.
Many exciting papers/preprints on RNAseq came out over the last few months. Among them, a recently posted preprint solves an important problem - improving annotations based on new RNAseq data. There were other papers on quantification, compression and search, and we like to cover them in the next few posts.
We present a new method, GRASS, for improving an initial annotation of de novo transcriptomes. GRASS makes the shared-sequence relationships between assembled contigs explicit in the form of a graph, and applies an algorithm that performs label propagation to transfer annotations between related contigs and modifies the graph topology iteratively. We demonstrate that GRASS increases the completeness and accuracy of the initial annotation, allows for improved di↵erential analysis, and is very efficient, typically taking 10s of minutes.
Software Availability: GRASS is written in Python, and is freely-available under an open-source
(BSD) license at https://github.com/COMBINE-lab/GRASS.
A number of recent papers are proposing to use multidimensional Bloom filters to identify genes from a
large collection of RNAseq libraries. This post provides general perspective on these papers. In the following
post, we will go in depth and explain the algorithm of the recent preprint by carrying out an example.
Bloom filters in general found many uses in bioinformatics, and we covered them in our blog
in the past. We now have a new tutorial on Bloom filter for those unfamiliar with this probabilistic data structure and its applications
in bioinformatics. The concept of multidimensional Bloom filter goes a level higher than oridinary Bloom
filters, because it is meant to store and query a large collection of Bloom filters. Each Bloom filter within
the set may store a representation (usually kmer-based) of a genome or RNAseq library.
Job Title: Postdoctoral Scholar Position in Comparative Plant Genomics and Bioinformatics
The Computational Plant Genomics Lab invites applications for a Postdoctoral position in the Department of Ecology and Evolutionary Biology at the University of Connecticut. We focus on developing computational approaches that integrate next generation sequence data to address questions in non-model plants, particularly forest trees. The lab has the following ongoing projects: 1) Understanding the evolution of alternative translation initiation using RNA-seq data 2) Integrating new and existing approaches to gene prediction to improve the annotation of complex genomes 3) Analysis of gene family evolution and related comparative genomics questions 4) Detecting variation in populations from GBS and related sequence data.
Abstract: Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. The Zipper plot is an application that enables users to interrogate putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These features are obtained from publicly available datasets including CAGE-sequencing (CAGE-seq), ChIP-sequencing (ChIP-seq) for histone marks and DNase-sequencing (DNase-seq). The Zipper plot application requires three input fields (chromosome, genomic coordinate (hg19) of the TSS and strand) and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot.
Abstract: Transcriptomes are tremendously diverse and highly dynamic; visualizing and analysing this complexity is a major challenge. Here we present superTranscript, a single linear representation for each gene. SuperTranscripts contain all unique exonic sequence, built from any combination of transcripts, including reference assemblies, de novo assemblies and long-read sequencing. Our approach enables visualization of transcript structure and provides increased power to detect differential isoform usage.
This is a fascinating talk that our readers from both computational and life sciences sides
will enjoy. The author realized shortcomings of common programming languages in solving
his domain-specific task and developed Clasp starting from common Lisp.
We are back after making extensive changes to the blog software being used here.
Most important among the changes, we got rid of Wordpress and made a commitment to
never use Wordpress again. Wordpress is easy to install, but nightmare to maintain
with its entire panoply of buggy plugins. Moreover, it sucks up time by failing at the
most unfortunate times.
Lynn Yi, Harlod Pimentel and Lior Pachter published a new
RNAseq paper that our
readers will definitely find interesting. In this paper, the authors showcase the
new RNAseq technologies Pachterlab has been developing over the last few years. We
covered those components (e.g Kallisto, Sleuth) in earlier posts, but here you can
see a biological application to get new insights from already published data.
Readers may keep an eye on #SMRTBFX hashtag on twitter to follow an ongoing conference. This is the best place to know about the latest bioinformatics algorithms on long reads.
Gene Myers is again the star of the show. He has been distributing a lot of goodies through his Dazzlerblog, such as -
A large number of NIH-funded parasites waste taxpayers’
money with the excuse that they are working
toward improving the health of Americans. Francis Collins, the head of NIH,
uses every opportunity to tell everyone how research funded by NIH helps in
improving the life expectancy of Americans (a flat out lie). Yet, when
research by Deaton and Case uncovered that the life expectancy of Americans of
prime age (45-54) was falling, primarily due to rising suicides, Collins and
his minions went completely silent.
‘Ancient’ Bene Israel Jews and late-arrived Baghdadi Jews in India started the
Bollywood movie industry. Many famous early Indian actresses also came from
these communities. This is not common knowledge in India, because those
actresses took Muslim (Firoza Begum) or Hindu (Sulochana, Pramila) screen
In 2013, Dr. Elhaik complained about his home page at John Hopkins University
mysteriously disappearing from google searches right after his first Jewish
genomics paper started to gain
attention. We reproduced his complaint here, and then his page came back on
top again after a few days.
Among all biomolecules within the cell, tRNAs got the least respect. Their
supposed importance ended right after the ‘adaptors’ related to entries in the
genetic code table were identified (mid-60s). Since then, the attention
shifted to more complex RNAs like the rRNAs.
Among various biotechnology inventions of the last few years with potential to
revolutionize medicine, nothing excites us more than growing of three-
dimensional human organoids on matrigel. Therefore, we plan to devote a number
of posts on this topic to keep our readers aware of the practices, potentials