Report from Asia – Will Asia ‘Unfollow’ NIH’s Failed Research Model?

In Asia, top-level funding agencies looking for directions for their research programs are following the US (NIH) model, even though NIH is clearly failing in its mission compared to most other countries.

To find the evidence of failure, you can use Francis Collins’ own metrics. A few weeks back, he marketed his precision medicine scam to gullible Americans by writing –

As the world’s largest source of biomedical research funding, the US National Institutes of Health (NIH) has been advancing understanding of health and disease for more than a century. Scientific and technological breakthroughs that have arisen from NIH-supported research account for many of the gains that the United States has seen in health and longevity.

For example, an infant born today in the United States can look forward to an average lifespan of about 79 years—nearly 3 decades longer than one born in 1900.

We pointed out in ‘Francis Collins Admits NIH Under Him Has Been Failing‘ that if improvement in life expectancy at birth is the best measure of performance of NIH, NIH clearly failed during Collins’ time. The earlier post showed that US life expectancy at birth is lower than almost all advanced country. The extent of failure can be judged by making comparison with a number of very poor countries around the world, who cannot afford guitar-playing NIH director.

1. Cuba

Cubans, who had been under US blockade for over five decades, have life expectancy at birth equal to USA.


2. Vietnam

Vietnam’s life expectancy at birth is 75 years, which is only 4 years less than USA. This is an amazing achievement given that the country was bombed to oblivion by USA a few decades back and had no expensive ‘National Institute of Health’ since then.


3. Sri Lanka

Sri Lanka, a country that came out of civil year less than a decade back, has life expectancy of 74 years.


4. Mexico

The life expectancy at birth in Mexico is 77 years and their number will cross USA’s in another 5 years at its current rate of growth (see figure),. They only need to keep these advocates of ‘genomic medicine in Mexico’ at bay.


5. Chile

Chile was run by a brutal dictator until 1990, when the human genome project started. Their life expectancy at birth is now higher than USA’s.


All those evidences show that instead of following what NIH is doing, Asian countries need to figure out what NIH is doing wrong and save the cost of an ineffective bureaucracy. Dan Graur offers some help –

What Do You Know About The Delusional @NIHDirector?

Francis Collins is the Director of the NIH. During his tenure, Big Science flourished and Real Science wilted. Francis Collins doesn’t think junk DNA exists, which is quite reassuring given that he believes in virgin birth, the resurrection of corpses, and the second coming of Jesus Christ. He frequently falls on his knees and thanks JC for all kinds of things, such as the completion of the human genome project in 2001. His views on evolution are peculiar, to say the least.

Report from Asia – Multidrug Resistance and Phage Therapy

Ken Weiss posted an informative commentary on Antibiotic resistance in his blog. Being in India, I experienced this looming crisis of multi-drug resistance very close based on the experience of the father-in-law of a friend I was staying with.

Is Stalin’s forgotten cure a possible solution? I posted a number of papers on phage therapy that the readers may find informative.

Phage Therapy – Reviews

Experimental Phage Therapy on Multiple Drug Resistant Pseudomonas aeruginosa Infection in Mice

T4 phages against Escherichia coli diarrhea: Potential and problems

Experimental phage therapy of burn wound infection: difficult first steps

Quality-Controlled Small-Scale Production of a Well- Defined Bacteriophage Cocktail for Use in Human Clinical Trials

Immunoglobulin Classification Using the Colored Antibody Graph


The somatic recombination of V, D, and J gene-segments in B-cells, introduces a great deal of diversity, and divergence from reference segments. Many recent studies of antibodies focus on the population of antibody transcripts that show which V, D, and J gene-segments have been favored for a particular antigen, a repertoire. To properly describe the antibody repertoire, each antibody must be labeled by its constituting V, D, and J gene-segment, a task made difficult by somatic recombination and hypermutation events. While previous approaches to repertoire analysis were based on sequential alignments, we describe a new de Bruijn graph based algorithm to perform VDJ labeling, and benchmark its performance.

Data Science MOOC from John Hopkins (starring Steven Salzberg)

Simply Statistics blog posted the teaser trailer about a new MOOC that the readers may find interesting.

We have been hard at work in the studio putting together our next specialization to launch on Coursera. It will be called the “Genomic Data Science Specialization” and includes a spectacular line up of instructors: Steven Salzberg, Ela Pertea, James Taylor, Liliana Florea, Kasper Hansen, and me. The specialization will cover command line tools, statistics, Galaxy, Bioconductor, and Python. There will be a capstone course at the end of the sequence featuring an in-depth genomic analysis. If you are a grad student, postdoc, or principal investigator in a group that does genomics this specialization is for you. If you are a person looking to transition into one of the hottest areas of research with the new precision medicine initiative this is for you. Get pumped and share the teaser-trailer with your friends!

A perceptual hash function to store and retrieve large scale DNA sequences


This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by “avalanche effect” and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA sequences, and retrieving them.

MEDUSA: a Multi-draft Based Scaffolder


Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines.

Results: In this paper, we present MEDUSA (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MEDUSA exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MEDUSA formalises the scaffolding problem by means of a combinatorial optimisation formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MEDUSA is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MEDUSA on eukaryotic datasets has also been evaluated, leading to interesting results.

Availability: MEDUSA web server: A stand-alone version of the software can be downloaded from All results presented in this work have been obtained with MEDUSA v. 1.3.

Contact: [email protected] Reports from Asia – ‘Calcutta – a New Beginning?’


While in India, I visited the National Institute of Biomedical Genomics (NIBMG) in Kalyani near Calcutta (marked as Kolkata in the above map).

The National Institute of Biomedical Genomics (NIBMG) has been established as an autonomous institution by the Government of India, under the aegis of the Department of Biotechnology. This is the first institution in India explicitly devoted to research, training, translation & service and capacity-building in biomedical genomics.

It is located in Kalyani, West Bengal, India, about 50 km. from Kolkata. Connected to Kolkata by highways, expressways and railways, it takes about one and one-half hours to reach Kalyani from Kolkata.

The Institute is currently functioning from an interim facility of about 120,000 sq. ft. of floor space, constructed on the 2nd floor of a local hospital. laboratories, equipment (including flow-cell sequencers, whole-genome genotyping and gene expression platforms, multiplex suspension array platform, high-end computing platforms, etc.), bio-banking facility, office space and class rooms have been established.



The institute is recently established by Professor Partha P. Majumder, a reputed scholar from the Indian Statistical Institute (ISI) in Calcutta. ISI is well-respected around the world. In the context of population genetics, this is where J. B. S. Haldane spent his later life.


In 1956, Haldane left his post at University College London, and moved to Calcutta, where he joined the Indian Statistical Institute (ISI).[40] Haldane’s move to India was influenced by a number of factors. Officially he stated that his chief political reason was in response to the Suez Crisis. He wrote: “Finally, I am going to India because I consider that recent acts of the British Government have been violations of international law.” His interest in India was also because of his interest in biological research as he believed that the warm climate would do him good, and that India offered him freedom and shared his socialist dreams.[41] One immediate factor was related to a police case involving his wife Helen, who was arrested on charges of misbehaviour due to excessive drinking and refusal to pay fine. The university sacked her and Haldane followed suit. On his lighter side his “reason for settling in India was to avoid wearing socks,” and he concluded, “Sixty years in socks is enough.”[42] This could be partly true because Haldane always dressed up in Indian clothes, and was often mistaken to be a Hindu priest or guru.[2]

At the ISI, he headed the biometry unit and spent time researching a range of topics and guiding other researchers around him. He was keenly interested in inexpensive research and he wrote to Julian Huxley about his observations on Vanellus malabaricus, the Yellow-wattled Lapwing, boasting that he observed them from the comfort of his backyard. Haldane took an interest in anthropology, human genetics and botany. He advocated the use of Vigna sinensis (cowpea) as a model for studying plant genetics. He took an interest in the pollination of the common weed Lantana camara. The quantitative study of biology was his focus and he lamented that Indian universities forced those who took up biology to give up on an education in mathematics.[43] Haldane took an interest in the study of floral symmetry. His wife, Helen Spurway, conducted studies on wild silk moths.[41] In January 1961 they befriended the young Canadian lepidopterist Gary Botting, who initially visited the Indian Statistical Institute to share the results of his experiments hybridising silk moths of the genus Antheraea. Uncomfortable with Haldane’s “communist” sympathies, the United States cultural attache, Duncan Emery, summarily cancelled Gary Botting’s attendance at a high-profile banquet to which the Haldanes had invited him to meet biologists from all over India. Haldane protested this “insult” by going on a much-publicized hunger strike.[44][45] When the director of the I.S.I., P. C. Mahalanobis, confronted Haldane about both the hunger strike and the unbudgeted banquet, Haldane resigned his post (in February 1961) and moved to a newly established biometry unit in Odisha.[41]

Being impressed by his stay in Calcutta and Bengal, Haldane later accepted Indian citizenship. That was in early 1960s, when Bengal had very impressive past record of scholarship to report. Modern epidemiology was born there.


Sir Ronald Ross, KCB, FRS[1][2] (13 May 1857 – 16 September 1932), was an Indian-born British medical doctor who received the Nobel Prize for Physiology or Medicine in 1902 for his work on malaria, becoming the first British Nobel laureate, and the first born outside of Europe. His discovery of the malarial parasite in the gastrointestinal tract of mosquito led to the realisation that malaria was transmitted by mosquitoes, and laid the foundation for combating the disease. He was quite a polymath, writing a number of poems, published several novels, and composed songs. He was also an amateur artist and natural mathematician. He worked in the Indian Medical Service for 25 years. It was during his service that he made the groundbreaking medical discovery. After resigning from his service in India, he joined the faculty of Liverpool School of Tropical Medicine, and continued as Professor and Chair of Tropical Medicine of the institute for 10 years.

In physics, S. N. Bose mathematically showed the existence of Boson particle.


Satyendra Nath Bose FRS[1] was an Indian physicist specialising in mathematical physics. He was born in Calcutta. He is best known for his work on quantum mechanics in the early 1920s, providing the foundation for Bose–Einstein statistics and the theory of the Bose–Einstein condensate.

Satyendra Nath Bose, along with Saha, presented several papers in theoretical physics and pure mathematics from 1918 onwards. In 1924, while working as a Reader (Professor without a chair) at the Physics Department of the University of Dhaka, Bose wrote a paper deriving Planck’s quantum radiation law without any reference to classical physics by using a novel way of counting states with identical particles. This paper was seminal in creating the very important field of quantum statistics. Though not accepted at once for publication, he sent the article directly to Albert Einstein in Germany. Einstein, recognising the importance of the paper, translated it into German himself and submitted it on Bose’s behalf to the prestigious Zeitschrift für Physik. As a result of this recognition, Bose was able to work for two years in European X-ray and crystallography laboratories, during which he worked with Louis de Broglie, Marie Curie, and Einstein.[4][12][13][14]

C. V. Raman worked in Calcutta and discovered Raman effect.


In 1917, Raman resigned from his government service after he was appointed the first Palit Professor of Physics at the University of Calcutta. At the same time, he continued doing research at the Indian Association for the Cultivation of Science (IACS), Calcutta, where he became the Honorary Secretary. Raman used to refer to this period as the golden era of his career. Many students gathered around him at the IACS and the University of Calcutta.

On 28 February 1928, Raman led experiments at the IACS with collaborators, including K. S. Krishnan, on the scattering of light, when he discovered what now is called the Raman effect.[8] A detailed account of this period is reported in the biography by G. Venkatraman.[9] It was instantly clear that this discovery was of huge value. It gave further proof of the quantum nature of light.

All that changed after the partition of Bengal in 1947 and Calcutta went into slow decline since 1960s, when Bengalis followed the Soviet side in the cold war. By now it lost almost all of its cultural legacy and modern Calcutta is unrecognizable by anyone, who spent time there in 1970s or 80s.

It is quite remarkable that the Statistical Institute maintained its quality during the period of Bengal’s decline. Therefore, I am hopeful that the new institute in Kalyani will combine their theoretical skills in population genetics with the availability of high-through sequencing instruments to contribute positively to science. You can find the list of faculty members and their research projects here. I am thankful to Dr. Priyadarshi Basu for inviting me there.



In the larger context, although I am very pessimistic about India, Calcutta may have hit the bottom. Everyone appears to be negative about its future, and every article you read in mainstream press describes how the place failed.

Kanika Datta: ‘West Bengal still on decline’

Decline of Bengal, death of the bhadralok

Falling total fertility rate in Kolkata sets alarm bells ringing

The Empire’s second city is now a second-grade city

The decline and fall of Kolkata

Needless to point out that the consensus of intellectuals was unusually positive about Calcutta in late 1950s, just before the decline started.

The Current Status of the Introductory Book on Genome Assembly


Dear readers,

In mid-February, I made a somewhat finished draft of the book available for purchase here, but did not announce in the blog, because I was not happy with the quality. Since then, I have been working on improving a number of sections. Over the next few days, I will post the texts of those sections as separate blog commentaries and then publish an updated version of the book around April 15th by incorporating all those sections. I am quite satisfied with the changes and expect the finalized book to be useful for newcomers.

I am planning to set the price at $20 after the next release, but if you grab the book now at the leanpub site, you will be able to get it for $15. As I mentioned earlier, once you make the initial purchase from leanpub, you will always be able to access the latest version from their site at no extra cost. They give pdf, mobi and epub formats.

If you like to see a copy of the released version, please feel free to email me at [email protected] and I will send you a pdf file.


Earlier posts –

An Easy-to-follow Introductory Book on NGS Assembly Algorithms
An Update on the Introductory Book on Genome Assembly

rnaQUAST: Quality Assessment Tool for Transcriptome Assemblies

Algorithmic Biology Lab in St. Petersburg of SPAdes fame developed a new tool for evaluating quality of transcriptome assemblies using reference genome and annotation (h/t: anton). Stay tuned for more information.

3 Options

3.1 Input data options

To run rnaQuast one needs to provide either FASTA files with transcripts (recommended), or align transcripts to the reference genome manually and provide the resulting PSL files. rnaQUAST also requires reference genome and optionally an annotation.
-r , –reference
Single file with reference genome containing all chromosomes/scaffolds in FASTA format.

-gtf , –annotation
File with annotation in GTF/GFF format.

-c , –transcripts
File(s) with transcripts in FASTA format separated by space.

-psl , –alignment
File(s) with transcripts alignments in PSL format separated by space.

3.2 Basic options

-o , –output_dir
Directory to store all results. Default is rnaQUAST_results/results_.

Run rnaQUAST on the test data from the test_data folder, output directory is rnaOUAST_test_output.

-d, –debug
Report detailed information, typically used only when detecting problems.

-h, –help
Show help message and exit.

3.3 Advanced options

-t , –threads
Maximum number of threads. Default is the number of CPU cores (detected automatically).

-l , –labels
Names of assemblies that will be used in the reports separated by space.

-ss, –strand_specific
Set if transcripts were assembled using strand specific RNA-Seq data in order to benefit from knowing whether the transcript originated from the + or – strand.

Minimal alignment size to be used, default value is 50.

Do not draw plots (makes rnaQUAST run a bit faster).

FASTAG Viewer Bandage

This FASTAG viewer appears very impressive (h/t: Anton). Have you used it?

De novo assembly graphs contain assembled contigs (nodes) but also the connections between those contigs (edges), which are not easily accessible to users. Bandage is a program for visualising assembly graphs using graph layout algorithms. By displaying connections between contigs, Bandage opens up new possibilities for analysing de novo assemblies that are not possible by looking at contigs alone.


    A layout algorithm is used to automatically position graph nodes.
    Manually reposition and reshape nodes.
    Zoom, pan and rotate the view using either mouse or keyboard controls.
    View the entire assembly graph or only a region of interest.
    Copy node sequences to the clipboard or save them to file.
    Nodes can be coloured using built-in colour schemes or user-defined colours.
    Nodes can be labelled using node number, length, coverage or a user-defined label.
    Find nodes quickly in a large graph using node numbers.
    Specify the thickness of nodes and allow thickness to reflect the node’s coverage.
    Define the relationship between the length of a node and the length of its sequence.
    Two possible styles for handling reverse complement nodes:
    Single: nodes and their reverse complements are drawn as one object with no direction.
    Double: nodes are drawn using arrow heads to indicate direction and reverse complement nodes are drawn separately with arrow heads pointing the opposite direction.
    Integrated BLAST search allows for highlighting specific sequences in an assembly graph.