Categories Reports from Asia – ‘Calcutta – a New Beginning?’


While in India, I visited the National Institute of Biomedical Genomics (NIBMG) in Kalyani near Calcutta (marked as Kolkata in the above map).

The National Institute of Biomedical Genomics (NIBMG) has been established as an autonomous institution by the Government of India, under the aegis of the Department of Biotechnology. This is the first institution in India explicitly devoted to research, training, translation & service and capacity-building in biomedical genomics.

It is located in Kalyani, West Bengal, India, about 50 km. from Kolkata. Connected to Kolkata by highways, expressways and railways, it takes about one and one-half hours to reach Kalyani from Kolkata.

The Institute is currently functioning from an interim facility of about 120,000 sq. ft. of floor space, constructed on the 2nd floor of a local hospital. laboratories, equipment (including flow-cell sequencers, whole-genome genotyping and gene expression platforms, multiplex suspension array platform, high-end computing platforms, etc.), bio-banking facility, office space and class rooms have been established.



The institute is recently established by Professor Partha P. Majumder, a reputed scholar from the Indian Statistical Institute (ISI) in Calcutta. ISI is well-respected around the world. In the context of population genetics, this is where J. B. S. Haldane spent his later life.


In 1956, Haldane left his post at University College London, and moved to Calcutta, where he joined the Indian Statistical Institute (ISI).[40] Haldane’s move to India was influenced by a number of factors. Officially he stated that his chief political reason was in response to the Suez Crisis. He wrote: “Finally, I am going to India because I consider that recent acts of the British Government have been violations of international law.” His interest in India was also because of his interest in biological research as he believed that the warm climate would do him good, and that India offered him freedom and shared his socialist dreams.[41] One immediate factor was related to a police case involving his wife Helen, who was arrested on charges of misbehaviour due to excessive drinking and refusal to pay fine. The university sacked her and Haldane followed suit. On his lighter side his “reason for settling in India was to avoid wearing socks,” and he concluded, “Sixty years in socks is enough.”[42] This could be partly true because Haldane always dressed up in Indian clothes, and was often mistaken to be a Hindu priest or guru.[2]

At the ISI, he headed the biometry unit and spent time researching a range of topics and guiding other researchers around him. He was keenly interested in inexpensive research and he wrote to Julian Huxley about his observations on Vanellus malabaricus, the Yellow-wattled Lapwing, boasting that he observed them from the comfort of his backyard. Haldane took an interest in anthropology, human genetics and botany. He advocated the use of Vigna sinensis (cowpea) as a model for studying plant genetics. He took an interest in the pollination of the common weed Lantana camara. The quantitative study of biology was his focus and he lamented that Indian universities forced those who took up biology to give up on an education in mathematics.[43] Haldane took an interest in the study of floral symmetry. His wife, Helen Spurway, conducted studies on wild silk moths.[41] In January 1961 they befriended the young Canadian lepidopterist Gary Botting, who initially visited the Indian Statistical Institute to share the results of his experiments hybridising silk moths of the genus Antheraea. Uncomfortable with Haldane’s “communist” sympathies, the United States cultural attache, Duncan Emery, summarily cancelled Gary Botting’s attendance at a high-profile banquet to which the Haldanes had invited him to meet biologists from all over India. Haldane protested this “insult” by going on a much-publicized hunger strike.[44][45] When the director of the I.S.I., P. C. Mahalanobis, confronted Haldane about both the hunger strike and the unbudgeted banquet, Haldane resigned his post (in February 1961) and moved to a newly established biometry unit in Odisha.[41]

Being impressed by his stay in Calcutta and Bengal, Haldane later accepted Indian citizenship. That was in early 1960s, when Bengal had very impressive past record of scholarship to report. Modern epidemiology was born there.


Sir Ronald Ross, KCB, FRS[1][2] (13 May 1857 – 16 September 1932), was an Indian-born British medical doctor who received the Nobel Prize for Physiology or Medicine in 1902 for his work on malaria, becoming the first British Nobel laureate, and the first born outside of Europe. His discovery of the malarial parasite in the gastrointestinal tract of mosquito led to the realisation that malaria was transmitted by mosquitoes, and laid the foundation for combating the disease. He was quite a polymath, writing a number of poems, published several novels, and composed songs. He was also an amateur artist and natural mathematician. He worked in the Indian Medical Service for 25 years. It was during his service that he made the groundbreaking medical discovery. After resigning from his service in India, he joined the faculty of Liverpool School of Tropical Medicine, and continued as Professor and Chair of Tropical Medicine of the institute for 10 years.

In physics, S. N. Bose mathematically showed the existence of Boson particle.


Satyendra Nath Bose FRS[1] was an Indian physicist specialising in mathematical physics. He was born in Calcutta. He is best known for his work on quantum mechanics in the early 1920s, providing the foundation for Bose–Einstein statistics and the theory of the Bose–Einstein condensate.

Satyendra Nath Bose, along with Saha, presented several papers in theoretical physics and pure mathematics from 1918 onwards. In 1924, while working as a Reader (Professor without a chair) at the Physics Department of the University of Dhaka, Bose wrote a paper deriving Planck’s quantum radiation law without any reference to classical physics by using a novel way of counting states with identical particles. This paper was seminal in creating the very important field of quantum statistics. Though not accepted at once for publication, he sent the article directly to Albert Einstein in Germany. Einstein, recognising the importance of the paper, translated it into German himself and submitted it on Bose’s behalf to the prestigious Zeitschrift für Physik. As a result of this recognition, Bose was able to work for two years in European X-ray and crystallography laboratories, during which he worked with Louis de Broglie, Marie Curie, and Einstein.[4][12][13][14]

C. V. Raman worked in Calcutta and discovered Raman effect.


In 1917, Raman resigned from his government service after he was appointed the first Palit Professor of Physics at the University of Calcutta. At the same time, he continued doing research at the Indian Association for the Cultivation of Science (IACS), Calcutta, where he became the Honorary Secretary. Raman used to refer to this period as the golden era of his career. Many students gathered around him at the IACS and the University of Calcutta.

On 28 February 1928, Raman led experiments at the IACS with collaborators, including K. S. Krishnan, on the scattering of light, when he discovered what now is called the Raman effect.[8] A detailed account of this period is reported in the biography by G. Venkatraman.[9] It was instantly clear that this discovery was of huge value. It gave further proof of the quantum nature of light.

All that changed after the partition of Bengal in 1947 and Calcutta went into slow decline since 1960s, when Bengalis followed the Soviet side in the cold war. By now it lost almost all of its cultural legacy and modern Calcutta is unrecognizable by anyone, who spent time there in 1970s or 80s.

It is quite remarkable that the Statistical Institute maintained its quality during the period of Bengal’s decline. Therefore, I am hopeful that the new institute in Kalyani will combine their theoretical skills in population genetics with the availability of high-through sequencing instruments to contribute positively to science. You can find the list of faculty members and their research projects here. I am thankful to Dr. Priyadarshi Basu for inviting me there.



In the larger context, although I am very pessimistic about India, Calcutta may have hit the bottom. Everyone appears to be negative about its future, and every article you read in mainstream press describes how the place failed.

Kanika Datta: ‘West Bengal still on decline’

Decline of Bengal, death of the bhadralok

Falling total fertility rate in Kolkata sets alarm bells ringing

The Empire’s second city is now a second-grade city

The decline and fall of Kolkata

Needless to point out that the consensus of intellectuals was unusually positive about Calcutta in late 1950s, just before the decline started.

Data Science MOOC from John Hopkins (starring Steven Salzberg)

Simply Statistics blog posted the teaser trailer about a new MOOC that the readers may find interesting.

We have been hard at work in the studio putting together our next specialization to launch on Coursera. It will be called the “Genomic Data Science Specialization” and includes a spectacular line up of instructors: Steven Salzberg, Ela Pertea, James Taylor, Liliana Florea, Kasper Hansen, and me. The specialization will cover command line tools, statistics, Galaxy, Bioconductor, and Python. There will be a capstone course at the end of the sequence featuring an in-depth genomic analysis. If you are a grad student, postdoc, or principal investigator in a group that does genomics this specialization is for you. If you are a person looking to transition into one of the hottest areas of research with the new precision medicine initiative this is for you. Get pumped and share the teaser-trailer with your friends!

A perceptual hash function to store and retrieve large scale DNA sequences


This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by “avalanche effect” and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA sequences, and retrieving them.

MEDUSA: a Multi-draft Based Scaffolder


Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines.

Results: In this paper, we present MEDUSA (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MEDUSA exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MEDUSA formalises the scaffolding problem by means of a combinatorial optimisation formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MEDUSA is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MEDUSA on eukaryotic datasets has also been evaluated, leading to interesting results.

Availability: MEDUSA web server: A stand-alone version of the software can be downloaded from All results presented in this work have been obtained with MEDUSA v. 1.3.

Contact: [email protected]

The Current Status of the Introductory Book on Genome Assembly


Dear readers,

In mid-February, I made a somewhat finished draft of the book available for purchase here, but did not announce in the blog, because I was not happy with the quality. Since then, I have been working on improving a number of sections. Over the next few days, I will post the texts of those sections as separate blog commentaries and then publish an updated version of the book around April 15th by incorporating all those sections. I am quite satisfied with the changes and expect the finalized book to be useful for newcomers.

I am planning to set the price at $20 after the next release, but if you grab the book now at the leanpub site, you will be able to get it for $15. As I mentioned earlier, once you make the initial purchase from leanpub, you will always be able to access the latest version from their site at no extra cost. They give pdf, mobi and epub formats.

If you like to see a copy of the released version, please feel free to email me at [email protected] and I will send you a pdf file.


Earlier posts –

An Easy-to-follow Introductory Book on NGS Assembly Algorithms
An Update on the Introductory Book on Genome Assembly

rnaQUAST: Quality Assessment Tool for Transcriptome Assemblies

Algorithmic Biology Lab in St. Petersburg of SPAdes fame developed a new tool for evaluating quality of transcriptome assemblies using reference genome and annotation (h/t: anton). Stay tuned for more information.

3 Options

3.1 Input data options

To run rnaQuast one needs to provide either FASTA files with transcripts (recommended), or align transcripts to the reference genome manually and provide the resulting PSL files. rnaQUAST also requires reference genome and optionally an annotation.
-r , –reference
Single file with reference genome containing all chromosomes/scaffolds in FASTA format.

-gtf , –annotation
File with annotation in GTF/GFF format.

-c , –transcripts
File(s) with transcripts in FASTA format separated by space.

-psl , –alignment
File(s) with transcripts alignments in PSL format separated by space.

3.2 Basic options

-o , –output_dir
Directory to store all results. Default is rnaQUAST_results/results_.

Run rnaQUAST on the test data from the test_data folder, output directory is rnaOUAST_test_output.

-d, –debug
Report detailed information, typically used only when detecting problems.

-h, –help
Show help message and exit.

3.3 Advanced options

-t , –threads
Maximum number of threads. Default is the number of CPU cores (detected automatically).

-l , –labels
Names of assemblies that will be used in the reports separated by space.

-ss, –strand_specific
Set if transcripts were assembled using strand specific RNA-Seq data in order to benefit from knowing whether the transcript originated from the + or – strand.

Minimal alignment size to be used, default value is 50.

Do not draw plots (makes rnaQUAST run a bit faster).

FASTAG Viewer Bandage

This FASTAG viewer appears very impressive (h/t: Anton). Have you used it?

De novo assembly graphs contain assembled contigs (nodes) but also the connections between those contigs (edges), which are not easily accessible to users. Bandage is a program for visualising assembly graphs using graph layout algorithms. By displaying connections between contigs, Bandage opens up new possibilities for analysing de novo assemblies that are not possible by looking at contigs alone.


    A layout algorithm is used to automatically position graph nodes.
    Manually reposition and reshape nodes.
    Zoom, pan and rotate the view using either mouse or keyboard controls.
    View the entire assembly graph or only a region of interest.
    Copy node sequences to the clipboard or save them to file.
    Nodes can be coloured using built-in colour schemes or user-defined colours.
    Nodes can be labelled using node number, length, coverage or a user-defined label.
    Find nodes quickly in a large graph using node numbers.
    Specify the thickness of nodes and allow thickness to reflect the node’s coverage.
    Define the relationship between the length of a node and the length of its sequence.
    Two possible styles for handling reverse complement nodes:
    Single: nodes and their reverse complements are drawn as one object with no direction.
    Double: nodes are drawn using arrow heads to indicate direction and reverse complement nodes are drawn separately with arrow heads pointing the opposite direction.
    Integrated BLAST search allows for highlighting specific sequences in an assembly graph.

Ultrafast SNP Analysis using the Burrows-Wheeler Transform of Short-read Data

This recent paper appears quite interesting (h/t: Ruibang). It starts with the BWT of short read library (e.g. BCR), and skips the alignment step altogether to go straight to SNP determination.

Motivation: Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to SNP analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed.

Results: The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly,SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with supplementary data, FDC) using a desktop workstation in the case of human exome or transcriptome sequencing data and twenty minutes using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and fragment depth of coverage (FDC) defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data was large enough.

Availability: A proof-of-concept binary code for a Linux platform is available on request to the corresponding author.
Contact: [email protected]


The authors are not new to using BWT and suffix arrays for analyzing genomic data. Here are a few of their previous papers –

2009 –

Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

2009 –

Localized suffix array and its application to genome mapping problems for paired-end short reads

2011 –

Seed-set construction by equi-entropy partitioning for efficient and sensitive short-read mapping Reports from Asia – ‘Bangkok – a Wild Ride’


Bangkok – a wild ride

The crazy rain barely stopped, when I got out of the airport express train at Phaya Thai station in Bangkok. This was my last day of two weeks’ trip to various Asian countries to meet with NGS researchers. The day was reserved for buying small gifts for my kids, but the rain held me inside the hotel until mid-day.

I tried to find my way through the flooded sidewalks, when a motorcycle taxi spotted me.

“Hey, where do you want to go?”
Me: “Any shopping mall nearby?”
“What do you like to buy?”
Me: “Clothes and small things for kids.”
“Pixie, you need to go to Pixie. 40 Bahts – I take you.”

For 40 bahts (~$1), I not only got a ride but a truly adventurous wild ride that no theme park can match. Traffic moved slowly on all lanes of the highway, but the motorcycle taxi raced fast. It managed its way through by swerving in and out of lanes and taking advantage of everything from the sidewalk to the lanes for oncoming traffic. A big bus coming toward us nearly kissed my bag, when the taxi managed to get out of the oncoming lane.

“Hey, do you need a lady for tonight?” – he asked rather nonchalantly after this feat. “I am preparing my mind for 72 virgins” – I had to say. Thankfully, ‘Pixie’ mall (=Big C) arrived by then and I got out alive.

Although I did not plan to meet any researcher in Bangkok itself, I passed through the city at least three times during the trip. Bangkok is fast becoming a major international hub in Asia by playing the role of Singapore in the older days. The airport is well-connected to all over Asia through many discount airlines. As an added feature, you can venture into the city to buy cheap clothes and electronic goods. Moreover, growing Thailand is pulling all hinterlands in Laos, Cambodia and Myanmar forward. That part of the world is ready for dramatic transformations in the years ahead.

[coming up]

Hong Kong – will it become the bioinformatics capital?


Singapore – fast forward of Lee Kuan Yew’s vision

[visits to A*Star Research Institute and National University of Singapore, and general comments about Singapore]

Calcutta – a new beginning?

While in India, I visited the National Institute of Biomedical Genomics (NIBMG) in Kalyani near Calcutta (marked as Kolkata in the above map).

Continue reading here.

Seoul – what will the unification bring?

[comments on transit through Seoul and upcoming Korean unification]

Back to the land of exceptionalism

[the day started with exceptionally arrogant immigration officers, as usual]

Outrageous Prediction 2015 – USA Will Start to See Exodus of Russian Scientists

Over the years, our blog has made a number of forecasts, which seemed outrageous when we made them, but turned out to be correct over time. Please ponder over the today’s one and feel free to comment. I am leaving for airport and will add full post after finding a good wifi somewhere.



Who is your ruler?

One of the first things newcomers to USA notice is that the state capitals are not located in the largest and most well-accessed cities within the states. The capital of New York state is in Albany, not New York. The capital of Illinois is in unknown Evansville, not Chicago and the capital of California is neither in San Francisco or Los Angeles, but in Sacramento. Boston, Massachusettes is possibly the only exception. The reason for this, as is explained, is that the largest cities are already burdened with trade, commerce, manufacturing and transportation. Therefore, it is prudent to decentralize and take away the extra load of government activities on top of existing ones.

That explanation presents one with another dilemma – why did not anyone else think of it before? The act of moving the government away from the largest city ended in disastrous failures in two rare examples of the last thousand years of Indian history. One of them got the ruler labeled as a crazy guy forever, and the other one marked the beginning of the end of British Raj in India. Rulers and their associates love to be the center of attention and stay right in the middle of the largest cities.

It took me a decade to figure out the real answer. As it turns out, the rulers of US states and nation are indeed located right at the centers of the biggest cities, but we do not (or rather did not) know them as such. If you go to the centers of the big cities in USA, you see large banks. USA is a country run by banks and bankers, whereas the politics is subordinate to them. That is not an unique insight any more in 2015, and many have come to the same conclusion since the financial crisis of 2008.


Peculiarities of banking-led empires

Within US, the northern states followed the above model since early days of the country and south was fully brought in after the civil war. Post-WWII, US has been trying to take over most other countries by expanding this banking empire. The process involves –

(i) control of central bank through organizations like IMF

(ii) control of money and credit through central bank

(iii) control of media

(iv) support of ‘democracy’ without losing control of (i-iii).

Once the banks are controlled, the remaining industries can be easily taken over through a process of credit inflation, deflation and selective violence.

One can write an entire book on the above process, but I do not need to because many such books are already out there. John Perkins’ Confessions of an Economic Hit Man is fairly good.

According to his book, Perkins’ function was to convince the political and financial leadership of underdeveloped countries to accept enormous development loans from institutions like the World Bank and USAID. Saddled with debts they could not hope to pay, those countries were forced to acquiesce to political pressure from the United States on a variety of issues. Perkins argues in his book that developing nations were effectively neutralized politically, had their wealth gaps driven wider and economies crippled in the long run. In this capacity Perkins recounts his meetings with some prominent individuals, including Graham Greene and Omar Torrijos. Perkins describes the role of an economic hit man as follows:

Economic hit men (EHMs) are highly paid professionals who cheat countries around the globe out of trillions of dollars. They funnel money from the World Bank, the U.S. Agency for International Development (USAID), and other foreign “aid” organizations into the coffers of huge corporations and the pockets of a few wealthy families who control the planet’s natural resources. Their tools included fraudulent financial reports, rigged elections, payoffs, extortion, sex, and murder. They play a game as old as empire, but one that has taken on new and terrifying dimensions during this time of globalization.


Violence in a banking-led empire

Banking-led empires cyclically go through violent phases, coinciding with the terminal phases of debt deflation. They are essentially a part of the debt-collection process. The way it works is that the bankers, through their control of politicians, pass the debt on to people. The people eventually revolt and are brought under control by use of force. A country can also externalize violence and WW II is a classic example for the banks of New York.


Russia in the above context

The above introduction is necessary to provide context for the main topic of this blog post. The banking-led empire is going through another cycle of debt deflation, and everyone realizes that it will end in a war. The question is where.

US learned from the experience of WW II that they became strong, when the Europeans fought each other. So, US has been trying hard to start a war between European countries and Russia. Russian government, on the other hand, has figured out the game-plan of the banking-led empire and is trying neutralize them. One such action, which will surprise everyone when it is announced in 2-3 months, is unification of Korea.

Those failures in starting a large scale war in Europe are frustrating Americans immensely, and they are increasing the amount of propaganda. As an example of extreme propaganda, in the video attached in the top of the post, a Fox news analyst announced that US government should start killing Russians. They realize that if they cannot externalize the deflationary war to Europe, then it will come to US. Either way, Russian scientists and engineers, who emigrated to US after 1990, will increasingly feel pressurized to move to Russia.


Russia as a country

At present, Russia is one of the strongest countries culturally, socially and scientifically. Therefore, moving to Russia for Russians under politically humiliation in USA will not be the same as Somalian Muslims moving to Somalia after being harassed after 9/11. The population growth in Russia will soon exceed US, and a growing country always offers more opportunities.


To summarize, the combination of above dynamics – (i) deflation and declining quality of life in US, (ii) hostility toward Russians due to US elites’ desire to externalize war and (iii) improving quality of life in Russia – will result in exodus of Russians from US.