According to NIH Director Francis Collins, NIH is so short of money that it took funds away from almost sure Ebola vaccine development to support other projects. Three months back, he claimed that NIH could have found Ebola vaccine by now, if it had little more money.
“NIH has been working on Ebola vaccines since 2001. It’s not like we suddenly woke up and thought, ‘Oh my gosh, we should have something ready here,'” Collins told The Huffington Post on Friday. “Frankly, if we had not gone through our 10-year slide in research support, we probably would have had a vaccine in time for this that would’ve gone through clinical trials and would have been ready.”
It’s not just the production of a vaccine that has been hampered by money shortfalls. Collins also said that some therapeutics to fight Ebola “were on a slower track than would’ve been ideal, or that would have happened if we had been on a stable research support trajectory.”
This month, the same agency announced a new mega-project on million human genome sequencing and ‘precision medicine’, which is utterly wasteful as argued by Professor Ken Weiss in three informative blog posts. Dr. Weiss has been making similar arguments since late 1990s (see “How many diseases does it take to map a gene with SNPs?”) and has been right so far. The promises made by Francis Collins prior to human genome sequencing project remain mostly unfulfilled, despite having no shortage of money and technology. Only thing that came out of personalized genomics hype is the new name ‘precision medicine’ and more hype.
In the light of all this, we wonder whether it is worth having a serious discussion before throwing away billions of dollars into a new mega boondoggle. Has anyone addressed the scientific objections made by Dr. Weiss? Have we determined why the previous claims made by Francis Collins since mid-90s remain pipe dreams?
Parts of blog posts of Dr. Weiss are reproduced below.
Your money at work…er, waste: the million genomes project
Bulletin from the Boondoggle Department
In desperate need for a huge new mega-project to lock up even more NIH funds before the Republicans (or other research projects that are actually focused on a real problem) take them away, or before individual investigators who actually have some scientific ideas to test, we read that Francis Collins has apparently persuaded someone who’s not paying attention to fund the genome sequencing of a million people! Well, why not? First we had the (one) human genome project. Then after a couple of iterations, the 1000 genomes project, then the hundred thousand genomes ‘project’. So, what next? Can’t just go up by dribs and drabs, can we? This is America, after all! So let’s open the bank for a cool million. Dr Collins has, apparently, never met a genome he didn’t like or want to peer into. It’s not lascivious exactly, but the emotion that is felt must be somewhat similar.
We now know enough to know just what we’re (not) getting from all of this sequencing, but what we are getting (or at least some people are getting) is a lot of funds sequestered for a few in-groups or, more dispassionately perhaps, for a belief system, the belief that constitutive genome sequence is the way to conquer every disease known to mankind. Why, this is better than what you get by going to communion every week, because it’ll make you immortal so you don’t have to worry that perhaps there isn’t any heaven to go to after all.
What’s ‘precise’ about ‘precision’ medicine (besides desperate spin)?
The million genomes project
In the same breath, we’re hearing that we’ll be funding a million genomes project. The implication is that if we have a million whole genome sequences, we will have ‘precision medicine’ (personalized, too!). But is that a serious claim or is it a laugh?
A million is a large number, but if most variation in gene-based risk is due, as mountains of evidence shows, to countless very rare variants, many of them essentially new, and hordes of them perhaps per person, then even a million genome sequences will not be nearly enough to yield much of what is being promised by the term ‘precision’! We’d need to sequence everybody (I’m sure Dr Collins has that in mind as the next Major Slogan, and I know other countries are talking that way).
Don’t be naive enough to take this for something other than what it really is: (1) a ploy to secure continued funding perpetrated on his Genome Dream, but in the absence of new ideas and the presence of promises any preacher would be proud of, and results that so far clearly belie it; and (2) a way to protect influential NIH clients with major projects that no longer really merit continued protection, but which will be included in this one (3) to guarantee congressional support from our representatives who really don’t know enough to see through it or who simply believe or just want cover for the idea that these sorts of thing (add Defense contracting and NASA mega-projects as other instances) are simply good for local business and sound good to campaign on.
Yes, Francis Collins is born-again with perhaps a simplistic one-cause worldview to go with that. He certainly knows what he’s doing when it comes to marketing based on genetic promises of salvation. This idea is going to be very good for a whole entrenched segment of the research business, because he’s clever enough to say that it will not just be one ‘project’ but is apparently going to have genome sequencing done on an olio of existing projects. Rationales for this sort of ‘project’ are that long-standing, or perhaps long-limping, projects will be salvaged because they can ‘inexpensively’ be added to this new effort. That’s justified because then we don’t have to collect all that valuable data over again.
But if you think about what we already know about genome sequences and their evolution, and about what’s been found with cruder data, from those very projects to be incorporated among others, a million genome sequences will not generate anything like what we usually understand the generic term ‘precision’ to mean. Cruder data? Yes, for example, the kinds of data we have on many of these ongoing studies, based on inheritance, on epidemiological risk assessment, or on other huge genomewide mapping has consistently shown that there is scant new serious information to be found by simply sequencing between mapping-marker sites. The argument that the significance level will raise when we test the actual site doesn’t mean the signal will be strong enough to change the general picture. That picture is that there simply are not major risk factors except, certainly, some rare strong ones hiding in the sequence leaf-litter of rare or functionless variants.
Of course, there will be exceptions, and they’ll be trumpeted to the news media from the mountain top. But they are exceptions, and finding them is not the same as a proper cost-benefit assessment of research priorities. If we have paid for so many mega-GWAS studies to learn something about genomic causation, then we should heed the lessons we ourselves have learned.
Secondly, the data collected or measures taken decades ago in these huge long-term studies are often no longer state of the art, and many people followed for decades are now pushing up daisies, and can’t be followed up.
Thirdly, is the fact that the epidemiological (e.g., lifestyle, environment…) data have clearly been shown largely to yield findings that get reversed by the next study down the pike. That’s the daily news that the latest study has now shown that all previous studies had it wrong: factor X isn’t a risk factor after all. Again, major single-factor causation is elusive already, so just pouring funds on detailed sequencing will mainly be finding reasons for existing programs to buy more gear to milk cows that are already drying up.
Fourth, many if not even most of the major traits whose importance has justified mega-epidemiological longterm follow up studies, have failed to find consistent risk factors to begin with. But for many of the traits, the risk (incidence) has risen faster than the typical response to artificial selection. In that case, if genomic causation were tractably simple, such strong ‘selection’ should reflect those few genes whose variants respond to the changed environmental circumstances. But these are the same traits (obesity, stature, diabetes, autism,…..) for which mapping shows that single, simple genetic causation does not obtain (and, again, that assumes that the environmental risk factors purportedly responsible are even identified, and the yes-no results just mentioned above shows otherwise).
Worse than this, what about the microbiome or the epigenome, that are supposedly so important? Genome sequencing, a convenient way to carry on just as before, simply cannot generally turn miracles in those areas, because they require other kinds of data (and, not available from current sequencing samples nor, of course, from deceased subjects even if we had stored their blood samples).
Somatic mutation: does it cut both ways?
Beware, million genome project!
What has this got to do with the million genome project? An important fact is that SoMu’s are in body tissues but are not part of the constitutive (inherited) genome, as is routinely sampled from, say, a cheek swab or blood sample. The idea underlying the massive attempts at genomewide mapping of complex traits, and the new culpably wasteful ‘million genomes’ project by which NIH is about to fleece the public and ensure that even fewer researchers get grants because the money’s all been soaked up by DNA sequencing, Big Data induction labs, is that we’ll be able to predict disease precisely, from whole genome sequence, that is, from constitutive genome sequence of hordes of people. We discussed this yesterday, perhaps to excess. Increasing sample size, one might reason, will reduce measurement error and make estimates of causation and risk ‘precise’. That is in general a bogus self-promoting ploy, among other reasons because rare variants and measurement and sample errors or issues may not yield a cooperating signal-to-noise ratio.
So I think that the idea of wholesale, mindless genome sequencing will yield some results but far less than is promised and the main really predictable result, indeed precisely predictable result, is more waste thrown onto mega-labs, to keep them in business.
Anyway, we’re pretty consistent with our skepticism, nay, cynicism about such Big Data fads as mainly grabs in tight times for funding that’s too long-lasting or too big to kill, regardless of whether it’s generating anything really useful.
A few days back we mentioned about the unfortunate experience of Adam Eyre-Walker, a well-respected evolutionary biologist, who was asked to provide bank statements to show that he was indeed poor enough to get PLoS fee waiver.
Poor People Need to Provide Bank Statements to Publish in PLoS
Thanks to social media storm and Mike Eisen’s efforts, that PLoS policy seemed to have been changed for good (or at least until the next social media storm about Bill Gates trying to get fee waiver at PLoS reverses it). PLoS published the following letter in their webpage and explained that the request to Adam Eyre-Walker was an error.
PLOS Clarifies its Publication Fee Assistance Policy
PLOS would like to clarify the policy by which authors can apply for fee assistance in the form of a partial or full fee waiver. Authors who are unable to obtain financial support from institutional, library, government agencies or research funders to pay for publication are not expected to self-fund these costs. In short, PLOS does not expect authors to fund publication fees through their personal funds.
Based on a misinterpretation of the organization’s Publication Fee Assistance (PFA) policy, requests were made or implied for individual financial information from certain PFA applicants. This was done in error. We regret any confusion it may have caused for applicants and any other members of the community. The process for communicating with PFA applicants and the language used on relevant PLOS application forms have now been corrected.
PLOS is committed to ensuring that the availability of research funding is not a barrier to publishing scientific research. Our Global Participation Initiative covers the full cost of publishing for authors from low-income countries, and covers most of the cost for authors from middle-income countries. Our PFA program has always and continues to support those with demonstrated need who are unable to pay all or part of their publication fees.
We wrote about Meraculous assembler over two years back (see Genome Assembly – MERmaid and Meraculous) and even then it was noteworthy for implementing perfect hash data structure for storing the graph. Reader J. Zola pointed out that the program improved significantly since then.
You should also point out that in SC 2014 Aydin Buluc et al. published what is probably the most scalable parallel version of de Bruijn graph construction. The algorithm has been designed for and incorporated into Meraculous. More here: http://dl.acm.org/citation.cfm?id=2683642.
Readers can access the paper on scalable parallel dBG construction here. Performance improvement from days to seconds appears very impressive !
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state-of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.
We posted about a number of publications from David Tse’s group investigating fundamental limits for assembly algorithms (see here and here). In their
latest paper, they look at how noise in the reads affect those fundamental limits.
While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.
It should be obvious that if the reads are 100% noisy, no assembly will be possible no matter how long the reads are. What is then the ‘fundamental limit’ for the error rate in the reads?
The answer depends on the distribution of errors as explained in the paper.
Our results show that for several actual genomes, if we are in a dense-read model with reads 20-40% longer than the noiseless requirement `crit(s), perfect assembly feasibility is robust to erasures at a rate of about 10%. While this is not as optimistic as the message from , we emphasize that we consider an adversarial error model. When errors instead occur at random locations, it is natural to expect less stringent requirements.
Tomasetti, Vogelstein and co-authors recently published another informative paper on somatic mutations and cancer, but it got lost due to cacophony related to their Science paper. This paper has a lot more details on their mathematical model.
Cancer arises through the sequential accumulation of mutations in oncogenes and tumor suppressor genes. However, how many such mutations are required for a normal human cell to progress to an advanced cancer? The best estimates for this number have been provided by mathematical models based on the relation between age and incidence. For example, the classic studies of Nordling [Nordling CO (1953) Br J Cancer 7(1):68–72] and Armitage and Doll [Armitage P, Doll R (1954) Br J Cancer 8(1):1–12] suggest that six or seven sequential mutations are required. Here, we describe a different approach to derive this estimate that combines conventional epidemiologic studies with genome-wide sequencing data: incidence data for different groups of patients with the same cancer type were compared with respect to their somatic mutation rates. In two well-documented cancer types (lung and colon adenocarcinomas), we find that only three sequential mutations are required to develop cancer. This conclusion deepens our understanding of the process of carcinogenesis and has important implications for the design of future cancer genome-sequencing efforts.
Speaking of criticism about the Science paper, the biggest part of nonsense comes from Yaniv Erlich, a poorly trained technician at Broad Institute, who keeps harping on their use of log-log distribution. Folks, log-log distribution has been used in cancer literature related to somatic mutations since the 1950s, such as these two classic papers – A New Theory on the Cancer-inducing Mechanism – C. O. Nordling and The Age Distribution of Cancer and a Multi-stage Theory of Carcinogenesis – P. Armitage and R. Doll. Everyone knows that it represents y=x^a type of mathematical relationship, including Nordling, who mentioned in his paper and Tomasetti et al., who reproduced the math of Armitage and Doll in their PNAS paper.
We believe Yaniv Erlich should stick to showing his ignorance on read coherence rather than diversifying into many different areas.
Readers may recall our post about Rayan Chikhi, Guillaume Rizk, Dominique Lavenier and their collaborators converting their efficient programs like Minia into an entire library with useful modules. Now others are building on top of GATB and LoRDEC is one success story. You can access the paper at this link (h/t: E. Rivals).
Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space.
Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy.
Availability and implementaion: LoRDEC is written in C++, tested on Linux platforms and freely available at http://atgc.lirmm.fr/lordec.
Readers may also find the following posts relevant.
Very Efficient Hybrid Assembler for PacBio Data
Cerulean: A Hybrid Assembly using High Throughput Short and Long Reads
This paper was posted in arxiv in early 2014, but we forgot to mention here. Ruibang Luo (one of the authors) mentioned that the paper is now accepted.
Background: Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alternative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC board contains only ~60 cores (while a GPU board typically has over a thousand cores). Results: To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MICs limitation and the extra parallelism inside each MIC core. Experiments on aligning 150bp paired-end reads show that MICA using one MIC board is 4.9 times faster than the BWA-MEM (using 6-core of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU). Furthermore, MICAs simplicity allows very efficient scale-up when multiple MIC boards are used in a node (3 cards give a 14.1-fold speedup over BWA-MEM). Summary: MICA can be readily used by MIC-enabled supercomputers for production purpose. We have tested MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour less than 400 nodes. MICA has impressive performance even though the current MIC is at its initial stage of development (the next generation of MIC has been announced to release in late 2014).
Readers may remember our commentary about the positivity lady.
Tragedy of the Day: PNAS Got Duped by Positivity Lady !!
In 2005, she wrote a paper to link human happiness with nonlinear dynamics and came up with a precise ratio of 2.9013 (critical positivity ratio) to improve life !!
The critical positivity ratio (also known as the Losada ratio or the Losada line) is a largely discredited concept in positive psychology positing an exact ratio of positive to negative emotions which distinguishes “flourishing” people from “languishing” people. The ratio was proposed by Marcial Losada and psychologist Barbara Fredrickson, who identified a ratio of positive to negative affect of exactly 2.9013 as separating flourishing from languishing individuals in a 2005 paper in American Psychologist. The concept of a critical positivity ratio was widely embraced by both academic psychologists and the lay public; Fredrickson and Losada’s paper was cited nearly 1,000 times, and Fredrickson wrote a popular book expounding the concept of “the 3-to-1 ratio that will change your life”. Fredrickson wrote: “Just as zero degrees Celsius is a special number in thermodynamics, the 3-to-1 positivity ratio may well be a magic number in human psychology.”
In 2013, the critical positivity ratio aroused the skepticism of Nick Brown, a graduate student in applied positive psychology, who felt that the paper’s mathematical claims underlying the critical positivity ratio were fundamentally flawed. Brown collaborated with physicist Alan Sokal and psychologist Harris Friedman on a re-analysis of the paper’s data. They found that Fredrickson and Losada’s paper contained “numerous fundamental conceptual and mathematical errors”, as did Losada’s earlier work on positive psychology, which completely invalidated their claims. Losada declined to respond to the criticism, indicating that he was too busy running his consulting business. Fredrickson wrote a response in which she conceded that the mathematical aspects of the critical positivity ratio were “questionable” and that she had “neither the expertise nor the insight” to defend them, but she maintained that the empirical evidence was solid. Brown and colleagues, whose response was published the next year, maintain that there is no evidence for the critical positivity ratio whatsoever.
That paper got retracted eight years later, when a group of scientists, including well-known physicist Alan Sokal, called her BS. By then, the positivity lady was on to her new venture involving genomics and, believe it or not, bioinformatics to show the effect of positivity on life. Our earlier blog post linked above was about another nonsensical claim from her that uses gene expression analysis to show impact of purpose in life. Last year, we published a paper in PNAS to again show that her claims were all meaningless.
A critical reanalysis of the relationship between genomics and well-being
This article critically reanalyzes the work of Fredrickson et al. [Fredrickson BL, et al. (2013) Proc Natl Acad Sci USA 110(33):13684–13689], which claimed to show that distinct dimensions of psychological well-being are differentially correlated with levels of expression of a selection of genes associated with distinct forms of immune response. We show that not only is Fredrickson et al.’s article conceptually deficient, but more crucially, that their statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless. We believe that our findings may have implications for the reevaluation of other published genomics research based on comparable statistical analyses and that a variant of our methodology might be useful for such a reevaluation.
Today we are shocked to learn that NIH is now funding her ‘proven-by-genetics’ method to cure human beings !! This grant seems to have gotten approved after the publication of our paper. Therefore, NIH is completely ignoring all criticisms to allow this disgusting junk science to proceed.
Here is part of the abstract of the grant application –
An innovative upward spiral theory o lifestyle change positions warm and empathic emotional states as key pathways to unlocking the body’s inherent plasticity to reverse entrenched biological risk factors. The PI’s team has identified an affective intervention – the ancient practice of loving-kindness meditation (LKM) – that produces salubrious biological effects in healthy midlife adults. The innovation of the present study lies in testing this affective intervention in a sample of midlife adults on poor health trajectories by virtue of having low childhood SES plus present-day pathogenic behavioral tendencies (i.e., impulsivity and mistrust). A dual-blind placebo-controlled randomized controlled trial (RCT) is designed to provide proof of principle that early-established biological risks factors are mutable, not permanent.
If you are doing real science and cannot get NIH grant, please contact the grant manager Nielsen Lisbeth at National Institute of Aging to stop this nonsense. You can find her email address at this link.
NIH director Francis Collins wrote a new article in JAMA to sell the latest boondoggle of ‘precision medicine’.
Exceptional Opportunities in Medical Science – A View From the National Institutes of Health
Quite ironically, he made a strong case for his resignation from NIH and focusing on more productive activities (such as reading informative JAMA articles he failed to read).
Collins wrote the following text in the introduction –
As the world’s largest source of biomedical research funding, the US National Institutes of Health (NIH) has been advancing understanding of health and disease for more than a century. Scientific and technological breakthroughs that have arisen from NIH-supported research account for many of the gains that the United States has seen in health and longevity.
For example, an infant born today in the United States can look forward to an average lifespan of about 79 years—nearly 3 decades longer than one born in 1900.
The sense you get from the above text is that NIH helped greatly in improving life expectancy of Americans. The rest of the article suggests that NIH should be funded more to continue doing its great work.
But how is NIH doing under Collins? Let us see how much US life expectancy at birth improved since Collins joined NIH in 1993 to lead the human genome project. The answer comes from a figure in another important JAMA article Collins failed to read (“The Anatomy of Health Care in the United States”). We cannot access the original article, but the relevant figure is available from Incidental Economics blog.
As you can see, gap between the life expectancy of Americans and other OECD residents is increasing steadily ever since Collins started to lead HGP/NHGRI/NIH. USA is falling behind.
Since NIH claimed credit for gain in life expectancy since 1900, following the same logic, this is the clearest admission of failure of NIH under Francis Collins for over two decades.
Readers may also enjoy a good article written by Mike Eisen to call the bullshit of Francis Collins –
NIH Director Francis Collins’ ridiculous “We would have had an Ebola vaccine if the NIH were fully funded”
The guy, who turned Libya into prosperous democracy (not), punished the evil Wall Street bankers (not), made Middle East safe (not), gave free health insurance to everyone (not) and forced Afgan jihadis to hide back in caves (not), is now on to his new venture – ‘personalized medicine’. Thanks to the PR team, this also has a new name – ‘precision medicine’.
Tonight, I’m launching a new Precision Medicine Initiative to bring us closer to curing diseases like cancer and diabetes — and to give all of us access to the personalized information we need to keep ourselves and our families healthier.
Here is the touted success story –
I want the country that eliminated polio and mapped the human genome to lead a new era of medicine — one that delivers the right treatment at the right time. In some patients with cystic fibrosis, this approach has reversed a disease once thought unstoppable.
How good is that success story? Vox.com has some numbers.
All these treatments are still incredibly costly
There are big barriers here between the dream of personalized medicine and the reality. For the most part, the science of genetics just isn’t refined enough to help most patients, and developing targeted therapies is hugely expensive and time-consuming.
IT TOOK 24 YEARS AND TENS OF MILLIONS OF DOLLARS TO GET FROM THE DISCOVERY OF THE CYSTIC FIBROSIS MUTATION TO FDA APPROVAL. For example, Bill Gardner at the Incidental Economist ran some numbers on the promising cystic fibrosis therapy: “It took 24 years and tens of millions of dollars to get from the discovery of the CFTR [the particular genetic mutation that causes CF in some people] to the FDA approval of a drug. Moreover, this drug was designed for a mutation found in only a small fraction of the population of an already rare disease.”
Gardner noted that the drug costs about $300,000 per year, not only because of the manpower and years of research behind it, but because the market for the drug is small: “Precisely because the treatments are targeted at phenomena at the level of specific harmful mutations, they are not just personalized but practically bespoke, and correspondingly pricey.”
But do facts matter any more, when a country stands far apart from others?
[Figure from Calamities of Nature website and Vox.com].