Rushed Review by Nature Biotechnology on a Cancer Bioinformatics Paper?

One of our readers, who is a PhD student, has been struggling with trying to figure out what is going on with the paper – “Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes“. The corresponding author is broadster Gad Getz, who did not respond to his email correspondences.


In addition to his role at the Broad, Getz is a co-principal investigator in the Genome Data Analysis Center (GDAC) of the NCI/NHGRI TCGA (The Cancer Genome Atlas) project; a co-leader of the International Cancer Genome Consortium (ICGC) Pan-Cancer Analysis of Whole Genomes (PCAWG) project; a co-principal investigator of the Broad-led NCI Cloud Pilot; and a member of various NCI advisory committees. In addition, Getz directs the Bioinformatics Program at the Massachusetts General Hospital Cancer Center and Department of Pathology and serves as an associate professor of pathology at Harvard Medical School. Getz is also the inaugural incumbent of the Paul C. Zamecnik Chair in Oncology at the MGH Cancer Center. He has published numerous papers in recent years in prominent journals that describe new genes and pathways involved in different tumor types.

Maybe it is time for him to publish in real scientific journals instead than ‘prominent’ magazines.

Here are the specific questions –

The article compares a number of algorithms, but provides no details about any of the software versions or parameter settings. It’s in contrast to the editorial position presented in which states “Since last October, all Nature journals have required that authors declare the location and accessibility of any custom code and software central to the main claims in a paper …” If the method has double the accuracy of some existing methods, I’d like to see what parameters they used on the existing methods or what versions they ran. Remarkably, the authors didn’t even provide any of the parameters of their own software, making none of their analyses reproducible. With poor quality documentation like shown below, it would be necessary to have reproducible code.

Input parameters:
-bam: path to the BAM file to be used for HLA typing
… …
-format: fastq format (STDFQ, ILMFQ, ILM1.8 or SLXFQ; see Novoalign documentation)
Source :

Why is the input a BAM file, but there’s a parameter to specify the FASTQ format ? Which parameters are optional and which are mandatory ? The documentation must have been an afterthought.


I’ve also noticed they have a crucial supplementary table 11 missing from their supplementary materials and described as being available in dbGaP, but is nowhere to be found.

Turkey and Thanksgiving

Prior to thanksgiving, people are looking for –


On the first topic, the following story came on the top. It is noteworthy for what passes as ‘science’ this days (shown in emphasis below).

Brining turkey is the worst, according to science

As BuzzFeed showed in a 13-person taste test, a brined turkey may be moist, but it is also flavorless and boring. The skin doesn’t crisp. The taste doesn’t wow. But that leaves home chefs with a problem: How do you get a turkey that is both juicy and flavorful?


On the third topic –

Turkish Shootdown of Russian Jet: What You Need to Know


A U.S. official told Reuters that the Russian jet was inside of Syria when it was shot down:

The United States believes that the Russian jet shot down by Turkey on Tuesday was hit inside Syrian airspace after a brief incursion into Turkish airspace, a U.S. official told Reuters, speaking on condition of anonymity.

Russia denies that the Russian fighter jet – which was bombing ISIS – ever entered Turkish air space, and has put out its own map purporting to prove that claim.

The Russian jet pilots who parachutted free of their burning plane were then purportedly killed by Turkish rebels inside Syria. If true, this is a war crime.

Then – when a Russian helicopter tried to save the pilots – it was shot down by American-backed Syrian rebels – using weapons provided to them by the United States – and a Russian marine was killed.

Russia is deploying a warship off the Syrian coast to “destroy any threats to Russian planes”. Many believe this is the start of World War III.

While the U.S. and NATO tried to blame Russia, German Vice-Chancellor Sigmar Gabriel slammed Turkey:

“This incident shows for the first time that we are to dealing with an actor who is unpredictable according to statements from various parts of the region – that is not Russia, that is Turkey,” Gabriel said, as cited by DPA news agency. He added that Turkey was playing “a complicated role” in the Syrian conflict.

Indeed, NATO-member Turkey is MASSIVELY supporting ISIS, provided chemical weapons used in the jihadi’s massacre of civilians, and has been bombing ISIS’ main on-the-ground enemy – Kurdish soldiers – using its air force. And some of the Turkish people are also unsympathetic to the victims of ISIS terrorism.Turkey was also instrumental in the creation of ISIS. An internal Defense Intelligence Agency (DIA) document produced recently shows, the U.S. knew that the actions of “the West, Gulf countries and Turkey” in Syria might create a terrorist group like ISIS and an Islamic CALIPHATE.

As the former DIA head explained:

It was a willful decision [by Turkey, the West and Gulf countries] to … support an insurgency that had salafists, Al Qaeda and the Muslim Brotherhood ….

It was a willful decision [by Turkey, the West and Gulf countries] to … support an insurgency that had salafists, Al Qaeda and the Muslim Brotherhood ….


On link between Thanksgiving Turkey and the country Turkey, here is an interesting factoid – The bird is called ‘hindi’ (means Indian) in Turkey.

The Mistake That Gave Turkey (the Bird) the Same Name as Turkey (the Nation)

The former center of the Ottoman Empire isn’t exactly a breeding ground for the bird that Americans associate with Thanksgiving. In fact, the turkey is native to North America, so why do they share the same name?

First, let’s get the facts on the two turkeys. The word turkey has been used to refer to “land occupied by the Turks” since the 1300s and was even used by Chaucer in The Book of the Duchess. The word Turk is of unknown origin, but it is used in such varying languages as Italian, Arabic, Persian, and many others to refer to people from this region. The land occupied by the Turks was known as the Ottoman Empire from the 1300s until 1922. Following World War I and the fall of the Ottomans, the republic of Turkey was declared, taking on the name that had long referred to that region. The bird is another story. Meleagris gallopavo is an odd-looking bird that is known for its bare head, wattle, and iridescent plumage.

How are they related? First, we have to get to know another bird: the guinea fowl. This bird bears some resemblance to the then-recently found American bird. Though it is native to eastern Africa, the guinea fowl was imported to Europe through the Ottoman Empire and came to be called the turkey-cock or turkey-hen. When settlers in the New World began to send similar-looking fowl back to Europe, they were mistakenly called turkeys.

Every language seems to have radically different names for this bird. The Turkish word is hindi, which literally means “Indian.” The original word in French, coq d’Inde, meant rooster of India, and has since shortened to dinde. These names likely derive from the common misconception that India and the New World were one and the same. In Portuguese, it’s literally a “Peru bird,” and in Malay, it’s called a “Dutch chicken.”

The turkey’s acceptance into the Old World happened quickly. By 1575, the English were enjoying the North American bird at Christmas dinner, and Shakespeare talked about it in Henry IV. Turkeys, as we know them, have fared better than their guinea fowl relatives on the international scene, perhaps explaining why you probably have never heard of guinea fowl until right now.


Peak Sequencing Consequence? – BGI Delays Launch of ‘Nation-scale’ Sequencer

In June, we reported about BGI’s announcement of ‘Nation-scale’ sequencer (Peak Sequencing? BGI Unveils ‘Nation-scale’ Sequencer) and commented –

it clearly looks like the society is reaching ‘peak sequencing’ and should invest in other components of medical science.

Not so surprisingly, BGI possibly came to the same conclusion five months later and decided to delay the launch of its mega-sequencer to expedite their desktop version. Surprised BIO-IT World reports –

Complete Genomics Loses Staff, Delays Revolocity as BGI Seeks Support for BGISEQ-500

Complete Genomics of Mountain View, Calif., is undergoing a big shakeup under the direction of its Shenzhen-based owner BGI, Julia Karow reports at GenomeWeb. The company, once the largest sequencing-as-a-service provider in the U.S., will lay off what Karow reports is a “substantial” number of employees and refocus its R&D efforts in support of BGI’s new BGISEQ-500 sequencer, revealed last month at the International Conference on Genomics.

The reorganization comes at a surprising time, as Complete Genomics was in the midst of its own product launch, of the ultra-high-throughput Revolocity sequencing system. Three customers, pursuing large-scale whole genome sequencing for patient care, had announced their purchases of the system, at a price of $12 million each. Complete Genomics was preparing to install these Revolocity systems, each a miniature sequencing factory unto itself, in early 2016.

Now these installations will be delayed, and there is no indication of when or if the contracts can be honored.

As recently as one month ago, Complete Genomics CEO Cliff Reid spoke to Bio-IT World about his plans for Revolocity, and the efforts his company had gone through to reinvent itself as a commercial manufacturer. “What [BGI] needed from us was to be a packaged product supplier,” Reid said at the time, referring to his pivot away from sequencing-as-a-service after his company’s acquisition in 2013. “So it accelerated our move from a pure services business to being able to distribute our technology globally, and it caused us to go through some major packaging processes.”

Matthew Cunningham-Cook: How the TPP Will Create a Medical Privacy Hellscape

By Matthew Cunningham-Cook, who has written for the International Business Times, The New Republic, Jacobin, Aljazeera, and The Nation and has been a labor activist

On October 6, the European Court of Justice issued a sweeping ruling invalidating the existing cross-Atlantic data transfer agreement, putting the entire business model of companies like Facebook and Google at risk. The ruling gives data privacy regulators in individual EU states expansive powers to demand data localization from multinational tech firms. Observers noted that the Snowden revelations contributed to the decision, with EU judges looking unfavorable at the fact that the NSA had basically unfettered access to the data of EU citizens.

Lo and behold, just a month later comes a trade agreement that will make sure that Facebook and Google’s little legal problems in Europe won’t happen in. say, Australia, Japan, New Zealand or Canada.

To wit, from the TPP’s electronic commerce chapter:

Each Party shall allow the cross-border transfer of information by electronic means, including personal information, when this activity is for the conduct of the business of a covered person.

Public Citizen as always had a good rundown, right after the TPP’s release. “The E-Commerce chapter has serious implications for online privacy,” said Peter Maybarduk, director of Public Citizen’s information society program. “The text reveals that policies protecting personal data when it crosses borders could be subject to challenge as a violation of the TPP.”

The Public Citizen press release also points out that “These TPP standards replicate language in World Trade Organization agreements under which tribunals have ruled against domestic policies in 43 of 44 challenges.”

But beyond the E-commerce chapter’s impact on Facebook and Google, which has been discussed, I’m interested in how there is no carveout for medical data. The TPP language means that insurers and other companies can take medical data across borders willy-nilly without any type of fear of running into pesky data privacy laws–like, say HIPAA, which protects personal health information from misuse.

This is particularly interesting in the case of Vietnam. A memo from the international law firm Russin and Vecchi states that:

Notwithstanding the existence of some privacy regulations that relate to healthcare services, certain gaps remain. Is a healthcare entity liable for a breach of a patient’s privacy by a doctor or medical worker employed by that entity? If yes, to what extent is the healthcare entity liable? May private information about a patient be stored, used and transferred within a healthcare entity and, if so, to what extent? Who may have access to a patient or his private information during a medical examination and/or treatment?

Basically a legal wild west for data, now in the TPP zone, being advertised as a great place for more IT offshoring.

It’s unfortunately widely accepted–even in the EU–that companies like Facebook and Google consider consumer data a commodity to be bought and sold. There is little variance across the world as to this fact. But medical data is a whole other area entirely, with a range of laws protecting medical privacy across the TPP zone. But what happens when medical data is transferred to another country? The EU’s Directive on Data Protection explicitly prohibits the offshoring of EU citizen data to countries with lower security standards. But HIPAA has none of the same protections–an overhaul of HIPAA to make its protections stronger could be prevented by TPP rules.

The Inspector General of the Department of Health and Human Services already found data protections sorely lacking in 2014, when it wrote: “For example, Medicaid agencies or domestic contractors who send [personal health information] offshore may have limited means of enforcing provisions of BAAs [business associate agreements] that are intended to safeguard PHI. Although some countries may have privacy protections greater than those in the United States, other countries may have limited or no privacy protections to support HIPAA compliance.”

So the short of it is this: medical data protection in the US is already poor compared to the EU, and TPP could preempt any effort to strengthen protections–sending any changes directly to an investor-state tribunal, where it is more likely than not to be overturned.

Yet another reason to oppose this truly awful, anti-people deal.

Friday Humor

Bioinformatics software testing


Evolutionary biology


US intelligence


College education


Discovering Russia and Russians through the Eyes of an Ex-US Military Analyst

One of our readers criticized us for the grave crime of ‘supporting Russia’. We generally do not support any government organization, but we do try to discover and present the truth, and that truth may not happen to coincide with what media sells in the western countries. The process of discovering the truth involves reading many contrasting accounts, and then come to the conclusion about what may have happened.

A few of our readers are so brainwashed that reading anything contrary to their belief system troubles them. For the rest, we present an interesting book that just came out. We have not read the book yet, but are well familiar with the blog of the author. Here is his brief bio from the book.

The Saker was born in a military family of “White” Russian refugees in western Europe where he lived most of his life. After completing two college degrees in the USA, he returned to Europe were he worked as a military analyst until he lost his career due to his vocal opposition to the western-sponsored wars in Chechnia, Croatia, Bosnia and Kosovo. After re-training as a software engineer, he moved to the Florida where he now lives with his wife, a veterinarian, and their three children. When he does not blog or help his wife at work, the likes to explore the Florida wilderness on foot, mountain bike and kayak or play acoustic jazz guitar.

The following long blog post gives you a more colorful personal account of his background.

“Submarines in the desert” (as my deepest gratitude to you)

My life has been one of ups and downs. Early on, after a pretty nasty childhood, it went up, rather rapidly. Then came the “fall from (pseudo-) grace” and I lost my career. It is still too early to go into all the details, but let’s just say that I used to be associated with a “three letter outfit” whose existence was not well-known by the general public and which has since been disbanded. In my field, I got to the proverbial ‘top’ pretty early on, but soon the war in Bosnia began to open my eyes to many things I had never suspected before. Then I found out about two things which got me blacklisted in my own, putatively democratic, country: I found out that a group of people had uselessly been murdered as a result of the criminal incompetence of their superiors and I found out that one guy had taken a long jail sentence while all this superiors had managed to walk away from a crime they all had committed. And even though I never went public, or even told my closest friends about it (to protect them), I was blacklisted and prevented from ever working again.

In those dark days my wonderful wife was always trying to tell me that it was not my fault, that I had never done anything wrong, that I was paying the price for being a person of integrity and that I had proven many times over how good I was in my field. I always used to bitterly reply to her that I was like a “submarine in a desert”: maybe very good at “something somewhere”, but useless in my current environment (I always used to visualize a Akula-class SSN stranded smack in the middle of the Sahara desert – what a sight that would be! I wish somebody would use a Photoshop-like software to create that pic). What I have found out since, is that our planet is covered with deserts and that there are many, many submarines in them, all yearning for the vastness of an ocean.

The book is available at the following link –

The Essential Saker: from the trenches of the emerging multipolar world

These are some of the most essential articles written by the Saker on his blog. Even though they cover topics ranging from history, to politics, to religion, to military affairs, to social issues, they are all linked by one common thread: the full-spectrum clash between the Western world and what the Saker calls the “Russian civilizational realm”. Most Russians, especially when addressing a western audience, feel compelled to use a diplomatic and non-confrontational language. In contrast, Saker’s style is informal, almost conversational, but also direct, even blunt. He is fully aware that his views might offend many of his readers, but he believes that there is also a bigger audience out there which will appreciate an honest and, above all, sincere criticism of what the Saker calls the “AngloZionist Empire”. The careful reader, however, will notice that the Saker’s criticisms are always aimed at a political system and its constituent institutions and supporting ideologies, but never at the people, nations or ethnicities. In fact, the Saker forcefully argues for a multi-cultural, multi-ethnic and multi-religious Russia which would be fully integrated in a multi-polar world inspired by the fraternal diversity of the BRICS countries. Underlying the Saker’s entire worldview is a categorical rejection of all ideologies and a profound belief that the root of all evil as well as the key to defeating it is always in the realm of spirituality.

A Different Vision from ‘Singularity’

Ray Kurzweil, one of Silicon Valley’s favorite nutcases, popularized the concept of technological singularity in his book “The Singularity Is Near: When Humans Transcend Biology”.

The book builds on the ideas introduced in Kurzweil’s previous books, The Age of Intelligent Machines (1990) and The Age of Spiritual Machines (1999). This time, however, Kurzweil embraces the term the Singularity, which was popularized by Vernor Vinge in his 1993 essay “The Coming Technological Singularity” more than a decade earlier.[1] The first known use of the term in this context was made in 1958 by the Hungarian born mathematician and physicist John von Neumann.

Kurzweil describes his law of accelerating returns which predicts an exponential increase in technologies like computers, genetics, nanotechnology, robotics and artificial intelligence. He says this will lead to a technological singularity in the year 2045, a point where progress is so rapid it outstrips humans’ ability to comprehend it.

With this book and previous ones, he has now become the spiritual (religious) leader of the singularity church. The followers have their yearly summit named ‘Singularity summit’.

The Singularity Summit is the annual conference of the Machine Intelligence Research Institute. It was started in 2006 at Stanford University[1][2] by Ray Kurzweil, Eliezer Yudkowsky, and Peter Thiel, and the subsequent summits in 2007, 2008, 2009, 2010, and 2011 have been held in San Francisco, San Jose, New York, San Francisco, and New York, respectively. Some speakers have included Sebastian Thrun, Sam Adams, Rodney Brooks, Barney Pell, Marshall Brain, Justin Rattner, Peter Diamandis, Stephen Wolfram, Gregory Benford, Robin Hanson, Anders Sandberg, Juergen Schmidhuber, Aubrey de Grey, Max Tegmark, and Michael Shermer.

There have also been spinoff conferences in Melbourne, Australia in 2010, 2011 and 2012. Previous speakers include David Chalmers, Lawrence Krauss, Gregory Benford, Ben Goertzel, Steve Omohundro, Hugo de Garis, Marcus Hutter, Mark Pesce, Stelarc and Randal A. Koene.

Kurzweil got hired by Google, a company now run by a former classmate of mine. Many of my other classmates got hired by Google, and the rest are mostly scattered around various tech-heavy companies in the valley. They may not enjoy the heretical message that their concept of singularity is a gigantic illusion just like all other religious concepts.

The biggest question about ‘technological singularity’ is who will pay to maintain the technology. The US boom over the last 20 years had been funded by the pensions of baby boomers, who sponsored the initial cost. The pension money kept on coming due to demographic reasons, and this gave the companies building technology (primarily around Silicon Valley) the illusion of exponential growth. Given that both the buildup and maintenance of contemporary technologies are extremely capital intensive, it is expected that the infrastructure will collapse without continuous money pouring into the system. A good example is shown below –

Following the same argument, here is a ‘non-singularity’ vision of future.

The Death of the Internet: A Pre-Mortem

All this has been on my mind of late as I’ve considered the future of the internet. The comparison may seem far-fetched, but then that’s what supporters of the SST would have said if anyone had compared the Boeing 2707 to, say, the zeppelin, another wave of the future that turned out to make too little economic sense to matter. Granted, the internet isn’t a subsidy dumpster, and it’s also much more complex than the SST; if anything, it might be compared to the entire system of commercial air travel, which we still have with us or the moment. Nonetheless, a strong case can be made that the internet, like the SST, doesn’t actually make economic sense; it’s being propped up by a set of financial gimmickry with a distinct resemblance to smoke and mirrors; and when those go away—and they will—much of what makes the internet so central a part of pop culture will go away as well.

It’s probably necessary to repeat here that the reasons for this are economic, not technical. Every time I’ve discussed the hard economic realities that make the internet’s lifespan in the deindustrial age roughly that of a snowball in Beelzebub’s back yard, I’ve gotten a flurry of responses fixating on purely technical issues. Those issues are beside the point. No doubt it would be possible to make something like the internet technically feasible in a society on the far side of the Long Descent, but that doesn’t matter; what matters is that the internet has to cover its operating costs, and it also has to compete with other ways of doing the things that the internet currently does.

It’s a source of wry amusement to me that so many people seem to have forgotten that the internet doesn’t actually do very much that’s new. Long before the internet, people were reading the news, publishing essays and stories, navigating through unfamiliar neighborhoods, sharing photos of kittens with their friends, ordering products from faraway stores for home delivery, looking at pictures of people with their clothes off, sending anonymous hate-filled messages to unsuspecting recipients, and doing pretty much everything else that they do on the internet today. For the moment, doing these things on the internet is cheaper and more convenient than the alternatives, and that’s what makes the internet so popular. If that changes—if the internet becomes more costly and less convenient than other options—its current popularity is unlikely to last.

Let’s start by looking at the costs. Every time I’ve mentioned the future of the internet on this blog, I’ve gotten comments and emails from readers who think that the price of their monthly internet service is a reasonable measure of the cost of the internet as a whole. For a useful corrective to this delusion, talk to people who work in data centers. You’ll hear about trucks pulling up to the loading dock every single day to offload pallet after pallet of brand new hard drives and other components, to replace those that will burn out that same day. You’ll hear about power bills that would easily cover the electricity costs of a small city. You’ll hear about many other costs as well. Data centers are not cheap to run, there are many thousands of them, and they’re only one part of the vast infrastructure we call the internet: by many measures, the most gargantuan technological project in the history of our species.

Your monthly fee for internet service covers only a small portion of what the internet costs. Where does the rest come from? That depends on which part of the net we’re discussing. The basic structure is paid for by internet service providers (ISPs), who recoup part of the costs from your monthly fee, part from the much larger fees paid by big users, and part by advertising. Content providers use some mix of advertising, pay-to-play service fees, sales of goods and services, packaging and selling your personal data to advertisers and government agencies, and new money from investors and loans to meet their costs. The ISPs routinely make a modest profit on the deal, but many of the content providers do not. Amazon may be the biggest retailer on the planet, for example, and its cash flow has soared in recent years, but its expenses have risen just as fast, and it rarely makes a profit. Many other content provider firms, including fish as big as Twitter, rack up big losses year after year.

How do they stay in business? A combination of vast amounts of investment money and ultracheap debt. That’s very common in the early decades of a new industry, though it’s been made a good deal easier by the Fed’s policy of next-to-zero interest rates. Investors who dream of buying stock in the next Microsoft provide venture capital for internet startups, banks provide lines of credit for existing firms, the stock and bond markets snap up paper of various kinds churned out by internet businesses, and all that money goes to pay the bills. It’s a reasonable gamble for the investors; they know perfectly well that a great many of the firms they’re funding will go belly up within a few years, but the few that don’t will either be bought up at inflated prices by one of the big dogs of the online world, or will figure out how to make money and then become big dogs themselves.

Are you preparing for the future?



One big factor of maintenance cost of the internet that Mr. Greer often highlights is regarding the disk drives. If you have not worked on the hardware side of a data center, you may appreciate the real capital costs of running the servers 24×7 from the following post.

On Disk Failure

And we never failed to fail
It was the easiest thing to do

– Stephen Stills, Rick and Michael Curtis; “Southern Cross” (1981)

With Brian Beach’s article on disk drive failure continuing to stir up popular press and criticism, I’d like to discuss a much-overlooked facet of disk drive failure. Namely, the failure itself. Ignoring for a moment whether Beach’s analysis is any good or the sample populations even meaningful, the real highlight for me from the above back-and-forth was this comment from Brian Wilson, CTO of BackBlaze, in response to a comment on Mr. Beach’s article:

Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary – maybe $5,000 total? The 30,000 drives costs you $4 million.

The $5k/$4million means the Hitachis are worth 1/10th of 1 percent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).

Moral of the story: design for failure and buy the cheapest components you can. :-)

He later went on to disclaim in a followup comment, after being rightly taken to task by other commenters for, among other things, ignoring the effect of higher component failure rates on overall durability, that his observation applies only to his company’s application. That rang hollow for me. Here’s why.

The two modern papers on disk reliability that are more or less required reading for anyone in the field are the CMU paper by Schroeder and Gibson and the Google paper by Pinheiro, Weber, and Barroso. Both repeatedly emphasise the difficulty of assessing failure and the countless ways that devices can fail. Both settle on the same metric for failure: if the operator decided to replace the disk, it failed. If you’re looking for a stark damnation of a technology stack, you won’t find a much better example than that: the only really meaningful way we have to assess failure is the decision a human made after reviewing the data (often a polite way of saying “groggily reading a pager at 3am” or “receiving a call from a furious customer”). Everyone who has actually worked for any length of time for a manufacturer or large-scale consumer of disk-based storage systems knows all of this; it may not make for polite cocktail party conversation, but it’s no secret. And that, much more than any methodological issues with Mr. Beach’s work, casts doubt on Mr. Wilson’s approach. Even ignoring for a moment the overall reduction in durability that unreliable components creates in a system, some but not all of which can be mitigated by increasing the spare and parity device counts at increased cost, the assertion that the cost of dealing with a disk drive failure that does not induce permanent data loss is the cost of 15 minutes of one employee’s time is indefensible. True, it may take only 15 minutes for a data centre technician armed with a box of new disk drives and a list of locations of known-faulty components to wander the data centre verifying that each has its fault LED helpfully lit, replacing each one, and moving on to the next, but that’s hardly the whole story.

“That’s just what I wanted you to think, with your soft, human brain!”
Given the failure metric we’ve chosen out of necessity, it seems like we need to account for quite a bit of additional cost. After all, someone had to assemble the list of faulty devices and their exact locations (or cause their fault indicators to be activated, or both). Replacements had to be ordered, change control meetings held, inventories updated, and all the other bureaucratic accoutrements made up and filed in triplicate. The largest operators and their supply chain partners have usually automated some portion of this, but that’s really beside the point: however it’s done, it costs money that’s not accounted for in the delightfully naive “15-minute model” of data centre operations. Last, but certainly not least, we need to consider the implied cost of risk. But the most interesting part of the problem, at least to me personally, is the first step in the process: identifying failure. Just what does that entail, and what does it cost?

The State of Software in Evolutionary Biology


With Next Generation Sequencing Data (NGS) coming off age and being routinely used, evolutionary biology is transforming into a data-driven science. As a consequence, researchers have to rely on a growing number of increasingly complex software. All widely used tools in our field have grown considerably, in terms of the number of features as well as lines of code. In addition, analysis pipelines now include substantially more components than 5-10 years ago. A topic that has received little attention in this context is the code quality of widely used codes. Unfortunately, the majority of users tend to blindly trust software and the results it produces. To this end, we assessed the code quality of 15 highly cited tools (e.g., MrBayes, MAFFT, SweepFinder etc.) from the broader area of evolutionary biology that are used in current data analysis pipelines. We also discuss widely unknown problems associated with floating point arithmetics for representing real numbers on computer systems. Since, the software quality of the tools we analyzed is rather mediocre, we provide a list of best practices for improving the quality of existing tools, but also list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal and science policy as well as funding issues that need to be addressed for improving software quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software quality issues and to emphasize the substantial lack of funding for scientific software development.

Does Gangolf Jobb’s Treefinder Program Have Any “Value”?

We read a blog post from Keith Robison describing his miserable day and thought – “Isn’t Treefinder solving his problem?” Quotes from Robison’s blog post are below –

Miserable day today – spent my entire day wrestling with bad formats and flaky tools and trying to bull my way past them, leading to many a mad expostulation. The whole day down in the pit, with the pendulum of multiple deadlines swinging just over my head.


Let’s go back to today’s nightmare. I’m in the middle of trying to generate some pretty phylogenetic trees marked up based on metadata for the sequences and with confidence information on the tree topology. Doing this well often involves a cycle of aligning the data, marking up the tree and then discovering some glitch in the input data or the metadata.

Since this is what a lot of folks do, there should be great tools out there, right? Perhaps lying around in plain sight? Perhaps, but that’s not my experience.

First, there’s a plethora of programs for each stage of the process. Multiple aligners for protein? Well, there’s Clustal Omega, MUSCLE, MAFFT and probably a few dozen more. Each offers a different array of possible alignment outputs. Then a wealth of tree generation programs, with again a raft of formats.

Phylogenetic formats: — I feel immured by them. There’s Newick, named after New Hampshire seafood restaurant (which is why it is sometimes called New Hampshire format). There’s an extended version of Newick. There’s Nexus format. Two different XML standards: PhyloXML and NeXML. The venerable PHYLIP format. And that’s the tip of the iceberg.

I’ve seen things you people would not believe. The first problem, beyond the sheer cacophony of different formats, is that different programs support different ones — and often badly. For example, the Mr.Bayes software for estimating tree confidence (for large trees, in geologic time, unless it crashes), will write in Nexus format — and then refuse to read its own output! Perl’s Bio::TreeIO happily generates XML files that many other programs won’t read, complaining about tags that don’t belong — somebody is just plain wrong here! Ditto the various tree viewers / editors that refused to consume the XML generated by upstream programs. And at least one of these packages insists that everything after the angle bracket in a FASTA file is part of the unique identifier, which it then has the temerity to complain contains spaces!

We bring up Treefinder here, because our reader Oli wrote an interesting comment in response to Gangolf Jobb’s interview –

Its nice when people are able to do what Gangolf has dine. It proves that we live in a demoracy, which is ironic, considering why he’s doing it!

In an alternative world, Gangolf could be rich, if only he had grasped the concept of capitalism. He has created software which has a very narrow range of users, but those people must use it as part of their work. Using the same licencing concept that has made him a bad person in the eyes of academics, he could have charged a fee for every document that included records created with his software. In fact, he could have given them the software for free, and only charged when it was published. That fee could have been $100 per report, and the report funders would have paid it!

He could have been very rich by today, but was too stupid and blinkered to realise that the world is a complex flow of a billion different actions and variables, and nobody actually controls it, or influences it or anything as simplistic.

There are people who believe they can make things happen. When enough other people believe in them, things happen.

To which Jobb replied

Oli, democracy means that a state does what people want, not that a state ignores what people want but at least does not punish the critics. We definitely do not have genuine democracy in Germany, nor in the EU, nor in any western country. Except, maybe, in Switzerland.

Let us calculate how stupid it really was not to charge for Treefinder: 100 Euros (I am in Europe) per published report, multiplied by some 1000 publications so far, is some 100000 Euros. Divided by more than a decade of hard work is per year much less than social wellfare … telling from the way you calculate you cannot be rich yourself, so better do not advise others how to become rich. The professors who refused to pay me decently for my work earn 100000 Euros in one and a half years. The immigrant programmers they hired in maybe two or three years.

The reason why scientific work is usually funded by tax money is that one normally cannot sell the results well.

Based on Keith Robison’s post, clearly there is a need for user-friendly phylogeny program. Is treefinder a commercially viable solution, as Oli suggests? Are you willing to pay a hard-working independent bioinformatician developing programs useful for the community, or will you only choose grant-funded free programs for your work?

Also, speaking of software related to evolutionary biology, the readers may find this new paper (“The State of Software in Evolutionary Biology”) informative.

Solute Carrier Proteins and Their Therapeutic Implication

These days, when we analyze transcriptome data sets from vetebrates, one or other solute carrier gene often comes to the top. If have the same experience and spend a lot of time searching through various databases to find what they do, the following two reviews will come handy. The first author of both papers is Matthias A. Hediger, who is currently at Institute of Biochemistry and Molecular Medicine, University of Bern, Switzerland.

One part of the paper you may find very handy is the large table (Table 1) describing solute carrier nomenclature. Solute carrier proteins transport materials inside the cell, and each SLC represents protein family carrying one type of material. For example, SLC48 is the heme transporter and SLC14 family transports urea.

Enjoy !

The ABCs of membrane transporters in health and disease (SLC series): Introduction

The field of transport biology has steadily grown over the past decade and is now recognized as playing an important role in manifestation and treatment of disease. The SLC (solute carrier) gene series has grown to now include 52 families and 395 transporter genes in the human genome. A list of these genes can be found at the HUGO Gene Nomenclature Committee (HGNC) website (see This special issue features mini-reviews for each of these SLC families written by the experts in each field. The existing online resource for solute carriers, the Bioparadigms SLC Tables (, has been updated and significantly extended with additional information and cross-links to other relevant databases, and the nomenclature used in this database has been validated and approved by the HGNC. In addition, the Bioparadigms SLC Tables functionality has been improved to allow easier access by the scientific community. This introduction includes: an overview of all known SLC and ‘‘non-SLC’’ transporter genes; a list of transporters of water soluble vitamins; a summary of recent progress in the structure determination of transporters (including GLUT1/SLC2A1); roles of transporters in human diseases and roles in drug approval and pharmaceutical perspectives.

The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteins

The Human Genome Organisation (HUGO) Nomenclature Committee Database provides a list of transporter families of the solute carrier (SLC) gene series (see Currently, it includes 43 families and 298 transporter genes. This special issue features mini-reviews on each of these SLC families written by the experts in each field. A WEB site has been established ( that gives the latest updates for the SLC families and their members as well as relevant links to gene databases and reviews in the literature. A list of all currently known SLC families, a discussion of additional SLC families and family members as well as a brief summary of non-SLC transporter genes is included in this introduction.

Web Analytics