Commercial Value of Efficient Metagenomics Pipeline

If you find our commentaries useful, please feel free to press the donate button on the right sidebar, and your generous contribution will be acknowledged in the table at the bottom of the page.

You can follow us on twitter – @homolog_us.


In his blog, Titus Brown asked for ideas to make his open-source algorithm discovery project more exciting. Here is one.

Even though it appears that assembling metagenomes from ocean water, and soil samples has little commercial value, that is not correct according to a Boston-based startup. Ideally we should not call it a startup, because Sanofi already made commitment to buy the company provided they meet their goals.

Warp Drive is being launched with $125 million in funding from Third Rock and French pharmaceutical giant Sanofi (NYSE: SNY). Greylock Partners also participated in the financing. Warp Drive was co-founded by Greg Verdine, a Harvard University chemical biologist and venture partner at Third Rock, along with Harvard University genomics expert George Church, and biolochemist James Wells of the University of California at San Francisco.


Warp Drive refers to its core platform as a “genomic search engine.” The company’s ultimate goal is to develop the technology to the point where it will be able to comb through naturally derived substances—such as plants and soil—and sequence the genomes of the microbes hidden in them. Ideally, says Borisy, the platform will be able to use that information to help scientists uncover new molecules that have the highest probability of hitting disease targets.

Nature has been one of the drug industry’s richest sources of pharmaceutical success stories. The menu of products that originated in the wild include diabetes drug exenatide (Byetta), derived from Gila monster saliva; heart failure treatment digoxin, which comes from the foxglove plant; and ziconotide (Prialt), a pain treatment from the cone snail. “Nature is an incredibly medicinal chemist,” Borisy says. “Nature can drug targets in ways we’ve never been able to figure out how to do.”

Metagenome assembly is a big part of their pipeline, according to the job description posted by them. Do not laugh at ‘minimum requirement’. It is likely written by a manager with little experience in bioinformatics, who threw in as many buzzwords as he could. Maybe Keith Robison of Omics!Omics! blog, who is an employee and author of Kevin’s GATTACA blog, who makes fun of such ridiculous job requirements, should chat.

Minimum Requirements:

Master’s Degree and 10+ years’ work experience or a Ph.D. and 6 + years relevant work
experience in the biological sciences, computer science or computational science

Experience with de novo assembly of microbial or metagenomic genomes from short
read or single molecule data using tools such as Ray, MIRA, Velvet, Celera, ALLPATHS
Experience using assembly refinement and scaffolding tools such as AMOS, SSPACE

Experience developing and maintaining tools in Perl utilizing BioPerl and other open
source frameworks. Python/BioPython will also be considered. Experience in
programming in R a plus.
Experience with short read data manipulation, analysis & trimming tools, e.g. samtools,

Experience developing SQL databases and executing complex queries (joins across many
tables, recursive joins) on such. Experience with NoSQL databases a plus.

Experience working with Linux clusters, particularly the configuring and execution of
jobs under Sun/Oracle Grid Engine.

Experience developing interactive web tools using HTML5.

General sequence analysis background with extensive use of BLAST, HMMER,
CLUSTAL, PHYLIP and similar tools.

Excellent communication skills, including the production of clear scientific updates using
PowerPoint and good scientific visualization skills

Able to work collaboratively in multi-disciplinary environment to accomplish program
and company goals.

How do you go from metagenome assembly to drug discovery? It is by looking for gene clusters having certain signatures. We do not have time to explain, but you may start with the following two papers (among many) –

Automated genome mining for natural products

The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

Heroes and Heroines of New Media--2015

Our blog is deeply honored by the generous contribution of the following readers. Without their patronage, this site would go away.

Outstandingly Generous:   
Amemiya C. Schnable J. ... Osipowski P.
Shen M. Furness M. Graur D. Diesh C.
Amemiya C.      

We are also looking for subscribers to get help to finish the tutorials. Please see this post for details.

4 comments to Commercial Value of Efficient Metagenomics Pipeline

  • Titus Brown

    If someone wants to give me a $200k gift or a grant, I will solve this assembly problem for them ;).

  • admin

    Seems like they are spending $125 million to solve what you are doing. You need to raise your price-tag to get attention :)

  • This topic is interesting. It’s not an assembly problem at all (well now in 2015 at least…). Nowadays it is relatively easy to assemble pretty much anything, you only need to have access to a compute cluster having between 2 and 10 TB or RAM (which is not uncommon now). The idea is to have an efficient and reliable pipeline that will routinely crunch whatever you feed it with, may it be soil, water, human samples and rapidly give you clear answers: Biosynthetic cluster x was over expressed in samples a, b and c vs samples d,e and f… and here is the complete sequence of that cluster and with it it’s functional domains and other annotations. I’m not into intellectual property too much, but it seems to me the main problem is that pretty much all bioinformatics pipeline out there are actually just wrappers that execute in an appropriate order open source packages. So they are highly useful but don’t really have any commercial value per se because unpatentable (treat them as an industrial secret instead?). They have high value in the sense that you actually need them to analyze your sequencing data. Sequencing without a pipeline does not worth anything. One of the few example of a relatively successful commercial model for a bioinformatics package (not pipeline…) is USEARCH for which you have to pay for a license for the 64 bit version.

  • samanta

    Thanks Julien. The MUSCLE program of Edgar was innovative and is widely used. So, clearly he was way ahead of the group in terms of algorithm development to make USEARCH commercially successful. Another bioinformatics effort that had been commercially successful after starting academic is TRANSFAC/BIOBASE (

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Web Analytics