Commercial Value of Efficient Metagenomics Pipeline

In his blog, Titus Brown asked for ideas to make his open- source algorithm discovery project more exciting. Here is one.

Even though it appears that assembling metagenomes from ocean water, and soil samples has little commercial value, that is not correct according to a Boston-based startup. Ideally we should not call it a startup, because Sanofi already made commitment to buy the company provided they meet their goals.

Warp Drive is being launched with $125 million in funding from Third Rock and French pharmaceutical giant Sanofi (NYSE: SNY). Greylock Partners also participated in the financing. Warp Drive was co-founded by Greg Verdine, a Harvard University chemical biologist and venture partner at Third Rock, along with Harvard University genomics expert George Church, and biolochemist James Wells of the University of California at San Francisco.

….

Warp Drive refers to its core platform as a genomic search engine. The companys ultimate goal is to develop the technology to the point where it will be able to comb through naturally derived substancessuch as plants and soiland sequence the genomes of the microbes hidden in them. Ideally, says Borisy, the platform will be able to use that information to help scientists uncover new molecules that have the highest probability of hitting disease targets.

Nature has been one of the drug industrys richest sources of pharmaceutical success stories. The menu of products that originated in the wild include diabetes drug exenatide (Byetta), derived from Gila monster saliva; heart failure treatment digoxin, which comes from the foxglove plant; and ziconotide (Prialt), a pain treatment from the cone snail. Nature is an incredibly medicinal chemist, Borisy says. Nature can drug targets in ways weve never been able to figure out how to do.

Metagenome assembly is a big part of their pipeline, according to the job description posted by them. Do not laugh at ‘minimum requirement’. It is likely written by a manager with little experience in bioinformatics, who threw in as many buzzwords as he could. Maybe Keith Robison of Omics!Omics! blog, who is an employee and author of Kevin’s GATTACA blog, who makes fun of such ridiculous job requirements, should chat.

Minimum Requirements:

Master’s Degree and 10+ years’ work experience or a Ph.D. and 6 + years relevant work

experience in the biological sciences, computer science or computational science

Experience with de novo assembly of microbial or metagenomic genomes from short

read or single molecule data using tools such as Ray, MIRA, Velvet, Celera, ALLPATHS

Experience using assembly refinement and scaffolding tools such as AMOS, SSPACE

Experience developing and maintaining tools in Perl utilizing BioPerl and other open

source frameworks. Python/BioPython will also be considered. Experience in

programming in R a plus.

Experience with short read data manipulation, analysis & trimming tools, e.g. samtools,

BWA, FASTX, FLASH, Quake, KHMER, SMALT

Experience developing SQL databases and executing complex queries (joins across many

tables, recursive joins) on such. Experience with NoSQL databases a plus.

Experience working with Linux clusters, particularly the configuring and execution of

jobs under Sun/Oracle Grid Engine.

Experience developing interactive web tools using HTML5.

General sequence analysis background with extensive use of BLAST, HMMER,

CLUSTAL, PHYLIP and similar tools.

Excellent communication skills, including the production of clear scientific updates using

PowerPoint and good scientific visualization skills

Able to work collaboratively in multi-disciplinary environment to accomplish program

and company goals.

How do you go from metagenome assembly to drug discovery? It is by looking for gene clusters having certain signatures. We do not have time to explain, but you may start with the following two papers (among many) -

Automated genome mining for natural products

The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

‹»Skip Lists and Other Efficient Data Structures« »A comparison of Methods for Differential Expression Analysis of RNA-seq Data«›