Assembling fish genomes is a complex task due to the presence of excessive amount of repeats and polymorphism in the sequences. Lex Nederbragt from Norwegian Sequencing Center has been working on two large fish genomes – those from Atlantic cod and Atlantic salmon. Readers may enjoy the slides he shared with us for one of his recent talks.
Lex’s slides are helpful for three reasons –
(i) They do not make any assumption about what the reader knows on sequencing, and starts from very basic levels. So, even those readers, who are barely dabbling into sequencing projects, can gain from his insights.
(ii) Lex is very knowledgeable about many different sequencing platforms, and recommends choosing technology paths based on the problem at hand. Many bioinformaticians work in a mode, where someone else (non-bioinformatician) decides to sequence a number of libraries from a genome or transcriptome, and then hands over the files to the bioinformatician and asks him to show his magic. It would be far more productive and efficient, if the planning of sequencing project takes input from the bioinformaticians on which choices could improve their analysis. As an example, Lex showed how error-corrected PacBio reads could be utilized to improve the long-range quality of the assembly.
(iii) Lex was an ‘early adopter’ of PacBio reads in his assembly projects, and has shown successfully how to improve the quality of a de novo assembly by using those ‘noisy beasts’. When we ask others about PacBio, the opinions we hear are – (a) “we do not plan to use them, because they are too noisy”, (b) “we tried incorporating for a project, but did not go anywhere. In the meanwhile, we got too much unprocessed Illumina data to take care of.” Even the published success stories are mostly about bacterial genomes, or finishing previously assembled large genomes with plenty of Sanger data (see Baylor paper here). Those starting large genome projects on previously unexplored genomes can learn a lot from Lex.