It is time to find the best bioinformatics contributions of 2013 just like we did in 2012 (Top Bioinformatics Contributions of 2012). The original idea came to us after noticing that the yearly reviews in Science and Nature celebrated the large experimental projects, whereas bioinformatics tools like BLAST, BWA or SOAPdenovo rarely got mentioned despite their immense contribution to biology. More importantly, papers discussing elegant computational algorithms got recognized years after their publication (Pevzner’s dBG, Myers’ string graph) or never got recognized (Ross Lippert’s 2005 papers on using Burrows Wheeler Transform in genomics). So, we wanted to give recognition to the major computation discoveries in biology and try to bring attention to under-appreciated contributions with potential long-term benefit.
For this year’s effort, we assembled an outstanding panel of judges.
Continue reading Top Bioinformatics Contributions of 2013
(source of figure: AFP/File, Cesar Manso)
Another day, another picture of people digging under cave, another ‘incredible breakthrough’, another sex story !! After Denisovans and extremely ancient African (discussed in Denisovans, Extremely Ancient Africans – the Role Cheap Sequencing Plays in Rewriting Human History) comes today’s ‘baffling finding’ that pushes the record by four times.
Using a thigh bone from the cave, Matthias Meyer from the Max Planck Institute for Evolutionary Anthropology has sequenced the almost complete mitochondrial genome of one of Sima de los Huesos’ inhabitants, who likely lived around 400,000 years ago. That is at least four times older than the previous record-holder—a small 100,000-year-old stretch of Neanderthal DNA.
Discovery channel pushes the sex angle. Sex sells, even in next-gen sequencing.
Ancient Humans Had Sex with Mystery Relatives
They even go to the extent of concocting an amorous image.
Sci-news reports that the ‘genome is sequenced’, even though it is not.
Sima de los Huesos: Scientists Sequence Genome of Enigmatic Hominin
Truth: Nuclear genome is very unlikely to get sequenced according to the authors. Even the mitochondrial sequence was full of contamination.
Nature sells you the paper for $32 (or $199, if you are wealthy).
A Mitochondrial Genome Sequence of a Hominin from Sima de los Huesos
Excavations of a complex of caves in the Sierra de Atapuerca in northern Spain have unearthed hominin fossils that range in age from the early Pleistocene to the Holocene1. One of these sites, the ‘Sima de los Huesos’ (‘pit of bones’), has yielded the world’s largest assemblage of Middle Pleistocene hominin fossils2, 3, consisting of at least 28 individuals4 dated to over 300,000 years ago5. The skeletal remains share a number of morphological features with fossils classified as Homo heidelbergensis and also display distinct Neanderthal-derived traits6, 7, 8. Here we determine an almost complete mitochondrial genome sequence of a hominin from Sima de los Huesos and show that it is closely related to the lineage leading to mitochondrial genomes of Denisovans9, 10, an eastern Eurasian sister group to Neanderthals. Our results pave the way for DNA research on hominins from the Middle Pleistocene.
Only Dan Graur (@dangraur) gives you what matters -
i) the full paper
ii) Healthy dose of skepticism.
Mike White (@genologos) is a very creative person. We already covered his paper to refute ENCODE claim by showing that random DNA sequence mimicked similar binding behavior, or rather ENCODE experiments did not have a proper control. Apart from science, he expressed his creativity by building the beautiful ‘The Finch and Pea’ blog, and also writes informative columns at the Pacific Standard site.
If we had a billion dollars to spare (say from building and selling houses made on bomb-testing sites), we would definitely make him the head of a research center. Unfortunately we do not. The only other option is to give him a shoutout so that many people join together to pay for his research. That is where we face a problem under the existing model of centralized funding.
Continue reading The Tragedy of Centrally Funded Research
Both Rayan Chikhi (an author of Minia assembler) and Anton Korobeynikov (an author of SPAdes assembler) commented in our earlier post on a recent PLOS One paper that benchmarked a number of low-memory genome assembly programs. The comments are quite informative for those not familiar with the intricate details of such programs, and therefore we decided to present them in a new commentary.
The results are definitely misleading. People should stop comparing to Velvet as single “gold reference” assembler. GAGE-B clearly shown that state-of-the-art assemblers can easily beat Velvet by 20x in terms of N50. E.g. for R. sphaeroides dataset from GAGE-B Velvet achieved N50 of 24kb (3 kbp in the paper with plenty of misassemblies), MaSuRCA achieved 130 kbp and both Ray and SPAdes were able to produce contigs of N50 with more than 500 kbp (Ray was not included into GAGE-B, this is our internal run with parameter tuning like in GAGE-B).
So… really, the results of the paper need to be redone using both recent data and recent assemblers.
There is the usual time / memory / quality tradeoff. However, for me, the paper looks like the indirect way propagating two their own approaches (DiMA, ZeMA, etc.), rather than compare the assemblers properly.
For example, the authors claim that “… Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods”
This is not true. The quality of of the results (look for N50 and misassemblies in Tables 3-5) is below the level of acceptance these days. They effectively reported the results 10 times lower than in GAGE study and ignored all the results from GAGE in order to deduce the conclusions. Note also the differences in the methodologies – GAGE tried to derive best assemblers selecting the best parameters. The authors simply fixed k-mer length to 31 and that’s it.
Combing the authors’ provided tables and GAGE tables, we can instead easily say “… Our experiments prove that it is NOT possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. In order to achieve the reasonable quality the proper assembly methods requiring more memory are necessary”.
So, the conclusions of the paper would be the opposite! However, judging from how they worked with GAGE results and methodologies, I won’t trust the authors anymore, instead I demand all the work to be thoroughly redone, and the true data to be put in the tables.
Perhaps, we can ask Rayan to run Minia on GAGE / GAGE-B data and report the results? This way we at least will be sure that Minia results are of GAGE-level quality.
Rayan replied -
All right Anton, good idea! Here’s my informal Minia assembly of the GAGE chromosome 14 dataset.
Minia version: 1.5938
Reads taken: all the raw reads, i.e. fragments and short jump and long jump.
Parameters k=53 and min_abundance=5 were given by Kmergenie (command line: ./kmergenie list_reads).
Command line: minia list_reads 53 5 100000000 chr14_k53_m5
Peak mem: 147 MB
Time: 73 mins
Excerpt from Quast output:
# contigs (>= 0 bp) 82433
Total length (>= 0 bp) 87199867
Largest contig 27533
# misassemblies 18
# local misassemblies 17
All those metrics look much better than in Table 5..
Here is what differs from their assembly: 1) parameters are optimized 2) that new Minia version yields better genome coverage than older ones. 3) I used all the GAGE reads, they used only the fragments.
A few more notes on the paper in general:
1) Many of the low-memory assemblers tested in the PLOS One paper do not include a scaffolder nor a gap-filler. Those tested in GAGE and GAGE-B at least have a scaffolder. Thus it is not surprising that the GAGE/GAGE-B contiguity stats look much better.
2) GAGE used the best possible error-corrected read dataset for each organism. The Plos One paper apparently used the raw data, which in my experience gives worse assemblies.
3) GAGE also picked reasonable k-mer sizes for each dataset separately.
In summary, the benchmarks in this paper are fair, in the sense that they ran all the programs in similar conditions. But it’s too bad that those conditions led to assemblies that were either very poor (e.g. for chr14), or that looked poor in comparison to GAGE, because of the lack of tuning.
By the way, Anton, not sure if the following statement is an accurate comparison:
“E.g. for R. sphaeroides dataset from GAGE-B Velvet achieved N50 of 24kb (3 kbp in the paper with plenty of misassemblies), MaSuRCA achieved 130 kbp and both Ray and SPAdes were able to produce contigs of N50 with more than 500 kbp”
That 24 kbp N50 figure for Velvet/GAGE-B is for the MiSeq data, it drops to 13 kbp with the HiSeq data. But, is the R. sphaeroides Hiseq dataset from GAGE-B the same as from GAGE? In my experience, the raw GAGE dataset is quite challenging to assemble. Also for the MiSeq dataset, Spades N50 is at 118 kbp in the GAGE-B paper. A newer version can do 500 kbp?
When reading such exchanges, readers should not assume that one person is right and therefore the other has to be wrong. A genome assembly problem is multi-faceted and both of them can be right in covering different aspects. Given that the exchange is based on GAGE vs PLOS One comparison, the same can be said about various benchmarking papers. There are so many variables in comparing two assemblers that asking which assembler is the best is too simplistic, unless one qualifies it with machine type, RAM size, library type and a number of other parameters. On the other hand, it is often impossible to test for all those parameters and still come up with a useful metric. Especially the ‘machine type’ variable is the hardest to quantify, because it includes disk speed and RAM size, which can change the speed of execution quite a bit. Throw in different read sizes and different types of libraries (PacBio?) and you are in a complete mess.
The exchange should rather be taken as the types of things a bioinformatician should keep in mind in working on his problem. Often conventional wisdom fails, as we describe below.
a) Hardware Limitations:
A few days back, we were trying to benchmark DSK, a k-mer counting program written by Rayan. We ran it with default options and the program finished in 3 hours. The machine happened to have 32 cores and fairly big RAM. So, we thought of running it with multi-threading option with lot of RAM to finish the run in fraction of that time. Lo and behold, the program continued to run for 8 hours with no result. What was going on?
With more resources, the processor cores, RAM and disk started to compete with each other to move data from here to there, and spent less time on actual computation (which is minimal for k-mer counting). So, essentially we had Craig Barrett’s breakfast problem described earlier. Too many cooks were opening and closing the fridge, and very few actually got time to fry omelettes.
We write RAM separately, because in many servers it is the biggest limiting factor. Amazon charges you an arm and a leg, if you want to rent a computer with high RAM. What can you do? You can design your data structure more efficiently, but beyond that the only solutions are – (i) crashing, (ii) shifting data between RAM and hard-drive during assembly. The speed of the second solution can be increased quite a bit by replacing the hard disk with SSD, and SSD storage does not cost that much these days. After all, you can design a server with one smaller SSD storage for computation and another slower disk for permanent storage. That is small increase in cost with huge added benefit.
Many bioinformatics programs are written to use multiple cores, but their scale up is not always linear as we explained earlier. How do you figure out what is going on? You may have to think about what the program is doing and break it down into smaller problems to identify the bottleneck.
d) Cache Size – Problem with Small Pieces:
A few days back, we were trying to benchmark BWA-MEM. To understand the performance of its various steps, we decided to replace the human genome with a tiny genome. At that point, everything started to behave abnormally and we got results much faster than we expected. What happened?
In our usual model of computing, a processor gets its data from the RAM. However, when the genome is too small, the processor fitted the entire memory block into its cache memory (memory within the chip) and did not need to access RAM any more. Very few bioinformatics programs use this aspect of processing, even though considerable speed gain can be achieved from using the cache effectively.
Those are just four aspects in producing machine-to-machine difference in performance of bioinformatics programs and we have not gone into algorithmic difference, library differences or contig assembly vs scafold assembly.
Conventional wisdom says larger sample size will make experiments more accurate, but that does not work in many situations. More samples can result in more errors, as explained by the following example.
Ask 1 million monkeys (~2^20) to predict the direction of the stock market for the year. At the end of the year, about 500K monkeys will be right and about 500K monkeys will be wrong. Remove the second group and redo the experiment for the second year. At the end of the year, you will be left with 250K monkeys, who correctly called the market for two years in a row. If you keep doing the same experiment for 20 years, you will be left with one monkey, who predicted the stock market correctly for 20 years in a row. Wow !
The above paragraph is taken from a youtube talk of best-selling author Rolf Dobeli, whose best-selling book appears to be a plagiarized version of N. N. Taleb’s writing. Taleb goes deeper into explaining the implication of the monkey experiment. Suppose you increase the sample size by asking 30x more monkeys about the stock market. At the end of twenty years, you will have 30 ‘smart’ monkeys instead of one. Now you have a large enough set to even search for that intelligence gene, which helps monkeys call the stock market correctly for 20 years in a row. A bigger wow !!!
Summarazing in Taleb’s words -
The winner-take-all effects in information space corresponds to more noise, less signal. In other words, the spurious dominates.
Information is convex to noise. The paradox is that increase in sample size magnifies the role of noise (or luck); it makes tail values even more extreme. There are some problems associated with big data and the increase of variable available for epidemiological and other “empirical” research.
You can read the rest of his chapter with all mathematical details here.
In another commentary along similar line, Lior Pachter wrote -
23andme Genotypes are all Wrong
The commentary is quite informative, but we will pick up one part that describes the ‘winner-take-all’ impact (or ‘loser-take-all’ in case of sickness) on users.
But the way people use 23andme is not to look at a single SNP of interest, but rather to scan the results from all SNPs to find out whether there is some genetic variant with large (negative) effect.
Whereas a comprehensive exam at a doctor’s office might currently constitute a handful of tests– a dozen or a few dozen at most– a 23andme test assessing thousands of SNPs and hundreds of diseases/traits constitutes more diagnostic tests on an individual at one time than have previously been performed in a lifetime.
In plain English, suppose you walk into a doctor’s office and ask for your brain, heart, lungs, kidney, teeth, tongue, eye, nose and one hundred other body parts to be tested. Doctor comes back to you and reports that 107 out of 108 tests were within limit, but your kidney test reported some problems. The ‘winner-take-all’ impact will make you remember only the result that reported problem, even though more tests the doctor conducts, more likely he is to find a problem by random chance. Next you will go through more invasive tests of your kidney and maybe some hospital stay, making your body vulnerable in some other ways. Paraphrasing Taleb, the only way to legally murder a person is to assign him a personal doctor, who will keep monitoring (‘testing’) his health 24×7.
Presenting this well-known problem of multiple testing did not win Lior Pachter many friends. He was immediately called a ‘troll’ and other names by those with vested interests.
Calling people trolls, when they present different scientific argument, has become a new fashion. We have been through similar experiences, when we wrote a set of commentaries questioning the effectiveness of genome-wide association studies.
Battle over #GWAS: Ken Weiss Edition
Study History and Read Papers Written by ‘Dinosaurs’ (#GWAS)
Genome Wide Association Studies (#GWAS) – Are They Replicable?
Mick Watson immediately called us trolls and, and both he and Daniel MacArthur immediately blocked our twitter accounts from following them. Readers should note that it is one extra step of censoring, as explained below.
For those unfamiliar with Twitter, it is designed in such a way that you do not read things you are not interested to read. For example, we do not read what Kim Kardashian is doing every day by simply choosing not to follow her channel. So, why do these two gentlemen (‘open science advocates’) take the extra step of blocking us to follow them? It is done to make sure that our comments do not reach their audience, or is a form of twitter censoring. We wonder what they have to fear.
On the plus side, the above exchange got us familiar with the blog of Ken Weiss and co-authors (@ecodevoevo on twitter), which is very thoughtfully written and has become our daily read. Readers may enjoy their today’s commentary on big data in medicine.
The ‘Oz’ of medicine: look behind the curtain or caveat emptor!
They highlight six problems with the ‘big data’ approach. The following list is only an abbreviated version of their very detailed commentary.
Problem 1: Risks are estimated retrospectively–from the past experience of sampled individuals, whether in a properly focused study or in a Big Data extravaganza. But risks are only useful prospectively: that is, about what will happen to you in your future, not about what already happened to somebody else (which, of course, we already know).
Problem 2: We are usually not actually applying any serious form of ‘theory’ to the model or to the results. We are just searching for non-random associations (correlations) that may be just chance, may be due to the measured factor, or may be due to some other confounding but unmeasured factors.
Problem 3: Statistical analysis is based on probability concepts, which in turn are (a) based on ideas of repeatability, like coin flipping, and (b) that the probabilities can be accurately estimated. But people, not to mention their environments, are not replicable entities (not even ‘identical’ twins).
Problem 4: Competing causes inevitably gum up the works. Your risk of a heart attack depends on your risk of completely unrelated causes, like car crashes, drug overdoses, gun violence, cancer or diabetes, etc.
Problem 5: Theory in physics is in many ways the historic precedent on which we base our thinking……But life is not replicable in that way.
Problem 6: Big Data is proposed as the appropriate approach, not a focused hypothesis test. Big Data are uncritical data–by policy! This raises all sorts of issues such as nature of sample and accuracy of measurements (of genotypes and of phenotypes).
Oh, this is hilarious !!
If you do not have time to watch the entire video, chemjobber blog covers the juiciest part.
Not everyday when science Nobelists get to take shots at the Economics prizewinners — it happened this week at the Swedish embassy compound at the gathering of this year’s Nobelists:
Then Martin Karplus, a Harvard University chemist, interjected, “ “What understanding of the stock market do you really have?”
Economics – “if one wants to call it a science” – seemed unable to explain the oscillations of the market, he said.
“I see these fluctuations and they make zero sense to me,” Professor Karplus declared. “Maybe they make sense to you.”
Professor Fama dismissed the question as unsophisticated, declaring its premise “factually incorrect.”
The hard scientists, more amused than chastened, turned to mocking the economists.
“You’re asking about a very fundamental question, on what the nature of life is,” James Rothman, a professor of cell biology at Yale University and one of the three newly minted laureates in medicine, told one questioner. “I don’t think there’s anyone here — even the economists – who would have an opinion on that for sure.”
We seem to have missed a nice PLOS One paper. (h/t: @rayanchiki) Authors compared many of the new approaches (e.g. Minia, SGA), which are not yet fully incorporated in mainstream programs, but can significantly improve performance. We wish Heng Li’s Fermi were included.
Two take home messages from the abstract:
1. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods.
2. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint.
Evaluation details from the paper -
In this study, we quantify the memory requirements of modern assemblers for a variety of datasets. We compare the prevalent memory-efficient techniques against a typical traditional approach (i.e., Velvet ). We compare the following programs: SparseAssembler , Gossamer , Minia  and SGA . All of them are open-source and representative of the recent assembly trends, namely: the efficient construction of large assembly graphs with less memory and the utilization of compressed data structures. Our performance evaluation follows the gold standard of genome assembly evaluation  and is applied to four well-studied datasets with diverse complexity and sizes, ranging from a few millions to hundreds of millions of reads. We performed the experiments on systems with 4 to 196 GB RAM, corresponding to a wide range of equipment, from laptops to desktops to large servers. We report the memory requirements for each program and provide directions to researchers for choosing a suitable execution environment for their assemblies. This is the first study that offers a practical comparison of memory-efficient assemblers with respect to the trade-offs between memory requirements, quality of assembly and execution time.
We also propose two new assembly strategies that combine existing memory-efficient approaches for each stage of the execution. The first strategy is Diginorm-MSP-Assembly (DiMA), which uses two pre-processing steps: Diginorm  for data cleaning followed by MSP , which distributes the data on disk partitions. The final assembly step allows for lightweight processing and any well-known assembler can be used. Our results show that DiMA is a general strategy for reducing the memory requirements of traditional assemblers. The combination of DiMA with the Velvet assembler results in better memory utilization than that by the original Velvet program and is capable of assembling the B. impatiens genome using about 20 GB RAM, whereas the original Velvet program would crash because of insufficient memory on a 192 GB server. The second strategy is Zero-memory assembly (ZeMA), which has a data cleaning preprocessing phase that uses Diginorm. Afterwards, ZeMA builds a sparse de Bruijn graph using SparseAssembler. The ZeMA pipeline executed on a conventional laptop successfully assembles the B. impatiens genome using only 3.2 GB of RAM.
What did they find?
Our results show that Diginorm-Velvet, SparseAssembler , Minia  and Diginorm  appear to be among the most useful methods under limited memory resources.
Regarding the ranking of the performance of the assemblers, we are compelled to say that the selection of the metrics and the ranking criteria were somewhat subjective and far from perfect. Thus, the ranking results that we report should be considered with caution. Based on the selected ranking procedure Diginorm-Velvet ranks first among the studied programs for two reasons: (i) Velvet is a very efficient assembler that produces high-quality results; (ii) when data size and complexity increase, Diginorm reduces the memory footprint without affecting the accuracy of the results. SparseAssembler ranks second. SparseAssembler has good trade-offs between accuracy, wrong assemblies, run-time and memory utilization. Minia ranks third in our comparison. The quality is slightly lower for smaller datasets and, surprisingly, the method is optimized for larger genomes like that of the bumblebee. Minia requires minimal memory and it can be used on conventional laptops and desktops. DiMA enhances the memory footprint of Velvet for larger datasets and ranks fourth. However, in-memory loading of a huge assembly graph remains a bottleneck and restricts the applicability of DiMA. ZeMA ranks fifth. The low quality that it achieves confirms our initial hypothesis that data cleaning and sparse creation of DBG lead to the loss of significant information. However under limited memory, the strategy is able to process large datasets and produce draft assemblies on a conventional laptop. SGA and Gossamer work only for the smaller datasets and the quality if the assemblies is lower compared to those of other programs.
We skipped over the discussions on cloud, because the number of variables were simply too many to make the comparison meaningful. Moreover, we do not buy retail and first-hand, when we get a server. So, the laptop and workstation quotes are not good comparisons. In any case, it is a fairly good paper with lot to digest.
NBC News reports:
The Food and Drug Administration has ordered DNA testing company 23andMe to stop marketing its over-the-counter genetic test, saying it’s being sold illegally to diagnose diseases, and with no proof it actually works.
The last six words got us curious. Isn’t FDA arguing about getting the agency shut down? Apart from countless Ben Goldacre videos we posted earlier on FDA-approved drugs (e.g. here), the irony of ‘FDA approval’ can be seen right there on the sidebar of NY Times article on 23andMe.
Let us not bring up the even bigger FDA screw-up, where their goons went and shut down dairies so that people cannot commit the ‘horrendous crime of drinking raw milk’ !!! Any agency that determines selling/drinking raw milk as a crime is either captured or run by incompetent people. Anyone, who grew up in third-world countries can certify to that.
FDA should be shut down, because there is no proof it actually works.
Oh well !! Mike Eisen thinks FDA is stupid, whereas a Forbes Editor finds 23andMe is stupid. It is hard to disagree with Eisen.
Human beings are very curious about our evolutionary history as humans. Apparently, that history is being rewritten over the last few years. Lately, every time we check the webpage of Nature News or Scientific American, we find report of another major breakthrough.
Such articles always contain picture of people working hard inside dark caves or underground, as you can see here and here.
What do we know about Denisovans so far? Taken from wiki and Nature News:
1. “Mystery humans spiced up ancients’ sex lives (Genome analysis suggests there was interbreeding between modern humans, Neanderthals, Denisovans and an unknown archaic population)” (Source: Nature News)
2. “The analysis indicated that modern humans, Neanderthals, and the Denisova hominin last shared a common ancestor around 1 million years ago.”
3. “The mtDNA analysis further suggested this new hominin species was the result of an early migration out of Africa, distinct from the later out-of-Africa migrations associated with Neanderthals and modern humans, but also distinct from the earlier African exodus of Homo erectus.”
4. “A detailed comparison of the Denisovan, Neanderthal, and human genomes has revealed evidence for a complex web of interbreeding among the lineages.”
5. “Through such interbreeding, 17% of the Denisova genome represents DNA from the local Neanderthal population, while evidence was also found of a contribution to the nuclear genome from an ancient hominin lineage yet to be identified, perhaps the source of the anomalously ancient mtDNA.”
6. “Melanesians may not be the only modern-day descendants of Denisovans. David Reich of Harvard University, in collaboration with Mark Stoneking of the Planck Institute team, found genetic evidence that Denisovan ancestry is shared by Melanesians, Australian Aborigines, and smaller scattered groups of people in Southeast Asia, such as the Mamanwa, a Negrito people in the Philippines. However, not all Negritos were found to possess Denisovan genes; Onge Andaman Islanders and Malaysian Jehai, for example, were found to have no significant Denisovan inheritance. These data place the interbreeding event in mainland Southeast Asia, and suggest that Denisovans once ranged widely over eastern Asia. Based on the modern distribution of Denisova DNA, Denisovans may have crossed the Wallace Line, with Wallacea serving as their last refugium.”
7. “Indeed, half of the HLA alleles of modern Eurasians represent archaic HLA haplotypes, and have been inferred to be of Denisovan or Neanderthal origin. The apparent over-representation of these alleles suggests a positive selective pressure for their retention in the human population.”
They constitute of an incredible amount of new findings.
In another supposedly major advance, Michael F. Hammer’s group reported about extremely ancient humans living in Africa.
An African American Paternal Lineage Adds an Extremely Ancient Root to the Human Y Chromosome Phylogenetic Tree
Reported elsewhere on the same paper -
“Our analysis indicates this lineage diverged from previously known Y chromosomes about 300,000 ago, a time when anatomically modern humans had not yet evolved,” said senior study author Prof Michael Hammer of the University of Arizona. “This pushes back the time the last common Y chromosome ancestor lived by almost 70 percent.”
“The most striking feature of this research is that a consumer genetic testing company identified a lineage that didn’t fit anywhere on the existing Y chromosome tree, even though the tree had been constructed based on perhaps a half-million individuals or more. Nobody expected to find anything like this.”
About 300,000 years ago falls around the time the Neanderthals are believed to have split from the ancestral human lineage. It was not until more than 100,000 years later that anatomically modern humans appear in the fossil record. They differ from the more archaic forms by a more lightly built skeleton, a smaller face tucked under a high forehead, the absence of a cranial ridge and smaller chins.
These are definitely the most exciting times to live in, or are they? Readers should keep in mind that the African study came from only one individual, whereas Denisovan findings are based on one toe, one finger bone and two teeth, nothing more. That is all that scientists have found from ancient Denisovans so far (h/t: @dangraur).
Denisova hominins /dəˈniːsəvə/, or Denisovans, are Paleolithic-era members of a species of Homo or subspecies of Homo sapiens. In March 2010, scientists announced the discovery of a finger bone fragment of a juvenile female who lived about 41,000 years ago, found in the remote Denisova Cave in the Altai Mountains in Siberia, a cave which has also been inhabited by Neanderthals and modern humans. Two teeth and a toe bone belonging to different members of the same population have since been reported.
One thing for sure, humans have an insatiable hunger for knowing how ancient humans looked and lived like, and scientists have immense ability to feed them.