Homolog.us - Frontier in Bioinformatics


Training Approach in Evo and Evo2

In the earlier posts of this series (here, here, here and here), we covered the mathematical and biological aspects of evo and evo2. One important topic that we have not covered yet is how the models were trained.

Massively Parameterized Statistics

In this article, I will argue that Multi Parameter Statistics, or even better, Massively Parameterized Statistics (MPS) better describes the application of AI models in biology and medicine. Also, I will introduce you to a new preprint on DNA sequence modeling that claims to match evo.

Biological Aspects of Evo and Evo2 - Semantic Mining

In the last three posts of this series (here, here and here), we covered the mathematical aspect of evo and evo2. Let us now discuss the biological findings from these models. It will take multiple posts to go over these topics.

StripedHyena in Evo and Evo2

In the first two posts of this series (here and here), we covered the AI-related mathematical concepts applied to evo and evo2. Before moving on to the biological side, here is one last post on the model.

Evo and Evo2 - Math and Algorithm

In the first post of this series, we covered the basic technical terms of the evo and evo2 papers. We also mentioned the key technological innovation that made their work possible. That led to the question - if they were using fast fourier transform (FFT), were they using convolutional neural network (CNN)? The answer is no. The computer science work done by the Stanford group is quite groundbreaking. Let me go over that in detail.

Discussing the Evo and Evo2 Papers

Two recent papers applying AI-related large language models on DNA sequences are gaining a lot of attentions and a bit of controversy. The first paper titled Sequence Modeling and Design from Molecular to Genome Scale with Evo wrote -

Trained on 2.7M prokaryotic and phage genomes, Evo can generalize across the three fundamental modalities of the central dogma of molecular biology to perform zero-shot function prediction that is competitive with, or outperforms, leading domain-specific language models. Evo also excels at multi-element generation tasks, which we demonstrate by generating synthetic CRISPR-Cas molecular complexes and entire transposable systems for the first time. Using information learned over whole genomes, Evo can also predict gene essentiality at nucleotide resolution and can generate coding-rich sequences up to 650 kb in length, orders of magnitude longer than previous methods.

Rules of the Genomes

What are the rules of the genomes? What patterns do the genome sequences follow? What biochemical and evolutionary mechanisms are behind these patterns? Are newly published genomes and pangenomes displaying many exceptions to the rules, or do they all confirm the expected patterns?

The Best of 2021

Now that we are on the very last day of 2021, it is not too late to review the positives of the year. I picked four categories (humor, science, society, technology) and shortlisted a tiny subset from many deserving candidates.

Devastating Impact of Climate Change Around the World

Climate Change is taking a devastating toll on the lives of people around the world. This week, two students from Anderson High School died in their sleep. The school district canceled the final exams to help the grieving community. Separately, in Europe, three soccer players left game this week because of heart conditions. Also, 33-year old Argentine striker Sergio Aguero playing for Barcelona announced retirement from soccer due to heart condition. In Silicon valley, 43 year old Tyson Clark from Google Ventures and 44 year old Ryan Popple also died in their sleep.

Did They Fake Their Entire NGS Experiment?

In NGS experiments, when the researchers encounter issues with genome assembly or analysis, they go back to the raw data composed of sequencing reads. In a latest preprint submitted to zenodo, Steven C. Quay did exactly that for a seminal paper and concluded - “The alternative conclusion is that this sample was not a fecal specimen but was contrived. The data cannot, however, distinguish between a non-fecal specimen that came from true field work on the one hand and a specimen created de novo in the laboratory on the other hand.” This is no simple matter, because the entire world had been running like headless chicken for the last two years relying on the genome assembly submitted in the paper.

Another Unusual Connection Between Covid and AIDS

In early 2020, Prashant Pradhan and collaborators posted a preprint titled “Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag” in biorxiv. Based on the released emails from NIH under FOIA, we now know that this article and its coverage in zerohedge upset Fauci so much that he immediately convened an urgent meeting of virologists and several health bureaucrats from US, UK and Europe. All details of this meeting had been redacted, but the virologists present in the meeting fast-tracked a Nature Medicine paper claiming the virus definitely came from animals even though they described it as lab-engineered in their private emails. This paper was then used for over one year to censor all counter-arguments. Especially, biorxiv retracted the preprint due to intense pressure and thus destroyed its reputation as a preprint server for good.

A Disturbing Rise in Heart Attacks Explained

All over the world, people are noticing disturbing rises in heart-related problems among the young and healthy. This year, seventy five prominent athletes died suddenly of heart attack, and many others were sent to hospitals often in the middle of the matches. The problem has become so noticeable that this compiled video of athletes falling on the ground with heart problem went viral. In Australia, a top player with the Adelaide Crows was diagosed with pericarditis. In India, 29 year old former Indian U-19 cricket team captain died after suffering a cardiac arrest and so did 66 year old former cricket player Yashpal Sharma.

Leaky Vaccines, Freaky Mutants and Viral Quasiparticle Swarms

NIH Director Francis Collins Plans to Change Name to Avoid Scandal

A newly leaked classified document revealed that scandal-ridden Francis Collins plans to change his name to continue running the NIH. He got the idea by observing Facebook CEO Mark Zuckerberg, who is rebranding himself to be a reptilian.

Oxford Nanopore IPO Takes Place on the Same Day UK Runs out of Petrol

Dishonest Trevor Bedford Wins Howard Hughes and MacArthur Awards

US establishment biologists are so tone-deaf that they gave Trevor Bedford both Howard Hughes and MacArthur awards. These same people also scream at the top of their lungs - “Trust the experts”. Here is what I got by trusting “experts” like Trevor Bedford.

Is It Time to Retract All Papers from Zhengli Shi and Peter Daszak?

This DEFUSE Grant Proposal is the Scariest Document I Have Ever Read

Yesterday, an explosive set of leaked documents on the origin of SARS-CoV-2 virus got released by DRASTIC. People following the topic are describining them as “worse than the Chernobyl in the biology field”. In my opinion, this release changed the entire understanding of the origin of the pandemic and exposed a group of people as extremely wicked, shockingly evil and vile (sorry to borrow the movie name). Let me explain why.

Three Major Breakthroughs on the Origin of Covid

The scientists looking for the true origin of the Wuhan flu are excited about three major developments over the last few weeks. Let me list them here.

Pushback against Using PCA, tSNE and UMAP in Biology

More Articles ›