Illumina, Inc. (NASDAQ: ILMN) unveiled what it calls the world’s largest genome-wide genetic perturbation dataset, aiming to speed drug discovery through artificial intelligence.
The company said the Illumina Billion Cell Atlas marks the first release in a three-year effort to map five billion human cells. Additionally, the dataset seeks to give researchers a clearer picture of how diseases operate inside human cells.
Illumina is building the Atlas with founding pharmaceutical partners AstraZeneca (NASDAQ: AZN), Merck (NYSE: MRK), and Eli Lilly and Company (NYSE: LLY). The alliance focuses on using large-scale biology to strengthen drug target validation and AI model training.
Furthermore, Illumina said the collaboration will support research into disease mechanisms that have remained difficult to study. Merck plans to use the Atlas to advance precision medicine across its drug discovery programs. Additionally, the company intends to train internal AI and machine-learning foundation models using the data.
Those models should help Merck develop virtual cell systems that better predict which diseases drugs may treat. The Atlas will track how one billion individual cells respond to genetic changes. Researchers will use CRISPR tools to switch genes on and off across more than 200 disease-relevant cell lines. Further, the selected cell lines reflect conditions ranging from cancer and immune disorders to neurological and rare genetic diseases.
This approach allows scientists to test the impact of nearly all 20,000 human genes. Consequently, researchers can observe how specific genetic changes alter cell behavior. The process helps link genetic signals to biological mechanisms that matter for medicine development.
Read more: Prestigious medtech intelligence firm recognizes Breath Diagnostics for innovation
Read more: Breath Diagnostics completes install of advanced mass spectrometry system
Billion Cell Atlas represents first commercial dataset
Illumina said the dataset will let users study how drugs work inside cells. Additionally, researchers can explore potential new disease indications for existing compounds. The Atlas also supports validation of drug targets discovered through human genetics.
Executives at partner companies see the project as a bridge between genetics and real-world treatments. AstraZeneca researchers said the data should help convert genetic clues into biology that scientists can directly test. Furthermore, they believe clearer cellular insight can sharpen decisions earlier in drug development.
Eli Lilly leaders described large, diverse biological datasets as essential for the next phase of AI-driven discovery. They argued that spanning many cell types improves the chances of generating insights that translate to patients. Meanwhile, the company expects the Atlas to support work in cardiometabolic disease and other complex areas.
The Billion Cell Atlas represents the first commercial dataset from Illumina’s newly formed BioInsight business. In addition, BioInsight aims to provide foundational technologies and data for AI-enabled drug research. Illumina positioned the unit as a long-term platform rather than a single product launch.
The company said the Atlas relies on its Single Cell 3′ RNA prep platform. That technology allows millions of individual cells to be captured in a single experiment. Consequently, Illumina can generate data at a scale previously considered impractical. Illumina expects the Atlas to produce about 20 petabytes of single-cell transcriptomic data within a year. To manage that volume, the company processes data using its DRAGEN pipeline with hardware acceleration. Additionally, Illumina hosts the results on its Connected Analytics cloud platform for large-scale analysis.
Read more: Breath Diagnostics gives the public the chance to join the fight against cancer
Read more: Breath Diagnostics onboards new president and closes critical financing
Setup allows researchers to apply advanced AI
The company said this infrastructure supports rapid access for partners across the pharmaceutical ecosystem. Furthermore, the setup allows researchers to apply advanced AI tools without building their own data systems. Illumina believes this shared approach lowers barriers to complex biological analysis.
“We believe the cell atlas is a key development that will enable us to significantly scale AI for drug discovery,” said Jacob Thaysen, chief executive officer of Illumina.
“We are building an unparalleled resource for training the next generation of AI models for precision medicine and drug target identification, ultimately helping map the biological pathways behind some of the world’s most devastating diseases.”
Illumina plans to expand the Atlas steadily with additional partners. The Billion Cell Atlas builds on an initiative announced last February to reach five billion cells. Additionally, the company said future atlases will focus on specific diseases and biological systems.
Illumina CEO Jacob Thaysen is scheduled to present at the 44th Annual J.P. Morgan Healthcare Conference on January 13, 2026. The presentation will be available by webcast through the company’s investor website.
The push reflects a broader shift across health care toward data-driven biology and AI-enabled decision making. Additionally, pharmaceutical and diagnostics companies are increasingly pairing large datasets with machine learning to detect cancer earlier and refine treatments.
That trend extends well beyond Illumina, as other firms apply AI directly to cancer prediction and detection technologies.
Read more: Breath Diagnostics pioneers novel lung cancer breath test
Read more: Breath Diagnostics takes aim at lung cancer with One Breath
Multiple companies incorporate AI in cancer fight
Axe Compute (formerly Predictive Oncology) (NASDAQ: POAI) uses artificial intelligence and machine learning to improve cancer drug discovery and treatment prediction. Its platform analyses biological and clinical data to forecast how tumor samples will respond to specific drug compounds with high accuracy. By providing early insight into drug effectiveness, the technology aims to streamline development and tailor therapies more precisely to individual patients.
Breath Diagnostics, Inc. is another player in the AI cancer space.
It’s a medical technology startup using machine learning and advanced chemistry to fight cancer by analysing exhaled breath for biomarkers linked to lung cancer. Its OneBreath platform combines proprietary microchips, high-performance liquid chromatography and AI algorithms to turn a single breath into diagnostic data.
Early clinical studies showed about 94 per cent sensitivity and 85 per cent specificity for detecting lung cancer, meaning the test can identify many early-stage cases more reliably than some conventional imaging methods. By reducing false positives and offering a non-invasive screening tool, Breath Diagnostics hopes to enable earlier intervention and improve patient outcomes.
.