Science Saturday: Using AI to reveal causes of complex diseases

Individualized Medicine

Science Saturday: Using AI to reveal causes of complex diseases

By
Susan Buckles

April 24, 2021

Researchers within Mayo Clinic's Center for Individualized Medicine have developed an artificial intelligence (AI) platform that can uncover causal drivers and relationships embedded within complex biomedical data.

Nicholas Chia, Ph.D., John Kalantari, Ph.D., and Kia Khezeli, Ph.D., recently tested their machine-learning framework, called Causal Relation and Inference Search Platform (CRISP) on multiomic colorectal cancer samples alongside NASA Frontier Development Lab data scientists and machine-learning engineers. The Mayo team presented and published its findings at the IEEE Global Conference on Life Sciences and Technologies.

"Identifying causal variables directly from observational data, and differentiating between causal relationships and misleading correlations, is a critical step toward understanding, diagnosing and treating rare and complex health conditions," says Dr. Kalantari, a machine-learning scientist within the center's Microbiome program. "No one, to our knowledge, has developed or applied such causal and invariant approaches for multiomic biomedical data before." Dr. Kalantari is the principal investigator of the study.

^{_{"It's like garden weeds," Dr. Chia explains. "The dandelion keeps coming back because you don't get rid of the root. Causal inference tells you how to get rid of the root; whereas, an association study just tells you that your poor lawn health is associated with dandelions. Association doesn't tell you how to solve the problem."}}

Dr. Kalantari says the novelty of such a platform comes from its ability to discover the underlying cause-and-effect relationships driving a patient's disease progression.

"By leveraging all available multiomic and clinical data types, the platform's algorithms can be used to reveal the hidden causes of a disease in order to identify new therapeutic targets and mechanisms for disease prevention," Dr. Kalantari explains.

Using innovative algorithm to root out cause of cancer

The team emphasizes that efforts to apply machine learning methods to cancer research are ongoing, but most of these efforts are restricted to identifying data associations.

"Knowing the cause and effect of a problem means knowing how to root out the problem," says Dr. Chia, the Bernard and Edith Waterman Co-Director for the Center's Microbiome program.

"It's like garden weeds," Dr. Chia explains. "The dandelion keeps coming back because you don't get rid of the root. Causal inference tells you how to get rid of the root; whereas, an association study just tells you that your poor lawn health is associated with dandelions. Association doesn't tell you how to solve the problem."

Identifying the causes of cancer development and progression is a challenging problem. Cancer develops through mutations in the DNA caused by an inherited gene alteration, exposure to toxic chemicals, poor diet, too much solar radiation and so on. These mutations collectively result in our cells multiplying uncontrollably, forming tumors, invading nearby tissue and eventually spreading to other distant organs.

Connecting the dots with AI

To decipher the causal relationships underlying a complex disease, such as cancer, the researchers designed the CRISP platform to automatically process and analyze the complex, high-dimensional and diverse data that is often encountered by clinicians and scientists in the biomedical sciences.

"As a cloud-computing platform, CRISP can run thousands of reproducible causal inference experiments in parallel, on any combination of data," Dr. Kalantari explains.

This input data is analyzed by a set of causal inference and invariant prediction algorithms, which identify causal features responsible for a specific outcome, such as cancer subtype, patient survival or drug response, in a dataset.

A key component of CRISP is its ability to discover the causal drivers of any outcome variable within a dataset, or across many different datasets.

"The framework asks each of its individual causal learning algorithms to dig through the data, connect the dots and come up with an explanation for what is a cause versus an effect," Dr. Kalantari says. "The final output model combines the explanations from each algorithm to provide the clinician with a set of predicted causal features."

To better understand these cause-and-effect explanations, the CRISP Team also developed an intuitive, user-friendly visualization application that allows clinicians and scientists to explore the results with ease on a computer, tablet or smartphone.

Putting colorectal cancer samples to the test

After performing several preliminary validation experiments to evaluate the platform's efficacy, the researchers chose to test the framework on a classification task, particularly which subtype of colorectal cancer does each tumor resemble and what are the most distinct features of that subtype?

To perform this task, the team used a very comprehensive multiomic cancer dataset of tumor samples from 109 patients with colorectal cancer.

"From each patient, there were somewhere between nine and 30 samples taken as swabs, scrapes and biopsies along different locations of the colon to gather broad datasets, including genomic, epigenetic and microbiome," says Dr. Chia.

After allowing the framework to analyze thousands of features, such as mutations, microbial species, age, body mass index and other comorbidities, that could cause each subtype of colorectal cancer, the team was able to demonstrate that the causal models inferred by CRISP performed as well as several noncausal machine-learning methods commonly used in practice in achieving a near-perfect test set accuracy.

"However, unlike noncausal approaches, CRISP selected putative features whose causal nature were found to be supported by empirical evidence in the clinical literature," Dr. Kalantari says.

Interpreting multiomic datasets to advance treatments

The team aims to demonstrate the feasibility of the algorithm to study a larger cohort of patients with colorectal cancer and further apply the algorithm to other cancers and chronic diseases.

Dr. Khezeli, a machine-learning scientist within the center, says that's why they designed the algorithm to be "data-agnostic" — meaning that the algorithm can receive data in multiple formats or from multiple sources, and still process that data effectively.

"In colorectal cancer, the microbiome plays a significant role, but in other cancers, features unique to proteomics or epigenetics may have higher importance," says Dr. Khezeli. "This ability for the platform to be able to interpret all data types opens the door to understanding these diseases better. Our ultimate goal is that this will lead to identifying and developing effective treatments."

Teaming up with NASA for cancer research, beyond

In addition to this work the Mayo researchers teamed up with data scientists from the NASA Frontier Development Laboratory to optimize the causal inference framework for multiomics data integration and causal modeling.

Mayo Clinic and the Frontier Development Laboratory are working together to further develop causal inference approaches like CRISP to prevent and treat cancer and other complex health conditions in space and on Earth. The laboratory is studying the effect of space radiation on an astronaut's health.

"NASA wants to get astronauts to Mars in the next decade, and they want to get them there in a healthy state," says Dr. Chia. "They have an incentive to understand what causes cancer with more depth, and we're working with them to advance that knowledge."

The team aims to create an extended federated causal learning platform for multi-institutional collaborations. This will enable the team to build predictive models that are more robust and fairer because they will be optimized across institutions with diverse data populations.