Machine learning based predictive models for systemic lupus erythematosus diagnosis and prevention
Mehmet Hocaoglu, MD
Student
University of Pittsburgh
Computational and Systems Biology
General Audience Summary
Lupus occurs when the body’s immune system, which normally protects us from infections, mistakenly attacks healthy tissues. The development of lupus takes many years before symptoms onset. This creates a window of opportunity to prevent lupus. However, one of the biggest challenges with lupus is that it is difficult to predict who will get it. Another challenge is that even after symptoms onset, it often takes years to diagnose, delaying proper care and treatment.
Our project aims to change that by using cutting-edge technology and data science to improve how we predict and diagnose lupus. Specifically, we are using machine learning, the mathematical field behind the recent advances in artificial intelligence to find patterns that can help us better understand the risk factors and biological signals associated with lupus. We will use data from large biobanks across the world, which are databases that store health and genetic data, and other types of biological information from hundreds of thousands of people. These include information about a person’s genes, proteins in their blood, and environmental exposures such as smoking, pollution, sunlight and others. By combining all this data, we will build tools that can predict who is at high risk for lupus before symptoms appear. This will allow doctors and patients to monitor early warning signs and take preventive steps. We will also build tools that improve how we diagnose lupus by identifying complex biological “signatures”. These are patterns in someone’s genes or proteins that can help distinguish lupus from other diseases with similar symptoms. Earlier and more accurate diagnosis means earlier treatment, which can reduce damage and improve long-term health.
Our project will be conducted by a team of experts in genetics, machine learning, and lupus research. Together, we aim to develop models that will be practical and helpful for doctors and patients in real-world settings. Our ultimate goal is to move toward more personalized and timely care for people living with lupus. This work builds on our previous research, which has already uncovered important clues about how genetics and environmental factors interact in people with lupus and those at risk for lupus. With new advances in artificial intelligence and access to powerful data resources and computers, we are now in a better position than ever to turn these discoveries into tools that can make a difference in people’s lives.
Scientific Abstract
Systemic lupus erythematosus (SLE) is a complex autoimmune disease with significant clinical burden, yet its low prevalence and multifactorial etiology have hindered the development of effective preventive and diagnostic strategies. This project aims to develop robust machine learning (ML)-based predictive models to improve SLE risk prediction and diagnosis by integrating genetic, environmental, and multi-omics data. Despite extensive identification of genetic and environmental risk factors, current models fail to capture their complex, non-linear interactions, contributing to limiting predictive accuracy. Additionally, the lack of specific biomarkers leads to diagnostic delays and poor outcomes. In aim 1, using data from biobanks across the world, we will build and validate population-specific genetic-environmental risk models for SLE. In aim 2, we will develop integrated multi-omics models for the biomarker-based diagnosis of SLE through harmonization of proteomic, metabolomic, and genomic data. This project will result in novel, high-performing, machine learning-based models for SLE risk prediction and diagnosis. It will also provide the applicant with advanced training in computational biology and machine learning, establishing a foundation for a career focused on precision medicine in autoimmune diseases.