Kumar Thurimella, PhD

Final-Year MD Student · Physician-Scientist in Training

University of Colorado School of Medicine · University of Cambridge

About

I am a final-year MD student at the University of Colorado School of Medicine and recently completed my PhD at the University of Cambridge. Before medical school, I worked as a software engineer at Uber, where I first saw the potential of computation to find precise signals within vast, complex systems. That conviction — that algorithms could help unravel the mechanisms behind autoimmune disease — led me to pursue a career as a physician-scientist.

My work sits at the intersection of AI, computational biology, and clinical medicine. I am drawn to rheumatology, where immune dysregulation, complex patient data, and computational modeling converge with the opportunity to develop targeted therapies for patients who need them most.

Research

Current Work

My PhD in Biotechnology and Statistics/Deep Learning focuses on using protein lanugage models to discover new microbial enzymes linked in immune mediated diseases. I was advised by Dr. Sergio Bacallado at Cambridge and Dr. Ramnik Xavier at the Broad Institute and Mass General Hospital.

My newest paper, Identifying microbial protease allergens through protein language model-guided homology, was just published in Cell Systems (2026). It introduces a deep learning framework using protein language models to uncover candidate allergenic serine proteases across gut and oral microbiome gene catalogs. The work was recently profiled by the Cambridge Department of Chemical Engineering and Biotechnology. Other recent work includes CAZyLingua, a protein language model-based tool for annotating carbohydrate-active enzymes in metagenomics (BMC Bioinformatics, 2025).

I am grateful to the Gates Cambridge Scholarship and Rotary International Scholarship for funding my PhD.

Clinical Training

Medical Education

6/6
Core Clerkship Honors

AOA
Alpha Omega Alpha

Rheumatology Interest

At the University of Colorado School of Medicine, I earned Honors in all six core clinical clerkships — Internal Medicine, OB/GYN, Pediatrics, Family Medicine, Psychiatry, and Surgery — as well as in my Rheumatology and Medicine Acting Internship rotations. I was elected to Alpha Omega Alpha (AOA), the national medical honor society.

My clinical experiences have deepened my commitment to pursuing a career as a physician-scientist in rheumatology. The intersection of autoimmune disease mechanisms, computational biology, and patient care drives my goal of contributing to both the scientific understanding and clinical management of rheumatic diseases through physician-scientist training programs.

Background

Education & Experience

University of Cambridge

Gates Cambridge Scholar

University of Colorado School of Medicine

Honors in 6/6 core clerkships · AOA

Wellcome Sanger Institute / University of Cambridge

Uber

San Francisco, CA

University of Colorado Boulder

Publications

Selected Papers

Identifying microbial protease allergens through protein language model-guided homology

A deep learning framework using protein language models to identify candidate allergenic serine proteases across gut and oral microbiome gene catalogs.

Thurimella, K., Wu, E., Li, C., Graham, D. B., Owens, R. M., Plichta, D. R., Sokol, C. L., Xavier, R. J., & Bacallado, S. (2026). Identifying microbial protease allergens through protein language model-guided homology. Cell Systems, 0, 101510.

Protein language models uncover carbohydrate-active enzyme function in metagenomics

CAZyLingua is the first annotation tool to use protein language models for accurate classification of carbohydrate-active enzyme families and subfamilies in metagenomics.

Thurimella, K., Mohamed, A. M., Li, C., Vatanen, T., Graham, D. B., Owens, R. M., La Rosa, S. L., Plichta, D. R., Bacallado, S., & Xavier, R. J. (2025). Protein language models uncover carbohydrate-active enzyme function in metagenomics. BMC Bioinformatics, 26(285).

SCNIC: Sparse Correlation Network Investigation for Compositional Data

SCNIC is open-source software that can generate correlation networks and detect and summarize modules of highly correlated features.

Shaffer, M.†, Thurimella, K.†, Sterrett, J. D., & Lozupone, C. A. (2023). SCNIC: Sparse Correlation Network Investigation for Compositional Data. Molecular Ecology Resources, 23(1), 312–325. †Co-first authors

AMON: Annotation of metabolite origins via networks to integrate microbiome and metabolome data

AMON is an open-source bioinformatics application that annotates which compounds in the metabolome could have been produced by bacteria or the host, evaluates pathway enrichment, and visualizes metabolite origins in KEGG pathway maps.

Shaffer, M., Thurimella, K., Quinn, K., Doenges, K., Zhang, X., Bokatzian, S., Reisdorph, N., & Lozupone, C. A. (2019). AMON: Annotation of metabolite origins via networks to integrate microbiome and metabolome data. BMC Bioinformatics, 20(1), 1–11.

View all publications

Kumar Thurimella, PhD

About

Current Work

Medical Education

Education & Experience

PhD, Biotechnology & Mathematics/Statistics

MD, Expected May 2026

MPhil, Computational Biology

Software Engineer II

BS, Applied Mathematics

Selected Papers

Identifying microbial protease allergens through protein language model-guided homology

Protein language models uncover carbohydrate-active enzyme function in metagenomics

SCNIC: Sparse Correlation Network Investigation for Compositional Data

AMON: Annotation of metabolite origins via networks to integrate microbiome and metabolome data