Kumar Thurimella

Kumar Thurimella, PhD

Final-Year MD Student · Physician-Scientist in Training

University of Colorado School of Medicine · University of Cambridge

About

I am a final-year MD student at the University of Colorado School of Medicine and recently completed my PhD at the University of Cambridge. Before medical school, I worked as a software engineer at Uber, where I first saw the potential of computation to find precise signals within vast, complex systems. That conviction — that algorithms could help unravel the mechanisms behind autoimmune disease — led me to pursue a career as a physician-scientist.

My work sits at the intersection of AI, computational biology, and clinical medicine. I am drawn to rheumatology, where immune dysregulation, complex patient data, and computational modeling converge with the opportunity to develop targeted therapies for patients who need them most.

Current Work

My PhD in Biotechnology and Statistics/Deep Learning focuses on using protein lanugage models to discover new microbial enzymes linked in immune mediated diseases. I was advised by Dr. Sergio Bacallado at Cambridge and Dr. Ramnik Xavier at the Broad Institute and Mass General Hospital.

My newest paper, Identifying microbial protease allergens through protein language model-guided homology, was just published in Cell Systems (2026). It introduces a deep learning framework using protein language models to uncover candidate allergenic serine proteases across gut and oral microbiome gene catalogs. Other recent work includes CAZyLingua, a protein language model-based tool for annotating carbohydrate-active enzymes in metagenomics (BMC Bioinformatics, 2025).

I am grateful to the Gates Cambridge Scholarship and Rotary International Scholarship for funding my PhD.

Medical Education

6/6
Core Clerkship Honors
AOA
Alpha Omega Alpha
Rheumatology Interest

At the University of Colorado School of Medicine, I earned Honors in all six core clinical clerkships — Internal Medicine, OB/GYN, Pediatrics, Family Medicine, Psychiatry, and Surgery — as well as in my Rheumatology and Medicine Acting Internship rotations. I was elected to Alpha Omega Alpha (AOA), the national medical honor society.

My clinical experiences have deepened my commitment to pursuing a career as a physician-scientist in rheumatology. The intersection of autoimmune disease mechanisms, computational biology, and patient care drives my goal of contributing to both the scientific understanding and clinical management of rheumatic diseases through physician-scientist training programs.

Education & Experience

PhD, Biotechnology & Mathematics/Statistics

Oct 2020 – Apr 2024

University of Cambridge

Gates Cambridge Scholar

MD, Expected May 2026

2018–2026

University of Colorado School of Medicine

Honors in 6/6 core clerkships · AOA

MPhil, Computational Biology

2017–2018

Wellcome Sanger Institute / University of Cambridge

Software Engineer II

2015–2017

Uber

San Francisco, CA

BS, Applied Mathematics

2013

University of Colorado Boulder

Selected Papers

Cell Systems · 2026
Paper

Identifying microbial protease allergens through protein language model-guided homology

A deep learning framework using protein language models to identify candidate allergenic serine proteases across gut and oral microbiome gene catalogs.

Thurimella, K., Wu, E., Li, C., Graham, D. B., Owens, R. M., Plichta, D. R., Sokol, C. L., Xavier, R. J., & Bacallado, S. (2026). Identifying microbial protease allergens through protein language model-guided homology. Cell Systems, 0, 101510.

BMC Bioinformatics · 2025
Paper

Protein language models uncover carbohydrate-active enzyme function in metagenomics

CAZyLingua is the first annotation tool to use protein language models for accurate classification of carbohydrate-active enzyme families and subfamilies in metagenomics.

Thurimella, K., Mohamed, A. M., Li, C., Vatanen, T., Graham, D. B., Owens, R. M., La Rosa, S. L., Plichta, D. R., Bacallado, S., & Xavier, R. J. (2025). Protein language models uncover carbohydrate-active enzyme function in metagenomics. BMC Bioinformatics, 26(285).

Molecular Ecology Resources · 2023
Paper

SCNIC: Sparse Correlation Network Investigation for Compositional Data

SCNIC is open-source software that can generate correlation networks and detect and summarize modules of highly correlated features.

Shaffer, M.†, Thurimella, K.†, Sterrett, J. D., & Lozupone, C. A. (2023). SCNIC: Sparse Correlation Network Investigation for Compositional Data. Molecular Ecology Resources, 23(1), 312–325. †Co-first authors

BMC Bioinformatics · 2019
Paper

AMON: Annotation of metabolite origins via networks to integrate microbiome and metabolome data

AMON is an open-source bioinformatics application that annotates which compounds in the metabolome could have been produced by bacteria or the host, evaluates pathway enrichment, and visualizes metabolite origins in KEGG pathway maps.

Shaffer, M., Thurimella, K., Quinn, K., Doenges, K., Zhang, X., Bokatzian, S., Reisdorph, N., & Lozupone, C. A. (2019). AMON: Annotation of metabolite origins via networks to integrate microbiome and metabolome data. BMC Bioinformatics, 20(1), 1–11.