A deep learning framework using protein language models to identify candidate allergenic serine proteases across gut and oral microbiome gene catalogs.
Thurimella, K., Wu, E., Li, C., Graham, D. B., Owens, R. M., Plichta, D. R., Sokol, C. L., Xavier, R. J., & Bacallado, S. (2026). Identifying microbial protease allergens through protein language model-guided homology. Cell Systems, 0, 101510.
CAZyLingua is the first annotation tool to use protein language models for accurate classification of carbohydrate-active enzyme families and subfamilies in metagenomics.
Thurimella, K., Mohamed, A. M., Li, C., Vatanen, T., Graham, D. B., Owens, R. M., La Rosa, S. L., Plichta, D. R., Bacallado, S., & Xavier, R. J. (2025). Protein language models uncover carbohydrate-active enzyme function in metagenomics. BMC Bioinformatics, 26(285).
Pilot randomized controlled trial: Increased engagement in cardiovascular health through AI-enabled tailored messaging
A pilot randomized controlled trial showing that AI-enabled tailored messaging increases engagement in cardiovascular health interventions.
Xia, A.†, Thurimella, K.†, Bull, S., Waughtal, J., Chavez, C., Novins-Montague, S., Silvasstar, J., Salyers, A., Ho, M. P., & Lavieri, M. (2024). Pilot randomized controlled trial: Increased engagement in cardiovascular health through AI-enabled tailored messaging. JMIR Cardio (Under Review). †Co-first authors
SCNIC is open-source software that can generate correlation networks and detect and summarize modules of highly correlated features.
Shaffer, M.†, Thurimella, K.†, Sterrett, J. D., & Lozupone, C. A. (2023). SCNIC: Sparse Correlation Network Investigation for Compositional Data. Molecular Ecology Resources, 23(1), 312–325. †Co-first authors