A deep learning framework using protein language models to identify candidate allergenic serine proteases across gut and oral microbiome gene catalogs.
Thurimella, K., Wu, E., Li, C., Graham, D. B., Owens, R. M., Plichta, D. R., Sokol, C. L., Xavier, R. J., & Bacallado, S. (2026). Identifying microbial protease allergens through protein language model-guided homology. Cell Systems, 0, 101510.
CAZyLingua is the first annotation tool to use protein language models for accurate classification of carbohydrate-active enzyme families and subfamilies in metagenomics.
Thurimella, K., Mohamed, A. M., Li, C., Vatanen, T., Graham, D. B., Owens, R. M., La Rosa, S. L., Plichta, D. R., Bacallado, S., & Xavier, R. J. (2025). Protein language models uncover carbohydrate-active enzyme function in metagenomics. BMC Bioinformatics, 26(285).
SCNIC is open-source software that can generate correlation networks and detect and summarize modules of highly correlated features.
Shaffer, M.†, Thurimella, K.†, Sterrett, J. D., & Lozupone, C. A. (2023). SCNIC: Sparse Correlation Network Investigation for Compositional Data. Molecular Ecology Resources, 23(1), 312–325. †Co-first authors
AMON is an open-source bioinformatics application that annotates which compounds in the metabolome could have been produced by bacteria or the host, evaluates pathway enrichment, and visualizes metabolite origins in KEGG pathway maps.
Shaffer, M., Thurimella, K., Quinn, K., Doenges, K., Zhang, X., Bokatzian, S., Reisdorph, N., & Lozupone, C. A. (2019). AMON: Annotation of metabolite origins via networks to integrate microbiome and metabolome data. BMC Bioinformatics, 20(1), 1–11.