CasPro-ESM2

Accurate Identification of Cas Proteins Integrating Pre-trained Protein Language Model and Multi-scale Convolutional Neural Network

Upload FASTA File

If the amount of data uploaded at one time is too large, the prediction may fail. Please upload in batches or download the model to run locally (https://github.com/ChaoruiYan019/CasPro-ESM2)
Example FASTA Data:
>sequence1
MTEITAAMVKELRESTGAGMMDCKNALSETQHEHRSTVDTVDTKVLSS

Download Datasets

Download Example Data

Cas proteins are the core components of the CRISPR-Cas system, playing critical roles in defending against foreign DNA and RNA invasions. Identifying Cas proteins can provide deeper insights into the immune mechanisms of the CRISPR-Cas system and help uncover the functional mechanisms of Cas proteins.

In this study, we developed a computational tool named CasPro-ESM2, which combines the ESM-2 large language model with evolutionary information from protein sequences to identify unknown Cas proteins. Experimental results demonstrate that CasPro-ESM2 outperforms existing models in Cas protein identification, achieving the highest values in metrics such as ACC, SP, SN, and MCC on two different datasets. Furthermore, we deployed this tool on a web server to enable direct access for users.

Model Diagram