This system is a deep learning-based protein acidophilicity prediction platform. Our approach employs a three-stage pipeline: First, we utilize the ESMC (ESM-C) protein language model to generate high-quality protein embeddings. To address limited training data, we implement DCGAN-GP for data augmentation. Finally, we employ a Lightweight Sparse Mixture of Experts (LSMoE) transformer architecture for feature optimization and classification.
ACDEFGHIKLMNPQRSTVWY
>
, followed by sequenceMKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATE
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASED