Acidophilic Protein Prediction System

Supports .fasta, .fa, .txt formats, max 16MB
System Introduction

This system is a deep learning-based protein acidophilicity prediction platform. Our approach employs a three-stage pipeline: First, we utilize the ESMC (ESM-C) protein language model to generate high-quality protein embeddings. To address limited training data, we implement DCGAN-GP for data augmentation. Finally, we employ a Lightweight Sparse Mixture of Experts (LSMoE) transformer architecture for feature optimization and classification.

Usage
  • Single Sequence: Enter a single protein sequence for prediction
  • Batch Prediction: Support FASTA format multi-sequence prediction via text input or file upload
Input Requirements
  • Only standard 20 amino acids supported: ACDEFGHIKLMNPQRSTVWY
  • Sequence length: 10-2000 amino acids
  • FASTA format: Header line starting with >, followed by sequence
Result Interpretation
  • Prediction: Acidophilic or Non-acidophilic
  • Confidence: Model confidence score (0-1)
  • Probability Distribution: Prediction probabilities for both categories
Example Sequences
Acidophilic protein example:
MKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATE

Non-acidophilic protein example:
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASED