Acidophilic Protein Prediction System

Upload FASTA File

Supports .fasta, .fa, .txt formats, max 16MB

System Introduction

This system is a deep learning-based protein acidophilicity prediction platform. Our approach employs a three-stage pipeline: First, we utilize the ESMC (ESM-C) protein language model to generate high-quality protein embeddings. To address limited training data, we implement DCGAN-GP for data augmentation. Finally, we employ a Lightweight Sparse Mixture of Experts (LSMoE) transformer architecture for feature optimization and classification.

Usage

Single Sequence: Enter a single protein sequence for prediction
Batch Prediction: Support FASTA format multi-sequence prediction via text input or file upload

Input Requirements

Only standard 20 amino acids supported: ACDEFGHIKLMNPQRSTVWY
Sequence length: 10-2000 amino acids
FASTA format: Header line starting with >, followed by sequence

Result Interpretation

Prediction: Acidophilic or Non-acidophilic
Confidence: Model confidence score (0-1)
Probability Distribution: Prediction probabilities for both categories

Example Sequences

Acidophilic protein example:
MKALIVLGLVLLSVTVQGKVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATE

Non-acidophilic protein example:
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASED