Python port of TransDecoder - identify candidate coding regions within transcript sequences.
cd PyTransDecoder
pip install -e .Use pyTransdecoder as the primary interface when you want the standard end-to-end pipeline:
pyTransdecoder -t transcripts.fastaThis runs LongOrfs followed by Predict and writes the usual *.transdecoder.{gff3,bed,pep,cds} outputs.
You can also have the pipeline run homology support searches directly:
pyTransdecoder -t transcripts.fasta \
--blast-search-pep uniprot_sprot.pep \
--pfam-search-db Pfam-A.hmmIf you want to run the phases separately, use the subcommand CLI:
pytransdecoder longorfs -t transcripts.fastaThis creates a directory transcripts.fasta.transdecoder_dir/ with:
longest_orfs.pep- Protein sequenceslongest_orfs.cds- CDS sequenceslongest_orfs.gff3- ORF annotationsbase_freqs.dat- Nucleotide frequencies
-t, --transcripts PATH Input transcripts FASTA file [required]
-m, --min-protein-length INT Minimum protein length (default: 100 aa)
-G, --genetic-code TEXT Genetic code (default: Standard)
-S, --strand-specific Only analyze top strand
-O, --output-dir PATH Output directory
--gene-trans-map PATH Gene-to-transcript mapping file
--complete-orfs-only Only output complete ORFs
-v, --verbose Verbose output
PyTransDecoder's CLI accepts both underscore and dash formats for option names, ensuring compatibility with existing Perl TransDecoder workflows:
# Both of these work identically:
pytransdecoder predict -t transcripts.fasta --retain-pfam-hits pfam.domtblout
pytransdecoder predict -t transcripts.fasta --retain_pfam_hits pfam.domtbloutThis means you can use existing scripts and Makefiles without modification.
Legacy compatibility wrappers for TransDecoder.LongOrfs and TransDecoder.Predict now live under util/ instead of the repository root.
- universal/standard
- vertebrate_mitochondrial
- yeast_mitochondrial
- invertebrate_mitochondrial
- ciliate/tetrahymena/dasycladacean
- euplotid
- bacterial
- candida
- And 15+ more...
pytransdecoder predict -t transcripts.fastaThis creates final output files in the same directory as your transcripts:
transcripts.fasta.transdecoder.gff3- Gene predictionstranscripts.fasta.transdecoder.bed- BED formattranscripts.fasta.transdecoder.pep- Protein sequencestranscripts.fasta.transdecoder.cds- CDS sequences
# Run BLASTP against UniProt
blastp -query transcripts.fasta.transdecoder_dir/longest_orfs.pep \
-db uniprot_sprot.pep -max_target_seqs 1 \
-outfmt 6 -evalue 1e-5 -num_threads 4 > blastp.outfmt6
# Run Pfam domain search
hmmscan --cpu 4 --domtblout pfam.domtblout Pfam-A.hmm \
transcripts.fasta.transdecoder_dir/longest_orfs.pep
# Run predict with homology data
pytransdecoder predict -t transcripts.fasta \
--retain-blastp-hits blastp.outfmt6 \
--retain-pfam-hits pfam.domtbloutFor the full pipeline entrypoint, you can skip the separate Pfam step and let
pyTransdecoder prepare and search the HMM database for you:
pyTransdecoder -t transcripts.fasta --pfam-search-db Pfam-A.hmm-t, --transcripts PATH Input transcripts FASTA file [required]
-O, --output-dir PATH Output directory
-T, --top-orfs-train INT Training ORFs (default: 500)
--retain-long-orfs-mode 'dynamic' or 'strict' (default: dynamic)
--retain-pfam-hits PATH Pfam domain hits (domtblout format)
--retain-blastp-hits PATH BLAST hits (outfmt 6 format)
--single-best-only Only best ORF per transcript
--no-refine-starts Skip start codon refinement
-G, --genetic-code TEXT Genetic code (default: Standard)
-v, --verbose Verbose output
- Training: Selects top longest unique ORFs to train hexamer scoring model
- Scoring: Scores all ORFs using Markov chain model (hexamer composition)
- Selection: Chooses best ORFs based on:
- Homology support (BLAST/Pfam hits)
- Coding potential score
- ORF length
- GC content-based thresholds
- Output: Generates final predictions in multiple formats
- ✅ Phase 1 (LongOrfs): Implemented and validated
- ✅ Phase 2 (Predict): Implemented and tested
- ⏳ Performance benchmarking on large datasets
To validate output matches the Perl version:
# Run Perl version
cd ../TransDecoder
./TransDecoder.LongOrfs -t test.fasta
# Run Python version
cd ../PyTransDecoder
pytransdecoder longorfs -t test.fasta
# Compare outputs
diff ../TransDecoder/test.fasta.transdecoder_dir/longest_orfs.pep \
test.fasta.transdecoder_dir/longest_orfs.pep- Python 3.8+
- BioPython >= 1.81
- tqdm >= 4.65
Note: Ported by Claude.io Sonnet 4.5 under guidance by bhaas. Jan 24, 2026.