Skip to the content.

Python Dependencies Contributions welcome Status

MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

HomeKey FeaturesList of filesDependenciesInstallingDescriptorsHow To UseCitation

Other techniques

MathFeature also provides other techniques known in the literature: k-mer (for protein), Amino acid composition (AAC), Dipeptide composition (DPC), Tripeptide composition (TPC).

Important: This package only accepts sequence files in Fasta format as input to the methods.

Customizable k-mer, AAC, DPC, TPC

To use this model, follow the example below:

To run the code (Example): $ python3.7 methods/ExtractionTechniques-Protein.py -i input -o output -l label -t technique


Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - CSV format file, e.g., test.csv

-l = Label - Dataset Label, e.g., lncRNA, mRNA, sncRNA, DNA, 0, 1

-t = type of Feature Extraction - e.g., AAC or DPC or TPC or kmer or kstep

Running:

$ python3.7 methods/ExtractionTechniques-Protein.py -i protein.fasta -o dataset.csv -l DNA -t AAC

Note: Kstep = Customizable k-mer - You can configure sliding window, step and k-mer.

Note: Input sequences for feature extraction must be in fasta format.

Note: This example will generate a csv file with the extracted features.