Skip to the content.

Python Dependencies Contributions welcome Status

MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

HomeKey FeaturesList of filesDependenciesInstallingHow To UseCitation

Numerical Mapping - Protein

This method generates a numerical mapping of all sequence. Essentially, we provide 4 mappings (Protein). Nevertheless, this method will generate a vector with the size of the largest sequence. We developed a code that applies everything automatically. Therefore, it is necessary to pass all the classes/labels that will form the dataset. Thereby. to use this model, follow the example below:

To run the code (Example): $ python3.7 methods/Mappings-Protein.py -n number of datasets/labels -o output -r representation


Where:

-h = help

-n = number of datasets/labels

-o = output - CSV format file, e.g., test.csv

-r = representation/mappings, e.g., 1 = Accumulated Amino Acid Frequency, 3 = Kmer Frequency Mapping, 5 = Integer Mapping, 7 = EIIP Mapping

Running:

$ python3.7 methods/Mappings-Protein.py -n 2 -o dataset.csv -r 2

Note Input sequences for feature extraction must be in fasta format.

Note This example will generate a csv file with the extracted features.