Skip to the content.

Python Dependencies Contributions welcome Status

MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

HomeKey FeaturesList of filesDependenciesInstallingHow To UseCitation

Shannon Entropy

Fundamentally, we applid entropy, because we can reach a single value that quantifies the information contained in different observation periods (e.g., our case: k-mer - see our pipeline in this article). To use this model, follow the example below:

To run the tool (Example): $ python3.7 methods/EntropyClass.py -i input -o output -l label -k kmer -e Entropy


Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - CSV format file, e.g., test.csv

-l = Label - Dataset Label, e.g., lncRNA, mRNA, sncRNA

-k = Range of k-mer, e.g., 1-mer (1) or 2-mer (1, 2)

-e = Type of Entropy, E.g., Shannon

Running:

$ python3.7 methods/EntropyClass.py -i sequences.fasta -o sequences.csv -l mRNA -k 10 -e Shannon

Tsallis Entropy

To run the feature extraction tool (Example): $ python3.7 methods/TsallisEntropy.py -i input -o output -l label -k kmer -q entropic parameter

Where:

-h = help

-i = Input - Fasta format file, e.g., sequences.fasta

-o = output - CSV format file, e.g., dataset.csv

-l = Label - Dataset Label, e.g., lncRNA, mRNA, sncRNA

-k = kmer - Range of k-mer, E.g., 1-mer (1) or 2-mer (1, 2) ...

-q = Tsallis - q entropic parameter

Running:

$ python3.7 methods/TsallisEntropy.py -i sequences.fasta -o sequences.csv -l mRNA -k 10 -q 2.3

Note Input sequences for feature extraction must be in fasta format.

Note This example will generate a csv file with the extracted features.