Skip to the content.

Python Dependencies Contributions welcome Status

MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

HomeKey FeaturesList of filesDependenciesInstallingDescriptorsHow To UseCitation

Preprocessing

Before executing any method in this package, it is necessary to run a pre-processing script, to eliminate any noise from the sequences (e.g., other letters as: N, K …). To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/preprocessing.py -i input -o output


Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - Fasta format file, e.g., output.fasta

Running:

$ python3.7 preprocessing/preprocessing.py -i dataset.fasta -o preprocessing.fasta 

Customizable k-mer, NAC, DNC, TNC

MathFeature also provides other techniques known in the literature: k-mer, Nucleic acid composition (NAC), Di-nucleotide composition (DNC), Tri-nucleotide composition (TNC). To use this model, follow the example below:

To run the code (Example): $ python3.7 methods/ExtractionTechniques.py -i input -o output -l label -t technique -seq DNA/RNA


Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - CSV format file, e.g., test.csv

-l = Label - Dataset Label, e.g., lncRNA, mRNA, sncRNA, DNA, 0, 1

-t = type of Feature Extraction - e.g., NAC or DNC or TNC or kmer or kstep

-seq = type of sequence, 1 = DNA and 2 = RNA'

Running:

$ python3.7 methods/ExtractionTechniques.py -i sequence.fasta -o dataset.csv -l DNA -t NAC -seq 1

Note: Kstep = Customizable k-mer - You can configure sliding window, step and k-mer.

Note: Input sequences for feature extraction must be in fasta format.

Note: This example will generate a csv file with the extracted features.