MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

Home • Key Features • List of files • Dependencies • Installing • Descriptors • How To Use • Citation

Preprocessing

Before executing any method in this package, it is necessary to run a pre-processing script, to eliminate any noise from the sequences (e.g., other letters as: N, K …). To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/preprocessing.py -i input -o output

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - Fasta format file, e.g., output.fasta

Running:

$ python3.7 preprocessing/preprocessing.py -i dataset.fasta -o preprocessing.fasta 

Customizable k-mer, NAC, DNC, TNC

MathFeature also provides other techniques known in the literature: k-mer, Nucleic acid composition (NAC), Di-nucleotide composition (DNC), Tri-nucleotide composition (TNC). To use this model, follow the example below:

To run the code (Example): $ python3.7 methods/ExtractionTechniques.py -i input -o output -l label -t technique -seq DNA/RNA

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - CSV format file, e.g., test.csv

-l = Label - Dataset Label, e.g., lncRNA, mRNA, sncRNA, DNA, 0, 1

-t = type of Feature Extraction - e.g., NAC or DNC or TNC or kmer or kstep

-seq = type of sequence, 1 = DNA and 2 = RNA'

Running:

$ python3.7 methods/ExtractionTechniques.py -i sequence.fasta -o dataset.csv -l DNA -t NAC -seq 1

Note: Kstep = Customizable k-mer - You can configure sliding window, step and k-mer.

Note: Input sequences for feature extraction must be in fasta format.

Note: This example will generate a csv file with the extracted features.