Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors
Home • Key Features • List of files • Dependencies • Installing • Descriptors • How To Use • Citation
Preprocessing
Before executing any method in this package, it is necessary to run a pre-processing script, to eliminate any noise from the sequences (e.g., other letters as: N, K …,). To use this script, follow the example below:
Important: This package only accepts sequence files in Fasta format as input to the methods.
To run the tool (Example): $ python3.7 preprocessing/preprocessing.py -i input -o output
Where:
-h = help
-i = Input - Fasta format file, e.g., test.fasta
-o = output - Fasta format file, e.g., output.fasta
Running:
$ python3.7 preprocessing/preprocessing.py -i dataset.fasta -o preprocessing.fasta
Accumulated Nucleotide Frequency
To use this model, follow the example below:
To run the code (Example): $ python3.7 methods/AccumulatedNucleotideFrequency.py -n number of datasets/labels -o output -r approach
Where:
-h = help
-n = number of datasets/labels
-o = output - CSV format file, e.g., test.csv
-r = approach, e.g., 1 = Accumulated Nucleotide Frequency, 2 = Accumulated Nucleotide Frequency with Fourier.
Running:
$ python3.7 methods/AccumulatedNucleotideFrequency.py -n 2 -o dataset.csv -r 1
Note: Input sequences for feature extraction must be in fasta format.
Note: This example will generate a csv file with the extracted features.