Skip to the content.

Python Dependencies Contributions welcome Status

MathFeature

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

HomeKey FeaturesList of filesDependenciesInstallingHow To UseCitation

Preprocessing

Before executing any method in this package, it is necessary to run a pre-processing script, to eliminate any noise from the sequences (e.g., other letters as: N, K …,). To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/preprocessing.py -i input -o output


Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - Fasta format file, e.g., output.fasta

Running:

$ python3.7 preprocessing/preprocessing.py -i dataset.fasta -o preprocessing.fasta 

Pseudo k-tuple nucleotide composition - PseKNC

To use this model, follow the example below:

To run the code (Example): $ python3.7 methods/PseKNC.py -h


Where:

-i = Input - Fasta format file, E.g., test.fasta

-o = Output - CSV format file, E.g., test.csv.

-l = label - lncRNA, circRNA...

-x = prop - e.g., Name of file containing list of properties to be used in calculations.

-xp = prop_values - e.g., Name of file containing list of properties (values) to be used in calculations.

-seq = type of sequence, e.g., 1 = DNA, 2 = RNA

-t = type, e.g., 1 - Type 1 PseKNC, 2 - Type 2 PseKNC

-k = Kind of oligonucleotide: 2 - Dinucleotide, 3 - Trinucleotide

-j = lambda, e.g., Set the value of lambda parameter in the PseKNC algorithm. Must be smaller than the length of any query sequence, E.g., 1

-w = weight e.g., Set the value of weight parameter in the PseKNC algorithm. It can be a value between (0,1], E.g, 1.0

-s = frequency, e.g., Calculate only the frequency of each oligonucleotide in the input sequence. Unless otherwise specified, E.g., 2.

Running:

Example 1 - When seq = DNA and k = 2

$ python3.7 methods/PseKNC.py -i sequence.fasta -o sequence.csv -l 1 -x files/propNames-DNA-k2.txt -xp files/propValues-DNA-k2.txt -seq 1 -t 2 -k 2 -j 1 -w 1.0 -s 2

Example 2 - When seq = DNA and k = 3

$ python3.7 methods/PseKNC.py -i sequence.fasta -o sequence.csv -l 1 -x files/propNames-DNA-k3.txt -xp files/propValues-DNA-k3.txt -seq 1 -t 2 -k 3 -j 1 -w 1.0 -s 2

Example 3 - When seq = RNA and k = 2

$ python3.7 methods/PseKNC.py -i sequence.fasta -o sequence.csv -l 1 -x files/propNames-RNA-k2.txt -xp files/propValues-RNA-k2.txt -seq 2 -t 2 -k 2 -j 1 -w 1.0 -s 2

Note Input sequences for feature extraction must be in fasta format.

Note This example will generate a csv file with the extracted features.