MathFeature | Feature Extraction Package for Biological Sequences

Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors

Home • Key Features • List of files • Dependencies • Installing • How To Use • Citation

Script: Concatenate datasets/descriptors

Concatenate datasets/descriptors. To use this script, follow the example below:

Important: CSV format as input to the method.

To run the tool (Example): $ python3.7 preprocessing/concatenate.py -n number of datasets/files -o output

Where:

-h = help

-n = Input - number of datasets/files, e.g., orf.csv, entropy.csv

-o = output - Fasta format file, e.g., output.csv

Running:

$ python3.7 preprocessing/concatenate.py -n 2 -o hybrid_dataset.csv 

Script: Preprocessing

To eliminate any noise from the sequences (e.g., other letters as: N, K …,). To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/preprocessing.py -i input -o output

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - Fasta format file, e.g., output.fasta

Running:

$ python3.7 preprocessing/preprocessing.py -i sequences.fasta -o preprocessing.fasta 

Script: Sampling

Sample sequences, select a quantity. To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/sampling.py -i input -o output -p samples

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

-o = output - Fasta format file, e.g., output.fasta

-p = samples - Number of samples

Running:

$ python3.7 preprocessing/sampling.py -i sequences.fasta -o sampling.fasta -p 1000

Script: Sequences Count

Returns the number of sequences/samples. To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/count_sequences.py -i input

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

Running:

$ python3.7 preprocessing/count_sequences.py -i sequences.fasta

Script: Bases/Nucleotide Count by Sequences

Bases/Nucleotide Count by Sequence. To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/count_bases.py -i input

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

Running:

$ python3.7 preprocessing/count_bases.py -i sequences.fasta

Script: Redundancy 1

Removes repeated sequences in a file. To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/remove_redundancy.py -i input

Where:

-h = help

-i = Input - Fasta format file, e.g., test.fasta

Running:

$ python3.7 preprocessing/remove_redundancy.py -i sequences.fasta

Script: Redundancy 2

Removes repeated sequences between two files. To use this script, follow the example below:

Important: This package only accepts sequence files in Fasta format as input to the methods.

To run the tool (Example): $ python3.7 preprocessing/remove_equal_sequences.py -i input1 -t input2

Where:

-h = help

-i = Input 1 - Fasta format file, e.g., test.fasta

-t = Input 2 - Fasta format file, e.g., test.fasta

Running:

$ python3.7 preprocessing/remove_equal_sequences.py -i sequences1.fasta -t sequences2.fasta