Feature Extraction Package for Biological Sequences Based on Mathematical Descriptors
Home • Key Features • List of files • Dependencies • Installing • Descriptors • How To Use • Citation
Other techniques
MathFeature also provides other techniques known in the literature: k-mer (for protein), Amino acid composition (AAC), Dipeptide composition (DPC), Tripeptide composition (TPC).
Important: This package only accepts sequence files in Fasta format as input to the methods.
Customizable k-mer, AAC, DPC, TPC
To use this model, follow the example below:
To run the code (Example): $ python3.7 methods/ExtractionTechniques-Protein.py -i input -o output -l label -t technique
Where:
-h = help
-i = Input - Fasta format file, e.g., test.fasta
-o = output - CSV format file, e.g., test.csv
-l = Label - Dataset Label, e.g., lncRNA, mRNA, sncRNA, DNA, 0, 1
-t = type of Feature Extraction - e.g., AAC or DPC or TPC or kmer or kstep
Running:
$ python3.7 methods/ExtractionTechniques-Protein.py -i protein.fasta -o dataset.csv -l DNA -t AAC
Note: Kstep = Customizable k-mer - You can configure sliding window, step and k-mer.
Note: Input sequences for feature extraction must be in fasta format.
Note: This example will generate a csv file with the extracted features.