BioAutoML

BioAutoML: Automated Feature Engineering and Metalearning

View project on GitHub

Python Dependencies Contributions welcome Status

BioAutoML: Automated Feature Engineering and Metalearning for Classification of Biological Sequences

HomeRepositoryDocumentationCitation

BioAutoML - Automated Feature Engineering and Metalearning - End-to-end Machine Learning Workflow

To use this model, follow the example below:

To run the code (Example): $ python BioAutoML-feature.py -h


Where:

-fasta_train: fasta format file, e.g., fasta/lncRNA.fasta fasta/circRNA.fasta
 
-fasta_label_train: labels for fasta files, e.g., lncRNA circRNA

-fasta_test: fasta format file, e.g., fasta/lncRNA.fasta fasta/circRNA.fasta

-fasta_label_test: labels for fasta files, e.g., lncRNA circRNA

-estimations: number of estimations - BioAutoML - default = 50

-n_cpu: number of cpus - default = 1

-output: results directory, e.g., result

Running:

$ python BioAutoML-feature.py -fasta_train Case\ Studies/CS-I-A/E_coli/train/rRNA.fasta Case\ Studies/CS-I-A/E_coli/train/sRNA.fasta -fasta_label_train rRNA sRNA -fasta_test Case\ Studies/CS-I-A/E_coli/test/rRNA.fasta Case\ Studies/CS-I-A/E_coli/test/sRNA.fasta -fasta_label_test rRNA sRNA -output test_directory

Note This example is in the Directory: Case Studies.

Note Inserting a test dataset is optional.

Running: In unknown sequences

$ python BioAutoML-feature.py -fasta_train Case\ Studies/CS-I-A/E_coli/train/rRNA.fasta Case\ Studies/CS-I-A/E_coli/train/sRNA.fasta -fasta_label_train rRNA sRNA -fasta_test new_sequences.fasta -fasta_label_test unknown -output test_directory