BioAutoML

BioAutoML: Automated Feature Engineering and Metalearning

View project on GitHub

Python Dependencies Contributions welcome Status

BioAutoML: Automated Feature Engineering and Metalearning for Classification of Biological Sequences

HomeRepositoryDocumentationCitation

BioAutoML - Automated Feature Engineering and Metalearning - End-to-end Machine Learning Workflow - Protein

To use this model, follow the example below:

To run the code (Example): $ python BioAutoML-feature-protein.py -h


Where:

-fasta_train: fasta format file, e.g., fasta/protein_train_pos.fasta fasta/protein_train_neg.fasta
 
-fasta_label_train: labels for fasta files, e.g., positive negative

-fasta_test: fasta format file, e.g., fasta/protein_test_pos.fasta fasta/protein_test_neg.fasta

-fasta_label_test: labels for fasta files, e.g., positive negative

-estimations: number of estimations - BioAutoML - default = 50

-n_cpu: number of cpus - default = 1

-output: results directory, e.g., result

Running:

$ python BioAutoML-feature-protein.py -fasta_train MathFeature/Case\ Studies/CS-I/train_P.fasta MathFeature/Case\ Studies/CS-I/train_N.fasta -fasta_label_train positive negative -fasta_test MathFeature/Case\ Studies/CS-I/test_P.fasta MathFeature/Case\ Studies/CS-I/test_N.fasta -fasta_label_test positive negative -output experimental/protein

Note This example is in the Directory: MathFeature.

Running: In unknown sequences

$ python BioAutoML-feature-protein.py -fasta_train MathFeature/Case\ Studies/CS-I/train_P.fasta MathFeature/Case\ Studies/CS-I/train_N.fasta -fasta_label_train positive negative -fasta_test new_sequences.fasta -fasta_label_test unknown -output experimental/protein