BioAutoML

BioAutoML: Automated Feature Engineering and Metalearning

View project on GitHub

Python Dependencies Contributions welcome Status

BioAutoML: Automated Feature Engineering and Metalearning for Classification of Biological Sequences

HomeRepositoryDocumentationCitation

BioAutoML - Metalearning - Binary Problems - Using features from other packages

To use this model, follow the example below:

To run the code (Example): $ python BioAutoML-binary.py -h


Where:

-train: csv format file, e.g., train.csv

-train_label: csv format file with labels, e.g., labels_test.csv

-test: csv format file, e.g., test.csv

-test_label: csv format file with labels, e.g., labels_test.csv

-test_nameseq: csv with sequence names - test

-nf: Normalization - Features (default = False)

-n_cpu: number of cpus - default = 1
  
-classifier: Classifier - 0: CatBoost, 1: Random Forest 2: LightGBM
  
-imbalance: To deal with the imbalanced dataset problem - True = Yes, False = No, default = False

-tuning: Tuning Classifier - True = Yes, False = No, default = False

-output: results directory, e.g., result

Running:

$ python BioAutoML-binary.py -train example_csv/lncRNA/train-human.csv -train_label example_csv/lncRNA/train-human-labels.csv -test example_csv/lncRNA/test-human.csv -test_label example_csv/lncRNA/test-human-labels.csv -test_nameseq example_csv/lncRNA/test-human-sequences.csv -classifier 2 -output example_results

or

$ python BioAutoML-binary.py -train example_csv/lncRNA/train-human.csv -train_label example_csv/lncRNA/train-human-labels.csv -test example_csv/lncRNA/test-human.csv -test_label example_csv/lncRNA/test-human-labels.csv -test_nameseq example_csv/lncRNA/test-human-sequences.csv -imbalance True -tuning True -classifier 2 -output example_results

Note This example is in the Directory: example_csv/lncRNA

Note Inserting a test dataset is optional.