Welcome to BioAutoML

Democratizing Machine Learning in Life Sciences

About

Find Out More About Us

We have been working on developing solutions to democratize AI, specifically Machine Learning (ML) in biology. So far, our studies have generated results that can be applied in the study of biological sequences, having a high potential to significantly reduce the experience required to use AI/ML pipelines, helping researchers in combating various problems, such as diseases that directly impact people's lives, mainly in low- and middle-income countries, giving biologists and other stakeholders, an opportunity for widespread use of these techniques.


BioAutoML

BioAutoML automatically runs an end-to-end ML pipeline that can be effectively employed by non-experts. BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). BioAutoML can extract features based on different aspects, and automate the feature selection, algorithm(s) recommendation, and algorithm(s) tuning steps for binary and multiclass biological data classific. BioAutoML does not require specialized human assistance, i.e., it only needs a training dataset of biological sequences (FASTA files) to perform an end-to-end ML experiment. BioAutoML lowers the barrier to applying feature engineering and metalearning in biological sequences for non-experts, democratizing ML in life sciences.


Achievements

Our studies have gained recognition in the scientific community. In 2021, our project called BioAutoML was elected by LARA-Google among the 24 most promising ideas in Latin America (24 awarded projects from a base of 700 submissions), winning the Google Latin America Research Awards (LARA), promoted by Google. In 2022, the same project was a finalist (Top 15 of 82) in the Ideas Contest, Falling Walls Lab Brazil 2022, promoted by the Falling Walls Foundation (DAAD - German Center for Science and Innovation).


Scholarships

Our research has also been recognized internationally through scholarships. We were honored to receive two scholarships to conduct research in Germany: (1) Research & Training Grant from the Federation of European Microbiological Societies (FEMS); and (2) Helmholtz Visiting Researcher Grant from the Helmholtz Information & Data Science Academy (HIDA).


Publications

Our contributions have resulted in published scientific articles, showcasing our commitment to advancing AI in biology - See FAQ.

Awards and Scholarships

Articles

Stars - All Solutions

Citations - All Articles

Solutions

Check our Solutions

BioAutoML

End-to-End Machine Learning Package for Life Sciences

BioDeepFuse

A Hybrid Deep Learning Approach with Integrated Feature Extraction Techniques for Enhanced Non-coding RNA Classification

MathFeature

Feature Extraction Package for Biological Sequences

Mathematical Features

Feature Extraction Approaches for Biological Sequences: A Comparative Study of Mathematical Features

Team

Our Hardworking Team

André de Carvalho

Robson Bonidia

Ulisses da Rocha

Peter Stadler

Danilo Sanches

Anderson Santos

Breno de Almeida

Bruno Florentino

Natan Sanches

F.A.Q

Frequently Asked Questions

  • Recent technological advances allowed an exponential expansion of biological sequence data, and the extraction of meaningful information through Machine Learning (ML) algorithms. This knowledge improved the understanding of the mechanisms related to various fatal diseases (e.g., Cancer and COVID-19) and environmental research (e.g., adapting legumes to climate change, soil remediation, and environmental microbial communities). Although ML creates new opportunities, its proper use requires advanced knowledge of computing, statistics, and mathematics, limiting their use by non-experts. To address this concern, we developed BioAutoML, which automatically runs an end-to-end ML pipeline that can be effectively employed by non-experts. BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). BioAutoML can extract features based on different aspects, and automate the feature selection, algorithm(s) recommendation, and algorithm(s) tuning steps for binary and multiclass biological data classific BioAutoML does not require specialized human assistance, i.e., it only needs a training dataset of biological sequences (FASTA files) to perform an end-to-end ML experiment. To the best of our knowledge, our proposal automates the longest pipeline for biological sequence analysis, encompassing feature engineering, ML algorithm recommendation, and hyperparameter tuning. So far, we have achieved promising results on several problems, such as SARS-CoV-2, anticancer peptides, proinflammatory peptides, HIV-1 sequences, and phage virion proteins. BioAutoML lowers the barrier to applying feature engineering and metalearning in biological sequences for non-experts, democratizing ML in life sciences.

  • [Link][IF 2021: 13.994] BONIDIA, ROBSON P; SANTOS, ANDERSON P AVILA; DE ALMEIDA, BRENO L S; STADLER, PETER F; DA ROCHA, ULISSES N; SANCHES, DANILO S; DE CARVALHO, ANDRÉ C P L F. BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria. Briefings in Bioinformatics, v. 1, p. 1-13, 2022.

    [Link][IF 2021: 2.738] BONIDIA, ROBSON P; SANTOS, ANDERSON P AVILA; DE ALMEIDA, BRENO L S; STADLER, PETER F; DA ROCHA, ULISSES N; SANCHES, DANILO S; DE CARVALHO, ANDRÉ C P L F. Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy. Entropy, v. 24, p. 1398, 2022.

    [Link][IF 2021: 13.994] BONIDIA, ROBSON P; DOMINGUES, DOUGLAS S; SANCHES, DANILO S; DE CARVALHO, ANDRÉ C P L F. MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Briefings in Bioinformatics, v. 1, p. 1-10, 2022.

    [Link][IF 2020: 11.622] BONIDIA, ROBSON P; SAMPAIO, LUCAS D H; DOMINGUES, DOUGLAS S; PASCHOAL, ALEXANDRE R; LOPES, FABRÍCIO M; DE CARVALHO, ANDRÉ C P L F; SANCHES, DANILO S. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Briefings in Bioinformatics, v. 00, p. 1-20, 2021.

  • BioAutoML: Documentation
    MathFeature: Documentation
    Mathematical Features: Documentation
    BioDeepFuse: Documentation

Contact

Contact Us

Robson P Bonidia

rpbonidia@gmail.com

Ulisses N da Rocha

ulisses.rocha@ufz.de

Anderson Santos

anderson.avilasantos@gmail.com

André C P L F de Carvalho

andre@icmc.usp.br