About
Find Out More About Us
We have been working on developing solutions to democratize AI, specifically Machine Learning (ML) in biology. So far, our studies have generated results that can be applied in the study of biological sequences, having a high potential to significantly reduce the experience required to use AI/ML pipelines, helping researchers in combating various problems, such as diseases that directly impact people's lives, mainly in low- and middle-income countries, giving biologists and other stakeholders, an opportunity for widespread use of these techniques.
BioAutoML
BioAutoML automatically runs an end-to-end ML pipeline that can be effectively employed by non-experts. BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). BioAutoML can extract features based on different aspects, and automate the feature selection, algorithm(s) recommendation, and algorithm(s) tuning steps for binary and multiclass biological data classific. BioAutoML does not require specialized human assistance, i.e., it only needs a training dataset of biological sequences (FASTA files) to perform an end-to-end ML experiment. BioAutoML lowers the barrier to applying feature engineering and metalearning in biological sequences for non-experts, democratizing ML in life sciences.
Achievements
Our studies have gained recognition in the scientific community. In 2021, our project called BioAutoML was elected by LARA-Google among the 24 most promising ideas in Latin America (24 awarded projects from a base of 700 submissions), winning the Google Latin America Research Awards (LARA), promoted by Google. In 2022, the same project was a finalist (Top 15 of 82) in the Ideas Contest, Falling Walls Lab Brazil 2022, promoted by the Falling Walls Foundation (DAAD - German Center for Science and Innovation).
Scholarships
Our research has also been recognized internationally through scholarships. We were honored to receive two scholarships to conduct research in Germany: (1) Research & Training Grant from the Federation of European Microbiological Societies (FEMS); and (2) Helmholtz Visiting Researcher Grant from the Helmholtz Information & Data Science Academy (HIDA).
Publications
Our contributions have resulted in published scientific articles, showcasing our commitment to advancing AI in biology - See FAQ.
Awards and Scholarships
Articles
Stars - All Solutions
Citations - All Articles
Solutions
Check our Solutions
BioDeepFuse
A Hybrid Deep Learning Approach with Integrated Feature Extraction Techniques for Enhanced Non-coding RNA Classification
Mathematical Features
Feature Extraction Approaches for Biological Sequences: A Comparative Study of Mathematical Features
Team
Our Hardworking Team
André de Carvalho
Robson Bonidia
Ulisses da Rocha
Peter Stadler
Danilo Sanches
Anderson Santos
Breno de Almeida
Bruno Florentino
Natan Sanches
F.A.Q
Frequently Asked Questions
-
Lay summary of the proposed package?
Recent technological advances allowed an exponential expansion of biological sequence data, and the extraction of meaningful information through Machine Learning (ML) algorithms. This knowledge improved the understanding of the mechanisms related to various fatal diseases (e.g., Cancer and COVID-19) and environmental research (e.g., adapting legumes to climate change, soil remediation, and environmental microbial communities). Although ML creates new opportunities, its proper use requires advanced knowledge of computing, statistics, and mathematics, limiting their use by non-experts. To address this concern, we developed BioAutoML, which automatically runs an end-to-end ML pipeline that can be effectively employed by non-experts. BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). BioAutoML can extract features based on different aspects, and automate the feature selection, algorithm(s) recommendation, and algorithm(s) tuning steps for binary and multiclass biological data classific BioAutoML does not require specialized human assistance, i.e., it only needs a training dataset of biological sequences (FASTA files) to perform an end-to-end ML experiment. To the best of our knowledge, our proposal automates the longest pipeline for biological sequence analysis, encompassing feature engineering, ML algorithm recommendation, and hyperparameter tuning. So far, we have achieved promising results on several problems, such as SARS-CoV-2, anticancer peptides, proinflammatory peptides, HIV-1 sequences, and phage virion proteins. BioAutoML lowers the barrier to applying feature engineering and metalearning in biological sequences for non-experts, democratizing ML in life sciences.
-
Publications
[Link][IF 2021: 13.994] BONIDIA, ROBSON P; SANTOS, ANDERSON P AVILA; DE ALMEIDA, BRENO L S; STADLER, PETER F; DA ROCHA, ULISSES N; SANCHES, DANILO S; DE CARVALHO, ANDRÉ C P L F. BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria. Briefings in Bioinformatics, v. 1, p. 1-13, 2022. [Link][IF 2021: 2.738] BONIDIA, ROBSON P; SANTOS, ANDERSON P AVILA; DE ALMEIDA, BRENO L S; STADLER, PETER F; DA ROCHA, ULISSES N; SANCHES, DANILO S; DE CARVALHO, ANDRÉ C P L F. Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy. Entropy, v. 24, p. 1398, 2022. [Link][IF 2021: 13.994] BONIDIA, ROBSON P; DOMINGUES, DOUGLAS S; SANCHES, DANILO S; DE CARVALHO, ANDRÉ C P L F. MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Briefings in Bioinformatics, v. 1, p. 1-10, 2022. [Link][IF 2020: 11.622] BONIDIA, ROBSON P; SAMPAIO, LUCAS D H; DOMINGUES, DOUGLAS S; PASCHOAL, ALEXANDRE R; LOPES, FABRÍCIO M; DE CARVALHO, ANDRÉ C P L F; SANCHES, DANILO S. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Briefings in Bioinformatics, v. 00, p. 1-20, 2021.
-
Documentations
BioAutoML: Documentation MathFeature: Documentation Mathematical Features: Documentation BioDeepFuse: Documentation