🤖 BioOntoBERT 🤖 Combining BERT with Biomedical Ontologies

This repository provides the code and instructions for pre-training the BioOntoBERT model, which integrates BERT with knowledge from biomedical ontologies. The model is pre-trained using a corpus generated by Onto2Sen from biomedical ontologies and then fine-tuned on the MedMCQA dataset. BioOntoBERT demonstrates enhanced performance over baseline BERT models, including PubMedBERT, in biomedical multiple-choice question-answering tasks. Remarkably, it achieves this with only 0.7% of the pre-training data used for PubMedBERT, showcasing its efficiency and improved accuracy.

Introduction

BioOntoBERT is a domain-specific language model tailored for the biomedical domain. It is pre-trained on a large corpus generated from biomedical ontologies using the Onto2Sen methodology, which helps capture domain-specific context and semantics. This pre-trained model is then fine-tuned on the MedMCQA dataset, a benchmark for biomedical question answering, to improve its performance on this specific task.

Setup-Environment

Install the required packages using:

pip install -r requirements.txt

Pre-training

Data Preparation: Prepare the Onto2Sen-generated biomedical corpus in text format for pre-training.
Model Configuration: Modify the pre-training configuration in pretrain_config.json to set hyperparameters, paths, and other settings.
Run Pre-training: Execute the pre-training script

Fine-tuning

Data Preparation: Obtain the MedMCQA dataset and preprocess it for fine-tuning.
Model Configuration: Adjust the fine-tuning configuration in finetune_config.json according to your hardware and preferences.
Run Fine-tuning: Start fine-tuning the pre-trained BioOntoBERT model

Results

The above table shows how efficiently BioOntoBERT is outperforming other pre-training BERT models with just 158MB of pre-training data from Biomedical ontologies.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Finetuning/src		Finetuning/src
Onto2Sen		Onto2Sen
Pretraining/src		Pretraining/src
Model Evaluation.png		Model Evaluation.png
README.md		README.md
requirements.txt		requirements.txt
results.png		results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 BioOntoBERT 🤖 Combining BERT with Biomedical Ontologies

Contents

Introduction

Setup-Environment

Pre-training

Fine-tuning

Results

About

Releases

Packages

Contributors 2

Languages

sahillihas/BioOntoBERT

Folders and files

Latest commit

History

Repository files navigation

🤖 BioOntoBERT 🤖 Combining BERT with Biomedical Ontologies

Contents

Introduction

Setup-Environment

Pre-training

Fine-tuning

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages