bsllmner-mk2¶
A tool for extracting biological named entities from BioSample records using Large Language Models (LLMs) and mapping them to ontology terms.
Key capabilities:
- Extract mode -- Performs Named Entity Recognition (NER) to extract terms such as cell line, tissue, and organism from BioSample metadata
- Select mode -- Extends extract mode by mapping extracted terms to ontology entries (Cellosaurus, UBERON, Cell Ontology, etc.)
bsllmner-mk2 uses Ollama as the LLM inference server.
Quick Start¶
docker compose up -d --build
docker compose exec app bsllmner2_extract \
--bs-entries tests/data/example_biosample.json \
--model llama3.1:70b --debug
For a complete walkthrough including ontology setup and Select mode, see Getting Started.
Documentation¶
Full documentation is available at https://dbcls.github.io/bsllmner-mk2.
Basics
- Getting Started -- First run walkthrough with ontology setup
- Installation -- Docker Compose, uv, and GPU configuration
Features
- Extract Mode -- NER extraction pipeline and CLI options
- Select Mode -- Ontology mapping pipeline and CLI options
- Data Formats -- Input/output data format specification
- Configuration -- Environment variables and settings
Operations
- ChIP-Atlas -- Processing ChIP-Atlas data (hg38/mm10)
- NIG Slurm -- Running on NIG Slurm environment
Development
- Development -- Development environment setup
- Testing -- Unit tests, linting, mutation testing, model evaluation
Related Resources¶
- Original repository: sh-ikeda/bsllmner
- Related paper: https://doi.org/10.1101/2025.02.17.638570
Other Interfaces¶
bsllmner-mk2 also includes a FastAPI server (bsllmner2_api) and a React-based web UI, but these are not actively maintained and their operation is unverified.
License¶
This repository is released under the MIT License. For details, see the LICENSE file.