Getting Started¶
This guide walks you through running bsllmner-mk2 for the first time.
For installation details (Docker Compose, uv, GPU configuration), see Installation.
1. Start the Service¶
If this is your first time, complete the full setup in Installation first.
2. Download Ontology Files¶
The ontology files are used by Select mode (Stage 2) for mapping extracted terms to ontology entries. The download script places files inside the container at /app/ontology/. Because compose.yml mounts ${PWD}:/app, they are also available on the host at ./ontology/.
2.1 Run Download Script¶
This downloads the following ontology files to ontology/:
cellosaurus.obo- Cell line databasecell_ontology.owl- Cell Ontologyuberon.owl- UBERON (anatomy ontology)mondo.owl- MONDO (disease ontology)chebi.owl- ChEBI (chemical entities)
2.2 Convert Cellosaurus OBO to OWL¶
Cellosaurus is downloaded in OBO format and needs to be converted to OWL:
cd ontology
docker run -v $PWD:/work -w /work --rm -it obolibrary/robot robot convert \
-i ./cellosaurus.obo \
-o ./cellosaurus.owl \
--format owl
cd ..
3. (Optional) Pre-pull LLM Model¶
The LLM model is automatically downloaded on first use via the Ollama API. No manual pull is required.
To pre-download the model before running (recommended for large models like 70b):
The model (~40GB for 70b) is stored in the ollama-data/ directory.
4. Run Extract Mode¶
Extract mode performs Named Entity Recognition (NER) to extract biological terms from BioSample records.
docker compose exec app bsllmner2_extract \
--bs-entries tests/data/example_biosample.json \
--model llama3.1:70b \
--debug
For all extract CLI options, see Extract Mode.
5. Run Select Mode¶
Select mode extends extract mode by mapping extracted terms to ontology entries.
docker compose exec app bsllmner2_select \
--bs-entries tests/data/example_biosample.json \
--model llama3.1:70b \
--select-config scripts/select-config.json \
--debug
For all select CLI options, see Select Mode.
6. Inspect Results¶
Results are saved in bsllmner2-results/:
View the extract result:
# Show run metadata
jq '.run_metadata' bsllmner2-results/extract/*.json
# Show extracted values
jq '.output[] | {accession, output}' bsllmner2-results/extract/*.json
View the select result:
Result Structure¶
- Extract results:
bsllmner2-results/extract/{run_name}.json -
Contains extracted entities and metadata
-
Select results:
bsllmner2-results/select/select_{run_name}.json - Contains ontology-mapped results for each field
For the full result schema, see Data Formats.
Next Steps¶
- ChIP-Atlas data processing: See chip-atlas.md for processing ChIP-Atlas data with hg38/mm10
- Model evaluation: See tests/model-evaluation/README.md for benchmarking different LLM models
- Custom configuration: Create your own
select-config.jsonto customize field extraction and ontology mapping