Getting Started¶

This guide walks you through running bsllmner-mk2 for the first time.

For installation details (Docker Compose, uv, GPU configuration), see Installation.

1. Start the Service¶

docker compose up -d --build

If this is your first time, complete the full setup in Installation first.

2. Download Ontology Files¶

The ontology files are used by Select mode (Stage 2) for mapping extracted terms to ontology entries. The download script places files inside the container at /app/ontology/. Because compose.yml mounts ${PWD}:/app, they are also available on the host at ./ontology/.

2.1 Run Download Script¶

docker compose exec app python3 scripts/download_ontology_files.py

This downloads the following ontology files to ontology/:

cellosaurus.obo - Cell line database
cell_ontology.owl - Cell Ontology
uberon.owl - UBERON (anatomy ontology)
mondo.owl - MONDO (disease ontology)
chebi.owl - ChEBI (chemical entities)

2.2 Convert Cellosaurus OBO to OWL¶

Cellosaurus is downloaded in OBO format and needs to be converted to OWL:

cd ontology
docker run -v $PWD:/work -w /work --rm -it obolibrary/robot robot convert \
  -i ./cellosaurus.obo \
  -o ./cellosaurus.owl \
  --format owl
cd ..

3. (Optional) Pre-pull LLM Model¶

The LLM model is automatically downloaded on first use via the Ollama API. No manual pull is required.

To pre-download the model before running (recommended for large models like 70b):

docker compose exec ollama ollama pull llama3.1:70b

The model (~40GB for 70b) is stored in the ollama-data/ directory.

4. Run Extract Mode¶

Extract mode performs Named Entity Recognition (NER) to extract biological terms from BioSample records.

docker compose exec app bsllmner2_extract \
  --bs-entries tests/data/example_biosample.json \
  --model llama3.1:70b \
  --debug

For all extract CLI options, see Extract Mode.

5. Run Select Mode¶

Select mode extends extract mode by mapping extracted terms to ontology entries.

docker compose exec app bsllmner2_select \
  --bs-entries tests/data/example_biosample.json \
  --model llama3.1:70b \
  --select-config scripts/select-config.json \
  --debug

For all select CLI options, see Select Mode.

6. Inspect Results¶

Results are saved in bsllmner2-results/:

# List result files
ls bsllmner2-results/extract/
ls bsllmner2-results/select/

View the extract result:

# Show run metadata
jq '.run_metadata' bsllmner2-results/extract/*.json

# Show extracted values
jq '.output[] | {accession, output}' bsllmner2-results/extract/*.json

View the select result:

# Show mapped ontology terms
jq '.[0].results' bsllmner2-results/select/*.json

Result Structure¶

Extract results: bsllmner2-results/extract/{run_name}.json
Contains extracted entities and metadata
Select results: bsllmner2-results/select/select_{run_name}.json
Contains ontology-mapped results for each field

For the full result schema, see Data Formats.

Next Steps¶

ChIP-Atlas data processing: See chip-atlas.md for processing ChIP-Atlas data with hg38/mm10
Model evaluation: See tests/model-evaluation/README.md for benchmarking different LLM models
Custom configuration: Create your own select-config.json to customize field extraction and ontology mapping