Data Formats¶
BioSample JSON Input (bs_entries)¶
A list of BioSample entries. Supports JSON array or JSONL (one JSON object per line) format.
Each entry must have an accession field.
[
{
"accession": "SAMN00000001",
"title": "HeLa cell RNA-seq",
"characteristics": {
"cell_line": "HeLa",
"organism": "Homo sapiens"
}
}
]
JSONL format:
{"accession": "SAMN00000001", "title": "HeLa cell RNA-seq", ...}
{"accession": "SAMN00000002", "title": "HEK293 cell ChIP-seq", ...}
Mapping TSV (for evaluation)¶
A TSV file used for evaluating Select accuracy. A header row is required.
Note: The
extraction answercolumn is the output of a previous tool (MetaSRA), not a human-curated ground truth. It is not used for evaluation. Onlymapping answer ID(human-curated) is used as the gold standard for Select mode evaluation.
| Column | Description |
|---|---|
BioSample ID |
BioSample accession |
Experiment type |
Experiment type |
extraction answer |
Previous tool output (not used for evaluation) |
mapping answer ID |
Human-curated ground truth mapping ID (used for Select evaluation) |
mapping answer label |
Ground truth mapping label |
BioSample ID Experiment type extraction answer mapping answer ID mapping answer label
SAMN00000001 RNA-seq HeLa CVCL_0030 HeLa
SAMN00000002 RNA-seq HEK293 CVCL_0045 HEK293
Extract Result JSON (ExtractResult)¶
Saved to bsllmner2-results/extract/{run_name}.json.
{
"entries": [
{
"accession": "SAMN00000001",
"extracted": { "cell_line": "HeLa" },
"raw_output": "{\"cell_line\": \"HeLa\"}",
"llm_timing": {
"total_duration": 1000000000,
"load_duration": 100000000,
"eval_count": 50,
"eval_duration": 500000000,
"prompt_eval_count": 100
}
}
],
"run_metadata": {
"run_name": "llama3.1:70b_20250101_120000",
"model": "llama3.1:70b",
"thinking": false,
"start_time": "2025-01-01T12:00:00Z",
"end_time": "2025-01-01T12:10:00Z",
"status": "completed",
"processing_time_sec": 600.0,
"total_entries": 1
},
"performance": null,
"errors": []
}
Key Fields¶
| Path | Type | Description |
|---|---|---|
entries[].accession |
string |
BioSample accession |
entries[].extracted |
dict \| list \| null |
Parsed extraction result |
entries[].raw_output |
string \| null |
Raw JSON string from LLM |
entries[].llm_timing |
LlmTimingFields |
Lightweight timing data (nanoseconds) |
run_metadata.run_name |
string |
Run identifier |
run_metadata.model |
string |
Model name |
run_metadata.start_time |
datetime |
ISO 8601 UTC start time |
run_metadata.end_time |
datetime \| null |
ISO 8601 UTC end time |
run_metadata.status |
"running" \| "completed" \| "failed" |
Run status |
run_metadata.processing_time_sec |
float \| null |
Processing time (seconds) |
run_metadata.total_entries |
int \| null |
Total processed entries |
errors |
list[ErrorLog] |
Error information |
LlmTimingFields¶
Lightweight timing fields extracted from ChatResponse (nanoseconds). Replaces the full ChatResponse in persisted output.
| Field | Type | Description |
|---|---|---|
total_duration |
int |
Total duration (ns) |
load_duration |
int |
Model load duration (ns) |
eval_count |
int |
Number of tokens generated |
eval_duration |
int |
Token generation duration (ns) |
prompt_eval_count |
int |
Number of prompt tokens |
Select Result JSON (SelectResult)¶
Saved to bsllmner2-results/select/select_{run_name}.json.
{
"entries": [
{
"extract": {
"accession": "SAMN00000001",
"extracted": { "cell_line": "HeLa", "tissue": "cervix" },
"raw_output": "{\"cell_line\": \"HeLa\", \"tissue\": \"cervix\"}",
"llm_timing": { "total_duration": 0, "load_duration": 0, "eval_count": 0, "eval_duration": 0, "prompt_eval_count": 0 }
},
"search_results": {
"cell_line": {
"HeLa": [
{
"term_uri": "http://purl.obolibrary.org/obo/CVCL_0030",
"term_id": "CVCL:0030",
"prop_uri": "http://www.w3.org/2000/01/rdf-schema#label",
"value": "HeLa",
"label": "HeLa",
"exact_match": true,
"text2term_score": null,
"reasoning": null,
"comments": ["Disease: Cervical adenocarcinoma"]
}
]
}
},
"text2term_results": {},
"select_timings": {
"cell_line": {
"HeLa": { "total_duration": 500000000, "load_duration": 0, "eval_count": 20, "eval_duration": 200000000, "prompt_eval_count": 50 }
}
},
"results": {
"cell_line": [
{
"value": "HeLa",
"term_id": "CVCL:0030",
"term_uri": "http://purl.obolibrary.org/obo/CVCL_0030",
"label": "HeLa",
"exact_match": true,
"reasoning": "Exact match found for HeLa"
}
]
}
}
],
"run_metadata": {
"run_name": "llama3.1:70b_20250101_120000",
"model": "llama3.1:70b",
"thinking": false,
"start_time": "2025-01-01T12:00:00Z",
"end_time": "2025-01-01T12:15:00Z",
"status": "completed",
"processing_time_sec": 900.0,
"total_entries": 1
},
"evaluation": null,
"performance": null,
"errors": []
}
Key Fields¶
| Path | Type | Description |
|---|---|---|
entries[].extract |
ExtractEntry |
Embedded extract result for this entry |
entries[].search_results |
dict[field, dict[value, list[SearchResult]]] |
Stage 2a ontology search results |
entries[].text2term_results |
dict[field, dict[value, list[SearchResult]]] |
Stage 2b text2term results |
entries[].select_timings |
dict[field, dict[value, LlmTimingFields]] |
Per-field LLM timing |
entries[].results |
dict[field, list[ResolvedValue]] |
Final mapping results |
evaluation |
EvaluationMetrics \| null |
Evaluation metrics (independent from RunMetadata). All ratio fields (accuracy, precision, recall, f1) are stored as 0–1 ratios, not percentages. |
errors |
list[ErrorLog] |
Error information |
ResolvedValue¶
Unified result type for Select mode output.
| Field | Type | Description |
|---|---|---|
value |
string |
Original extracted value |
term_id |
string \| null |
Matched ontology term ID |
term_uri |
string \| null |
Matched ontology term URI |
label |
string \| null |
Ontology term label |
exact_match |
bool \| null |
Whether it was an exact match |
reasoning |
string \| null |
LLM reasoning for selection |
Select Config JSON¶
Configuration file for Select mode. Defines the ontology file, prompt, and filter for each field.
{
"fields": {
"cell_line": {
"ontology_file": "/app/ontology/cellosaurus.owl",
"prompt_description": "Cell line is a group of cells that are genetically identical...",
"ontology_filter": { "hasDbXref": "NCBI_TaxID:9606" },
"value_type": "string"
},
"drug": {
"ontology_file": "/app/ontology/chebi.owl",
"prompt_description": "Drug is a chemical or biological substance...",
"value_type": "array"
},
"gene_perturbation": {
"prompt_description": "Experimental perturbation applied to the target gene...",
"value_type": "array"
}
}
}
For the full specification of each field, see Select Mode - Select Config Customization.
Prompt YAML¶
Prompts are defined in YAML as a list of role and content.
- role: system
content: |-
You are a smart curator of biological data
- role: user
content: |-
I will input JSON formatted metadata of a sample...
Here is the input metadata:
role must be one of "system", "user", or "assistant".
Format JSON Schema¶
A JSON Schema that controls the LLM output format. Passed to the Ollama format parameter.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"cell_line": { "type": ["string", "null"] }
},
"required": ["cell_line"],
"additionalProperties": true
}
In Select mode, the schema is dynamically generated from the SelectConfig field definitions (build_extract_schema_for_select). For value_type: "array", it is generated as {"type": ["array", "null"], "items": {"type": "string"}}. The generated schema always includes "additionalProperties": false.
PerformanceSummary¶
Performance data is embedded in the performance field of ExtractResult and SelectResult. There is no separate benchmark file; all data lives inside the result JSON.
Key Fields¶
| Path | Type | Description |
|---|---|---|
performance.total_input_entries |
int |
Total input entries |
performance.completed_count |
int |
Entries that completed processing |
performance.total_wall_sec |
float \| null |
Total wall-clock time (seconds) |
performance.stage_timings[] |
StageTimings[] |
Per-batch stage breakdown |
performance.ner_llm_timing |
LlmTimingSummary \| null |
Aggregated NER LLM timing stats |
performance.select_llm_timing |
LlmTimingSummary \| null |
Aggregated Select LLM timing stats (Select mode only) |
performance.disk_io |
DiskIoTimings |
Disk I/O timing breakdown (Select mode only) |
Accuracy metrics (accuracy, precision, recall, f1) are in SelectResult.evaluation, not in PerformanceSummary.
LlmTimingSummary Fields¶
| Field | Description |
|---|---|
call_count |
Number of LLM calls |
total_duration_sec |
Sum of total_duration across all calls |
mean_latency_sec |
Mean latency per call (total_duration - load_duration) |
p50/p95/p99_latency_sec |
Latency percentiles |
mean_tokens_per_sec |
Mean generation speed (eval_count / eval_duration) |
p50/p95_tokens_per_sec |
tokens/sec percentiles |
mean_load_duration_sec |
Mean model load time (high = cold start) |
max_load_duration_sec |
Max model load time |
total_prompt_tokens |
Total prompt tokens processed |
total_eval_tokens |
Total tokens generated |
For interpretation guidance, see benchmarking.md.