Subruns¶

Subruns allow you to create separate tracked sessions within a single so-record execution. This is particularly useful for evaluation scripts where you want each sample to be tracked independently.

Basic Usage¶

Use the launch context manager to create subruns:

from sovara import launch

for sample in samples:
    with launch("sample-name"):
        eval_sample(sample)

Each iteration creates a separate run in the Sovara UI, allowing you to:

View each sample's dataflow graph independently
Compare results across samples
Debug specific failed samples

Naming Subruns¶

Give descriptive names to your subruns for easy identification:

for i, sample in enumerate(samples):
    with launch(f"sample-{i}-{sample.id}"):
        result = process_sample(sample)

Concurrent Subruns¶

Subruns can run concurrently using Python's threading or multiprocessing:

from concurrent.futures import ThreadPoolExecutor
from sovara import launch

def process_sample(sample):
    with launch(f"sample-{sample.id}"):
        return run_evaluation(sample)

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(process_sample, samples))

Each concurrent subrun maintains its own context and appears as a separate run in the UI.

How It Works¶

When you enter a subrun context:

A new session ID is generated
The server is notified of the new session
All LLM calls within the context are associated with that session
The dataflow graph is built for that specific session

When the context exits, the session is closed and the graph is finalized.

Subruns Architecture

Use Cases¶

Evaluation Pipelines¶

from sovara import launch

results = []
for sample in test_dataset:
    with launch(f"eval-{sample.id}"):
        prediction = agent.run(sample.input)
        score = evaluate(prediction, sample.expected)
        results.append(score)

print(f"Average score: {sum(results) / len(results)}")

A/B Testing¶

from sovara import launch

configs = [
    {"model": "gpt-4", "temperature": 0.7},
    {"model": "gpt-4", "temperature": 0.2},
]

for config in configs:
    with launch(f"config-{config['model']}-t{config['temperature']}"):
        run_benchmark(config)

Debugging Specific Cases¶

from sovara import launch

failed_samples = [s for s in samples if s.status == "failed"]

for sample in failed_samples:
    with launch(f"debug-{sample.id}"):
        # Run with verbose logging
        result = agent.run(sample.input, verbose=True)

Examples¶

See the example_workflows/debug_examples/ directory for working examples of subruns in action.