Workbench

Overview

The ModelBench Workbench is a powerful tool for creating robust, testable prompts and running comprehensive benchmarks across multiple models. It allows you to take your experiments from the Playground and turn them into structured tests that can be run repeatedly and at scale.

Key Features

Prompt Versioning

Create and save multiple versions of your prompts.
Easily iterate and improve your prompts over time.

Dynamic Inputs

Convert parts of your prompt into variable inputs.
Test your prompt across different scenarios by varying these inputs.

Test Creation

Define multiple test cases for your prompt.
Set up desired outcomes for each test case.
Create tests for different scenarios (e.g., refusing insecure links, handling valid links).

Benchmarking

Run your tests across multiple models.
Set the number of rounds to ensure a robust sample size.
Compare performance across different models and prompt versions.

Result Analysis

View detailed results of your benchmarks.
Analyze success rates for each model and test case.
Drill down into individual test results to understand failures.

How to Use

Start by creating a new prompt or importing one from the Playground.
Identify parts of your prompt that you want to vary and convert them to inputs.
Create test cases by defining different input values and desired outcomes.
Set up your benchmark by selecting models and the number of rounds.
Run the benchmark and analyze the results.
Refine your prompt based on the results and create new versions for further testing.

The Workbench allows you to move beyond simple experimentation and into rigorous, data-driven prompt engineering. Use it to ensure your prompts are robust and perform consistently across different scenarios and models.

Get Started

Essentials

Overview

Key Features

Prompt Versioning

Dynamic Inputs

Test Creation

Benchmarking

Result Analysis

How to Use

Get Started

Essentials

​Overview

​Key Features

​Prompt Versioning

​Dynamic Inputs

​Test Creation

​Benchmarking

​Result Analysis

​How to Use

Overview

Key Features

Prompt Versioning

Dynamic Inputs

Test Creation

Benchmarking

Result Analysis

How to Use