The ModelBench Workbench is a powerful tool for creating robust, testable prompts and running comprehensive benchmarks across multiple models. It allows you to take your experiments from the Playground and turn them into structured tests that can be run repeatedly and at scale.
Start by creating a new prompt or importing one from the Playground.
Identify parts of your prompt that you want to vary and convert them to inputs.
Create test cases by defining different input values and desired outcomes.
Set up your benchmark by selecting models and the number of rounds.
Run the benchmark and analyze the results.
Refine your prompt based on the results and create new versions for further testing.
The Workbench allows you to move beyond simple experimentation and into rigorous, data-driven prompt engineering. Use it to ensure your prompts are robust and perform consistently across different scenarios and models.