Overview

The ModelBench Playground is a powerful environment for testing and comparing different LLM models. It allows you to interact with multiple models simultaneously, add custom tools, and refine your prompts in real-time.

Key Features

Model Selection

  • Choose from a wide range of models, including GPT-4, GPT-4 Mini, and many others.
  • Compare multiple models side by side to evaluate their performance.

Tool Integration

  • Add custom tools to enhance model capabilities.
  • Example: Adding a fetch_url_content tool to allow models to browse web content.

Prompt Engineering

  • Write and refine prompts in real-time.
  • Test how different models respond to the same prompt.

Response Analysis

  • View detailed logs of each interaction, including:
    • Exact API requests
    • Token usage
    • Associated costs

Sharing

  • Easily share your prompts and results with others using a generated link.
  • Collaborate with prompt engineers and get feedback on your work.

How to Use

  1. Select the models you want to compare from the available options.
  2. Add any necessary tools by pasting their JSON schema into the tool section.
  3. Write your prompt in the input area.
  4. Run the prompt and observe how different models respond.
  5. Refine your prompt based on the results and repeat the process.
  6. Use the “Show Log” feature to view detailed information about each interaction.
  7. Share your work using the “Share” button to generate a public link.

The Playground is your sandbox for quick experimentation and model comparison. For more structured testing and benchmarking, check out our Workbench feature.