Test Sets

What are Test Sets? Test Sets are groups of tests that can be executed together.

Test sets organize related tests into collections for batch execution. When you generate tests, all created tests are automatically grouped into a single test set. You can also manually assign tests to test sets or remove them as needed.

Test sets inherit shared types, behaviors, categories, topics and sources from their tests.

Test Set Types

Every test set has a type that determines how its tests are executed:

Type	Description
Single-Turn	Tests that evaluate individual prompt/response exchanges. Each test sends a single input and evaluates the response. Ideal for RAG systems, classification tasks, and standalone response quality.
Multi-Turn	Tests that evaluate conversational interactions across multiple turns. Each test defines a goal and the system conducts an automated multi-turn conversation to assess the endpoint’s behavior. Ideal for chatbots, agents, and dialogue systems.

The test set type is set when the test set is created and determines which metrics can be applied during evaluation. When generating tests or importing from files, the type is inferred automatically from the tests: if any test is multi-turn, the test set is classified as Multi-Turn.

Executing Test Sets

Executing a test set runs all its tests against your AI application endpoint to see how your application responds. This creates a Test Run that captures all results.

To execute a test set, select it from the Test Sets page and configure:

Execution Target

Project: The current project
Endpoint: The endpoint within the project to execute tests against

Execution Mode

Parallel (default): Tests run simultaneously for faster execution
Sequential: Tests run one after another, better for rate-limited endpoints

Tags: Optional tags to categorize and find this test run

Next Steps - Generate tests to create test sets - View execution progress in Results Overview - Track historical performance in Test Runs