Bloom is a free and open-source tool that automatically detects bad behavior in AI models, such as biased output, flattery, and more. You only need to define the types of behaviors to be detected in the simple configuration file, add dialog examples as needed, and the tool will automatically perform four steps: behavioral intent analysis→ generation of diverse test scenarios→ target model interaction simulation (support integration with mainstream models such as Claude and GPT through APIs), →and quantitative scoring of results (evaluation based on the frequency of problems and other indicators). Interactive conversation transcripts during testing are also readily available.
This tool saves hours of manual testing work, allows you to quickly compare the performance of different models based on new test sets, and effectively avoids overfitting issues. It also provides reliable and reproducible AI security analysis conclusions, making it ideal for researchers working to build trusted AI systems.
Today, as large language models (LLMs) become more and more powerful, “under what circumstances the model will exhibit unsafe, misaligned, or biased behavior” has become a question that must be answered systematically.
Bloom is an open-source model behavior evaluation framework by AI security research teams, which is not a new language model, but a toolchain for “testing models”, with the goal of making model security assessment automated, scalable, and reproducible.
What problem does Bloom want to solve?
Before Bloom, model safety assessment often had several obvious pain points:
- Evaluation use cases are highly dependent on human design
- Limited scene coverage makes it difficult to detect “out-of-control behavior at the edge”
- It is difficult to reproduce experimental results between different researchers
- The assessment process is not scalable and costly
Bloom’s core goals can be summed up in one sentence:
Turn “model behavior evaluation” itself into a process that can be automated, combined, and extended.
Bloom’s overall workflow
Bloom breaks down model evaluation into a clear pipeline rather than a one-time conversation test.
Behavior Specification
The investigator begins by defining the types of behaviors to be assessed, such as:
- Sycophancy
- Self-preservation tendencies
- Political or value bias
- Stability in rejecting inappropriate requests
- Role consistency breaking
These behaviors are not prompts, but abstract goals.
Ideation
Bloom automatically generates a large number of test scenarios, including:
- Different contexts
- Different ways to ask questions
- Different emotions, roles, or induction paths
This step solves the problem of “too narrow coverage” for manual design use cases.
Model Interaction (Rollout)
Bloom feeds these scenarios into the target model (like different versions of LLMs) in batches:
- Run multiple rounds of conversations automatically
- Record the full context
- You can compare multiple models or multiple checkpoints
Judgment
The final step is to analyze the model output, such as:
- Whether the target behavior is triggered
- The frequency of the behavior
- The intensity or stability of the behavior
The judgment itself can also be done by a model or rule system, rather than relying entirely on manual annotation.
Core features of Bloom
Automation first
Instead of “testing once,” Bloom is designed to:
- Can be run repeatedly
- CI-like
- Regression testing can be performed on model updates
Research-oriented
Bloom is clearly not a “conversational bot framework”, but:
- AI security research tool
- Model Alignment Analysis Tool
- Early warning tool for out-of-control behavior
This also determines that the threshold for its use is biased towards researchers.
Reproducible and scalable
- All assessment configurations are structured
- Experiments can be fully reproduced by others
- New behavior types can be added modularly
Summary of Bloom
To sum up Bloom in one sentence:
Bloom is not “teaching the model to speak”, but “interrogating the model under what circumstances it will say the wrong thing”.
It represents a very important trend:
The next step for AI is not just to be stronger, but to be more understandable, restrained, and verified.
Github:https://github.com/safety-research/bloom
Tubing: