Prompt quality is a moving target.
You write a prompt. It works in GPT-4o. Then Anthropic releases Claude 3.7 and your benchmarks shift. Add cost pressure (GPT-4 is 5× more expensive than Haiku for some tasks) and you need a way to test variants systematically.
PromptOptimizer runs every prompt variant against every model you care about. It scores responses on rubrics you define. Cost and latency are tracked automatically.
Available Q3 2026.
The template is in finishing. If you want it customized for your stack now, we can start a custom version that drops into your existing eval pipeline.