I have a test set with 40+ test cases. I primarily use custom evaluators, some of which also use LLM calls to evaluate an output. Some cases take a very long time to execute and potentially even time ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results