Fi" benchmark designed to rigorously measure the reliability of AI agentic automation. The benchmark models enterprise document workflow processes that traditionally require tedious human work. Thunk.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results