News

In this study, we design and use evaluation models to both evaluate and autonomously refine the performance of digital agents that browse the web or control mobile devices. The evaluator and ...
llm-jp-eval-mm is a lightweight framework for evaluating visual-language models across various benchmark tasks, mainly focusing on Japanese tasks.