Abstract: While Multi-modal Language Models (MLMs) demonstrate impressive multimodal ability, they still struggle on providing factual and precise responses for tasks like visual question answering ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results