← exp-record / indirect-caption / cap-substitute

indirect_caption — caption-substitute VQA leaderboard

Each captioner produces a description of the image; qwen3-30b-instruct then answers the VQA question text-only from the description. The original lmms-eval task prompt and metric are preserved verbatim. 12 VQA-style tasks × 300 random samples each (seed=0). Sorted by Avg by default.

Source: projects/indirect-caption/cap_substitute/data/cap_substitute_results.csv