News

These new tools provide step-by-step explanations, solutions, and interactive 3D models to aid visual learning for STEM (science, technology, engineering, and math) subjects.
Welcome to the official repository for DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision-Language Models.This repository contains the code, resources, and ...
We describe the development of a visual model to represent the implementation of an ambitious mathematics program, which serves as an example of a complex educational reform. Visual models can be both ...
The evaluation on MATH VERSE highlighted that, while models like Qwen-VL-Max and InternLM-XComposer2 experienced a boost in performance (over 5% accuracy increase) without visual inputs, GPT-4V ...
Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has ...
The latest round of language models, like GPT-4o and Gemini 1.5 Pro, are touted as "multimodal," able to understand images and audio as well as text. But a new study makes clear that they don't ...
In this paper, we study the capability of visual context-based mathematical reasoning within the rapidly evolving field of Large Multimodal Models (LMMs). Achieving visual context-based mathematical ...
Although diffusion models advance condition-based visual generation, they suffer from speed and cost issues, unlike faster AutoRegressive methods that are limited in performance. To address these, we ...