News

Model Information The app uses a pre-trained ViLT model for Visual Question Answering. The model and processor are loaded from the "dandelin/vilt-b32-finetuned-vqa" checkpoint.
Visual Question Answering Model Introduction input an image and a yes/no question and the model will output the answer to the question.
In recent years, the multi-modal visual question answering (VQA) technology based on the fusion of image visual features and question text features has attracted wide attention from researchers.
In this study, a VQA model for fruit tree diseases based on multimodal feature fusion was designed. Fusing images and Q&A knowledge of disease management, the model obtains the decision-making answer ...