News
When you open the app, you'll see a title "Visual Question Answering with ViLT". Upload an image: Click on the "Choose an image file" button and select an image from your local machine. Supported ...
This is a Streamlit app for Visual Question Answering (VQA) using the ViLT (Vision-and-Language Transformer) model. Given an image and a question, the app uses a pre-trained ViLT model to predict the ...
In recent years, the multi-modal visual question answering (VQA) technology based on the fusion of image visual features and question text features has attracted wide attention from researchers.
Keywords: disease decision-making, deep learning, multimodal fusion, visual question answer, bilinear model, co-attention mechanism. Citation: Lan Y, Guo Y, Chen Q, Lin S, Chen Y and Deng X (2023) ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results