News

In this work, we present a method for building grounded representations by structuring the sensorimotor data of an agent. The aim is to encode sensory inputs into internal states that describe ...
Our key contributions include curating a large dataset featuring diverse radiologic 2D/3D image-text pairs, pretraining RadCLIP as a vision-language foundation model on this dataset, developing a ...