News
In addition, we find that visual and non-visual sources of rotation elicit similar responses in VIP, suggesting multi-sensory combination of both visual and non-visual cues in representing rotations.
Most existing works solving Room-to-Room VLN problem only utilize RGB images and do not consider local context around candidate views, which lack sufficient visual cues about surrounding environment.
We introduce a self-supervised learning method that focuses on beneficial properties of representation and their abilities in generalizing to real-world tasks. The method incorporates rotation ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results