Abstract: The virtual-to-real paradigm, i.e., training models on virtual data and then applying them to solve real-world problems, has attracted more and more attention from various domains by ...
Language understanding is inherently multimodal. Whether we read, listen, or converse, our brains go beyond words to draw on visual scenes, prosody, prior ...