Abstract: Visual affordance grounding aims to segment all possible interaction regions between people and objects from an image/video, which benefits many applications, such as robot grasping and ...
You can access the current version of the book in the chapters directory or in PDF format (both Light and Dark modes are available) by clicking here. Note that this ...
Previous research has investigated the application of Multimodal Large Language Models (MLLMs) in understanding 3D scenes by interpreting them as videos. These approaches generally depend on ...
Abstract: Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果