ECCV Conference Papers
ECCV 2024 Papers
The DOI links will be inaccessible until released by Springer.
Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
Jacopo Bonato*, Marco Cotogni, Luigi Sabetta*
[pdf ]
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, ChenCheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu*
[pdf ]
FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu*
[pdf ]
4D Contrastive Superflows are Dense 3D Representation Learners
Xiang Xu*, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu*
[pdf ]
ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation
Yuyuan Liu*, Yuanhong Chen, Hu Wang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro
[pdf ]
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu*, Shangzhe Wu*
[pdf ]
Robust Fitting on a Gate Quantum Computer
Frances F Yang*, Michele Sasdelli, Tat-Jun Chin
[pdf ]
H-V2X: A Large Scale Highway Dataset for BEV Perception
Chang Liu*, MingXu zhu, Cong Ma
[pdf ]
Learning Camouflaged Object Detection from Noisy Pseudo Label
Jin Zhang*, Ruiheng Zhang*, Yanjiao Shi, Zhe Cao, Nian Liu, Fahad Shahbaz Khan
[pdf ]
Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
Kuan-Chih Huang*, Yi-Hsuan Tsai, Ming-Hsuan Yang
[pdf ]
Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
Weng Fei Low*, Gim Hee Lee
[pdf ]
CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
Shengke Sun, Ziqian Luan, Zhanshan Zhao*, Shijie Luo, Shuzhen Han*
[pdf ]
Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
Mengyao Lyu, Tianxiang Hao, Xinhao Xu, Hui Chen*, Zijia Lin, Jungong Han, Guiguang Ding*
[pdf ]
PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts
Zewen Chen, Haina Qin, Juan Wang, Chunfeng Yuan, Bing Li*, Weiming Hu, Leon Wang
[pdf ]
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid, RICHARD HARTLEY, Bohan Zhuang, Hao Tang*
[pdf ]
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Yuanhao Cai*, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille
[pdf ]
"Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance"
Liting Lin, Heng Fan, Zhipeng Zhang, Yaowei Wang*, Yong Xu, Haibin Ling*
[pdf ]
A Direct Approach to Viewing Graph Solvability
Federica Arrigoni*, Andrea Fusiello, Tomas Pajdla
[pdf ]
CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng*, Xiao Bai*
[pdf ]
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
Qingwen Zhang*, Yi Yang, Peizheng Li, Olov Andersson, Patric Jensfelt
[pdf ]
ZeST: Zero-Shot Material Transfer from a Single Image
Ta-Ying Cheng, Prafull Sharma, Andrew Markham, Niki Trigoni, Varun Jampani*
[pdf ]
3D Congealing: 3D-Aware Image Alignment in the Wild
Yunzhi Zhang*, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani
[pdf ]
SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang*
[pdf ]
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani*
[pdf ]
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
Vikram Voleti*, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitrii Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani*
[pdf ]
WordRobe: Text-Guided Generation of Textured 3D Garments
Astitva Srivastava*, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma
[pdf ]
Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation
Taekyung Ki*, Dongchan Min, Gyeongsu Chae*
[pdf ]
SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
Yingqi Tang, Zhaotie Meng, Guoliang Chen, Erkang Cheng*
[pdf ]
"EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation"
Wenyang Zhou, Zhiyang Dou*, Zeyu Cao, Zhouyingcheng Liao, Jingbo Wang, Wenjia Wang, Yuan Liu, Taku Komura, Wenping Wang, Lingjie Liu
[pdf ]
Editable Image Elements for Controllable Synthesis
Jiteng Mu*, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park*
[pdf ]
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue*, Anurag Das, Francis Engelmann, Siyu Tang, Jan Eric Lenssen
[pdf ]
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
Yuanpeng Tu, Boshen Zhang, Liang Liu, YUXI LI, Jiangning Zhang, Yabiao Wang*, Chengjie Wang, cairong zhao*
[pdf ]
PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
Runsong Zhu*, Shi Qiu*, Qianyi Wu, Ka-Hei Hui, Pheng-Ann Heng, Chi-Wing Fu
[pdf ]
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li*, Jingbo Wang, Lixin Yang, Cewu Lu*, Bo Dai
[pdf ]
MANIKIN: Biomechanically Accurate Neural Inverse Kinematics for Human Motion Estimation
Jiaxi Jiang*, Paul Streli, Xuejing Luo, Christoph Gebhardt, Christian Holz
[pdf ]
Simple Unsupervised Knowledge Distillation With Space Similarity
Aditya Singh*, Haohan Wang
[pdf ]
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li*, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi
[pdf ]
Diffusion Bridges for 3D Point Cloud Denoising
Mathias Vogel Hüni, Keisuke Tateno, Marc Pollefeys, Federico Tombari, Marie-Julie Rakotosaona, Francis Engelmann*
[pdf ]
Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Mahmoud Afifi*, Zhenhua Hu, Liang Liang
[pdf ]
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee*, Hyeran Byun
[pdf ]
MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
Ziqiang Zheng*, Yiwei Chen, Huimin Zeng, Tuan-Anh Vu, Binh-Son Hua, Sai-Kit Yeung
[pdf ]
Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data
Jia-Yi Li, Xi-Le Zhao*, Jian-Li Wang, Chao Wang, Min Wang
[pdf ]
EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
Jiaxi Jiang*, Paul Streli, Manuel Meier, Christian Holz
[pdf ]
Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
Satoshi Ikehata*, Yuta Asano
[pdf ]
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic*, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, Edmond Boyer
[pdf ]
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han*, Filippos Kokkinos, Philip Torr
[pdf ]
Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences
Shishir Reddy Vutukur*, Junwen Huang, Rasmus Laurvig Haugaard, Benjamin Busam, Tolga Birdal
[pdf ]
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
Muhammad Jehanzeb Mirza*, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuehne, Horst Possegger
[pdf ]
Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang*, Hong-Xing Yu, Rundi Wu, Brandon Y Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman
[pdf ]
Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li*
[pdf ]
Deep Patch Visual SLAM
Lahav Lipson*, Zachary Teed, Jia Deng
[pdf ]
Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints
Qianyi Wu*, Jianmin Zheng, Jianfei Cai
[pdf ]
HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Helisa Dhamo*, Yinyu Nie, Arthur Moreau, Jifei Song, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero*
[pdf ]
LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow
Hongyu Wen*, Erich Liang, Jia Deng
[pdf ]
Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
Yuxin Wang, Qianyi Wu, Guofeng Zhang, Dan Xu*
[pdf ]
Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
Friedhelm Hamann*, Ziyun Wang, Ioannis Asmanis, Kenneth Chaney, Guillermo Gallego, Kostas Daniilidis
[pdf ]
Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning
Cong Wu, Xiao-Jun Wu*, Linze Li, Tianyang Xu, Zhenhua Feng, Josef Kittler
[pdf ]
Text2Place: Affordance-aware Text Guided Human Placement
Rishubh Parihar*, Harsh Gupta, Sachidanand VS, Venkatesh Babu RADHAKRISHNAN
[pdf ]
OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations
Yiming Zuo*, Jia Deng
[pdf ]
Zero-Shot Multi-Object Scene Completion
Shun Iwase*, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rareș A Ambruș, Sergey Zakharov
[pdf ]
Beta-Tuned Timestep Diffusion Model
Tianyi Zheng*, Peng-Tao Jiang, Ben Wan, Hao Zhang, Jinwei Chen, Jia Wang*, Bo Li*
[pdf ]
POA: Pre-training Once for Models of All Sizes
Yingying Zhang*, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, HUIMEI HE, Jingdong Chen, Ming Yang*
[pdf ]
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Chieh Hubert Lin*, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng
[pdf ]
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao*, Ruikai Li, Hui Zhang, Rong Yin, Dingzhe Li, Sangil Jung, Seung-In Park, ByungIn Yoo, Haimei Zhao, Jing Zhang
[pdf ]
"ByteEdit: Boost, Comply and Accelerate Generative Image Editing"
Yuxi Ren, Jie Wu*, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean FU
[pdf ]
ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
Sungmin Woo*, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee*
[pdf ]
High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs
Ruikang Xu, Mingde Yao, Yue Li, Yueyi Zhang, Zhiwei Xiong*
[pdf ]
Accelerating Image Super-Resolution Networks with Pixel-Level Classification
Jinho Jeong, Jinwoo Kim, Younghyun Jo, Seon Joo Kim*
[pdf ]
LASS3D: Language-Assisted Semi-Supervised 3D Semantic Segmentation with Progressive Unreliable Data Exploitation
Jianan Li*, Qiulei Dong*
[pdf ]
Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution
Xingyuan Li, Jinyuan Liu*, ZHIXIN CHEN, Yang Zou, Long Ma, Xin Fan, Risheng Liu
[pdf ]
Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim*, Hoseok Do*
[pdf ]
Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
Zelong Zeng*, Kaname Tomite
[pdf ]
DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction
Mozghan Pourkeshavarz*, Arielle Zhang, Amir Rasouli
[pdf ]
Track Everything Everywhere Fast and Robustly
Yunzhou Song, Jiahui Lei*, Ziyun Wang, Lingjie Liu, Kostas Daniilidis
[pdf ]
Towards Open-ended Visual Quality Comparison
Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin*
[pdf ]
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Tianxing Wu*, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu
[pdf ]
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
DongHyun Kim, Byeongho Heo, Dongyoon Han*
[pdf ]
Eliminating Feature Ambiguity for Few-Shot Segmentation
Qianxiong Xu*, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao
[pdf ]
Soft Prompt Generation for Domain Generalization
Shuanghao Bai*, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen*
[pdf ]
Shedding More Light on Robust Classifiers under the lens of Energy-based Models
Mujtaba Hussain Mirza*, Maria Rosaria Briglia*, Senad Beadini*, Iacopo Masi*
[pdf ]
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Jiaxiang Tang*, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu
[pdf ]
Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization
Qi Zhang, Kaiyi Zhang, Antoni B. Chan, Hui Huang*
[pdf ]
RAW-Adapter: Adapting Pretrained Visual Model to Camera RAW Images
Ziteng Cui*, Tatsuya Harada
[pdf ]
SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
Kashyap Chitta*, Daniel Dauner, Andreas Geiger
[pdf ]
AFreeCA: Annotation-Free Counting for All
Adriano D'Alessandro*, Ali Mahdavi-Amiri, Ghassan Hamarneh
[pdf ]
Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap
Junhao Dong, Piotr Koniusz*, Junxi Chen, Yew-Soon Ong*
[pdf ]
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy*
[pdf ]
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Bohan Li*, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin, Wenjun Zeng
[pdf ]
Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration
Xueyang Kang*, Zhaoliang Luan, Kourosh Khoshelham, Bing WANG*
[pdf ]
GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation
Chenxin Li*, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan
[pdf ]
PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
Fernando Julio Cendra, Bingchen Zhao, Kai Han*
[pdf ]
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar*, Timur Bagautdinov, Julieta Martinez, Zhaoen Su, Austin T James, Peter Selednik, Stuart Anderson, Shunsuke Saito
[pdf ]
Linearly Controllable GAN: Unsupervised Feature Categorization and Decomposition for Image Generation and Manipulation
sehyung lee*, Mijung Kim, Yeongnam Chae, Bjorn Stenger
[pdf ]
Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi*, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe*
[pdf ]
NOVUM: Neural Object Volumes for Robust Object Classification
Artur Jesslen*, Guofeng Zhang, Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski
[pdf ]
Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
Dingkang Yang, Dingkang Yang, Ke Li, Dongling Xiao, Zedian Shao, Peng Sun, Liang Song*
[pdf ]
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Xintao Lv, Liang Xu, Yichao Yan*, Xin Jin, Congsheng Xu, Wu Shuwen, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang
[pdf ]
SAIR: Learning Semantic-aware Implicit Representation
Canyu Zhang*, Xiaoguang Li*, Qing Guo*, Song Wang*
[pdf ]
ColorMNet: A Memory-based Deep Spatial-Temporal Feature Propagation Network for Video Colorization
Yixin Yang, Jiangxin Dong, Jinhui Tang, Jinshan Pan*
[pdf ]
UNIC: Universal Classification Models via Multi-teacher Distillation
Yannis Kalantidis, Diane Larlus, Mert Bulent Sariyildiz*, Philippe Weinzaepfel, Thomas LUCAS
[pdf ]
Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation
Arpit Garg*, Cuong Cao Nguyen, RAFAEL FELIX, Thanh-Toan Do, Gustavo Carneiro
[pdf ]
Eliminating Warping Shakes for Unsupervised Online Video Stitching
Lang Nie, Chunyu Lin*, Kang Liao, Yun Zhang, Shuaicheng Liu, Rui Ai, Yao Zhao
[pdf ]
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei*, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang
[pdf ]
Merlin: Empowering Multimodal LLMs with Foresight Minds
En Yu, Liang Zhao, YANA WEI, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao*
[pdf ]
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
Jefferson Hernandez*, Ruben Villegas, Vicente Ordonez
[pdf ]
E.T. the Exceptional Trajectory: Text-to-camera-trajectory generation with character awareness
Robin Courant*, Nicolas Dufour, Xi WANG, Marc Christie, Vicky Kalogeiton
[pdf ]
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Ming Hu*, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, zhongxing xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou*, Zongyuan Ge*
[pdf ]
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
Zhengdi Yu, Shaoli Huang*, yongkang cheng, Tolga Birdal
[pdf ]
AttnZero: Efficient Attention Discovery for Vision Transformers
Lujun Li, Zimian Wei*, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu*, Yike Guo*
[pdf ]
Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu*, Yike Guo*
[pdf ]
Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
Haosen Sun, Lujun Li*, Peijie Dong, Zimian Wei, Shitong Shao
[pdf ]
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang*, Wanli Ouyang
[pdf ]
TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
Huabin Liu, Xiao Ma, Cheng Zhong, Yang Zhang, Weiyao Lin*
[pdf ]
Spectral Subsurface Scattering for Material Classification
Haejoon Lee*, Aswin Sankaranarayanan
[pdf ]
nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
Benjin Zhu*, zhe wang, Hongsheng Li*
[pdf ]
Dynamic Neural Radiance Field From Defocused Monocular Video
Xianrui Luo, Huiqiang Sun, Juewen Peng, Zhiguo Cao*
[pdf ]
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu*, Pengxiang Ding, Siteng Huang, Min Zhang, Han Zhao, Donglin Wang
[pdf ]
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan*, Fatma Guney
[pdf ]
FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
Wei WU*, Qingnan Fan, Shuai Qin, Hong Gu, Ruoyu Zhao, Antoni Chan*
[pdf ]
Plain-Det: A Plain Multi-Dataset Object Detector
Cheng Shi, Yuchen Zhu, Sibei Yang*
[pdf ]
Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
Zhen Zhao*, Zicheng Wang, Dian Yu, Longyue Wang*, Yixuan Yuan, Luping Zhou
[pdf ]
Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation
Wei Cong*, Yang Cong, Yuyang Liu, Gan Sun
[pdf ]
Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching
Dongliang Cao*, Zorah Laehner, Florian Bernard
[pdf ]
Text-Guided Video Masked Autoencoder
David Fan*, Jue Wang, Shuai Liao, Zhikang Zhang, Vimal Bhat, Xinyu Li
[pdf ]
Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija*, Iro Laina, Andrea Vedaldi, Christian Rupprecht
[pdf ]
Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation
Peixi Xiong*, Michael A Kozuch, Nilesh Jain
[pdf ]
EvSign: Sign Language Recognition and Translation with Streaming Events
Pengyu Zhang*, Hao Yin, Zeren Wang, Wenyue Chen, Sheng Ming Li, Dong Wang, Huchuan Lu, Xu Jia
[pdf ]
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Pengxiang Ding, Han Zhao, Wenjie Zhang, Wenxuan Song, Min Zhang, Siteng Huang, Ningxi Yang, Donglin Wang*
[pdf ]
Zero-shot Object Counting with Good Exemplars
Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Xian Zhong*, Zheng Wang, Shengfeng He*
[pdf ]
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Jingye Chen*, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei
[pdf ]
SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds
Yanbo Wang*, Wentao Zhao, Cao Chuan, Tianchen Deng, Jingchuan Wang, Weidong Chen*
[pdf ]
PartSTAD: 2D-to-3D Part Segmentation Task Adaptation
Hyunjin Kim, Minhyuk Sung*
[pdf ]
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Rajeev Yasarla*, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli
[pdf ]
LLM as Copilot for Coarse-grained Vision-and-Language Navigation
Yanyuan Qiao*, Qianyi Liu, Jiajun Liu, Jing Liu, Qi Wu
[pdf ]
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
Yeying Jin*, Xin Li, Jiadong Wang, Yan Zhan, Malu Zhang*
[pdf ]
Unsupervised Moving Object Segmentation with Atmospheric Turbulence
Dehao Qin*, Ripon k Saha, Woojeh Chung, Suren Jayasuriya, Jinwei Ye, Nianyi Li
[pdf ]
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
Zhihang Lin, Mingbao Lin, Meng Zhao, Rongrong Ji*
[pdf ]
Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer
Lintao Peng, Siyu Xie, Liheng Bian*
[pdf ]
CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering
Haidong Zhu, Tianyu Ding*, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang
[pdf ]
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
Jiacheng Chen*, Yuefan Wu, Jiaqi Tan, Hang Ma, Yasutaka Furukawa*
[pdf ]
Image Demoireing in RAW and sRGB Domains
Shuning Xu, Binbin Song, Xiangyu Chen, Xina Liu, Jiantao Zhou*
[pdf ]
LiDAR-Event Stereo Fusion with Hallucinations
Luca Bartolomei*, Matteo Poggi, Andrea Conti, Stefano Mattoccia*
[pdf ]
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Sirnam Swetha*, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul A Chilimbi, Mubarak Shah
[pdf ]
Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection
Haoyue Shi, Le Wang*, Sanping Zhou, Gang Hua, Wei Tang
[pdf ]
Revisiting Supervision for Continual Representation Learning
Daniel Marczak*, Sebastian Cygert*, Tomasz Trzcinski*, Bartlomiej Twardowski*
[pdf ]
FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds
Keke Tang, Lujie Huang, Weilong Peng*, Daizong Liu, Xiaofei Wang, Yang Ma, Ligang Liu, Zhihong Tian
[pdf ]
MMBENCH: Is Your Multi-Modal Model an All-around Player?
Yuan Liu*, Haodong Duan*, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin
[pdf ]
Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds
Shengtao Li*, Ge Gao, Yudong Liu, Ming Gu, Yu-Shen Liu
[pdf ]
Unsupervised Exposure Correction
Ruodai Cui*, Li Niu, Guosheng Hu
[pdf ]
Anytime Continual Learning for Open Vocabulary Classification
Zhen Zhu*, Yiming Gong, Derek Hoiem*
[pdf ]
External Knowledge Enhanced 3D Scene Generation from Sketch
Zijie Wu, Mingtao Feng*, Yaonan Wang, He Xie, Weisheng Dong, Bo Miao, Ajmal Mian
[pdf ]
G3R: Gradient Guided Generalizable Reconstruction
Yun Chen*, Jingkang Wang, Ze Yang, Sivabalan Manivasagam*, Raquel Urtasun*
[pdf ]
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Shijie Zhou*, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas K Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi
[pdf ]
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan*, Lei Luo*
[pdf ]
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim*
[pdf ]
Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective
Panjian Huang, Yunjie Peng, Saihui Hou*, Chunshui Cao, Xu Liu, Zhiqiang He, Yongzhen Huang*
[pdf ]
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Shuai Tan*, Bin Ji, Mengxiao Bi, ye pan*
[pdf ]
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma*, Yi Jiang*, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi*
[pdf ]
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil*, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao*
[pdf ]
DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding
Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu*, Meili Wang*, Lizhuang Ma, Jian Chang, Jian Jun Zhang
[pdf ]
Operational Open-Set Recognition and PostMax Refinement
Steve Cruz*, Ryan Rabinowitz, Manuel Günther, Terrance E. Boult
[pdf ]
ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
Zhiyuan Ma*, Yuxiang Wei, Yabin Zhang, Xiangyu Zhu, Zhen Lei, Lei Zhang
[pdf ]
SINDER: Repairing the Singular Defects of DINOv2
Haoqi Wang, Tong Zhang, Mathieu Salzmann*
[pdf ]
"SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow"
Yihan Wang*, Lahav O Lipson, Jia Deng
[pdf ]
Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation
Bochao Liu, Pengju Wang, Shiming Ge*
[pdf ]
General and Task-Oriented Video Segmentation
Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang*
[pdf ]
VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Hanjung Kim, Jaehyun Kang, Miran Heo, Sukjun Hwang, Seoung Wug Oh, Seon Joo Kim*
[pdf ]
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri*, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava
[pdf ]
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li*, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen
[pdf ]
TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
Xudong Wang, Ke-Yue Zhang, Taiping Yao*, Qianyu Zhou, Shouhong Ding, Pingyang Dai*, Rongrong Ji
[pdf ]
Prompting Future Driven Diffusion Model for Hand Motion Prediction
Bowen Tang*, Kaihao Zhang*, Wenhan Luo*, Wei Liu, HONGDONG LI
[pdf ]
Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
Shuai Yang, ZhiFei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu*, Yingcong Chen*
[pdf ]
Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
Kun Zhou*, Xinyu Lin, Wenbo Li, Xiaogang Xu, Yuanhao Cai, Zhonghang Liu, Xiaoguang Han, Jiangbo Lu
[pdf ]
RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
Li Li*, Hubert P. H. Shum, Toby P Breckon
[pdf ]
UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia*, Raoul de Charette, A. Cengiz Oztireli, Jing-Hao Xue
[pdf ]
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou*, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu
[pdf ]
3D Single-object Tracking in Point Clouds with High Temporal Variation
Qiao Wu, Kun Sun, Pei An, Mathieu Salzmann, Yanning Zhang, Jiaqi Yang*
[pdf ]
Adaptive Multi-task Learning for Few-shot Object Detection
Yan Ren*, Yanling Li, Adams Wai-Kin Kong
[pdf ]
Event Trojan: Asynchronous Event-based Backdoor Attacks
Ruofei Wang*, Qing Guo, Haoliang Li, Renjie Wan*
[pdf ]
Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization
Mengnan Liu, Le Wang*, Sanping Zhou, Kun Xia, Qi Wu, Qilin Zhang, Gang Hua
[pdf ]
Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems
Ziyuan Luo, Boxin Shi, Haoliang Li, Renjie Wan*
[pdf ]
Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning
Zhengyi Fang, Yue Wang, Ran Yi*, Lizhuang Ma
[pdf ]
OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers
Qitai Wang, Jiawei He, Yuntao Chen, Zhaoxiang Zhang*
[pdf ]
LoA-Trans: Enhancing Visual Grounding by Location-Aware Transformers
Ziling Huang*, Shin'ichi Satoh
[pdf ]
HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Yihang Chen*, Qianyi Wu, Weiyao Lin*, Mehrtash Harandi, Jianfei Cai
[pdf ]
Energy-induced Explicit quantification for Multi-modality MRI fusion
Xiaoming Qi*, Yuan Zhang, Tong Wang, Guanyu Yang*, Yueming Jin*, Shuo Li
[pdf ]
ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
Muhammad Atif Butt*, Kai Wang, Javier Vazquez-Corral, Joost van de Weijer
[pdf ]
Exemplar-free Continual Representation Learning via Learnable Drift Compensation
Alex Gomez-Villa*, Dipam Goswami, Kai Wang, Andy Bagdanov, Bartlomiej Twardowski, Joost van de Weijer
[pdf ]
Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Object Appearance Graphs
Mattia Segù*, Luigi Piccinelli, Siyuan Li, Luc Van Gool, Fisher Yu, Bernt Schiele
[pdf ]
Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition
Sumin Lee*, Yooseung Wang, Sangmin Woo, Changick Kim
[pdf ]
DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh*, Jiaming Song, Guilin Liu, Jan Kautz, Arash Vahdat
[pdf ]
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu*, Jiajun Bu, Qi Zheng, Cong Yao
[pdf ]
GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
Changshuo Wang*, Meiqing Wu, Siew-Kei Lam, Xin Ning, Shangshu Yu, Ruiping Wang, Weijun Li, Thambipillai Srikanthan
[pdf ]
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao, Ran Yi*, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma*
[pdf ]
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection
Zheng Jiang, Jinqing Zhang, Yanan Zhang, Qingjie Liu*, Zhenghui HU*, Baohui Wang, Yunhong Wang
[pdf ]
SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Daniel Barath*
[pdf ]
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu*
[pdf ]
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li*
[pdf ]
See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao, Xuan Wang, Wenhao Chai, Boyi Li, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang*
[pdf ]
PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects
Guangcheng Chen*, Yicheng He, Li He, Hong Zhang
[pdf ]
Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases
Xinpeng Liu, Yong-Lu Li*, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu*
[pdf ]
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich*, Niv Nayman*, Sharon Fogel, Inbal Lavi, Ron Litman, Shahar Tsiper, Royee Tichauer, Srikar Appalaraju, Shai Mazor, R. Manmatha
[pdf ]
Masked Angle-Aware Autoencoder for Remote Sensing Images
Zhihao Li*, Biao Hou, Siteng Ma, zitong wu, Xianpeng Guo, bo ren, Licheng Jiao
[pdf ]
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang*, Bin Li*
[pdf ]
MultiGen: Zero-shot Image Generation from Multi-modal Prompts
Zhi-Fan Wu*, Lianghua Huang, Wei Wang, Yanheng Wei, Yu Liu
[pdf ]
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen*, Ming Jiang, Qi Zhao*
[pdf ]
Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
Yifeng Zhang, Ming Jiang, Qi Zhao*
[pdf ]
SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
Hanrong Ye*, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu*
[pdf ]
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
Ishan Rajendrakumar Dave*, Fabian Caba, Mubarak Shah, Simon Jenni*
[pdf ]
FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
Ishan Rajendrakumar Dave*, Mamshad Nayeem Rizve*, Mubarak Shah
[pdf ]
Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting
Yu Liu, Fatimah binti Khalid, Lei Wang, Youxi Zhang, Cunrui Wang*
[pdf ]
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng*, Bohan Zhou, Yicheng Feng, Ye Wang, Zongqing Lu*
[pdf ]
When Do We Not Need Larger Vision Models?
Baifeng Shi*, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell
[pdf ]
GVGEN: Text-to-3D Generation with Volumetric Representation
Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan*, Wanli Ouyang, Tong He*
[pdf ]
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
Zhening Liu, Xinjie Zhang, Jiawei Shao, Zehong Lin*, Jun Zhang
[pdf ]
"UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation"
Yunfan Lu*, Guoqiang Liang, Yusheng Wang, Lin Wang, Hui Xiong*
[pdf ]
ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
Chen Guo*, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song*, Otmar Hilliges
[pdf ]
Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
Yujiao Shi*, HONGDONG LI, Akhil Perincherry, Ankit Vora
[pdf ]
Dataset Growth
Ziheng Qin*, zhaopan xu, YuKun Zhou, Kai Wang*, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Radu Timofte, Xiaojiang Peng, Hongxun Yao*, Yang You*
[pdf ]
MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References
Lukas Bösiger*, Mihai Dusmanu, Marc Pollefeys, Zuria Bauer
[pdf ]
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
Sixiang Chen, Tian Ye, Kai Zhang, Zhaohu Xing, Yunlong Lin, Lei Zhu*
[pdf ]
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
Yulin Ren, Xin Li*, Bingchen Li, Xingrui Wang, Mengxi China Guo, Shijie Zhao, Li Zhang, Zhibo Chen*
[pdf ]
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai*, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M Rehg, Miao Liu
[pdf ]
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun*, Can Qin, JIAMINAN WANG, Zeyuan Chen, Ran Xu, Zhiqiang Tao
[pdf ]
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Yujin Chen*, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Niessner
[pdf ]
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai*, Fiona Ryan, Wenqi Jia, Miao Liu, James M Rehg
[pdf ]
R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
Xiang Li*, Kai Qiu, Jinglu Wang, Xiaohao Xu, Kashu Yamazaki, Hao Chen, Rita Singh, Xiaonan Huang, Bhiksha Raj
[pdf ]
Self-supervised co-salient object detection via feature correspondences at multiple scales
Souradeep Chakraborty*, Dimitris Samaras
[pdf ]
Differentiable Convex Polyhedra Optimization from Multi-view Images
Daxuan Ren*, Haiyi Mei, Hezi Shi, Jianmin Zheng, Jianfei Cai, Lei Yang
[pdf ]
SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields
Yu Liu, Baoxiong Jia*, Yixin Chen, Siyuan Huang
[pdf ]
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Baoxiong Jia*, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, Siyuan Huang
[pdf ]
ADMap: Anti-disturbance Framework for Vectorized HD Map Construction
Haotian Hu, Fanyi Wang*, Yaonong Wang, Laifeng Hu, Jingwei Xu, Zhiwang Zhang*
[pdf ]
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
Xinjie Zhang, Xingtong Ge, Tongda Xu, Dailan He, Yan Wang, Hongwei Qin, Guo Lu, Jing Geng*, Jun Zhang*
[pdf ]
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan*, Xiaohao Xu, Renrui Zhang, Lingyi Hong, wenchao chen, Wenqiang Zhang, Wei Zhang*
[pdf ]
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin*, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan
[pdf ]
SENC: Handling Self-collision in Neural Cloth Simulation
Zhouyingcheng Liao*, Sinan Wang, Taku Komura
[pdf ]
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
Shanyan Guan, Yanhao Ge, Ying Tai*, Jian Yang, Wei Li, Mingyu You*
[pdf ]
PartCraft: Crafting Creative Objects by Parts
Kam Woh Ng*, Xiatian Zhu, Yi-Zhe Song, Tao Xiang
[pdf ]
GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields
Xiufeng HUANG*, Ka Chun Cheung, Simon See, Renjie Wan*
[pdf ]
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong, Hui Chen*, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding
[pdf ]
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua*, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
[pdf ]
CrossScore: A Multi-View Approach to Image Evaluation and Scoring
Zirui Wang*, Wenjing Bian, Victor Adrian Prisacariu
[pdf ]
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang*, Dejan Markovic*, Chenliang Xu*, Alexander Richard*
[pdf ]
m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma*, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna
[pdf ]
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou*, Dan Guo*, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang*
[pdf ]
High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding
Qi Zuo*, Xiaodong Gu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Qiu Lingteng, Liefeng Bo, Zilong Dong
[pdf ]
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Hongtao Wu, Angelica I Aviles-Rivero, Yijun Yang, Jingjing Ren, Sixiang Chen, Haoyu Chen, Lei Zhu*
[pdf ]
I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang*
[pdf ]
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong*, Ya Zhang, Yanfeng Wang*
[pdf ]
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Jiahe Li, Jiawei Zhang, Xiao Bai*, Jin Zheng*, Xin Ning, Jun Zhou, Lin Gu
[pdf ]
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu*, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao
[pdf ]
Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
Hengyu Zhou, Hui Zhang*, Bin Wang*
[pdf ]
Implicit Style-Content Separation using B-LoRA
Yarden Frenkel*, Yael Vinker, Ariel Shamir, Danny Cohen-Or
[pdf ]
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou*, Zheng Zhu, Holger Caesar, Miaojing Shi*
[pdf ]
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang*, Ruicong Liu, Yifei Huang*, Ryosuke Furuta, Yoichi Sato*
[pdf ]
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang, Jiaqi Hu, Lianrui Mu, Rui Hu, Xiaoyu Liang, Jiangnan Ye, Haoji Hu*
[pdf ]
U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation
li zhang*, Weiqing Meng, Yan Zhong, Bin Kong, Mingliang Xu, Jianming Du, Xue Wang, Rujing Wang, Liu Liu
[pdf ]
Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization
Naiyu Yin*, Hanjing Wang, Yue Yu, Tian Gao, Amit Dhurandhar, Qiang Ji
[pdf ]
Rotary Position Embedding for Vision Transformer
Byeongho Heo*, Song Park, Dongyoon Han, Sangdoo Yun
[pdf ]
Local All-Pair Correspondence for Point Tracking
Seokju Cho, Jiahui Huang, Jisu Nam, Honggyu An, Seungryong Kim*, Joon-Young Lee*
[pdf ]
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
Youngmin Oh, Hyung-Il Kim, Seong Tae Kim*, Jung Uk Kim*
[pdf ]
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim, Cheolhong Min, Byeonghwi Kim, Jinyeon Kim, Wonje Jeung, Jonghyun Choi*
[pdf ]
S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
Dongze Li*, Kang Zhao*, Wei Wang*, Yifeng Ma, Bo Peng, Yingya Zhang, Jing Dong
[pdf ]
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
Hyolim Kang, Jeongseok Hyun, Joungbin An, Youngjae Yu, Seon Joo Kim*
[pdf ]
Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim*
[pdf ]
PQ-SAM: Post-training Quantization for Segment Anything Model
Xiaoyu Liu*, Xin Ding, Lei Yu, Yuanyuan Xi, Wei Li, Zhijun Tu, jie hu, Hanting Chen, Baoqun YIN, Zhiwei Xiong*
[pdf ]
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen*, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro
[pdf ]
Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
Shreyank N Gowda*, Anurag Arnab, Jonathan Huang
[pdf ]
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng, Siting Zhu, Chensheng Peng, Zhe Liu, Hesheng Wang*
[pdf ]
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
Faegheh Sardari*, Armin Mustafa, Philip JB Jackson, Adrian Hilton
[pdf ]
Noise-assisted Prompt Learning for Image Forgery Detection and Localization
Dong Li, Jiaying Zhu, Xueyang Fu*, Xun Guo, Yidi Liu, Gang Yang, Jiawei Liu, Zheng-Jun Zha
[pdf ]
Data Collection-free Masked Video Modeling
Yuchi Ishikawa*, Masayoshi Kondo, Yoshimitsu Aoki
[pdf ]
Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
Qi Song*, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan
[pdf ]
Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
Tao Yang*, Rongyuan Wu, Peiran Ren, Xuansong Xie, Lei Zhang
[pdf ]
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
Yanan Sun*, Yanchen Liu, Yinhao Tang, Wenjie Pei, Kai Chen
[pdf ]
SEED: A Simple and Effective 3D DETR in Point Clouds
Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai*
[pdf ]
AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion
Zhiheng Fu, Longguang Wang, Lian Xu, Zhiyong Wang, Hamid Laga, Yulan Guo*, Farid Boussaid, Mohammed Bennamoun
[pdf ]
Synergy of Sight and Semantics: Visual Intention Understanding with CLIP
Qu Yang, Mang Ye*, Dacheng Tao
[pdf ]
Intrinsic Single-Image HDR Reconstruction
Sebastian Dille*, Chris Careaga*, Yagiz Aksoy
[pdf ]
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
Weijie Wei*, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald*
[pdf ]
Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
Linhao Qu*, Dingkang Yang, Dan Huang, Qinhao Guo, rongkui luo, Shaoting Zhang, Xiaosong Wang*
[pdf ]
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Meng Chu, Zhedong Zheng*, Wei Ji, Tingyu Wang, Tat-Seng Chua
[pdf ]
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Tae-Hyun Oh*
[pdf ]
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
Ruiyang Zhang*, Hu Zhang, Hang Yu, Zhedong Zheng*
[pdf ]
DATENeRF: Depth-Aware Text-based Editing of NeRFs
Sara Rojas Martinez*, Julien Philip, Kai Zhang, Sai Bi, Fujun Luan, Bernard Ghanem, Kalyan Sunkavalli
[pdf ]
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Qu Yunpeng*, Kun Yuan, Kai Zhao, Qizhi Xie, Jinhua Hao, Ming Sun, Chao Zhou
[pdf ]
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Michael A Hobley*, Victor Adrian Prisacariu
[pdf ]
Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
Grzegorz Rypeść*, Daniel Marczak, Sebastian Cygert, Tomasz Trzcinski, Bartlomiej Twardowski
[pdf ]
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen*, Haofei Xu, Stefano Esposito, Siyu Tang, Andreas Geiger
[pdf ]
Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
Haodong LI*, Hao LU, Yingcong Chen*
[pdf ]
MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liang*
[pdf ]
Grounding Language Models for Visual Entity Recognition
Zilin Xiao*, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez*
[pdf ]
ELSE: Efficient Deep Neural Network Inference through Line-based Sparsity Exploration
Zeqi Zhu*, Alberto Garcia-Ortiz, Luc Waeijen, Egor Bondarev, Arash Pourtaherian, Orlando Moreira
[pdf ]
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
Yiqun Duan*, Xianda Guo*, Zheng Zhu
[pdf ]
DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
Wenliang Zhao, Haolin Wang, Jie Zhou, Jiwen Lu*
[pdf ]
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang*, Ziyun Wang, Lingjie Liu, Kostas Daniilidis
[pdf ]
MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection
Ziyue Huang, Yongchao Feng, Qingjie Liu*, Yunhong Wang
[pdf ]
Self-Supervised Video Copy Localization with Regional Token Representation
Minlong Lu*, Yichen Lu, Siwei Nie, Xudong Yang, Xiaobo Zhang
[pdf ]
Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models
Claudio Rota*, Marco Buzzelli, Joost van de Weijer
[pdf ]
RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
Sibi Catley-Chandar*, Richard Shaw, Gregory Slabaugh, Eduardo Pérez Pellitero
[pdf ]
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
ShahRukh Athar*, Shunsuke Saito, Stanislav Pidhorskyi, Zhengyu Yang, Chen Cao
[pdf ]
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, erfei cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen*, Yu Qiao, Jifeng Dai, Wenhai Wang*
[pdf ]
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Lan Feng, Mohammadhossein Bahari*, Kaouther Messaoud, Eloi Zablocki, Matthieu Cord, Alexandre Alahi
[pdf ]
DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
Zizheng Yan*, Jiapeng Zhou, Fanpeng Meng, Yushuang Wu, Lingteng Qiu, Zisheng Ye, Shuguang Cui, Guanying CHEN, Xiaoguang Han*
[pdf ]
Vamos: Versatile Action Models for Video Understanding
Shijie Wang*, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
[pdf ]
Prioritized Semantic Learning for Zero-shot Instance Navigation
xinyu sun*, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang*
[pdf ]
RoadPainter: Points Are Ideal Navigators for Topology transformER
Zhongxing Ma, Liang Shuang, Yongkun Wen, Weixin Lu, Guowei Wan*
[pdf ]
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang*, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li*
[pdf ]
Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu*, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi*
[pdf ]
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
Xiang Fan*, Anand Bhattad, Ranjay Krishna
[pdf ]
MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
Ashish Tiwari*, Satoshi Ikehata, Shanmuganathan Raman
[pdf ]
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
Qiangqiang Wu, Yan Xia*, Jia Wan, Antoni Chan
[pdf ]
Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
Junsung Lee, Minsoo Kang, Bohyung Han*
[pdf ]
Real-data-driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
Siqi Yang*, Zhaojun Huang, Yakun Chang, Bin Fan, Zhaofei Yu, Boxin Shi
[pdf ]
Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
Peirong Liu*, Oula Puonti, Xiaoling Hu, Daniel C. Alexander, Juan E. Iglesias
[pdf ]
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
Youssef Mansour*, Xuyang Zhong, Serdar Caglar, Reinhard Heckel
[pdf ]
RadEdit: stress-testing biomedical vision models via diffusion image editing
Fernando Pérez-García, Sam Bond-Taylor, Pedro Sanchez, Boris van Breugel, Daniel Coelho de Castro, Harshita Sharma, Valentina Salvatelli, Maria Teodora A Wetscherek, Hannah CM Richardson, Lungren Matthew, Aditya Nori, Javier Alvarez-Valle, Ozan Oktay, Maximilian Ilse*
[pdf ]
SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow
Orcun Cetintas*, Tim Meinhardt, Guillem Brasó, Laura Leal-Taixé
[pdf ]
AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution
Yuanting Fan, Chengxu Liu, Nengzhong Yin, Changlong Gao, Xueming Qian*
[pdf ]
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
Xu Hang, Chen Long, Wenxiao Zhang*, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang
[pdf ]
Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
Taewoo Kim, Jaeseok Jeong, Hoonhee Cho, Yuhwan Jeong, Kuk-Jin Yoon*
[pdf ]
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua, Zixin Zhu*
[pdf ]
TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
Jinjie Mai*, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem
[pdf ]
COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
Liu He*, Daniel Aliaga
[pdf ]
Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography
Kailai Zhou*, Lijing Cai, Yibo Wang, Mengya Zhang, Bihan Wen, Qiu Shen*, Xun Cao
[pdf ]
SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
Han Xiao, Wenzhao Zheng, Sicheng Zuo, Peng Gao, Jie Zhou, Jiwen Lu*
[pdf ]
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu*
[pdf ]
MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf*, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Danny Cohen-Or
[pdf ]
AMEGO: Active Memory from long EGOcentric videos
Gabriele Goletto*, Tushar Nagarajan, Giuseppe Averta, Dima Damen
[pdf ]
Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment
Simon Weber*, Je Hyeong Hong, Daniel Cremers
[pdf ]
Collaborative Control for Geometry-Conditioned PBR Image Generation
Shimon Vainer, Mark Boss, Mathias Parger, Konstantin Kutsy, Dante De Nigris, Ciara Rowles, Nicolas Perony, Simon Donné*
[pdf ]
Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
Seonghui Min, Hyun-Jic Oh, Won-Ki Jeong*
[pdf ]
One-stage Prompt-based Continual Learning
Youngeun Kim*, Yuhang Li, Priyadarshini Panda
[pdf ]
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
Nir Barel*, Ron A Shapira Weber*, Nir Mualem, Shahaf E Finder, Oren Freifeld*
[pdf ]
APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension
Yaxin Luo, Jiayi Ji, Xiaofu Chen, Yuxin Zhang, Tianhe Ren, Gen Luo*
[pdf ]
GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
Yuhang Li*, Youngeun Kim, Donghyun Lee, Souvik Kundu, Priyadarshini Panda
[pdf ]
MVDD: Multi-View Depth Diffusion Models
Zhen Wang*, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang
[pdf ]
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma*, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Haene, Alan Yuille
[pdf ]
Risk-Aware Self-Consistent Imitation Learning for Trajectory Planning in Autonomous Driving
Yixuan Fan*, Ya-Li Li, Shengjin Wang*
[pdf ]
Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation
Ruijie Xu*, CHUYU ZHANG, Hui Ren, Xuming He
[pdf ]
EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models
Eungbean Lee, Somi Jeong, Kwanghoon Sohn*
[pdf ]
DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators
Hanyang Kong*, Dongze Lian, Michael Bi Mi, Xinchao Wang*
[pdf ]
Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
Duo Peng, Zhengbo Zhang, Ping Hu, Qiuhong Ke, David Yau, Jun Liu*
[pdf ]
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu*, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai*
[pdf ]
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
Cheeun Hong, Kyoung Mu Lee*
[pdf ]
Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang*, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu*
[pdf ]
FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information
Wen Jiang*, BOSHU LEI, Kostas Daniilidis*
[pdf ]
Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding
Niloofar Azizi*, Mohsen Fayyaz, Horst Bischof
[pdf ]
Gradient-based Out-of-Distribution Detection
Taha Entesari*, Sina Sharifi*, Bardia Safaei*, Vishal Patel, Mahyar Fazlyab
[pdf ]
Event-based Mosaicing Bundle Adjustment
Shuang Guo*, Guillermo Gallego
[pdf ]
ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
Dylan J Li, Gyungin Shin*
[pdf ]
M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models
Seunggeun Chi*, Hyung-gun Chi, Hengbo Ma, Nakul Agarwal, Faizan Siddiqui, Karthik Ramani*, Kwonjoon Lee*
[pdf ]
The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath*, Cheng-Yu Hsieh, Kai-Wei Chang, Ranjay Krishna
[pdf ]
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Jing Wu*, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu*
[pdf ]
Shapefusion: 3D localized human diffusion models
Rolandos Alexandros Potamias*, Michael Tarasiou, Stylianos Ploumpis, Stefanos Zafeiriou
[pdf ]
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
Wonjun Kang, Kevin Galim, Hyung Il Koo*
[pdf ]
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
Wentao Bao*, Lichang Chen, Heng Huang, Yu Kong
[pdf ]
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen*, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao
[pdf ]
3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
Zhe Jun Tang*, Tat-Jen Cham
[pdf ]
Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels
Jae Soon Baik*, In Young Yoon, Kun Hoon Kim, Jun Won Choi*
[pdf ]
Free-Viewpoint Video of Outdoor Sports Using a Drone
Zhengdong Hong*
[pdf ]
Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing
Haijin Zeng*, Hiep Luong, Wilfried Philips
[pdf ]
ConGeo: Robust Cross-view Geo-localization across Ground View Variations
Li Mi, Chang Xu*, Javiera Castillo Navarro, SYRIELLE MONTARIOL, Wen Yang, Antoine Bosselut, Devis Tuia
[pdf ]
Generalizable Facial Expression Recognition
Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu*, Weihong Deng
[pdf ]
GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
Vinayak Gupta*, Rongali Simhachala Venkata Girish, Mukund Varma T, Ayush Tewari, Kaushik Mitra
[pdf ]
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Ayush Shrivastava*, Andrew Owens
[pdf ]
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Tianchen Zhao*, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang
[pdf ]
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin*, Gedas Bertasius
[pdf ]
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Rinon Gal*, Or Lichter, Elad Richardson, Or Patashnik, Amit Bermano, Gal Chechik, Danny Cohen-Or
[pdf ]
Towards Architecture-Agnostic Untrained Networks Priors for Image Reconstruction with Frequency Regularization
Yilin Liu, Yunkui Pang, Jiang Li, Yong Chen, Pew-Thian Yap*
[pdf ]
Towards Open-Ended Visual Recognition with Large Language Models
Qihang Yu*, Xiaohui Shen, Liang-Chieh Chen
[pdf ]
Ray-Distance Volume Rendering for Neural Scene Reconstruction
Ruihong Yin*, Yunlu Chen, Sezer Karaoglu, Theo Gevers
[pdf ]
ReNoise: Real Image Inversion Through Iterative Noising
Daniel Garibi*, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Danny Cohen-Or
[pdf ]
Attention Decomposition for Cross-Domain Semantic Segmentation
Liqiang He*, Sinisa Todorovic
[pdf ]
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Omer Dahary*, Or Patashnik, Kfir Aberman, Danny Cohen-Or
[pdf ]
Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
Jingjing Zheng, Wanglong Lu, Wenzhe Wang, Yankai Cao*, Xiaoqin Zhang, Xianta Jiang
[pdf ]
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Bowen Zhang, Yiji Cheng, Chunyu Wang*, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo
[pdf ]
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Yinghao Xu*, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein
[pdf ]
IRGen: Generative Modeling for Image Retrieval
Yidan Zhang*, Ting Zhang*, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Jingdong Wang, Baining Guo
[pdf ]
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
Kyu Ri Park, Hong Joo Lee*, Jung Uk Kim*
[pdf ]
FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
Florian Maximilian Langer*, Jihong Ju, Georgi Dikov, Gerhard Reitmayr, Mohsen Ghafoorian
[pdf ]
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke*, Bert De Brabandere
[pdf ]
VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang*, Weidi Xie, Efstratios Gavves
[pdf ]
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Saman Motamed*, Danda Pani Paudel, Luc Van Gool
[pdf ]
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai*, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang
[pdf ]
Scaling Backwards: Minimal Synthetic Pre-training?
Ryo Nakamura*, Ryu Tadokoro*, Ryosuke Yamada*, Yuki M Asano*, Iro Laina*, Christian Rupprecht*, Nakamasa Inoue*, Rio Yokota*, Hirokatsu Kataoka*
[pdf ]
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong*, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen
[pdf ]
Event-based Head Pose Estimation: Benchmark and Method
Jiahui Yuan*, Hebei Li, Yansong Peng, Jin Wang, Yuheng Jiang, Yueyi Zhang*, Xiaoyan Sun
[pdf ]
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
Ekta Prashnani*, Koki Nagano, Shalini De Mello, David P Luebke, Orazio Gallo
[pdf ]
Towards Multi-modal Transformers in Federated Learning
Guangyu Sun*, Matias Mendieta, Aritra Dutta, Xin Li, Chen Chen
[pdf ]
Fisher Calibration for Backdoor-Robust Heterogeneous Federated Learning
Wenke Huang, Mang Ye*, zekun shi, Bo Du*, Dacheng Tao
[pdf ]
QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images
Pengbo Guo, Chengxu Liu, Xingsong Hou*, Xueming Qian
[pdf ]
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
Shishira R Maiya*, Anubhav Gupta, Matthew A Gwilliam, Max Ehrlich, Abhinav Shrivastava
[pdf ]
DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution
Shrey Singh*, Prateek Keserwani, Masakazu Iwamura*, Partha Pratim Roy
[pdf ]
Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh*
[pdf ]
DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
Liao Shen, Tianqi Liu, Huiqiang Sun, Xinyi Ye, Baopu Li, Jianming Zhang, Zhiguo Cao*
[pdf ]
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Shuang Hao, Chunlin Zhong, He Tang*
[pdf ]
Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
Zhiyu Wu*, Jinshi Cui*
[pdf ]
RPBG: Towards Robust Neural Point-based Graphics in the Wild
Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng*
[pdf ]
GaussReg: Fast 3D Registration with Gaussian Splatting
Jiahao Chang*, Yinglin Xu, Yihao Li, Yuantao Chen, Wensen Feng, Xiaoguang Han
[pdf ]
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu*, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang*, Xiu Li*
[pdf ]
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
Pengfei Wang*, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
[pdf ]
IAM-VFI : Interpolate Any Motion for Video Frame Interpolation with motion complexity map
Kihwan Yoon*, Yong Han Kim, Sungjei Kim*, Jinwoo Jeong*
[pdf ]
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
Siyi Du*, Shaoming Zheng, Yinsong Wang, Wenjia Bai, Declan P. O'Regan, Chen Qin*
[pdf ]
Diffusion Model is a Good Pose Estimator from 3D RF-Vision
Junqiao Fan, Jianfei Yang*, Yuecong Xu, Lihua Xie
[pdf ]
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues
Vandad Davoodnia*, Saeed Ghorbani, Marc-André Carbonneau, Alexandre Messier, Ali Etemad
[pdf ]
Learning 3D-aware GANs from Unposed Images with Template Feature Field
Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yujun Shen, Yue Wang, Yiyi Liao*
[pdf ]
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li*, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang*
[pdf ]
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
Shibo Jie, Yehui Tang, Jianyuan Guo, Zhi-Hong Deng*, Kai Han*, Yunhe Wang*
[pdf ]
Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
Jing Li, Junsong Fan*, Zhaoxiang Zhang*
[pdf ]
BRAVE: Broadening the visual encoding of vision-language models
Oğuzhan Fatih Kar*, Alessio Tonioni*, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari
[pdf ]
HUMOS: Human Motion Model Conditioned on Body Shape
Shashank Tripathi*, Omid Taheri, Christoph Lassner*, Michael J. Black*, Daniel Holden*, Carsten Stoll*
[pdf ]
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
Yonggan Fu, Huaizhi Qu, Zhifan Ye, Chaojian Li, Kevin Zhao, Yingyan (Celine) Lin*
[pdf ]
MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Shitao Tang*, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan
[pdf ]
FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning
Saandeep Aathreya*, Shaun Canavan*
[pdf ]
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
Archana Swaminathan*, Anubhav Gupta, Kamal Gupta, Shishira R Maiya, Vatsal Agarwal, Abhinav Shrivastava
[pdf ]
Un-EVIMO: Unsupervised Event-based Independent Motion Segmentation
Ziyun Wang*, Jinyuan Guo, Kostas Daniilidis
[pdf ]
Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
Shihao Zhou, Jinshan Pan, Jinglei Shi*, Duosheng Chen, Lishen Qu, Jufeng Yang
[pdf ]
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Junran Peng*, Zhaoxiang Zhang*
[pdf ]
Bayesian Evidential Deep Learning for Online Action Detection
Hongji Guo, Hanjing Wang, Qiang Ji*
[pdf ]
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan Yao*, Gao Huang*
[pdf ]
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, Kyungmin Kim, Hyunjung Shim*
[pdf ]
Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
Xinhang Liu*, Jiaben Chen, Shiu-Hong Kao, Yu-Wing Tai, Chi-Keung Tang
[pdf ]
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Hyogon Ryu, Seohyun Lim, Hyunjung Shim*
[pdf ]
VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing
Shang Liu*, Chaohui Yu, Chenjie Cao, Wen Qian, Fan Wang*
[pdf ]
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Wenxun Dai, Ling-Hao Chen, Jingbo Wang*, Jinpeng Liu, Bo Dai*, Yansong Tang
[pdf ]
Human Hair Reconstruction with Strand-Aligned 3D Gaussians
Egor Zakharov*, Vanessa Sklyarova, Michael J. Black, Giljoo Nam, Justus Thies, Otmar Hilliges
[pdf ]
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation
Jiefeng Li*, Ye Yuan, Davis Rempe, Haotian Zhang, Pavlo Molchanov, Cewu Lu, Jan Kautz, Umar Iqbal*
[pdf ]
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang*, Jane Yung-jen Hsu*
[pdf ]
Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
Qijie Mo, Yipeng Gao, Shenghao Fu, Junkai Yan, Ancong Wu*, Wei-Shi Zheng*
[pdf ]
Global-to-Pixel Regression for Human Mesh Recovery
Yabo Xiao, Mingshu HE*, Dongdong Yu
[pdf ]
Visible and Clear: Finding Tiny Objects in Difference Map
Bing Cao, Haiyu Yao, Pengfei Zhu*, Qinghua Hu
[pdf ]
Rethinking Image Super Resolution from Training Data Perspectives
Go Ohtani*, Ryu Tadokoro, Ryosuke Yamada, Yuki M Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka, Yoshimitsu Aoki
[pdf ]
BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering
Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li*, Tiande Guo, Pingyu Wang, Xuecheng Nie
[pdf ]
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu*
[pdf ]
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen*, Chunhua Shen*
[pdf ]
Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams
Liwen Hu*, Ziluo Ding, Mianzhi Liu, Lei Ma*, Tiejun Huang
[pdf ]
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
Kuo Wang, Lechao Cheng*, Weikai Chen, Pingping Zhang, Liang Lin, Fan Zhou, Guanbin Li*
[pdf ]
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li*, Philip Torr, Liang Lin
[pdf ]
Interactive 3D Object Detection with Prompts
Ruifei Zhang, Xiangru Lin, Wei Zhang, Jincheng Lu, Xuekuan Wang, Xiao Tan, Yingying Li, Errui Ding, Jingdong Wang, Guanbin Li*
[pdf ]
How Video Meetings Change Your Expression
Sumit Sarin*, Utkarsh Mall, Purva Tendulkar, Carl Vondrick
[pdf ]
GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering
Yifeng Zhang, Ming Jiang, Qi Zhao*
[pdf ]
Neural Volumetric World Models for Autonomous Driving
Zanming Huang*, Jimuyang Zhang*, Eshed Ohn-Bar*
[pdf ]
IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models
Kai Huang*, Hao Zou, Ye Xi, Bochen Wang, Zhen Xie, Liang Yu
[pdf ]
RegionDrag: Fast Region-Based Image Editing with Diffusion Models
Jingyi Lu, Xinghui Li, Kai Han*
[pdf ]
On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
Letian Huang, Jiayang Bai, Jie Guo*, Yuanqi Li, Yanwen Guo
[pdf ]
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans*, Shreya Pathak, Hamza Merzic, Jonathan Richard Schwarz, Ryutaro Tanno, Olivier Henaff*
[pdf ]
Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Zhihao Liang*, Qi Zhang*, Wenbo Hu, Ying Feng, Lei ZHU, Kui Jia*
[pdf ]
GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
Jiangshan Wang*, Yifan Pu, Yizeng Han, Jiayi Guo, Yiru Wang, Xiu Li*, Gao Huang*
[pdf ]
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng*, Duomin Wang, Baoyuan Wang
[pdf ]
CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection
Jinglin Zhan, Tiejun Liu, Rengang Li, Zhaoxiang Zhang, Yuntao Chen*
[pdf ]
Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
Chang Liu, Giulia Rizzoli, Pietro Zanuttigh, Fu Li, Yi Niu*
[pdf ]
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen*, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao*, Dahua Lin*
[pdf ]
"Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation"
Yunhao Gou*, Kai Chen, Zhili LIU, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang*
[pdf ]
Invertible Neural Warp for NeRF
Shin-Fang Chng*, Ravi Garg, Hemanth Saratchandran, Simon Lucey
[pdf ]
Enhancing Vectorized Map Perception with Historical Rasterized Maps
Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu*, Ji Zhao
[pdf ]
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
Sungyeon Kim*, Boseung Jeong, Donghyun Kim, Suha Kwak*
[pdf ]
Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, Sibei Yang*
[pdf ]
PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
Risa Shinoda*, Kaede Shiohara
[pdf ]
MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao*, Wei Li, Ziwei Liu
[pdf ]
Zero-Shot Detection of AI-Generated Images
Davide Cozzolino, GIovanni Poggi, Matthias Niessner, Luisa Verdoliva*
[pdf ]
Language-Image Pre-training with Long Captions
Kecheng Zheng*, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen
[pdf ]
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian*, Ping Luo, Ji Wu*
[pdf ]
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
Xinyu Xu*, Shengcheng Luo, Yanchao Yang, Yong-Lu Li*, Cewu Lu*
[pdf ]
You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu*, Chen Qian, Ping Luo*
[pdf ]
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Jiaqi Xu*, Mengyang Wu, Xiaowei Hu*, Chi-Wing Fu, Qi Dou, Pheng-Ann Heng
[pdf ]
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li*, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong
[pdf ]
CoReS: Orchestrating the Dance of Reasoning and Segmentation
Xiaoyi Bao, Siyang Sun, Shuailei Ma, Kecheng Zheng, Yuxin Guo, Guosheng Zhao, Yun Zheng, Xingang Wang*
[pdf ]
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Qingping Zheng, Zuxuan Wu*, Hang Xu, Yu-Gang Jiang
[pdf ]
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
Hang Guo*, Jinmin Li, Tao Dai*, Zhihao Ouyang, Xudong Ren, Shu-Tao Xia
[pdf ]
I Can't Believe It's Not Scene Flow!
Ishan Khatri*, Kyle Vedder*, Neehar Peri, Deva Ramanan, James Hays
[pdf ]
Rethinking Unsupervised Outlier Detection via Multiple Thresholding
Zhonghang Liu*, Panzhong Lu, Guoyang Xie, Zhichao Lu, Wen-Yan Lin
[pdf ]
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
Bowen Zhang*, Tianyu Yang*, Yu Li, Lei Zhang, Xi Zhao*
[pdf ]
Scalable Group Choreography via Variational Phase Manifold Learning
Nhat Le, Khoa Do, Xuan Bui, Tuong Do, Erman Tjiputra, Quang D.Tran, Anh Nguyen*
[pdf ]
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang, Yifei Huang*, Ruicong Liu, Yoichi Sato
[pdf ]
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
Jian Ma, Wenguan Wang*, Yi Yang, Feng Zheng
[pdf ]
PoseSOR: Human Pose Can Guide Our Attention
Huankang Guan, Rynson W.H. Lau*
[pdf ]
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin, Yupeng Zheng*, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao
[pdf ]
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim*, Hyung Suk Lim, Soonyoung Lee, Bumsoo Kim*, Gunhee Kim*
[pdf ]
Multi-Person Pose Forecasting with Individual Interaction Perceptron and Prior Learning
Peng Xiao, Yi Xie, Xuemiao Xu*, Weihong Chen, Huaidong Zhang*
[pdf ]
InfMAE: A Foundation Model in The Infrared Modality
Fangcen Liu, Chenqiang Gao*, Yaming Zhang, Junjie Guo, Jinghao Wang, Deyu Meng
[pdf ]
TPA3D: Triplane Attention for Fast Text-to-3D Generation
Bin-Shih Wu*, Hong-En Chen*, Sheng-Yu Huang, Yu-Chiang Frank Wang
[pdf ]
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie*, Yanyun Qu*
[pdf ]
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao*
[pdf ]
"NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation"
Ruikai Cui, Weizhe Liu*, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, ZHENNAN WU, Shenzhou Chen, HONGDONG LI, Pan Ji
[pdf ]
AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling
Sherry X. Chen*, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Misha Sra, Pradeep Sen
[pdf ]
SEDiff: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models
Dongseok Shim*, Hyoun Jin Kim*
[pdf ]
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao, Xiaohan Ding*, Juexiao Feng, Yuhong Yang, Hui Chen, Guiguang Ding*
[pdf ]
Online Temporal Action Localization with Memory-Augmented Transformer
Youngkil Song, Dongkeun Kim, Minsu Cho, Suha Kwak*
[pdf ]
Efficient Cascaded Multiscale Adaptive Network for Image Restoration
Yichen Zhou*, Pan Zhou*, Teck Khim Ng
[pdf ]
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Muyao Niu, Xiaodong Cun*, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng*
[pdf ]
Occlusion-Aware Seamless Segmentation
Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang*, Rainer Stiefelhagen, Kailun Yang*
[pdf ]
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection
Changsheng Lu*, Zheyuan Liu, Piotr Koniusz*
[pdf ]
Referring Atomic Video Action Recognition
Kunyu Peng*, Jia Fu, Kailun Yang, Di Wen, Yufan Chen, Ruiping Liu, Junwei Zheng, Jiaming Zhang, Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg
[pdf ]
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
sha zhang, Di Huang, Jiajun Deng*, Shixiang Tang, Wanli Ouyang, Tong He*, Yanyong Zhang*
[pdf ]
Stream Query Denoising for Vectorized HD-Map Construction
Shuo Wang*, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao*
[pdf ]
SAGS: Structure-Aware 3D Gaussian Splatting
Evangelos Ververas, Rolandos Alexandros Potamias*, Jifei Song, Jiankang Deng, Stefanos Zafeiriou
[pdf ]
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Young Kyun Jang*, Dat B Huynh, Ashish Shah, Wen-Kai Chen, Ser-Nam Lim*
[pdf ]
OneRestore: A Universal Restoration Framework for Composite Degradation
Yu Guo*, Yuan Gao, Yuxu Lu, Huilin Zhu, Wen Liu, Shengfeng He
[pdf ]
Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
Zikai Huang, Xuemiao Xu*, Cheng Xu*, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He
[pdf ]
SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks
Peishen Yan, Hao Wang, Tao Song*, Yang Hua, Ruhui Ma, Ningxin Hu, Mohammad Reza Haghighat, Haibing Guan
[pdf ]
RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency
Ziming Sun, Yuan Liang, Zejun Ma, Tianle Zhang, Linchao Bao, Guiqing Li, Shengfeng He*
[pdf ]
Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, Wenbo Hu*, Yixing Lao, Tong He, Hengshuang Zhao*
[pdf ]
WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
Tianjian Jiang*, Johsan Billingham, Sebastian Müksch, Juan J Zarate, Nicolas Evans, Martin R. Oswald, Marc Pollefeys, Otmar Hilliges, Manuel Kaufmann, Jie Song
[pdf ]
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
Toan Nguyen, Minh Nhat Nhat Vu, Baoru Huang, An Dinh Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen*
[pdf ]
COIN-Matting: Confounder Intervention for Image Matting
Zhaohe Liao, Jiangtong Li, Jun Lan, Huijia Zhu, Weiqiang Wang, Li Niu*, Liqing Zhang*
[pdf ]
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng*, Yujiang Pu*, Shaogang Gong, Parisa Kordjamshidi, Yu Kong
[pdf ]
Audio-driven Talking Face Generation with Stabilized Synchronization Loss
Dogucan Yaman*, Fevziye Irem Eyiokur, Leonard Bärmann, HAZIM KEMAL EKENEL, Alexander Waibel
[pdf ]
"Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos"
Md Mohaiminul Islam*, Tushar Nagarajan, Huiyu Wang, FU-JEN CHU, Kris Kitani, Gedas Bertasius, Xitong Yang
[pdf ]
Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation
Björn Michele*, Alexandre Boulch, Tuan-Hung VU, Gilles Puy, Renaud Marlet, Nicolas Courty
[pdf ]
Learning to Obstruct Few-Shot Image Classification over Restricted Classes
Amber Yijia Zheng*, Chiao-An Yang*, Raymond A. Yeh
[pdf ]
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
Kyle Shih-Huang Lo*, Jorg Peters, Eric Spellman
[pdf ]
L-DiffER: Single Image Reflection Removal with Language-based Diffusion Model
Yuchen Hong*, Haofeng Zhong*, Shuchen Weng, Jinxiu S Liang, Boxin Shi
[pdf ]
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
Yu Wang*, Xiaogeng Liu*, Yu Li*, Muhao Chen, Chaowei Xiao*
[pdf ]
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma*
[pdf ]
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
Tingbing Yan, Wenzheng Zeng*, Yang Xiao*, Xingyu Tong, Bo Tan, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou
[pdf ]
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke*, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi
[pdf ]
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Xuan Ju*, Xian Liu, Xintao Wang*, Yuxuan Bian, Ying Shan, Qiang Xu*
[pdf ]
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu*, Chia-chih Chen, Zeyuan Chen, Rui Meng, Gang Wu, Paul W Josel, Juan Carlos Niebles, Caiming Xiong, Ran Xu
[pdf ]
Blind image deblurring with noise-robust kernel estimation
Chanseok Lee*, Jeongsol Kim, Seungmin Lee, Jaehwang Jung, Yunje Cho, Taejoong Kim, Taeyong Jo, Myungjun Lee, Mooseok Jang*
[pdf ]
Binomial Self-compensation for Motion Error in Dynamic 3D Scanning
Geyou Zhang, Ce Zhu*, Kai Liu
[pdf ]
AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes
Dongxu Yue, Maomao Li, Yunfei Liu, Ailing Zeng, Tianyu Yang, Qin Guo, Yu Li*
[pdf ]
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation
Yue Xu, Yong-Lu Li*, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
[pdf ]
VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting
Renjie Li, Zhiwen Fan*, Bohua Wang, Peihao Wang, Zhangyang Wang, Xi Wu
[pdf ]
Momentum Auxiliary Network for Supervised Local Learning
Junhao Su, Changpeng Cai, Feiyu Zhu, Chenghao He, Xiaojie Xu, Dongzhi Guan*, Chenyang Si*
[pdf ]
HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
Junhao Su, Chenghao He, Feiyu Zhu, Xiaojie Xu, Dongzhi Guan, Chenyang Si*
[pdf ]
Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains
Jaeyeul Kim, Jungwan Woo, Jeonghoon Kim, Sunghoon Im*
[pdf ]
Improving Zero-Shot Generalization for CLIP with Variational Adapter
Ziqian Lu, Fengli Shen, Mushui Liu, Yunlong Yu*, Xi Li
[pdf ]
Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren, Shaoli Huang*, Xiu Li*
[pdf ]
EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng*, Wei-Shi Zheng*
[pdf ]
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
Youheng Sun, Shengming Yuan, Xuanhan Wang*, Lianli Gao, Jingkuan Song
[pdf ]
Towards Reliable Advertising Image Generation Using Human Feedback
Zhenbang Du*, Wei Feng, Haohan Wang, Yaoyu Li, Jingsen Wang, Jian Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junsheng Jin, Junjie Shen, Zhangang Lin, Jingping Shao
[pdf ]
Topology-Preserving Downsampling of Binary Images
Chia-Chia Chen*, Chi-Han Peng*
[pdf ]
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
Carlos Hinojosa*, Shuming Liu, Bernard Ghanem
[pdf ]
Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee, Taeoh Kim, Inwoong Lee, Minho Shim, Dongyoon Wee, Minsu Cho, Suha Kwak*
[pdf ]
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar*, Pekka Marttinen
[pdf ]
Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias
Jinhyeok Jang*, ByungOk Han, Jaehong Kim, Chan-Hyun Youn
[pdf ]
Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
Jiayun Wang*, Yubei Chen, Stella X. Yu
[pdf ]
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem*, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc Van Gool, Federico Tombari
[pdf ]
Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
Guowei Xu, Jiale Tao, Wen Li*, Lixin Duan
[pdf ]
Leveraging temporal contextualization for video action recognition
Minji Kim, Dongyoon Han, Taekyung Kim*, Bohyung Han*
[pdf ]
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Müller*, Georgios Kaissis, Daniel Rueckert
[pdf ]
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Adam Pardyl*, Michał Wronka, Maciej Wołczyk, Kamil Adamczewski, Tomasz Trzcinski, Bartosz Zieliński*
[pdf ]
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai*, Yuhang Liu, Zhen Zhang, Javen Qinfeng Shi
[pdf ]
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu*, Stefan A Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes S Fischer, Bjorn Ommer
[pdf ]
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai*, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam
[pdf ]
"On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines"
Selim Kuzucu*, Kemal Oksuz*, Jonathan Sadeghi, Puneet Dokania
[pdf ]
HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
Sakib Reza, Yuexi Zhang, Mohsen Moghaddam, Octavia Camps*
[pdf ]
Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
Chiao-An Yang*, Ziwei Liu, Raymond Yeh
[pdf ]
Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
Wei-Jer Chang*, Francesco Pittaluga, Masayoshi Tomizuka, Wei Zhan, Manmohan Chandraker
[pdf ]
Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction
Dian Jia, Xiaoqian Ruan, Kun Xia, Zhiming Zou, Le Wang, Wei Tang*
[pdf ]
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
Chongyu Fan, Jiancheng Liu*, Alfred Hero, Sijia Liu
[pdf ]
WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians
Dmytro Kotovenko*, Olga Grebenkova*, Nikolaos Sarafianos, Avinash Paliwal, Pingchuan Ma, Omid Poursaeed, Sreyas Mohan, Yuchen Fan, Yilei Li, Rakesh Ranjan, Bjorn Ommer
[pdf ]
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang*, Jieru Mei, Alan Yuille
[pdf ]
Flying with Photons: Rendering Novel Views of Propagating Light
Anagh Malik*, Noah Juravsky, Ryan Po, Gordon Wetzstein, Kiriakos N. Kutulakos, David B. Lindell
[pdf ]
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan*, Md Mohaiminul Islam, Thomas Seidl, Gedas Bertasius
[pdf ]
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen*, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
[pdf ]
3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
Evangelos Ververas*, Polydefkis Gkagkos, Jiankang Deng, Michail C Doukas, Jia Guo, Stefanos Zafeiriou
[pdf ]
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Mu Cai, Haotian Liu, Yuheng Li*, Yijun Li, Eli Shechtman, Zhe Lin, Yong Jae Lee, Krishna Kumar Singh
[pdf ]
Resilience of Entropy Model in Distributed Neural Networks
Milin Zhang*, Mohammad Abdi, Shahriar Rifat, Francesco Restuccia
[pdf ]
Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis
Chirag Vashist*, Shichong Peng, Ke Li
[pdf ]
Implicit Concept Removal of Diffusion Models
Zhili Liu*, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok
[pdf ]
PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery
Jicheol Park, Dongwon Kim, Boseung Jeong, Suha Kwak*
[pdf ]
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang*, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu
[pdf ]
Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
Runyi Hu, Jie Zhang*, Ting Xu, Jiwei Li, Tianwei Zhang
[pdf ]
OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
Qiao Mo, Yukang Ding, Jinhua Hao*, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu*
[pdf ]
Formula-Supervised Visual-Geometric Pre-training
Ryosuke Yamada*, Kensho Hara*, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh
[pdf ]
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan, Xiaojian Ma*, Rujie Wu, yuntao du, Jiaqi Li, Zhi Gao, Qing Li*
[pdf ]
Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing
Guanghao Zheng, Yuchen Liu, Wenrui Dai*, Chenglin Li, Junni Zou, Hongkai Xiong
[pdf ]
Restoring Images in Adverse Weather Conditions via Histogram Transformer
Shangquan Sun, Wenqi Ren*, Xinwei Gao, Rui Wang, Xiaochun Cao
[pdf ]
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
Tongkun Guan, Chengyu Lin, Wei Shen*, Xiaokang Yang
[pdf ]
NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis
Yubin Hu, Xiaoyang Guo, Yang Xiao, Jingwei Huang, Yong-Jin Liu*
[pdf ]
Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs
Han Wang*, Yanjie Wang, Ye Yongjie, Yuxiang Nie, Can Huang
[pdf ]
G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields
Shuxiang Xie*, Shuyi Zhou, Ken Sakurada, Ryoichi Ishikawa, Masaki Onishi, Takeshi Oishi
[pdf ]
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee*, Gabriela Ben Melech Stan, Estelle Guez Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hanna Hajishirzi, Vasudev Lal, Chitta R Baral, Yezhou Yang
[pdf ]
Generating 3D House Wireframes with Semantics
Xueqi Ma, Yilin Liu, Wenjun Zhou, Ruowei Wang, Hui Huang*
[pdf ]
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Xiao Fu*, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long
[pdf ]
Shape-guided Configuration-aware Learning for Endoscopic-image-based Pose Estimation of Flexible Robotic Instruments
Yiyao Ma*, Kai Chen*, Hon-Sing Tong, Ruofeng Wei, Yui-Lun Ng, Ka-Wai Kwok*, Qi Dou*
[pdf ]
Nonverbal Interaction Detection
Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang*
[pdf ]
UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
Jian Zou, Tianyu Huang, Guanglei Yang*, Zhenhua Guo, Tao Luo*, Chun-Mei Feng, Wangmeng Zuo
[pdf ]
Responsible Visual Editing
Minheng Ni, Yeli Shen, Lei Zhang*, Wangmeng Zuo*
[pdf ]
Drag Anything: Motion Control for Anything using Entity Representation
Weijia Wu , Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou*, Yan Li, Tingting Gao, Zhang Di
[pdf ]
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen*
[pdf ]
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan, Rui Liu, Wenguan Wang*, Yi Yang
[pdf ]
Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch
Taemin Park, Hyuck Lee, Heeyoung Kim*
[pdf ]
Vista3D: unravel the 3d darkside of a single image
Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang*
[pdf ]
The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
Yi Yao, Chan-Feng Hsu*, Jhe-Hao Lin, Hongxia Xie, Terence Lin, Yi-Ning Huang, Hong-Han Shuai*, Wen-Huang Cheng*
[pdf ]
Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
Junjie Huang*, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du
[pdf ]
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Qiuhong Shen, Xingyi Yang, Xinchao Wang*
[pdf ]
Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising
Guanting Dong*, Yueyi Zhang*, Xiaoyan Sun, Zhiwei Xiong
[pdf ]
Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
Kwanyong Park, Kuniaki Saito, Donghyun Kim*
[pdf ]
Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
Wanting Zhang, Huisi Wu*, Jing Qin
[pdf ]
CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
Jisu Shin, Junmyeong Lee, Seongmin Lee, Min-Gyu Park, Jumi Kang, Ju Hong Yoon, Hae-Gon Jeon*
[pdf ]
Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation
Genki Kinoshita*, Ko Nishino
[pdf ]
Uni3DL: A Unified Model for 3D Vision-Language Understanding
Xiang Li*, Jian Ding, Zhaoyang Chen, Mohamed Elhoseiny
[pdf ]
Object-Aware NIR-to-Visible Translation
Yunyi Gao, Lin Gu, Qiankun Liu, Ying Fu*
[pdf ]
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud*, Burhaneddin Yaman, Chun-Hao Liu, Diana Marculescu
[pdf ]
GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao*, Pan Zhou*, Mike Zheng Shou*
[pdf ]
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu*, Yushi Hu*, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A Smith, Wei-Chiu Ma, Ranjay Krishna
[pdf ]
AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Lorenzo Mur-Labadia*, Ruben Martinez-Cantin, Jose J Guerrero, Giovanni Maria Farinella, Antonino Furnari
[pdf ]
PreLAR: World Model Pre-training with Learnable Action Representation
Lixuan Zhang, Meina Kan, Shiguang Shan, Xilin Chen*
[pdf ]
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel*, Thomas LUCAS, Matthieu Armando, Salma Galaaoui, Romain Brégier, Philippe Weinzaepfel, Gregory Rogez
[pdf ]
De-confounded Gaze Estimation
Ziyang Liang, Yiwei Bao, Feng Lu*
[pdf ]
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi*
[pdf ]
FreestyleRet: Retrieving Images from Style-Diversified Queries
Hao Li*, Yanhao Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan*
[pdf ]
ReGround: Improving Textual and Spatial Grounding at No Cost
Phillip Y. Lee, Minhyuk Sung*
[pdf ]
CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
Jiewen Yang*, Yiqun Lin, Bin Pu, Jiarong GUO, Xiaowei Xu*, Xiaomeng Li*
[pdf ]
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, gang zhang, Errui Ding, Yan Wang*, Jingdong Wang, Si Liu*
[pdf ]
Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
Lingyu Zhu, Wenhan Yang, Baoliang Chen, Hanwei Zhu, Zhangkai Ni, Qi Mao, Shiqi Wang*
[pdf ]
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël, Renaud Vandeghen*, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck
[pdf ]
VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
Zhixue Fang, Yuzhi Liu, Huisi Wu*, Jing Qin
[pdf ]
Dataset Enhancement with Instance-Level Augmentations
Orest Kupyn*, Christian Rupprecht
[pdf ]
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
Zhikai Zhang, Yitang Li, Haofeng Huang, Mingxian Lin, Li Yi*
[pdf ]
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong*
[pdf ]
Reliability in Semantic Segmentation: Can We Use Synthetic Data?
Thibaut Loiseau, Tuan-Hung Vu*, Mickael Chen, Patrick Pérez, Matthieu Cord
[pdf ]
SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
Runmin Zhang*, Jun Ma, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen, Si-Yuan Cao*
[pdf ]
SCAPE: A Simple and Strong Category-Agnostic Pose Estimator
Yujia Liang, Zixuan Ye, Wenze Liu, Hao Lu*
[pdf ]
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
Mainak Singha*, Ankit Jha, Divyam Gupta, Pranav Singla, Biplab Banerjee
[pdf ]
Improving Knowledge Distillation via Regularizing Feature Direction and Norm
Yuzhu Wang, Lechao Cheng*, Manni Duan, Yongheng Wang, Zunlei Feng, Shu Kong
[pdf ]
3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views
Kennard Yanting Chan*, Fayao Liu, Guosheng Lin, Chuan Sheng Foo, Weisi Lin
[pdf ]
Lazy Diffusion Transformer for Interactive Image Editing
Yotam Nitzan*, Zongze Wu, Richard Zhang, Eli Shechtman, Danny Cohen-Or, Taesung Park, Michaël Gharbi
[pdf ]
Non-parametric Sensor Noise Modeling and Synthesis
Ali Mosleh*, Luxi Zhao, Atin Vikram Singh, Jaeduk Han, Abhijith Punnappurath, Marcus A Brubaker, Jihwan Choe, Michael S Brown
[pdf ]
Stripe Observation Guided Inference Cost-free Attention Mechanism
Zhongzhan Huang*, Shanshan Zhong, Wushao Wen, Jinghui Qin, Liang Lin*
[pdf ]
The Nerfect Match: Exploring NeRF Features for Visual Localization
Qunjie Zhou*, Maxim Maximov, Or Litany, Laura Leal-Taixé
[pdf ]
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia*, Ziwei Liu
[pdf ]
Robust Calibration of Large Vision-Language Adapters
Balamurali Murugesan*, Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
[pdf ]
Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
Haizhong Zheng*, Jiachen Sun, Shutong Wu, Bhavya Kailkhura, Zhuoqing Morley Mao, Chaowei Xiao*, Atul Prakash*
[pdf ]
Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training
Yuanqi Yao*, Gang Wu, Kui Jiang, Siao Liu, Jian Kuai, Xianming Liu, Junjun Jiang*
[pdf ]
milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
Fangqiang Ding*, Zhen Luo, Peijun Zhao, Chris Xiaoxuan Lu
[pdf ]
denoiSplit: a method for joint microscopy image splitting and unsupervised denoising
Ashesh Ashesh*, Florian Jug*
[pdf ]
AugDETR: Improving Multi-scale Learning for Detection Transformer
Jinpeng Dong, Yutong Lin, Chen Li, Sanping Zhou, Nanning Zheng*
[pdf ]
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun*, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob Donley, Chao Li, Gunhee Kim, Vamsi Krishna Ithapu, Calvin Murdock*
[pdf ]
SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
Josh David Myers-Dean*, Jarek T Reynolds, Brian Price, Yifei Fan, Danna Gurari
[pdf ]
SIGMA: Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi*, Michael Dorkenwald*, Fida Mohammad Thoker, Efstratios Gavves, Cees Snoek, Yuki M Asano
[pdf ]
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Basile Van Hoorick*, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick
[pdf ]
Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
Ziqiang Wang*, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu*, Konstantinos N Plataniotis*, Yang Wang*
[pdf ]
Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu*
[pdf ]
Understanding Physical Dynamics with Counterfactual World Modeling
Rahul Venkatesh*, Honglin Chen*, Kevin Feigelis, Daniel M Bear, Khaled Jedoui, Klemen Kotar, Felix J Binder, Wanhee Lee, Sherry Liu, Kevin Smith, Judith E. Fan, Daniel Yamins
[pdf ]
MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
Aggelina Chatziagapi*, Grigorios Chrysos, Dimitris Samaras
[pdf ]
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng*, Mi Luo*, Huiyu Wang, Alex Dimakis, Lorenzo Torresani, Gedas Bertasius, Kristen Grauman
[pdf ]
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu, Ming-Hsuan Yang, Sy-Yen Kuo*
[pdf ]
Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
Lingni Ma*, Yuting Ye, Rowan Postyeni, Alexander J Gamino, Vijay Baiyya, Luis Pesqueira, Kevin M Bailey, David Soriano Fosas, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Hyo Jin Kim, Jakob Engel, Karen Liu, Ziwei Liu, Renzo De Nardi, Richard Newcombe
[pdf ]
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng*, Faria Huq, Yue Jiang, Jason Wu, Xin Yue Li, Jeffrey Bigham, Amy Pavel
[pdf ]
SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild
Pengfei Wang, Xiaofei Hui, Jing Wu, Zile Yang, Kian Eng Ong, Xinge Zhao, Beijia Lu, Dezhao Huang, Evan Ling, Weiling Chen, Keng Teck Ma, Minhoe Hur, Jun Liu*
[pdf ]
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park*, Hee-Seon Kim, Kangwook Ko, Minbeom Kim, Changick Kim
[pdf ]
Text to Layer-wise 3D Clothed Human Generation
Junting Dong*, Qi Fang, Zehuan Huang, Xudong XU, Jingbo Wang, Sida Peng, Bo Dai
[pdf ]
Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
Tianxing Xu*, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang
[pdf ]
Fully Sparse 3D Occupancy Prediction
Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia Zeng, Li Chen, Hongyang Li, Limin Wang*
[pdf ]
Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
Junha Song*, Tae Soo Kim, Junha Kim, Gunhee Nam, Thijs Kooi, Jaegul Choo*
[pdf ]
CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui*
[pdf ]
Shifted Autoencoders for Point Annotation Restoration in Object Counting
Yuda Zou, Xin Xiao, Peilin Zhou, Zhichao Sun, Bo Du, Yongchao Xu*
[pdf ]
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu*, Xiaolong Wang, Tai Wang*, Yilun Chen, Jiangmiao Pang*, Dahua Lin
[pdf ]
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Shiyue Zhang, Zheng Chong, Xujie Zhang, Hanhui Li, Yuhao Cheng, yiqiang yan, Xiaodan Liang*
[pdf ]
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng, Wenjie Luo, Yiren Lu*, Tianyi Shen, Cole Gulino, Ari Seff, Justin Fu
[pdf ]
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Chaofeng Chen*, Annan Wang, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin*
[pdf ]
Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
Xiangyu Liao*, Tianheng Zheng, Jiayu Zhong, Pingping Zhang, Chao Ren*
[pdf ]
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
Mengchen Zhang*, Tong Wu, Tai Wang, Tengfei Wang, Ziwei Liu, Dahua Lin*
[pdf ]
BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Lingzhe Zhao, Peng Wang, Peidong Liu*
[pdf ]
Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
Qi Sun*, Hang Zhou, Wengang Zhou, Li Li, Houqiang Li
[pdf ]
BaSIC: BayesNet Structure Learning for Computational Scalable Neural Image Compression
Yufeng Zhang, Hang Yu, Shizhan Liu, Wenrui Dai, Weiyao Lin*
[pdf ]
FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li*, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan
[pdf ]
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
Junwu Zhang*, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, xing zhou, munan ning, Li Yuan*
[pdf ]
AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
Xinzhou Wang, Yikai Wang*, Junliang Ye, Fuchun Sun*, Zhengyi Wang, Ling Wang, Pengkun Liu, Kai Sun, Xintong Wang, Xie wende, Fangfu Liu, Bin He
[pdf ]
Spatially-Variant Degradation Model for Dataset-free Super-resolution
SHAOJIE GUO, Haofei Song, Qingli Li, Yan Wang*
[pdf ]
DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
Junkai Yan, Yipeng Gao, Qize Yang, Xihan Wei, Xuansong Xie, Ancong Wu*, WEI-SHI ZHENG*
[pdf ]
Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence
Hongyuan Wang, Lizhi Wang*, Jiang Xu, Chang Chen, Xue Hu, Fenglong Song, Youliang Yan
[pdf ]
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin*, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu*, Xiangyang Ji, Li Yuan*, Jie Chen
[pdf ]
EAFormer: Scene Text Segmentation with Edge-Aware Transformers
Haiyang Yu, Teng Fu, Bin Li*, Xiangyang Xue
[pdf ]
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan, Takehiko Ohkawa*, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao
[pdf ]
DetailSemNet: Elevating Signature Verification through Detail-Semantic Integration
Meng-Cheng Shih*, Tsai-Ling Huang, Yu-Heng Shih, Hong-Han Shuai, Hsuan-Tung Liu, Yi-Ren Yeh, Ching-Chun Huang*
[pdf ]
LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
Ruida Zhang, Ziqin Huang, Gu Wang, Chenyangguang Zhang, Yan Di, Xingxing Zuo, Jiwen Tang, Xiangyang Ji*
[pdf ]
Upper-body Hierarchical Graph for Skeleton Based Emotion Recognition in Assistive Driving
Jiehui Wu, Jiansheng Chen*, Qifeng Luo, Siqi Liu, Youze Xue, Huimin Ma
[pdf ]
Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
Yansheng Li, Tingzhu Wang*, Kang Wu, Linlin Wang, Xin Guo, Wenbin Wang
[pdf ]
Exploring Guided Sampling of Conditional GANs
Yifei Zhang*, Mengfei Xia, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Lianghua Huang, Yu Liu, Fan Cheng*
[pdf ]
MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan*
[pdf ]
Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
Lilang Lin, Lehong Wu, Jiahang Zhang, Jiaying Liu*
[pdf ]
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu*, Ashkan Khakzar, Jindong Gu, Qifeng Chen*, Philip Torr, Fabio Pizzati*
[pdf ]
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu*, Lilang Lin, Jiahang Zhang, Yiyang Ma, Jiaying Liu*
[pdf ]
TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection
Jan Skvrna*, Lukáš Neumann
[pdf ]
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai*
[pdf ]
FoundPose: Unseen Object Pose Estimation with Foundation Features
Evin Pınar Örnek*, Yann Labbé, Bugra Tekin, Lingni Ma, Cem Keskin, Christian Forster, Tomas Hodan
[pdf ]
Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao, Enguang Wang, Le Zhang, Xialei Liu*
[pdf ]
Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Ruicheng Feng, Chongyi Li, Chen Change Loy*
[pdf ]
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu*, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
[pdf ]
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li*, Xinhao Li, Yi Wang*, Yinan He, Yali Wang*, Limin Wang*, Yu Qiao*
[pdf ]
SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
Lingtong Kong*, Bo Li, Yike Xiong, Hao Zhang, Hong Gu, Jinwei Chen
[pdf ]
Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds
Yanni Ma, Hao Liu, Yun Pei, Yulan Guo*
[pdf ]
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu*, Li Zhang*
[pdf ]
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
Shouwei Ruan*, Yinpeng Dong, Liu Hanqing, Yao Huang, Hang Su, Xingxing Wei*
[pdf ]
Deep Cost Ray Fusion for Sparse Depth Video Completion
Jungeon Kim, Soongjin Kim, Jaesik Park, Seungyong Lee*
[pdf ]
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu, Lin Liu, Dongyang Xu, Caiyan Jia*, Feiyang Jia, Li Wang
[pdf ]
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Narek Tumanyan*, Assaf Singer, Shai Bagon, Tali Dekel
[pdf ]
GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Hui Zhang*, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song
[pdf ]
Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
Ruibin Li*, Ruihuang Li, Song Guo, Lei Zhang
[pdf ]
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
Nishad Singhi*, Jae Myung Kim, Karsten Roth, Zeynep Akata
[pdf ]
JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
ChenHan Jiang*, Yihan Zeng, Tianyang Hu, Songcen Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung
[pdf ]
Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
Camilo L Fosco*, Benjamin Lahner, Bowen Pan, Alex Andonian, Emilie L Josephs, Alex Lascelles, Aude Oliva
[pdf ]
Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
Deepti Hegde, Suhas Lohit*, Kuan-Chuan Peng*, Michael J. Jones, Vishal M. Patel
[pdf ]
"SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking"
Siyuan Li*, Lei Ke, Yung-Hsu Yang, Luigi Piccinelli, Mattia Segù, Martin Danelljan, Luc Van Gool
[pdf ]
Tensorial template matching for fast cross-correlation with rotations and its application for tomography
Antonio Martinez-Sanchez*, Ulrike Homberg, J. M. Almira, Harold Phelippeau
[pdf ]
FreeAugment: Data Augmentation Search Across All Degrees of Freedom
Tom Bekor*, Niv Nayman, Lihi Zelnik-Manor
[pdf ]
Learning Representations of Satellite Images From Metadata Supervision
Jules Bourcier*, Gohar Dashyan, Karteek Alahari, Jocelyn Chanussot
[pdf ]
I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim*
[pdf ]
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Kangle Deng*, Timothy Omernick, Alexander B Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, Maneesh Agrawala
[pdf ]
GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
Pengyuan Wang*, Takuya Ikeda, Robert Lee, Koichi Nishiwaki
[pdf ]
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Yicheng Zhu*, Keren Ye*, Junjie Ke, Jiahui Yu, Leonidas Guibas, Peyman Milanfar, Feng Yang*
[pdf ]
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Aoming Liu*, Zhong Li*, Zhang Chen*, Nannan Li, Yi Xu, Bryan Plummer
[pdf ]
SOS: Segment Object System for Open-World Instance Segmentation With Object Priors
Christian Wilms*, Tim Rolff, Maris N Hillemann, Robert Johanson, Simone Frintrop
[pdf ]
Lagrangian Hashing for Compressed Neural Field Representations
Shrisudhan Govindarajan*, Zeno Sambugaro, Akhmedkhan Shabanov, Towaki Takikawa, Weiwei Sun, Daniel Rebain, Nicola Conci, Kwang Moo Yi, Andrea Tagliasacchi
[pdf ]
EDformer: Transformer-Based Event Denoising Across Varied Noise Levels
Bin Jiang, Bo Xiong, Bohan Qu, M. Salman Asif, You Zhou*, Zhan Ma*
[pdf ]
Foster Adaptivity and Balance in Learning with Noisy Labels
Mengmeng Sheng, Zeren Sun*, Tao Chen, Shuchao Pang, yucheng wang, Yazhou Yao*
[pdf ]
MetaAug: Meta-Data Augmentation for Post-Training Quantization
Cuong Van Pham*, Hoang Anh Dung, Cuong Cao Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do
[pdf ]
Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis
Qian Chen, Shihao Shu, Xiangzhi Bai*
[pdf ]
Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
Shizhou Zhang, Wenlong Luo, De Cheng*, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang
[pdf ]
Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
Zhongyi Shui*, Yunlong Zhang, Kai Yao, Chenglu Zhu, Sunyi Zheng, Jingxiong Li, Honglin Li, YUXUAN SUN, Ruizhe Guo, Lin Yang*
[pdf ]
Gaze Target Detection Based on Head-Local-Global Coordination
Yaokun Yang, Feng Lu*
[pdf ]
3DSA:Multi-View 3D Human Pose Estimation With 3D Space Attention Mechanisms
Po Han Chen, Chia-Chi Tsai*
[pdf ]
Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni*
[pdf ]
An Economic Framework for 6-DoF Grasp Detection
Xiao-Ming Wu*, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng*
[pdf ]
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, Jiwen Lu*
[pdf ]
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
Fanyue Wei, Wei Zeng, Zhenyang Li, Dawei Yin, Lixin Duan, Wen Li*
[pdf ]
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu, Jiaxin Chen*, Hanwen Zhong, Di Huang, Yunhong Wang
[pdf ]
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng*
[pdf ]
"Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation"
Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang*, Xin Tong
[pdf ]
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
Junjie Guo*, Chenqiang Gao*, Fangcen Liu, Deyu Meng, Xinbo Gao
[pdf ]
CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks
Hao Fang, Jiawei Kong, Bin Chen*, Tao Dai, Hao Wu, Shu-Tao Xia
[pdf ]
Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
Benjamin Attal*, Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T Barron, Matthew O'Toole, Pratul Srinivasan
[pdf ]
Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds
Zicheng Wang, Zhen Zhao, Yiming Wu, Luping Zhou*, Dong Xu*
[pdf ]
A New Dataset and Framework for Real-World Blurred Images Super-Resolution
Rui Qin, Ming Sun, Chao Zhou, Bin Wang*
[pdf ]
AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization
Shixiong Xu, Chenghao Zhang, Lubin Fan*, Gaofeng Meng*, SHIMING XIANG, Jieping Ye
[pdf ]
RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
Zhiyuan Zhang*, Licheng Yang, Zhiyu Xiang
[pdf ]
StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
Wen Li*, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang
[pdf ]
Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation
Chen-Chen Zong, Ye-Wen Wang, Kun-Peng Ning, Hai-Bo Ye, Sheng-Jun Huang*
[pdf ]
Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
Zhaoxin Wang*, Handing Wang*, Cong Tian, Yaochu Jin
[pdf ]
Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
Zeyang Zhao, Qilong Xue, Yifan Bai, Yuhang He, Xing Wei*, Yihong Gong
[pdf ]
SeiT++: Masked Token Modeling Improves Storage-efficient Training
Minhyun Lee, Song Park, Byeongho Heo, Dongyoon Han, Hyunjung Shim*
[pdf ]
Rectify the Regression Bias in Long-Tailed Object Detection
Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu*
[pdf ]
MagicEraser: Erasing Any Objects via Semantics-Aware Control
Fan Li*, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu
[pdf ]
Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation
Haozhi Cao, Yuecong Xu, Jianfei Yang*, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie
[pdf ]
Stable Preference: Redefining training paradigm of human preference model for Text-to-Image Synthesis
Hanting Li, Hongjing Niu, Feng Zhao*
[pdf ]
SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
Jintu Zheng, Yi Ding, Qizhe Liu, Yuehui Chen, Yi Cao, Ying Hu, Zenan Wang*
[pdf ]
NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
Zhongqun Zhang*, Hengfei Wang, Ziwei Yu, Yihua Cheng*, Angela Yao, Hyung Jin Chang
[pdf ]
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai, ZheKai Duan, Gaowen Liu, Charles Fleming, Chris Xiaoxuan Lu*
[pdf ]
Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
Zhengbo Zhang*, Li Xu, Duo Peng, Hossein Rahmani, Jun Liu*
[pdf ]
Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification
Hai Ci*, Pei Yang, Yiren Song, Mike Zheng Shou*
[pdf ]
3D Small Object Detection with Dynamic Spatial Pruning
Zhihao Sun, Ziwei Wang, Hongmin Liu, Jie Zhou, Jiwen Lu*, Xiuwei Xu*
[pdf ]
STSP: Spatial-Temporal Subspace Projection for Video Class-incremental Learning
Hao Cheng, SIYUAN YANG, Chong Wang, Joey Tianyi Zhou, Alex Kot, Bihan Wen*
[pdf ]
Transferable 3D Adversarial Shape Completion using Diffusion Models
Xuelong Dai*, Bin Xiao
[pdf ]
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc*, Nicolas Gonthier, Clement Mallet, Loic Landrieu
[pdf ]
Distilling Diffusion Models into Conditional GANs
MinGuk Kang*, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park*
[pdf ]
Semantically Guided Representation Learning For Action Anticipation
Anxhelo Diko*, Danilo Avola, Bardh Prenkaj, Federico Fontana, Luigi Cinque
[pdf ]
MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory
Juwon Kang*, Nayeong Kim, Jungseul Ok, Suha Kwak*
[pdf ]
FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions
Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak*
[pdf ]
ScanTalk: 3D Talking Heads from Unregistered Scans
Federico Nocentini*, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Stefano Berretti, Mohamed Daoudi
[pdf ]
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Xianghao Kong, Jinyu Chen, Wenguan Wang*, Hang Su, Xiaolin Hu, Yi Yang, Si Liu*
[pdf ]
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang*, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang
[pdf ]
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He*, Ruihuang Li, Guowen Zhang, Lei Zhang
[pdf ]
A Cephalometric Landmark Regression Method based on Dual-encoder for High-resolution X-ray Image
Chao Dai, yang wang*, Chaolin Huang, zhou jiakai, Qilin Xu, Minpeng Xu
[pdf ]
Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking
Jikai Zheng, Mingjiang Liang, Shaoli Huang, Jifeng Ning*
[pdf ]
LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment
Yiming Ren, Xiao Han, Yichen Yao, Xiaoxiao Long, Yujing Sun*, Yuexin Ma*
[pdf ]
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Mehdi Noroozi*, Isma Hadji*, Brais Martinez*, Adrian Bulat*, Georgios Tzimiropoulos*
[pdf ]
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke*
[pdf ]
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang*, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu
[pdf ]
MegaScenes: Scene-Level View Synthesis at Scale
Joseph Tung, Gene Chou*, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely
[pdf ]
SuperGaussian: Repurposing Video Models for 3D Super Resolution
Yuan Shen*, Duygu Ceylan*, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Fruehstueck*
[pdf ]
Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
Jun-Yeong Moon, Jung Uk Kim*, Gyeong-Moon Park*
[pdf ]
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah*, Xiaoqian shen, Eslam mohamed abdelrahman*, Essam Sleiman, Mingchen Zhuge, Jian Ding, Deyao Zhu, Jürgen Schmidhuber, Mohamed Elhoseiny
[pdf ]
MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
Mihir Mahajan*, Florian Hofherr*, Daniel Cremers
[pdf ]
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang*, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu
[pdf ]
"MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Samuel Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Futang Peng, Anton Belyi, Max A Schwarzer, Hongyu Hè, Xianzhi Du, Haotian Zhang, Karanjeet Singh, Doug Kang, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev*, Yinfei Yang
[pdf ]
Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
Yixiao Wang*, Chen Tang, Lingfeng Sun, Simone Rossi, Yichen Xie, Chensheng Peng, Thomas Hannagan, Stefano Sabatini, Nicola Poerio, Masayoshi TOMIZUKA, Wei Zhan
[pdf ]
2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction
Atsuya Nakata*, Takao Yamanaka*
[pdf ]
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu*, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander G. Hauptmann, Ting Liu, Andrew Gallagher
[pdf ]
D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction
Bowen Fu*, Gu Wang*, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji*, Federico Tombari*
[pdf ]
Combining Generative and Geometry Priors for Wide-Angle Portrait Correction
Lan Yao, Chaofeng Chen, Xiaoming Li*, Zifei Yan, Wangmeng Zuo
[pdf ]
RealViformer: Investigating Attention for Real-World Video Super-Resolution
Yuehan Zhang*, Angela Yao
[pdf ]
Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution
Yuehan Zhang*, Seungjun Lee, Angela Yao
[pdf ]
Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
zhao zhe*, Mengshi Qi, Huadong Ma
[pdf ]
UniFS: Universal Few-shot Instance Perception with Point Representations
Sheng Jin*, Ruijie Yao, Lumin Xu, Wentao Liu*, Chen Qian, Ji Wu, Ping Luo*
[pdf ]
SemanticHuman-HD: High Resolution Semantic disentangled 3D Human Generation
Peng Zheng, Tao Liu, Zili Yi, Rui Ma*
[pdf ]
CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Avinash Paliwal*, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari
[pdf ]
Monocular Occupancy Prediction for Scalable Indoor Scenes
Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang*
[pdf ]
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang, Zongqing Lu*
[pdf ]
3DEgo: 3D Editing on the Go!
Umar Khalid*, Hasan Iqbal*, Azib Farooq, Jing Hua, Chen Chen*
[pdf ]
Efficient Depth-Guided Urban View Synthesis
sheng miao*, Jiaxin Huang, Dongfeng Bai, Weichao Qiu, Liu Bingbing, Andreas Geiger, Yiyi Liao
[pdf ]
Probabilistic Weather Forecasting with Deterministic Guidance-based Diffusion Model
Donggeun Yoon, Minseok Seo, Doyi Kim, Yeji Choi, Donghyeon Cho*
[pdf ]
Domain-adaptive Video Deblurring via Test-time Blurring
Jin-Ting He*, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin
[pdf ]
Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
Jiaxing Huang, Yanfeng Zhou, Yaoru Luo, Guole Liu, Heng Guo, Ge Yang*
[pdf ]
NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
William Ljungbergh*, Adam Tonderski, Joakim Johnander, Holger Caesar, Kalle Åström, Michael Felsberg, Christoffer Petersson
[pdf ]
OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing
Pranav Gupta*, Rishubh Singh, Pradeep Shenoy, Ravi Kiran Sarvadevabhatla*
[pdf ]
Progressive Pretext Task Learning for Human Trajectory Prediction
Xiaotong Lin, Tianming Liang, Jianhuang Lai, Jian-Fang Hu*
[pdf ]
"Hyperion – A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM"
David Hug*, Ignacio Alzugaray, Margarita Chli
[pdf ]
Isomorphic Pruning for Vision Models
Gongfan Fang*, Xinyin Ma, Michael Bi Mi, Xinchao Wang*
[pdf ]
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu*, Weihao Yu*, Xinchao Wang*
[pdf ]
Learning Cross-hand Policies of High-DOF Reaching and Grasping
Qijin She, Shishun Zhang, Yunfan Ye, Ruizhen Hu, Kai Xu*
[pdf ]
Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
Ting-Ru Liu*, Hsuan-Kung Yang, Jou-Min Liu, Chun-Wei Huang, Tsung-Chih Chiang, Quan Kong, Norimasa Kobori, Chun-Yi Lee
[pdf ]
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang*, Lixin Fan, Qiang Yang
[pdf ]
Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
Zhanzhong Pang*, Fadime Sener, Shrinivas Ramasubramanian, Angela Yao
[pdf ]
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
Agneet Chatterjee*, Yiran Luo, Tejas Gokhale, Yezhou Yang, Chitta R Baral
[pdf ]
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye*
[pdf ]
VideoClusterNet: Self-Supervised and Adaptive Face Clustering for Videos
Devesh Walawalkar*, Pablo Garrido
[pdf ]
Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients
Yiming Chen*, Xiangyu Yang, Nikos Deligiannis
[pdf ]
Controlling the World by Sleight of Hand
Sruthi Sudhakar*, Ruoshi Liu, Basile Van Hoorick, Carl Vondrick, Richard Zemel
[pdf ]
Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack
Mingyu Yang*, Daizong Liu, Keke Tang, Pan Zhou, Lixing Chen, Junyang Chen
[pdf ]
Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
Yongwei Nie, Hao Huang, Chengjiang Long, Qing Zhang, Pradipta Maji, Hongmin Cai*
[pdf ]
Cross-Domain Learning for Video Anomaly Detection with Limited Supervision
Yashika Jain, Ali Dabouei*, Min Xu*
[pdf ]
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang*, I-Hau Yeh, Hong-Yuan Mark Liao
[pdf ]
Unsupervised Multi-modal Medical Image Registration via Invertible Translation
Mengjie Guo*
[pdf ]
Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery
Jian-Li Wang, Xi-Le Zhao*
[pdf ]
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
Zhengyi Wang*, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu
[pdf ]
Domain Reduction Strategy for Non-Line-of-Sight Imaging
Hyunbo Shim, In Cho, Daekyu Kwon, Seon Joo Kim*
[pdf ]
HPE-Li: WiFi-enabled Lightweight Dual Selective Kernel Convolution for Human Pose Estimation
Toan D. Gian, Tien Dac Lai, Thien Van Luong, Kok-Seng Wong, Van-Dinh Nguyen*
[pdf ]
Cut out the Middleman: Revisiting Pose-based Gait Recognition
Yang Fu, Saihui Hou*, Shibei Meng, Xuecai Hu*, Chunshui Cao, Xu Liu, Yongzhen Huang
[pdf ]
HiEI: A Universal Framework for Generating High-quality Emerging Images from Natural Images
Jingmeng Li, Lukang Fu, Surun Yang, Hui Wei*
[pdf ]
High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior
Jianbing Shen*, Wencheng Han
[pdf ]
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, Hongyu Wang*
[pdf ]
View Selection for 3D Captioning via Diffusion Ranking
Tiange Luo*, Justin Johnson, Honglak Lee
[pdf ]
OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
Runyi Li*, Xuhan Sheng, Weiqi Li, Jian Zhang*
[pdf ]
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Yiming Zhao*, Zhouhui Lian*
[pdf ]
Confidence Self-Calibration for Multi-Label Class-Incremental Learning
Kaile Du*, Yifan Zhou, Fan Lyu, Yuyang Li, Chen Lu, Guangcan Liu*
[pdf ]
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
Zhe Kong*, Yong Zhang*, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, Wenhan Luo*
[pdf ]
Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
Min-Yeong Park, Jae-Ho Lee, Gyeong-Moon Park*
[pdf ]
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei*
[pdf ]
An Incremental Unified Framework for Small Defect Inspection
Jiaqi Tang, Hao Lu, Xiaogang Xu, Ruizheng Wu, Sixing Hu, Tong Zhang, Tsz Wa Cheng, Ming Ge, Ying-Cong Chen*, Fugee Tsung
[pdf ]
Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent
NianHui Guo*, Hong Guo, Christoph Meinel, Haojin Yang
[pdf ]
Temporally Consistent Stereo Matching
Jiaxi Zeng*, Chengtang Yao, Yuwei Wu*, Yunde Jia
[pdf ]
A Rotation-invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images
Tianyi Liu, Shuaishuai S Zhuang, Jiacheng Nie, Geng Chen , Yusheng Guo, Guangquan Zhou*, Jean-Louis Coatrieux, Yang Chen*
[pdf ]
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Kang Zhang, Yu-Jung Heo, Du-Seong Chang, Chang D. Yoo*
[pdf ]
Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
Zimin Xia*, Yujiao Shi, Hongdong Li, Julian F. P. Kooij
[pdf ]
BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream
Wenpu Li, Pian Wan, Peng Wang, Jinghang Li, Yi Zhou, Peidong Liu*
[pdf ]
Human Motion Forecasting in Dynamic Domain Shifts: A Homeostatic Continual Test-time Adaptation Framework
Qiongjie Cui*, Huaijiang Sun, Bin Li, Jianfeng Lu, Weiqing Li
[pdf ]
CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation
Hajin Shim, Changhun Kim, Eunho Yang*
[pdf ]
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
Yunpeng Bai*, Xintao Wang, Yan-Pei Cao, Yixiao Ge, Chun Yuan, Ying Shan
[pdf ]
FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation
Honghao Xu, Juzhan Xu, Zeyu Huang, Pengfei Xu, Hui Huang, Ruizhen Hu*
[pdf ]
BugNIST - a Large Volumetric Dataset for Detection under Domain Shift
Patrick M Jensen, Vedrana A Dahl, Rebecca Engberg, Carsten Gundlach, Hans Martin Kjer, Anders B Dahl*
[pdf ]
SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao*
[pdf ]
PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
Zhuojun Li*, Chun Yu*, Chen Liang, Yuanchun Shi
[pdf ]
PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen, Chongjian GE, Enze Xie*, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li
[pdf ]
Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
Xincheng Yao*, Ruoqi Li, Zefeng Qian, lu wang, Chongyang Zhang*
[pdf ]
A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
Yixiang Qiu*, Hao Fang, Hongyao Yu, Bin Chen*, Meikang Qiu, Shu-Tao Xia
[pdf ]
Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach
Aveen Dayal*, Rishabh Lalla, Linga Reddy Cenkeramaddi, C. Krishna Mohan, Abhinav Kumar, Vineeth N Balasubramanian
[pdf ]
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Zhenglin Zhou*, Fan Ma, Hehe Fan, Zongxin Yang, Yi Yang
[pdf ]
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu*, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Philip Torr, Jian Wu
[pdf ]
Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction
Rui Peng, Shihe Shen, Kaiqiang Xiong, Huachen Gao, Jianbo Jiao, Xiaodong Gu, Ronggang Wang*
[pdf ]
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Guian Fang*, Wenbiao Yan, Yuanfan Guo, Jianhua Han, Zutao Jiang, Hang Xu, Shengcai Liao, Xiaodan Liang
[pdf ]
Multiscale Graph Texture Network
Ravishankar Evani*, Deepu Rajan, Shangbo Mao
[pdf ]
HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
Fangqin Zhou*, Mert Kilickaya, Joaquin Vanschoren, Ran Piao
[pdf ]
Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
Xinhao Luo, Man Yao, Yuhong Chou, Bo Xu, Guoqi Li*
[pdf ]
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
Jianbing Shen, Chunliang Li, Wencheng Han, Junbo Yin, Sanyuan Zhao*
[pdf ]
Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon, Kuk-Jin Yoon*
[pdf ]
Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection
Harsh Shah*, Kashish Mittal, Ajit Rajwade*
[pdf ]
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet*, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash
[pdf ]
SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection
Anay Majee*, Ryan X Sharp, Rishabh Iyer*
[pdf ]
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren*, Yang Zhou, Jimei Yang, Jing Shi, Difan Liu, Feng Liu, Mingi Kwon, Abhinav Shrivastava
[pdf ]
S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition
Mohamed Abdelfattah*, Alexandre Alahi
[pdf ]
∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
Minh-Quan Le*, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras
[pdf ]
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
Jing Gu*, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Yilin Wang*, Xin Eric Wang*
[pdf ]
Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition
Yisong Wang, Nan Xi*, Jingjing Meng, Junsong Yuan
[pdf ]
Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
Ioannis Maniadis Metaxas*, Georgios Tzimiropoulos, Ioannis Patras
[pdf ]
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul A Jennings, Xingyu Zhao*
[pdf ]
Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
Akshay Paruchuri*, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen Pizer, Marc Niethammer, Roni Sengupta
[pdf ]
OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
jingyang xiang*, Zuohui Chen, Siqi Li, Qing Wu, Yong Liu
[pdf ]
Multistain Pretraining for Slide Representation Learning in Pathology
Guillaume Jaume*, Anurag J Vaidya*, Andrew Zhang, Andrew Song, Richard J Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long P Le, Faisal Mahmood*
[pdf ]
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang*, Feng Li, Zhaoyang Zeng, Shilong Liu, Tianhe Ren, Lei Zhang*
[pdf ]
Harmonizing knowledge Transfer in Neural Network with Unified Distillation
yaomin huang, Faming Fang, Zaoming Yan, Chaomin Shen, Guixu Zhang*
[pdf ]
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li*, Aditya Grover, Harkanwar Singh
[pdf ]
Click Prompt Learning with Optimal Transport for Interactive Segmentation
Jie Liu*, Haochen wang, Wenzhe Yin, Jan-Jakob Sonke, Efstratios Gavves
[pdf ]
3D Human Pose Estimation via Non-Causal Retentive Networks
Kaili Zheng, Feixiang Lu, Yihao Lv, Liangjun Zhang, Chenyi Guo*, Ji Wu*
[pdf ]
OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
Dongkwon Jin, Chang-Su Kim*
[pdf ]
6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
Sungho Chun, Ju Yong Chang*
[pdf ]
Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
Zongliang Wu*, Ruiying Lu, Ying Fu, Xin Yuan
[pdf ]
Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
Masashi Hatano*, Ryo Hachiuma, Ryo Fujii, Hideo Saito
[pdf ]
Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition
Zhongxi Chen, Shen Chen, Taiping Yao*, Ke Sun, Shouhong Ding, Xianming Lin*, Liujuan Cao, Rongrong Ji
[pdf ]
Modeling Label Correlations with Latent Context for Multi-Label Recognition
Zhaomin Chen*, Quan Cui, Ruoxi Deng, Jie Hu, Guodao Zhang*
[pdf ]
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Yulin Luo, Ruichuan An, Bocheng Zou, Yiming Tang, Jiaming Liu, Shanghang Zhang*
[pdf ]
Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection
Minzhou Pan*, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin
[pdf ]
DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction
Yuxin Yao, Siyu Ren, Junhui Hou*, Zhi Deng, Juyong Zhang, Wenping Wang
[pdf ]
MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
Yihong Sun*, Bharath Hariharan
[pdf ]
ARoFace: Alignment Robustness to Improve Low-quality Face Recognition
Mohammad Saeed Ebrahimi Saadabadi*, Sahar Rahimi Malakshan, Ali Dabouei, Nasser Nasrabadi
[pdf ]
Learning Diffusion Models for Multi-View Anomaly Detection
Chieh Liu*, Yu-Min Chu*, Ting-I Hsieh*, Hwann-Tzong Chen*, Tyng-Luh Liu*
[pdf ]
"Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation"
Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma*, Jian Wang*
[pdf ]
Multi-modal Relation Distillation for Unified 3D Representation Learning
Huiqun Wang, Yiping Bao, Panwang Pan, Zeming Li, Xiao Liu, Ruijie Yang, Di Huang*
[pdf ]
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi*, Tianyang Han, Wei Xiong, Jipeng ZHANG, Runtao Liu, Rui Pan, Tong Zhang
[pdf ]
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao*, hongguang Zhu, Yunchao Wei, Yao Zhao*, Jiannan Huang, Humphrey Shi
[pdf ]
Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification
Dekun Lin*, Zhe Cui, Rui Chen, Tailai Peng, xinran xie, Xiaolin Qin
[pdf ]
MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
Shuzhao Xie*, Weixiang Zhang, Chen Tang, Yunpeng Bai, Rongwei Lu, Shjia Ge, Zhi Wang
[pdf ]
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang*
[pdf ]
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang, yiming ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai*
[pdf ]
Neural Metamorphosis
Xingyi Yang*, Xinchao Wang*
[pdf ]
WHAC: World-grounded Humans and Cameras
Wanqi Yin, Zhongang Cai, Chen Wei, Fanzhou Wang, Ruisi Wang, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang*
[pdf ]
Federated Learning with Local Openset Noisy Labels
Zonglin Di*, Zhaowei Zhu, Xiaoxiao Li, Yang Liu*
[pdf ]
Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection
Jiacheng Deng*, Jiahao Lu, Tianzhu Zhang
[pdf ]
PSALM: Pixelwise Segmentation with Large Multi-modal Model
Zheng Zhang, yeyao ma, Enming Zhang, Xiang Bai*
[pdf ]
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
Shoma Iwai*, Atsuki Osanai, Shunsuke Kitada, Shinichiro Omachi
[pdf ]
Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images
Ruiqi Wang*, Akshay Gadi Patil, Fenggen Yu, Hao Zhang
[pdf ]
Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan*
[pdf ]
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
Xu Zheng*, Yuanhuiyi Lyu, Lin Wang*
[pdf ]
Kinetic Typography Diffusion Model
Seonmi Park, Inhwan Bae, Seunghyun Shin, Hae-Gon Jeon*
[pdf ]
"Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction"
Shuchi Wu*, Chuan Ma*, Kang Wei*, Xiaogang XU, Ming Ding, Yuwen Qian, Di Xiao, Tao Xiang
[pdf ]
Light-in-Flight for a World-in-Motion
Jongho Lee*, Ryan J Suess, Mohit Gupta
[pdf ]
GroupDiff: Diffusion-based Group Portrait Editing
Yuming Jiang, Nanxuan Zhao*, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu
[pdf ]
Faceptor: A Generalist Model for Face Perception
Lixiong Qin*, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu*, Weihong Deng
[pdf ]
Inter-Class Topology Alignment for Efficient Black-Box Substitute Attacks
Lingzhuang Meng, Mingwen Shao*, Yuanjian Qiao, Wenjie Liu
[pdf ]
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang*, Francis Engelmann
[pdf ]
InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
zhenhua xu*, Kwan-Yee K. Wong, Hengshuang Zhao
[pdf ]
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
Xianwei Zhuang*, Hongxiang Li, Xuxin Cheng, Zhihong Zhu, Yuxin Xie, Yuexian Zou
[pdf ]
"Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images"
Chuanrui Zhang*, Yonggen Ling*, Minglei Lu, Minghan Qin, Haoqian Wang*
[pdf ]
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim*, Sanghyuk Chun, Byeongho Heo, Dongyoon Han*
[pdf ]
Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
Peifu Liu, Tingfa Xu*, Jie Wang, Huan Chen, Huiyan Bai, Jianan Li*
[pdf ]
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang*, Zhixu Li, Tiefeng Li, Xiaowen Chu*
[pdf ]
Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
Yuan Gao, Zilei Wang*, Yixin Zhang, Bohai Tu
[pdf ]
Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture
Zhigao Cao, Meng Li, Xiashuang Wang, Haoyu Wang, Fan Wang, Youjun Li, Zigang Huang*
[pdf ]
Camera-LiDAR Cross-modality Gait Recognition
Wenxuan Guo*, Yingping Liang, Zhiyu Pan, Ziheng Xi, Jianjiang Feng, Jie Zhou
[pdf ]
LiteSAM is Actually what you Need for segment Everything
Jianhai Fu, Yuanjie Yu, Ningchuan Li*, Yi Zhang, Qichao Chen, Jianping Xiong, Jun Yin, Zhiyu Xiang*
[pdf ]
IGNORE: Information Gap-based False Negative Loss Rejection for Single Positive Multi-Label Learning
Gyeong Ryeol Song, Noo-ri Kim, Jin-Seop Lee, Jee-Hyong Lee*
[pdf ]
Visual Prompting via Partial Optimal Transport
Mengyu Zheng*, Zhiwei Hao, Yehui Tang, Chang Xu*
[pdf ]
Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model
Guanren Qiao, Guiliang Liu*, Guorui Quan, Rongxiao Qu
[pdf ]
Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation
Chongjie Si, Xuehui Wang, Xiaokang Yang, Wei Shen*
[pdf ]
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Yunkang Cao*, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen*, Giacomo Boracchi
[pdf ]
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Rong Quan, yantao Lai, Mengyu Qiu, Dong Liang*
[pdf ]
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Matic Fučka*, Vitjan Zavrtanik, Danijel Skočaj
[pdf ]
SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
Hongcheng Zhang, Liu Liang, Pengxin Zeng*, Xiao Song, Zhe Wang
[pdf ]
3D Gaussian Parametric Head Model
Yuelang Xu, Lizhen Wang, Zerong Zheng, Zhaoqi Su, Yebin Liu*
[pdf ]
RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields
Doriand Petit*, Steve Bourgeois, Dumitru Pavel, Vincent Gay-Bellile, Florian Chabot, Loïc Barthe
[pdf ]
Platypus: A Generalized Specialist Model for Reading Text in Various Forms
Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang*, Cong Yao*
[pdf ]
Structured-NeRF: Hierarchical Scene Graph with Neural Representation
Zhide Zhong, Jiakai Cao, songen gu, Sirui Xie, Liyi Luo, Hao Zhao, Guyue Zhou, Haoang Li, Zike Yan*
[pdf ]
EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation
Nikolai Körber*, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller
[pdf ]
Plug-and-Play Learned Proximal Trajectory for 3D Sparse-View X-Ray Computed Tomography
Romain Vo*, Julie Escoda, Caroline Vienne, Etienne Decenciere
[pdf ]
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen*
[pdf ]
Test-Time Stain Adaptation with Diffusion Models for Histopathology Image Classification
Cheng-Chang Tsai*, Yuan-Chih Chen, Chun-Shien Lu*
[pdf ]
Beyond MOT: Semantic Multi-Object Tracking
Yunhao Li, Qin Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang*
[pdf ]
Temporal Event Stereo via Joint Learning with Stereoscopic Flow
Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon*
[pdf ]
SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
Huafeng Chen, Pengxu Wei, Guangqian Guo, Shan Gao*
[pdf ]
Just a Hint: Point-Supervised Camouflaged Object Detection
Huafeng Chen, Dian SHAO*, Guangqian Guo, shan gao*
[pdf ]
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang*, Changliu Liu, Jiwen Lu, Yansong Tang
[pdf ]
Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu*
[pdf ]
Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
Zhili Chen, Shuangjie Xu, Maosheng Ye, Zian Qian, Xiaoyi Zou, Dit-Yan Yeung, Qifeng Chen*
[pdf ]
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang*, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
[pdf ]
E3V-K5: An Authentic Benchmark for Redefining Video-Based Energy Expenditure Estimation
Shengxuming Zhang, Lei Jin, Yifan Wang, Xinyu Wang, Xu Wen, Zunlei Feng*, Mingli Song
[pdf ]
GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Yanyan Li*, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari
[pdf ]
URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
Bo Xu*, Liu Ziao, Mengqi Guo, jiancheng Li, Gim Hee Lee
[pdf ]
InstructIR: High-Quality Image Restoration Following Human Instructions
Marcos V. Conde*, Gregor Geigle, Radu Timofte
[pdf ]
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
Yuan Chen, Zi-han Ding, Ziqin Wang, Yan Wang*, Lijun Zhang, Si Liu*
[pdf ]
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing HE, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen*
[pdf ]
LayoutFlow: Flow Matching for Layout Generation
Julian Jorge Andrade Guerreiro*, Naoto Inoue*, Kento Masui, Mayu Otani, Hideki Nakayama
[pdf ]
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma, Guangrun Wang, Xiaodan Liang*
[pdf ]
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Zheyuan Zhou, Le Wang, Naiyu Fang, Zili Wang, Lemiao Qiu*, Shuyou Zhang
[pdf ]
Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization
Wei Huang*, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu
[pdf ]
Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
Qian Liang, Yan Chen, Yang Hu*
[pdf ]
An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes
Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Zilong Dong*, Liefeng Bo, Qixing Huang*
[pdf ]
STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao*
[pdf ]
RGBD GS-ICP SLAM
Seongbo Ha, Jiung Yeon, Hyeonwoo Yu*
[pdf ]
Efficient NeRF Optimization - Not All Samples Remain Equally Hard
Juuso Korhonen*, Goutham Rangu, Hamed Rezazadegan Tavakoli, Juho Kannala
[pdf ]
Revisiting Calibration of Wide-Angle Radially Symmetric Cameras
Andrea Porfiri Dal Cin*, Francesco Azzoni, Giacomo Boracchi, Luca Magri*
[pdf ]
Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
Georgy Perevozchikov*, Nancy Mehta*, Mahmoud Afifi*, Radu Timofte*
[pdf ]
Robust Incremental Structure-from-Motion with Hybrid Features
Shaohui Liu*, Yidan Gao, Tianyi Zhang, Rémi Pautrat, Johannes L Schönberger, Viktor Larsson, Marc Pollefeys
[pdf ]
Revisiting Domain-Adaptive Object Detection in Adverse Weather by the Generation and Composition of High-Quality Pseudo-Labels
Rui Zhao, Huibin Yan, Shuoyao Wang*
[pdf ]
Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment
Yufan Liu*, Wanqian Zhang, Dayan Wu, Zheng Lin, jingzi Gu, Weiping Wang
[pdf ]
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
Qinyu Yang, Haoxin Chen, Yong Zhang*, Menghan Xia, Xiaodong Cun, Zhixun Su*, Ying Shan
[pdf ]
UniCal: Unified Neural Sensor Calibration
Ze Yang*, George G Chen, Haowei Zhang, Kevin Ta, Ioan Andrei Bârsan, Daniel Murphy, Sivabalan Manivasagam*, Raquel Urtasun*
[pdf ]
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang*, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia
[pdf ]
Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
Suqi Song, Chenxu Zhang, Peng Zhang, Pengkun Li, Fenglong Song, Lei Zhang*
[pdf ]
Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
Chih-Jung Tsai, Hwann-Tzong Chen*, Tyng-Luh Liu
[pdf ]
WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
Pingyi Chen*, Chenglu Zhu, Sunyi Zheng, Honglin Li, Lin Yang*
[pdf ]
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
Anindita Ghosh*, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, Philipp Slusallek
[pdf ]
Statewide Visual Geolocalization in the Wild
Florian Fervers*, Sebastian Bullinger, Christoph Bodensteiner, Michael Arens, Rainer Stiefelhagen
[pdf ]
Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Bin Zhao*, Zhigang Wang, Dong Wang*, Peng Gao, Hongsheng Li, Xuelong Li
[pdf ]
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
Pulkit Kumar*, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava
[pdf ]
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
Thomas Hummel*, Shyamgopal Karthik, Mariana-Iuliana Georgescu, Zeynep Akata
[pdf ]
Synchronization of Projective Transformations
Rakshith Madhavan*, Andrea Fusiello, Federica Arrigoni
[pdf ]
TLControl: Trajectory and Language Control for Human Motion Synthesis
Weilin Wan*, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu
[pdf ]
Insect Identification in the Wild: The AMI Dataset
Aditya Jain*, Fagner Cunha, Michael J Bunsen, Juan Sebastián Cañas, Léonard Pasi, Nathan Pinoy, Flemming Helsing, JoAnne Russo, Marc S Botham, Michael Sabourin, Jonathan Fréchette, Alexandre Anctil, Yacksecari Lopez, Eduardo Navarro, Filonila Pérez, Ana C Zamora, Jose Alejandro Ramirez-Silva, Jonathan Gagnon, Tom A August, Kim Bjerge, Alba Gomez Segura, Marc Belisle, Yves Basset, Kent P McFarland, David B Roy, Toke T Høye, Maxim Larrivee, David Rolnick
[pdf ]
Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
Junyan Ye, Zhutao Lv, Weijia Li*, Jinhua Yu, Haote Yang, Huaping Zhong, Conghui He*
[pdf ]
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang*, Siyuan Huang*
[pdf ]
Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers
Yutian Zhao, Tianjing Zhang, Hui Ji*
[pdf ]
SHIC: Shape-Image Correspondences with no Keypoint Supervision
Aleksandar Shtedritski*, Christian Rupprecht, Andrea Vedaldi
[pdf ]
GenRC: Generative 3D Room Completion from Sparse Image Collections
Ming-Feng Li*, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y Chen, Cheng-Hao Kuo, Min Sun
[pdf ]
A Probability-guided Sampler for Neural Implicit Surface Rendering
Gonçalo José Dias Pais, Valter André Piedade, Moitreya Chatterjee, Marcus Greiff, Pedro Miraldo*
[pdf ]
ReMatching: Low-Resolution Representations for Scalable Shape Correspondence
Filippo Maggioli*, Daniele Baieri, Emanuele Rodola, Simone Melzi
[pdf ]
Where am I? Scene Retrieval with Language
Jiaqi Chen*, Daniel Barath, Iro Armeni, Marc Pollefeys, Hermann Blum
[pdf ]
This Probably Looks Exactly Like That: An Invertible Prototypical Network
Zachariah Carmichael*, Timothy P Redgrave, Daniel Gonzalez Cedre, Walter Scheirer
[pdf ]
Arc2Face: A Foundation Model for ID-Consistent Human Faces
Foivos Paraperas Papantoniou*, Alexandros Lattas, Stylianos Moschoglou, Jiankang Deng, Bernhard Kainz, Stefanos Zafeiriou
[pdf ]
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng*, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein
[pdf ]
Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling
Wonwoong Cho*, Hareesh Ravi*, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David Iseri Inouye*, Ajinkya Kale*
[pdf ]
SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers
Mingrui Zhao*, Yizhi Wang, Fenggen Yu, Changqing Zou, Ali Mahdavi-Amiri
[pdf ]
Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions
Jiacong Xu*, Mingqian Liao, Ram Prabhakar Kathirvel, Vishal Patel
[pdf ]
On the Viability of Monocular Depth Pre-training for Semantic Segmentation
Dong Lao*, Fengyu Yang, Daniel Wang, Hyoungseob Park, Samuel Lu, Alex Wong, Stefano Soatto
[pdf ]
Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu*
[pdf ]
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar*, Arya Bakhtiar, Danny L Tran, Antonio Loquercio, Jathushan Rajasegaran, yann lecun, Amir Globerson, Trevor Darrell
[pdf ]
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu*, Venkatesh Saligrama*
[pdf ]
Neural graphics texture compression supporting random access
Farzad Farhadzadeh*, Qiqi Hou, Hoang Le, Amir Said, Randall R Rauwendaal, Alex Bourd, Fatih Porikli
[pdf ]
Contrastive Learning with Synthetic Positives
Dewen Zeng*, Xinrong Hu, Yawen Wu, Xiaowei Xu, Yiyu Shi
[pdf ]
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc P.J. Sträter*, Mohammadreza Salehi, Efstratios Gavves, Cees G.M. Snoek, Yuki M. Asano
[pdf ]
Interpretability-Guided Test-Time Adversarial Defense
Akshay Kulkarni*, Tsui-Wei Weng
[pdf ]
DIM: Dyadic Interaction Modeling for Social Behavior Generation
Minh Tran*, Di Chang, Maksim Siniukov, Mohammad Soleymani
[pdf ]
Tri^{2}-plane: Thinking Head Avatar via Feature Pyramid
Luchuan Song*, Pinxin Liu, Lele Chen, Guojun Yin, Chenliang Xu
[pdf ]
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao, Liu Yue, Zonghao Guo, weijia wu, Chen Gong, Qixiang Ye, Fang Wan*
[pdf ]
Free Lunch for Gait Recognition: A Novel Relation Descriptor
Jilong Wang*, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, Tianzhu Zhang, Liang Wang*
[pdf ]
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang*, Gaowen Liu, Mubarak Shah, Yan Yan
[pdf ]
Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
Xiaoran Zhang*, John C. Stendahl, Lawrence H. Staib, Albert J. Sinusas, Alex Wong, James S. Duncan
[pdf ]
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
Nithin Gopalakrishnan Nair*, Jeya Maria Jose Valanarasu, Vishal Patel
[pdf ]
Watch Your Steps: Local Image and Scene Editing by Text Instructions
Ashkan Mirzaei*, Tristan T Aumentado-Armstrong, Marcus A Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G Derpanis, Igor Gilitschenski
[pdf ]
Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation
Hritam Basak*, Zhaozheng Yin
[pdf ]
3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
Anh Thai*, Weiyao Wang, Hao Tang, Stefan Stojanov, James M Rehg, Matt Feiszli
[pdf ]
Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
Zhengyuan Yang*, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
[pdf ]
Human-in-the-Loop Visual Re-ID for Population Size Estimation
Gustavo Perez*, Daniel Sheldon, Grant Van Horn, Subhransu Maji
[pdf ]
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M Alvarez, Zuxuan Wu*, Yu-Gang Jiang
[pdf ]
"PointNeRF++: A multi-scale, point-based Neural Radiance Field"
Weiwei Sun, Eduard Trulls, Yang-Che Tseng, Sneha Sambandam, Gopal Sharma, Andrea Tagliasacchi, Kwang Moo Yi*
[pdf ]
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Bingchen Zhao, Alan Yuille, Yuyin Zhou, Cihang Xie*
[pdf ]
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian, Xiaopeng Zhang*
[pdf ]
Fast View Synthesis of Casual Videos with Soup-of-Planes
Yao-Chih Lee*, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, Feng Liu*
[pdf ]
Adaptive Human Trajectory Prediction via Latent Corridors
Neerja Thakkar*, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik
[pdf ]
Video Question Answering with Procedural Programs
Rohan Choudhury*, Koichiro Niinuma, Kris Kitani, Laszlo A Jeni
[pdf ]
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
Wenhui Zhu*, Xiwen Chen, Peijie Qiu, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang
[pdf ]
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
Dong Huo*, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang
[pdf ]
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiao-Jun Wu*, Muhammad Awais, Sara Atito, Josef Kittler
[pdf ]
LLMGA: Multimodal Large Language Model based Generation Assistant
bin xia*, Shiyin Wang, Yingfan Tao, Yitong Wang, Jiaya Jia
[pdf ]
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
Mi Luo*, Zihui Xue, Alex Dimakis, Kristen Grauman
[pdf ]
Shape from Heat Conduction
Sriram Narayanan*, Mani Ramanagopal, Mark Sheinin, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan
[pdf ]
An Adaptive Screen-Space Meshing Approach for Normal Integration
Moritz Heep*, Eduard Zell
[pdf ]
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Seung Hyun Lee*, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang*
[pdf ]
HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
Eugene Valassakis, Guillermo Garcia-Hernando*
[pdf ]
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei*, Abhinav Gupta, Pedro Morgado*
[pdf ]
Nuvo: Neural UV Mapping for Unruly 3D Representations
Pratul Srinivasan*, Stephan J Garbin, Dor Verbin, Jonathan T Barron, Ben Mildenhall
[pdf ]
Towards High-Quality 3D Motion Transfer with Realistic Apparel Animation
Rong Wang*, Wei Mao, Changsheng Lu, HONGDONG LI
[pdf ]
AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration
Rao Fu*, Zehao Wen, Zichen Liu , Srinath Sridhar
[pdf ]
Better Call SAL: Towards Learning to Segment Anything in Lidar
Aljosa Osep*, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé
[pdf ]
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc Van Gool, Konrad Schindler, Anton Obukhov*
[pdf ]
"DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement"
Qimin Chen*, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri
[pdf ]
Scene-aware Human Motion Forecasting via Mutual Distance Prediction
Chaoyue Xing*, Wei Mao, Miaomiao Liu
[pdf ]
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Zehao Zhu, Zhiwen Fan*, Yifan Jiang, Zhangyang Wang*
[pdf ]
Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang*, Rainer Stiefelhagen
[pdf ]
iMatching: Imperative Correspondence Learning
Zitong Zhan*, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang*
[pdf ]
COSMU: Complete 3D human shape from monocular unconstrained images
Marco Pesavento*, Marco Volino, Adrian Hilton
[pdf ]
MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps
Jianhao Zheng*, Daniel Barath, Marc Pollefeys, Iro Armeni*
[pdf ]
Appearance-based Refinement for Object-Centric Motion Segmentation
Junyu Xie*, Weidi Xie, Andrew Zisserman
[pdf ]
SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer*, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico Tombari
[pdf ]
Open Vocabulary Multi-Label Video Classification
Rohit Gupta*, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul A Chilimbi
[pdf ]
Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-Shot Data
Xiaofan Que, Qi Yu*
[pdf ]
Regularizing Dynamic Radiance Fields with Kinematic Fields
Woobin Im, Geonho Cha, Sebin Lee, Jumin Lee, Juhyeong Seon, Dongyoon Wee, Sungeui Yoon*
[pdf ]
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang*, Lukas Hoyer*, Mark Weber, Tobias Fischer, Dengxin Dai, Laura Leal-Taixé, Daniel Cremers, Marc Pollefeys, Luc Van Gool
[pdf ]
Efficient Pre-training for Localized Instruction Generation of Procedural Videos
Anil Batra*, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller
[pdf ]
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
Yuxuan Jiang*, Chen Feng, Fan Zhang, David Bull
[pdf ]
DEAL: Disentangle and Localize Concept-level Explanations for VLMs
Tang Li*, Mengmeng Ma, Xi Peng
[pdf ]
Fast Encoding and Decoding for Implicit Video Representation
Hao Chen*, Saining Xie, Ser-Nam Lim, Abhinav Shrivastava
[pdf ]
Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
Zhengming Yu*, Zhiyang Dou, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wenping Wang*
[pdf ]
Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao*, Alexandros Graikos, Jingwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras
[pdf ]
IMMA: Immunizing text-to-image Models against Malicious Adaptation
Amber Yijia Zheng*, Raymond A. Yeh
[pdf ]
Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling
Jaehyeok Kim, Dongyoon Wee, Dan Xu*
[pdf ]
GeoCalib: Learning Single-image Calibration with Geometric Optimization
Alexander Veicht*, Paul-Edouard Sarlin*, Philipp Lindenberger, Marc Pollefeys
[pdf ]
3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Zihao Xiao*, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng
[pdf ]
Semicalibrated Relative Pose from an Affine Correspondence and Monodepth
Petr Hruby*, Marc Pollefeys, Daniel Barath
[pdf ]
Global Structure-from-Motion Revisited
Linfei Pan*, Daniel Barath, Marc Pollefeys, Johannes L Schönberger
[pdf ]
MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin*, Chas H Leichner, Manolis Delakis, Marco Fornoni, Shixin Luo, Fan Yang, Weijun Wang, Colby Banbury, Chengxi Ye, Berkin Akin, Vaibhav Aggarwal, Tenghui Zhu, Daniele Moro, Andrew Howard
[pdf ]
Gravity-aligned Rotation Averaging with Circular Regression
Linfei Pan*, Marc Pollefeys, Daniel Barath
[pdf ]
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
Kunpeng Song*, Yizhe Zhu*, Bingchen Liu*, Qing Yan*, Ahmed Elgammal*, Xiao Yang*
[pdf ]
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray*, Zi Helen Huang, Tatsuya Harada, Yadan Luo
[pdf ]
Quanta Video Restoration
Prateek Chennuri*, Yiheng Chi, Enze Jiang, GM Dilshan Godaliyadda*, Abhiram Gnanasambandam*, Hamid R Sheikh, Istvan Gyongy, Stanley H Chan*
[pdf ]
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rohit Gandikota*, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau
[pdf ]
CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu*
[pdf ]
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
Hallee E. Wong*, Marianne Rakic, John Guttag, Adrian V. Dalca
[pdf ]
POCA: Post-training Quantization with Temporal Alignment for Codec Avatars
Jian Meng*, Yuecheng Li*, Leo (Chenghui) Li, Syed Shakib Sarwar, Dilin Wang, Jae-sun Seo*
[pdf ]
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim*, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun
[pdf ]
Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
Hoonhee Cho, Sung-Hoon Yoon, Hyeokjun Kweon, Kuk-Jin Yoon*
[pdf ]
Unsupervised Dense Prediction using Differentiable Normalized Cuts
Yanbin Liu*, Stephen Gould
[pdf ]
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan*, Jingxuan Wei*, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Ruifeng Guo, BiHui Yu, Stan Z. Li*
[pdf ]
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization
Jooyeol Yun*, Jaegul Choo
[pdf ]
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
Yitong Jiang*, Zhaoyang Zhang, Tianfan Xue, Jinwei Gu*
[pdf ]
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang*, Kai-Po Chang, Chung-Ting Tsai, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang
[pdf ]
EINet: Point Cloud Completion via Extrapolation and Interpolation
Pingping Cai*, Canyu Zhang, LINGJIA SHI, Lili Wang, Nasrin Imanpour, Song Wang
[pdf ]
Personalized Video Relighting With an At-Home Light Stage
Jun Myeong Choi*, Max Christman, Roni Sengupta
[pdf ]
Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
Lin Zhu*, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang
[pdf ]
A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks
Feiyu CHEN*, Wei Lin, Ziquan Liu, Antoni Chan
[pdf ]
SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang QI*, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi
[pdf ]
Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
David Junhao Zhang*, Mutian Xu, Jay Zhangjie Wu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou*
[pdf ]
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
XIANG ZHANG*, Yulun Zhang, Fisher Yu
[pdf ]
Audio-Synchronized Visual Animation
Lin Zhang, Shentong Mo, Yijing Zhang, Pedro Morgado*
[pdf ]
Expressive Whole-Body 3D Gaussian Avatar
Gyeongsik Moon*, Takaaki Shiratori, Shunsuke Saito
[pdf ]
Canonical Shape Projection is All You Need for 3D Few-shot Class Incremental Learning
Ali Cheraghian*, Zeeshan Hayder, Sameeea Ramasinghe, Shafin Rahman, Javad Jafaryahya, Lars Petersson, Mehrtash Harandi
[pdf ]
Controllable Human-Object Interaction Synthesis
Jiaman Li*, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu
[pdf ]
High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Yisheng He*, Weihao Yuan*, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang
[pdf ]
DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
Dominik Bauer*, Zhenjia Xu, Shuran Song
[pdf ]
PAV: Personalized Head Avatar from Unstructured Video Collection
Akin Caliskan*, Berkay Kicanaoglu, Hyeongwoo Kim
[pdf ]
Strike a Balance in Continual Panoptic Segmentation
Jinpeng Chen, Runmin Cong*, Yuxuan Luo, Horace Ho Shing Ip, Sam Kwong*
[pdf ]
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang, Minsu Cho*
[pdf ]
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng*, Hadi Amiri
[pdf ]
Unified Local-Cloud Decision-Making via Reinforcement Learning
Kathakoli Sengupta, Zhongkai Shangguan, Sandesh Bharadwaj, Sanjay Arora, Eshed Ohn-Bar*, Renato Mancuso
[pdf ]
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan*, Jiaqi Li, Zhiqian Lin, Weiye Xiao, Lei Yang*
[pdf ]
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang, Gu Zhang, Mingrun Jiang, Huazhe Xu*
[pdf ]
Efficient Frequency-Domain Image Deraining with Contrastive Regularization
Ning Gao, Xingyu Jiang, Xiuhui Zhang, Yue Deng*
[pdf ]
Stitched ViTs are Flexible Vision Backbones
Zizheng Pan*, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang*
[pdf ]
TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
Li-Wu Tsao*, Hao-Tang Tsui, Yu-Rou Tuan, Pei-Chi Chen, Kuan-Lin Wang, Jhih-Ciang Wu, Hong-Han Shuai*, Wen-Huang Cheng
[pdf ]
SemReg: Semantics Constrained Point Cloud Registration
Sheldon Fung, Xuequan Lu*, Dasith de Silva Edirimuni, Wei Pan, Xiao Liu, HONGDONG LI
[pdf ]
Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
Yabo Chen, Jiemin Fang, Yuyang Huang, Taoran Yi, Xiaopeng Zhang*, Lingxi Xie, Xinggang Wang, Wenrui Dai*, Hongkai Xiong, Qi Tian
[pdf ]
RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song*, Jieping Ye*
[pdf ]
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
Jiazhi Guan*, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu
[pdf ]
Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
Ri-Zhao Qiu*, Ge Yang, Weijia Zeng, Xiaolong Wang
[pdf ]
AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation
Ri-Zhao Qiu*, Yu-Xiong Wang, Kris Hauser
[pdf ]
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Jeonghyeok Do, Munchurl Kim*
[pdf ]
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu, Jixuan He, Wanhua Li*, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen*
[pdf ]
Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
Jae Joong Lee, Bosheng Li, Sara M Beery, Jonathan Huang, Songlin Fei, Raymond A. Yeh, Bedrich Benes*
[pdf ]
Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering
Baixin Xu, Jiangbei Hu, Fei Hou, Kwan-Yee Lin, Wayne Wu, Chen Qian, Ying He*
[pdf ]
DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models
Yuyang Huang, Yabo Chen, Yuchen Liu, xiaopeng zhang*, Wenrui Dai*, Hongkai Xiong, Qi Tian
[pdf ]
Open-Set Recognition in the Age of Vision-Language Models
Dimity Miller*, Niko Suenderhauf, Alex Kenna, Keita Mason
[pdf ]
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Qing Su*, Shihao Ji
[pdf ]
Robust Multimodal Learning via Representation Decoupling
Shicai Wei, Yang Luo, Yuji Wang, Chunbo Luo*
[pdf ]
Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
Yasi Zhang*, Peiyu Yu, Ying Nian Wu
[pdf ]
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang*, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann*
[pdf ]
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu, Yubin Cho, Beoungwoo Kang, Seunghun Moon, Kyeongbo Kong, Suk-Ju Kang*
[pdf ]
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai*, Haotian Zhang, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao
[pdf ]
Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
Manyuan Zhang*, Guanglu Song, Xiaoyu Shi, Yu Liu, Hongsheng Li
[pdf ]
Learning Representations from Foundation Models for Domain Generalized Stereo Matching
Yongjian Zhang, Longguang Wang, Kunhong Li, WANG Yun, Yulan Guo*
[pdf ]
Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction
Jianxiong Tang*, Jian-Huang Lai*, Lingxiao Yang, Xiaohua Xie
[pdf ]
Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
Qinji Yu*, Yirui Wang*, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Na Shen, Qifeng Wang, Xiaowei Ding, Le Lu, Xianghua Ye*, Dakai Jin*
[pdf ]
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
shuangkang fang*, Yufeng Wang*, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang
[pdf ]
Event-Adapted Video Super-Resolution
Zeyu Xiao, Dachun Kai, Yueyi Zhang, Zheng-Jun Zha, Xiaoyan Sun, Zhiwei Xiong*
[pdf ]
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal*, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai
[pdf ]
Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching
Xiaoyong Lu*, Songlin Du*
[pdf ]
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibo Wang*, Weifeng Ge*
[pdf ]
Catastrophic Overfitting: A Potential Blessing in Disguise
MN Zhao, Lihe Zhang*, Yuqiu Kong, Baocai Yin
[pdf ]
Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
Shengqi Xu, Run Sun, Yi Chang*, Shuning Cao, Xueyao Xiao, Luxin Yan
[pdf ]
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo, Ceyuan Yang*, Anyi Rao, Maneesh Agrawala, Dahua Lin*, Bo Dai*
[pdf ]
Visual Alignment Pre-training for Sign Language Translation
Peiqi Jiao, Yuecong Min, Xilin Chen*
[pdf ]
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin, Conghui He*, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou
[pdf ]
Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu*, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, Qiang Liu*
[pdf ]
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan, Yousong Zhu*, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang
[pdf ]
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
Huangbiao Xu, Xiao Ke*, Yuezhou Li, Rui Xu, Huanqi Wu, Xiaofeng Lin, Wenzhong Guo
[pdf ]
Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
Tao Chen*, Xiruo Jiang, Gensheng Pei, Zeren Sun, Yucheng Wang, Yazhou Yao
[pdf ]
BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow
EungGu Kang*, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin
[pdf ]
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Tao Huang*, Guangqi Jiang, Yanjie Ze, Huazhe Xu*
[pdf ]
Recursive Visual Programming
Jiaxin Ge*, Sanjay Subramanian, Baifeng Shi, Roei Herzig, Trevor Darrell
[pdf ]
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang*, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang
[pdf ]
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang, Jongoh Jeong, Kuk-Jin Yoon*
[pdf ]
Learning to Adapt SAM for Segmenting Cross-domain Point Clouds
Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Yujing Sun, Tai Wang, Xinge Zhu*, Yuexin Ma*
[pdf ]
Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging
In Cho, Hyunbo Shim, Seon Joo Kim*
[pdf ]
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
Jinke Li*, Xiao He*, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang*
[pdf ]
Fine-grained Dynamic Network for Generic Event Boundary Detection
Ziwei Zheng, Lijun He, Le Yang, Fan Li*
[pdf ]
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang, Jiting Cai, Mingyu Liu, Yue Xu, Cewu Lu, Yong-Lu Li*
[pdf ]
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
Jiannan Ge*, Lingxi Xie, Hongtao Xie, Pandeng Li, Xiaopeng Zhang, Yongdong Zhang, Qi Tian
[pdf ]
Learning with Counterfactual Explanations for Radiology Report Generation
Mingjie Li*, Haokun Lin, Liang Qiu, Xiaodan Liang*, Ling Chen, Abdulmotaleb Elsaddik, Xiaojun Chang
[pdf ]
SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
Weilong Chai*, Dandan Zheng, Jiajiong Cao, Zhiquan Chen, Changbao Wang, Chenguang Ma
[pdf ]
Better Regression Makes Better Test-time Adaptive 3D Object Detection
Jiakang Yuan, Bo Zhang, Kaixiong Gong, Xiangyu Yue, Botian Shi, Yu Qiao, Tao Chen*
[pdf ]
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi*, Kaisheng Ma*
[pdf ]
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
Weihang Liu, Xue Xian Zheng, Jingyi Yu, Xin Lou*
[pdf ]
Finding Visual Task Vectors
Alberto Hojel*, Yutong Bai, Trevor Darrell, Amir Globerson, Amir Bar*
[pdf ]
Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation
Zongrui Li*, Minghui Hu, Qian Zheng*, Xudong Jiang
[pdf ]
Event Camera Data Dense Pre-training
Yan Yang, Liyuan Pan*, Liu liu
[pdf ]
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu*, Liang Li, Li Su, Chenggang Yan, Qingming Huang
[pdf ]
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian*, Shuangrui Ding, Dahua Lin
[pdf ]
Layer-Wise Relevance Propagation with Conservation Property for ResNet
Seitaro Otsuki*, Tsumugi Iida*, Félix Doublet*, Tsubasa Hirakawa*, Takayoshi Yamashita*, Hironobu Fujiyoshi*, Komei Sugiura*
[pdf ]
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang, Xinyun Jiang, Jun Xiao, Tao Chen, Long Chen*
[pdf ]
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Qiao Gu*, Zhaoyang Lv*, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney*
[pdf ]
MEVG : Multi-event Video Generation with Text-to-Video Models
Gyeongrok Oh*, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Sangpil Kim*
[pdf ]
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan, Xiangtai Li*, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy
[pdf ]
Data-to-Model Distillation: Data-Efficient Learning Framework
Ahmad Sajedi*, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A. Lawryshyn, Konstantinos N. Plataniotis
[pdf ]
DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays
Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Xiantong Zhen*, Zhen Qian, Juan Zhang*, Baochang Zhang
[pdf ]
AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network
Yuxi Li*, Fuyuan Cheng, Wangbo Yu, Guangshuo Wang, Guibo Luo*, Yuesheng Zhu*
[pdf ]
ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion
Yan Hong*, Yuxuan Duan, Bo Zhang, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang*
[pdf ]
ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency
Shaocheng Yan, Pengcheng Shi, Jiayuan Li*
[pdf ]
Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation
Yuchen Yang, Yu Qiao, Xiao Sun*
[pdf ]
MoVideo: Motion-Aware Video Generation with Diffusion Models
Jingyun Liang*, Yuchen Fan, Kai Zhang*, Radu Timofte, Luc Van Gool, Rakesh Ranjan
[pdf ]
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao*, Bo Wan, Xu Jia, Yunzhi Zhuge, Ying Zhang, Huchuan Lu*, Long Chen
[pdf ]
MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection
Hongbin Lin, Yifan Zhang, Shuaicheng Niu, Shuguang Cui, Zhen Li*
[pdf ]
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
Qianjiang Hu, Zhimin Zhang, Wei Hu*
[pdf ]
Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
Xiaofeng Yang*, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu, Guosheng Lin
[pdf ]
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Fu-Yun Wang*, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li*
[pdf ]
Physically Plausible Color Correction for Neural Radiance Fields
Qi Zhang*, Ying Feng, HONGDONG LI*
[pdf ]
Unifying 3D Vision-Language Understanding via Promptable Queries
ziyu zhu*, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng*, Siyuan Huang*, Qing Li*
[pdf ]
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han*
[pdf ]
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
Xi Yang*, Chenhang He, Jianqi Ma, Lei Zhang
[pdf ]
PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
Yong Zhong, Min Zhao, Zebin You, Xiaofeng Yu, Changwang Zhang, Chongxuan Li*
[pdf ]
MAD-DR: Map Compression for Visual Localization with Matchness Aware Descriptor Dimension Reduction
Qiang Wang*
[pdf ]
Benchmarking Object Detectors with COCO: A New Path Forward
Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai*
[pdf ]
Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification
Chenyue Li, Shuoyi Chen, Mang Ye*
[pdf ]
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Xin-Jian Wu*, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu*
[pdf ]
Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang*
[pdf ]
DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang, Guosheng Lin*, Qingyao Wu*
[pdf ]
Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
Zizheng Yang, Hu Yu, Bing Li, Jinghao Zhang, Jie Huang, Feng Zhao*
[pdf ]
Uncertainty-aware sign language video retrieval with probability distribution modeling
Xuan Wu*, Hongxiang Li, yuanjiang luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu*
[pdf ]
NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction
Dong Wei, Huaijiang Sun, Xiaoning Sun*, Shengxiang Hu
[pdf ]
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan, Wei Shen*, Xue Yang, Xuehui Wang, Xiaokang Yang
[pdf ]
VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition
Ahmad Khaliq, Ming Xu, Stephen Hausler, Michael J Milford, Sourav Garg*
[pdf ]
DSA: Discriminative Scatter Analysis for Early Smoke Segmentation
Lujian Yao*, Haitao Zhao*, Jingchao Peng, Zhongze Wang, Kaijie Zhao
[pdf ]
SAFARI: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Sayan Nag*, Koustava Goswami, Srikrishna Karanam
[pdf ]
KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
Yifan Zhan, Zhuoxiao Li, Muyao Niu, Zhihang Zhong, Shohei Nobuhara, Ko Nishino, Yinqiang Zheng*
[pdf ]
Physical-Based Event Camera Simulator
Haiqian Han, Jiacheng Lyu, Jianing Li*, Henglu Wei, Cheng Li, Yajing Wei, SHU CHEN, Xiangyang Ji*
[pdf ]
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang*, Runyu Ding, Ellis L Brown, Xiaojuan Qi, Saining Xie
[pdf ]
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma*, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang*
[pdf ]
Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Jian Gao, chun gu, Youtian Lin, Zhihao Li, Hao Zhu, Xun Cao, Li Zhang*, Yao Yao*
[pdf ]
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
Jinfeng Liu*, Lingtong Kong, Bo Li, Zerong Wang, Hong Gu, Jinwei Chen
[pdf ]
CC-SAM: Enhancing SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
Shreyank N Gowda*, David A Clifton
[pdf ]
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
Wei Chen, Long Chen, Yu Wu*
[pdf ]
Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
Qifeng Li*, Xiaosong Jia, Shaobo Wang, Junchi Yan
[pdf ]
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Guansong Lu*, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu
[pdf ]
"X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning"
Artemis Panagopoulou*, Le Xue, Ning Yu, LI JUNNAN, DONGXU LI, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
[pdf ]
Learning Neural Volumetric Pose Features for Camera Localization
Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan*, Renjie Chen*, Ligang Liu, Jieping Ye
[pdf ]
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Shuangrui Ding*, Rui Qian, Haohang Xu, Dahua Lin, Hongkai Xiong
[pdf ]
REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices
Chaojie Ji*, Yufeng Li, Yiyi Liao
[pdf ]
Self-Training Room Layout via Geometry-aware Ray-casting
Bolivar Solarte*, Chin-Hsuan Wu*, Jin-Cheng Jhang*, Jonathan Lee*, Yi-Hsuan Tsai*, Min Sun*
[pdf ]
Closed-Loop Unsupervised Representation Disentanglement with $\\beta$-VAE Distillation and Diffusion Probabilistic Feedback
Xin Jin*, Bohan Li*, Baao Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li, Tao Yang, Wenjun Zeng
[pdf ]
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
Xiang Fang, Zeyu Xiong, Wanlong Fang, Xiaoye Qu, Chen Chen, Jianfeng Dong, Keke Tang, Pan Zhou*, Yu Cheng, Daizong Liu*
[pdf ]
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
Ming-Yang Ho, Che-Ming Wu, Min-Sheng Wu, Yufeng Jane Tseng*
[pdf ]
ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
Fu-Yun Wang*, Zhaoyang Huang*, Qiang Ma, Guanglu Song, Xudong LU, Weikang Bian, Yijin Li, Yu Liu, Hongsheng Li*
[pdf ]
Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach
Taolin Zhang, Jiawang Bai, Zhihe Lu, Dongze Lian, genping wang*, Xinchao Wang*, Shu-Tao Xia
[pdf ]
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
Chujie Qin, Ruiqi Wu, Zikun Liu, Xin Lin, Chun-Le Guo, Hyun Hee Park, Chongyi Li*
[pdf ]
When Fast Fourier Transform Meets Transformer for Image Restoration
Xingyu Jiang, Xiuhui Zhang, Ning Gao, Yue Deng*
[pdf ]
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, Chaowei Xiao*
[pdf ]
Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
Chen Rao, Guangyuan Li, Zehua Lan, Jiakai Sun, Junsheng Luan, Wei Xing*, Lei Zhao*, Huaizhong Lin*, Jianfeng Dong, Dalong Zhang
[pdf ]
CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
xunfa lai, Zhiyu Yang, Jie Hu, ShengChuan Zhang*, Liujuan Cao, Guannan Jiang, Songan Zhang, zhiyu wang, Rongrong Ji
[pdf ]
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
Pau de Jorge Aranda*, Riccardo Volpi, Puneet Dokania, Philip Torr, Gregory Rogez
[pdf ]
Textual Grounding for Open-vocabulary Visual Information Extraction in Layout-diversified Documents
Mengjun Cheng, Chengquan Zhang, Chang Liu*, Yuke Li, Bohan Li, Kun Yao, Xiawu Zheng, Rongrong Ji, Jie Chen
[pdf ]
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
Ruonan Yu, Songhua Liu, Jingwen Ye, Xinchao Wang*
[pdf ]
Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang*, Yanning Zhang
[pdf ]
D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Huiyu Zhou, Junyu Dong, Huaidong Zhang, Yong Du*
[pdf ]
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani*, Xian Liu, Wang Yifan, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B Lindell
[pdf ]
Blind Image Deconvolution by Generative-based Kernel Prior and Initializer via Latent Encoding
Jiangtao Zhang, Zongsheng Yue*, Hui Wang, Qian Zhao*, Deyu Meng
[pdf ]
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
Xuelong Dai*, Kaisheng Liang, Bin Xiao
[pdf ]
Improving Text-guided Object Inpainting with Semantic Pre-inpainting
Yifu Chen, Jingwen Chen, Yingwei Pan*, Yehao Li, Ting Yao, Zhineng Chen, Tao Mei
[pdf ]
Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
Yichen Li, Wenchao Xu, Haozhao Wang*, Yining Qi*, Jingcai Guo, Ruixuan Li*
[pdf ]
ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images
Xiangtian Xue, Jiasong Wu*, Youyong Kong, Lotfi Senhadji, Huazhong Shu
[pdf ]
RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng*
[pdf ]
Region-Adaptive Transform with Segmentation Prior for Image Compression
Yuxi Liu*, Wenhan Yang, Huihui Bai, Yunchao Wei, Yao Zhao
[pdf ]
Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks
Zhewei Wu, Ruilong Yu, Qihe Liu*, Shuying Cheng, Shilin Qiu, Shijie Zhou
[pdf ]
SLIM: Spuriousness Mitigation with Minimal Human Annotations
Xiwei Xuan*, Ziquan Deng, Hsuan-Tien Lin, Kwan-Liu Ma
[pdf ]
Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
Mijoo Kim, Junseok Kwon*
[pdf ]
X-Pose: Detecting Any Keypoints
Jie Yang, Ailing Zeng*, Ruimao Zhang*, Lei Zhang
[pdf ]
M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
Yingshuang Zou*, Yikang Ding, Xi Qiu, Haoqian Wang*, Haotian Zhang*
[pdf ]
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng, Yujie Zhong*, Chengjian Feng, Lin Ma
[pdf ]
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Le Yang*, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li
[pdf ]
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li*, Chengyao Wang, Jiaya Jia
[pdf ]
MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering
Guoxing Sun*, Rishabh Dabral, Pascal Fua, Christian Theobalt, Marc Habermann
[pdf ]
DiffPMAE: Diffusion Masked Autoencoders for Point Cloud Reconstruction
Yanlong LI*, Chamara Madarasingha, Kanchana Thilakarathna
[pdf ]
Multi-branch Collaborative Learning Network for 3D Visual Grounding
Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun*, Rongrong Ji
[pdf ]
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing*, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Gongye Liu, Xintao Wang, Ying Shan, Tien-Tsin Wong
[pdf ]
Motion Aware Event Representation-driven Image Deblurring
Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aiping Liu, Zheng-Jun Zha*
[pdf ]
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju*, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao*, Bo Zheng
[pdf ]
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhenxiang Lin, Xidong Peng, Peishan Cong, Ge Zheng, Yujing Sun, Yuenan HOU, Xinge Zhu, Sibei Yang, Yuexin Ma*
[pdf ]
RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
Longrong Yang, Hanbin Zhao, Yunlong Yu*, Xiaodong Zeng, Xi Li*
[pdf ]
Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
Luozhou Wang*, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li, Yingcong Chen*
[pdf ]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu*, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang*
[pdf ]
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang, Dingkang Liang*, Zichang Tan, Xiaoqing Ye, Cheng Zhang, Jingdong Wang, Xiang Bai*
[pdf ]
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang*, Ya-Li Li, TAICHI LIU, Hengshuang Zhao, Shengjin Wang
[pdf ]
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
Haibo Jin, Ruoxi Chen, Jinyin Chen, Haibin Zheng, Yang Zhang, Haohan Wang*
[pdf ]
UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li*, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, Zhibo Chen
[pdf ]
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu*, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li*
[pdf ]
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng*, Wayne Zhang
[pdf ]
Two-Stage Active Learning for Efficient Temporal Action Segmentation
Yuhao Su, Ehsan Elhamifar*
[pdf ]
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Yufei Liu, Junwei Zhu, Junshu Tang, Shijie Zhang, Jiangning Zhang, Weijian Cao, Chengjie Wang, Yunsheng Wu, Dongjin Huang*
[pdf ]
MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, Ronggang Wang*
[pdf ]
Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions
Yihao Ai*, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan
[pdf ]
Towards More Practical Group Activity Detection: A New Benchmark and Model
Dongkeun Kim, Youngkil Song, Minsu Cho, Suha Kwak*
[pdf ]
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You*, Zheyuan Li, Jinjin Gu*, Zhenfei Yin, Tianfan Xue*, Chao Dong*
[pdf ]
Zero-Shot Image Feature Consensus with Deep Functional Maps
Xinle Cheng, Congyue Deng*, Adam Harley, Yixin Zhu*, Leonidas Guibas*
[pdf ]
WindPoly: Polygonal Mesh Reconstruction via Winding Numbers
Xin He, Chenlei Lv, Pengdi Huang, Hui Huang*
[pdf ]
MinD-3D: Reconstruct High-quality 3D objects in Human Brain
Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, Yanwei Fu*
[pdf ]
Tokenize Anything via Prompting
Ting Pan*, Lulu Tang, Xinlong Wang*, Shiguang Shan
[pdf ]
Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
Ningli Xu, Rongjun Qin*
[pdf ]
Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
Jing Wu*, Mehrtash Harandi
[pdf ]
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Kaiwen Song, Xiaoyi Zeng, Chenqu Ren, Juyong Zhang*
[pdf ]
GRAPE: Generalizable and Robust Multi-view Facial Capture
Jing Li, Di Kang, Zhenyu He*
[pdf ]
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao, Beiwen Tian, Rong Zhi, Hao Zhao*
[pdf ]
Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
Yongwei Nie, Changzhen Liu, Chengjiang Long, Qing Zhang, Guiqing Li, Hongmin Cai*
[pdf ]
Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
Lianjun Wu, Jiangxiao Han, Zengqiang Zheng, Xinggang Wang*
[pdf ]
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang, Xiaoqi Zhao, JiaMing Zuo, Lihe Zhang*, Huchuan Lu
[pdf ]
SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
Xiaoyu Liu, Yuxiang Wei, Ming Liu*, Xianhui Lin, Peiran Ren, xuansong xie, Wangmeng Zuo
[pdf ]
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu*, Ruizhen Hu*
[pdf ]
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
Han Zhou, Wei Dong, Xiaohong Liu*, Shuaicheng Liu, Xiongkuo Min, Guangtao Zhai, Jun Chen*
[pdf ]
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
Xiaofeng Wang*, Zheng Zhu, Guan Huang, Chen Xinze, Jiagang Zhu, Jiwen Lu
[pdf ]
Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition
Muhammad Adi Nugroho*, Sangmin Woo, Sumin Lee, Jinyoung Park, Yooseung Wang, Donguk Kim, Changick Kim
[pdf ]
NeRF-XL: NeRF at Any Scale with Multi-GPU
Ruilong Li*, Sanja Fidler, Angjoo Kanazawa, Francis Williams
[pdf ]
CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems
Jiankun Zhao, Bowen Song, Liyue Shen*
[pdf ]
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
Qinyu Zhao*, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould
[pdf ]
Compositional Substitutivity of Visual Reasoning for Visual Question Answering
Chuanhao Li, Zhen Li, Chenchen Jing*, Yuwei Wu*, Mingliang Zhai, Yunde Jia
[pdf ]
LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu*
[pdf ]
DNI: Dilutional Noise Initialization for Diffusion Video Editing
Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong, Chang D. Yoo*
[pdf ]
Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
Xin Duan, Yu Cao, Lei Zhu, Gang Fu, Xin Wang, Renjie ZHANG, Ping Li*
[pdf ]
Towards Physical World Backdoor Attacks against Skeleton Action Recognition
Qichen Zheng, Yi Yu, SIYUAN YANG*, Jun Liu, Kwok-Yan Lam, Alex Kot
[pdf ]
SAM-guided Graph Cut for 3D Instance Segmentation
Haoyu Guo*, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu*, Xiaowei Zhou*
[pdf ]
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen*, Mengchen Liu, Noel C Codella, Yunsheng Li, Lu Yuan, Danna Gurari
[pdf ]
Active Generation for Image Classification
Tao Huang, Jiaqi Liu, Shan You*, Chang Xu
[pdf ]
FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
Chen-Wei Xie*, Siyang Sun, Liming Zhao, Pandeng Li, Shuailei Ma, Yun Zheng
[pdf ]
Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes
Chao Chen, Yu-Shen Liu*, Zhizhong Han
[pdf ]
Understanding Multi-compositional learning in Vision and Language models via Category Theory
Sotirios Panagiotis Chytas*, Hyunwoo J Kim, Vikas Singh
[pdf ]
FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
Shangchao Su, Bin Li*, Xiangyang Xue
[pdf ]
Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration
Youngjin Oh*, Keuntek Lee, Jooyoung Lee, Dae-Hyun Lee, Nam Ik Cho
[pdf ]
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
Pengkun Jiao*, Na Zhao*, Jingjing Chen, Yu-Gang Jiang
[pdf ]
Diffusion-Guided Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon, Hoyong Kwon, Jaeseok Jeong, Daehee Park, Kuk-Jin Yoon*
[pdf ]
Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment
Yang Jin*, Yadong Mu*
[pdf ]
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian*, Ping Luo, Wentao Liu
[pdf ]
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
Yoonwoo Jeong, Jinwoo Lee, Chiheon Kim, Minsu Cho*, Doyup Lee*
[pdf ]
Segment and Recognize Anything at Any Granularity
Feng Li*, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, Lei Zhang*, Jianfeng Gao*
[pdf ]
Real-time Holistic Robot Pose Estimation with Unknown States
Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu*, Yu QIAO*, Yizhou Wang
[pdf ]
CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning
Junghun Oh, Sungyong Baik, Kyoung Mu Lee*
[pdf ]
A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
Ronglai Zuo, Fangyun Wei*, Zenggui Chen, Brian Mak, Jiaolong Yang, Xin Tong
[pdf ]
An accurate detection is not all you need to combat label noise in web-noisy datasets
Paul Albert*, Kevin McGuinness, Eric Arazo, Tarun Krishna, Noel O Connor, Jack Valmadre
[pdf ]
Online Vectorized HD Map Construction using Geometry
Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, Fusheng Jin*, Xiangyu Yue
[pdf ]
Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids
Wontae Kim*, Nam Ik Cho*
[pdf ]
Learned HDR Image Compression for Perceptually Optimal Storage and Display
Peibei Cao, HAOYU CHEN, Jingzhe Ma, Yu-Chieh Yuan, Zhiyong Xie, Xin Xie, Haiqing Bai, Kede Ma*
[pdf ]
Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Huadong Li, Minhao Jing, Jin Wang, Shichao Dong, Jiajun Liang, Haoqiang Fan, Renhe Ji*
[pdf ]
Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration
Qiang Wang*, Yuhang He, Songlin Dong, Xinyuan Gao, Shaokun Wang, Yihong Gong
[pdf ]
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
Yuan Tian*, Guo Lu*, Guangtao Zhai*
[pdf ]
Improving Virtual Try-On with Garment-focused Diffusion Models
Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan*, Ting Yao, Yang Cao, Tao Mei
[pdf ]
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
Feng Liu*, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou*
[pdf ]
Disentangled Generation and Aggregation for Robust Radiance Fields
Shihe Shen, Huachen Gao, Wangze Xu, Rui Peng, Luyang Tang, Kaiqiang Xiong, Jianbo Jiao, Ronggang Wang*
[pdf ]
UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation
Mengqi Guo*, Chen Li, Hanlin Chen, Gim Hee Lee
[pdf ]
Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
Jiawei Han, Kaiqi Liu*, Wei Li, Guangzhi Chen
[pdf ]
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro*
[pdf ]
Semantic-guided Robustness Tuning for Few-Shot Transfer Across Extreme Domain Shift
kangyu xiao*, Zilei Wang, junjie li
[pdf ]
Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations
Zipeng Wang*, yunfan lu, Lin Wang*
[pdf ]
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
Yang Zhou*, Yongjian Wu, Jiya Saiyin, Bingzheng Wei, Maode Lai, Eric I Chang, Yan Xu*
[pdf ]
Open-World Dynamic Prompt and Continual Visual Representation Learning
Youngeun Kim, Jun Fang*, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer
[pdf ]
Learning Video Context as Interleaved Multimodal Sequences
Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou*
[pdf ]
Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
Wenyuan Zhang, Kanle Shi, Yu-Shen Liu*, Zhizhong Han
[pdf ]
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li*, Zhengqiang ZHANG, Chenhang He, Zhiyuan Ma, Vishal Patel, Lei Zhang
[pdf ]
Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
Cheng Gong, Yao Chen*, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun*, Le Zhang
[pdf ]
Multi-scale Cross Distillation for Object Detection in Aerial Images
Kun Wang, Zi Wang, Zhang Li*, Xichao Teng, Yang Li
[pdf ]
Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo*
[pdf ]
Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence
Yutong Chen, Yifan Zhan, Zhihang Zhong*, Wei Wang, Xiao Sun*, Yu Qiao, Yinqiang Zheng
[pdf ]
Revisit Human-Scene Interaction via Space Occupancy
Xinpeng Liu, Haowen Hou, Yanchao Yang, Yong-Lu Li*, Cewu Lu
[pdf ]
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Yue Han*, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu
[pdf ]
WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
Haisheng Fu*, Jie Liang, Zhenman Fang, Jingning Han, Feng Liang, Guohe Zhang
[pdf ]
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
Pengyu Li*, biao wang, Tianchu Guo, Xian-Sheng Hua
[pdf ]
Mitigating Background Shift in Class-Incremental Semantic Segmentation
Gilhan Park, WonJun Moon, SuBeen Lee, Tae-Young Kim, Jae-Pil Heo*
[pdf ]
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Xiuquan Hou, Meiqin Liu*, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan
[pdf ]
BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He*
[pdf ]
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, Gao Huang*
[pdf ]
Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion
Quoc-Huy Tran*, Muhammad Ahmed, Murad Popattia, Muhammad Hassan Ahmed, Andrey Konin, Zeeshan Zia
[pdf ]
Resolving Scale Ambiguity in Multi-view 3D Reconstruction using Dual-Pixel Sensors
Kohei Ashida*, Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita
[pdf ]
Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
Shibin Mei, Bingbing Ni*, Hang Wang, Chenglong Zhao, fengfa hu, Zhiming Pi, BiLian Ke
[pdf ]
Towards Stable 3D Object Detection
Jiabao Wang, Qiang Meng, Guochao Liu, Liujiang Yan, Ke Wang, Ming-Ming Cheng, Qibin Hou*
[pdf ]
FYI: Flip Your Images for Dataset Distillation
Byunggwan Son*, Youngmin Oh, Donghyeon Baek, Bumsub Ham*
[pdf ]
On-the-fly Category Discovery for LiDAR Semantic Segmentation
Hyeonseong Kim, Sung-Hoon Yoon, Minseok Kim, Kuk-Jin Yoon*
[pdf ]
Dual-Camera Smooth Zoom on Mobile Phones
Renlong Wu, Zhilu Zhang*, Yu Yang, Wangmeng Zuo
[pdf ]
ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
Xumin Yu, Yanbo Wang, Jie Zhou, Jiwen Lu*
[pdf ]
CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
Long Li, Nian Liu*, Dingwen Zhang, Zhongyu Li, Salman Khan, Rao Anwer, Hisham Cholakkal, Junwei Han*, Fahad Shahbaz Khan
[pdf ]
Cascade Prompt Learning for Visual-Language Model Adaptation
Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li*
[pdf ]
PolyRoom: Room-aware Transformer for Floorplan Reconstruction
Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen*
[pdf ]
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai*, Zirui Song, Dayan Guan*, Zhenhao Chen, Yaohang Li, Xing Luo, Chenyu Yi, Alex Kot
[pdf ]
SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
mingjun zheng, Long Sun, Jiangxin Dong, Jinshan Pan*
[pdf ]
HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang*, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang
[pdf ]
Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation
Bowei Xing*, Xianghua Ying, Ruibin Wang, Ruohao Guo, Ji Shi, Wenzhen Yue
[pdf ]
Customized Generation Reimagined: Fidelity and Editability Harmonized
Jian Jin, Yang Shen, Zhenyong Fu*, Jian Yang*
[pdf ]
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu*, Xin Liu*, Weicheng Xie, Huanjing Yue, Jingyu Yang
[pdf ]
Improving Video Segmentation via Dynamic Anchor Queries
Yikang Zhou, Tao Zhang*, Xiangtai Li*, Shunping Ji*, Shuicheng Yan
[pdf ]
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao*, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai
[pdf ]
Diffusion Models as Optimizers for Efficient Planning in Offline RL
Renming Huang, Yunqiang Pei, Guoqing Wang*, Yangming Zhang, Yang Yang, Peng Wang, Heng Tao Shen
[pdf ]
Enhanced Sparsification via Stimulative Training
Shengji Tang, Weihao Lin, Hancheng Ye, Peng Ye, Chong Yu, Baopu Li, Tao Chen*
[pdf ]
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu*, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie*
[pdf ]
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
Jingyang Huo, Yikai Wang, Yanwei Fu*, Xuelin Qian, Chong Li, Yun Wang, Jianfeng Feng
[pdf ]
Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image
Xingyu Liu, Pengfei Ren, Jingyu Wang*, Qi Qi, Haifeng Sun, Zirui Zhuang*, Jianxin Liao
[pdf ]
Efficient Snapshot Spectral Imaging: Calibration-Free Parallel Structure with Aperture Diffraction Fusion
Tao Lv*, Lihao Hu, Shiqiao Li, Chenglong Huang, Xun Cao
[pdf ]
Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
Fangzhou Song, Bin Zhu, Yanbin Hao*, Shuo Wang
[pdf ]
PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking
Jiahuan Long*, Tingsong Jiang*, Wen Yao*, Shuai Jia*, Weijia Zhang*, Weien Zhou*, Chao Ma*, Xiaoqian Chen*
[pdf ]
HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
Shen Zhang, Zhaowei CHEN, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun Liang*
[pdf ]
On the Approximation Risk of Few-Shot Class-Incremental Learning
Xuan Wang, Zhong Ji*, Xiyao Liu, Yanwei Pang, Jungong Han
[pdf ]
Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach
Yunseo Yang, Jihun Kim, Kuk-Jin Yoon*
[pdf ]
Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization
Jiajun Hu, Jian Zhang, Lei Qi*, Yinghuan Shi*, Yang Gao
[pdf ]
SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning
Zerun Wang*, Liuyu Xiang, Lang Huang, Jiafeng Mao, Ling Xiao, Toshihiko Yamasaki
[pdf ]
Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning
Meixuan Li, Tianyu Li, Guoqing Wang*, Peng Wang, Yang Yang, Jie Zou
[pdf ]
MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation
Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang*, Wangmeng Zuo*
[pdf ]
PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training
Suyi Chen, Hao Xu, Haipeng Li, Kunming Luo, Guanghui Liu, Chi-Wing Fu, Ping Tan, Shuaicheng Liu*
[pdf ]
General Geometry-aware Weakly Supervised 3D Object Detection
Guowen Zhang*, Junsong Fan, Liyi Chen, Zhaoxiang Zhang, Zhen Lei, Lei Zhang
[pdf ]
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang*, Pan Zhang, Xiaoyi Dong*, Yuhang Zang, Jiaqi Wang*
[pdf ]
Dolfin: Diffusion Layout Transformers without Autoencoder
Yilin Wang, Zeyuan Chen, Liangjun Zhong, Zheng Ding, Zhuowen Tu*
[pdf ]
Real-time 3D-aware Portrait Editing from a Single Image
Qingyan Bai*, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen*, Qifeng Chen*
[pdf ]
StructLDM: Structured Latent Diffusion for 3D Human Generation
Tao Hu, Fangzhou Hong, Ziwei Liu*
[pdf ]
Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
Han Li*, Shaohui Li*, Shuangrui Ding, Wenrui Dai*, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong
[pdf ]
Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
Hyeonwoo Kim, Sookwan Han, Patrick Kwon, Hanbyul Joo*
[pdf ]
Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification
Yu Bai, Bo Zhang*, Zheng Zhang, Shuo Yan, Zibo Ma, Wu Liu, Xiuzhuang Zhou, Xiangyang Gong, Wendong Wang
[pdf ]
Continuous Memory Representation for Anomaly Detection
Joo Chan Lee*, Taejune Kim, Eunbyung Park*, Simon S Woo*, Jong Hwan Ko*
[pdf ]
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
Xing Cui, Zekun Li, Peipei Li*, Huaibo Huang, Xuannan Liu, Zhaofeng He
[pdf ]
PACE: Pose Annotations in Cluttered Environments
Yang You*, kai xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou FANG, Adam Harley, Leonidas Guibas, Cewu Lu*
[pdf ]
CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon*
[pdf ]
CountFormer: Multi-View Crowd Counting Transformer
Hong Mo*, Xiong Zhang*, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren
[pdf ]
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
Haiyang Zheng, Nan Pu, Wenjing Li*, Nicu Sebe, Zhun Zhong*
[pdf ]
Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
Jaein Kim, HEE BIN YOO, Dong-Sig Han, Yeon-Ji Song, Byoung-Tak Zhang*
[pdf ]
EA-VTR: Event-Aware Video-Text Retrieval
Zongyang Ma*, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu LI, Xiaojuan Qi, Ying Shan, Weiming Hu
[pdf ]
Privacy-Preserving Adaptive Re-Identification without Image Transfer
Hamza Rami*, Jhony H. Giraldo, Nicolas Winckler, Stéphane Lathuilière
[pdf ]
A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
Miao Cao*, Lishun Wang, Huan Wang, Xin Yuan
[pdf ]
DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks
Caixin Kang*, Yinpeng Dong, Zhengyi Wang, Shouwei Ruan, Yubo Chen, Hang Su*, Xingxing Wei*
[pdf ]
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
Kihong Kim, Haneol Lee, Jihye Park, Seyeon Kim, Kwang Hee Lee, Seungryong Kim*, Jaejun Yoo*
[pdf ]
Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation
Anqi Zhang, Guangyu Gao*
[pdf ]
Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation
Yeongtak Oh, Jonghyun Lee, Jooyoung Choi, Dahuin Jung, Uiwon Hwang*, Sungroh Yoon*
[pdf ]
Learning to Unlearn for Robust Machine Unlearning
Mark He Huang*, Lin Geng Foo, Jun Liu*
[pdf ]
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper*, Hadar Averbuch-Elor
[pdf ]
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Zhenliang Ni, Xinghao Chen*, Yingjie Zhai, Yehui Tang, Yunhe Wang*
[pdf ]
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima*, Katrin Renz, Kashyap Chitta, Li Chen, Zhang Hanxue, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, Hongyang Li
[pdf ]
Neural Spectral Decomposition for Dataset Distillation
Shaolei Yang, Shen Cheng, Mingbo Hong, Haoqiang Fan, Xing Wei, Shuaicheng Liu*
[pdf ]
Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation
Linlong Fan, Ye Huang*, Yanqi Ge, Wen Li, Lixin Duan
[pdf ]
Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection
Lars Doorenbos*, Raphael Sznitman, Pablo Márquez Neila
[pdf ]
Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
Trinh Le Ba Khanh*, Huy-Hung Nguyen, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Jae Wook Jeon*
[pdf ]
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Weidi Xie, Yan-Feng Wang*
[pdf ]
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
Junxiong Lin*, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haoran Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang
[pdf ]
Disentangled Clothed Avatar Generation from Text Descriptions
Jionghao Wang*, Yuan Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Cheng Lin, Rong Xie, Li Song*, Xin Li, Wenping Wang*
[pdf ]
Real Appearance Modeling for More General Deepfake Detection
Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jiao Dai, Yesheng Chai*, Jizhong Han
[pdf ]
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Matteo Bortolon*, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue
[pdf ]
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
Jia-Hao Xiao, Ming-Kun Xie, Heng-Bo Fan, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang*
[pdf ]
V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
Hao Xiang, Xin Xia, Zhaoliang Zheng, Runsheng Xu, Letian Gao, Zewei Zhou, xu han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma*
[pdf ]
VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
Guénolé Fiche*, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno
[pdf ]
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang, Ke Liu, Jingjun Gu, Xiaoxu Cai, Zhihua Wang, Jiajun Bu, Haishuai Wang*
[pdf ]
HARIVO: Harnessing Text-to-Image Models for Video Generation
Mingi Kwon, Seoung Wug Oh, Yang Zhou, Joon-Young Lee, Difan Liu, Haoran Cai, Baqiao Liu, Feng Liu, Youngjung Uh*
[pdf ]
Deep Online Probability Aggregation Clustering
Yuxuan Yan, Na Lu*, Ruofan Yan
[pdf ]
WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification
Yonggan Wu, Ling-Chao Meng*, Yuan Zichao, Sixian Chan, Hong-Qiang Wang*
[pdf ]
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
Chao Gong*, Kai Chen, Zhipeng Wei, Jingjing Chen*, Yu-Gang Jiang
[pdf ]
Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu*, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang*
[pdf ]
Length-Aware Motion Synthesis via Latent Diffusion
Alessio Sampieri*, Alessio Palma, Indro Spinelli, Fabio Galasso
[pdf ]
Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
Yunlong Zhang*, Honglin Li, YUXUAN SUN, Chenglu Zhu, Sunyi Zheng, Lin Yang*
[pdf ]
An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers
Chi Zhang*, Jingpu Cheng, Qianxiao Li
[pdf ]
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun*, Rongrong Ji
[pdf ]
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
Jianwei Zhao*, Xin Li, Fan Yang, Qiang Zhai*, Ao Luo, Zhicheng Jiao, Hong Cheng
[pdf ]
Improving image synthesis with diffusion-negative sampling
Alakh Desai*, Nuno Vasconcelos
[pdf ]
AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
Feichi Lu*, Zijian Dong*, Jie Song, Otmar Hilliges
[pdf ]
FedVAD: Enhancing Federated Video Anomaly Detection with GPT-Driven Semantic Distillation
Fan Qi*, Ruijie Pan, Huaiwen Zhang, Changsheng Xu*
[pdf ]
SignGen: End-to-End Sign Language Video Generation with Latent Diffusion
Fan Qi*, Yu Duan, Changsheng Xu, Huaiwen Zhang*
[pdf ]
"Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization"
Hongjing Niu*, Hanting Li, Bin Li, Feng Zhao*
[pdf ]
Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
Sojin Lee, Dogyun Park, Inho Kong, Hyunwoo J. Kim*
[pdf ]
The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations
Anselm Haselhoff*, Kevin Trelenberg, Fabian Küppers, Jonas Schneider
[pdf ]
Accelerating Image Generation with Sub-path Linear Approximation Model
Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang*
[pdf ]
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi*, Tobia Poppi*, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
[pdf ]
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
Nikolai Kalischek*, Torben Peters, Jan Dirk Wegner, Konrad Schindler
[pdf ]
Camera Calibration using a Collimator System
Shunkun Liang, Banglei Guan*, Zhenbao Yu, Pengju Sun, Yang Shang
[pdf ]
Label-free Neural Semantic Image Synthesis
Jiayi Wang*, Kevin A Laube, Yumeng Li, Jan Hendrik Metzen, Shin-I Cheng, Julio Borges, Anna Khoreva
[pdf ]
Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation
Yuwen Pan*, Rui Sun, Naisong Luo, Tianzhu Zhang, Yongdong Zhang
[pdf ]
Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun*, Kede Ma
[pdf ]
DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching
Paul Roetzer*, Ahmed Abbas*, Dongliang Cao, Florian Bernard, Paul Swoboda
[pdf ]
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham, Changick Kim*
[pdf ]
"FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN"
Riccardo Santambrogio*, Marco Cannici, Matteo Matteucci
[pdf ]
ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
Xiaoshuai Zhang*, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen L Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas
[pdf ]
MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment
Anurag Das*, Xinting Hu, Li Jiang, Bernt Schiele
[pdf ]
Event-Aided Time-To-Collision Estimation for Autonomous Driving
Jinghang Li, Bangyan Liao, Xiuyuan Lu, Peidong Liu, Shaojie Shen, Yi Zhou*
[pdf ]
The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation
Muyang Qiu, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi*, Yang Gao
[pdf ]
VEON: Vocabulary-Enhanced Occupancy Prediction
Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xiangxuan Ren, Bailan Feng, Chao Ma*
[pdf ]
Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models
Mengyu Zheng*, Yehui Tang, Zhiwei Hao, Kai Han, Yunhe Wang, Chang Xu*
[pdf ]
The Sky's the Limit: Relightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-In Visibility
James A D Gardner*, Evgenii Kashin, Bernhard Egger, William Smith
[pdf ]
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Xinxu Ge, Xin Liu*, Zitong Yu*, Jingang Shi, Chun Qi, Jie Li, Heikki Kälviäinen
[pdf ]
Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception
Congzhang Shao, Guiyang Luo*, Quan Yuan*, Yifu Chen, Yilin Liu, Gong Kexin, Jinglin Li
[pdf ]
Learning-based Axial Video Motion Magnification
Kwon Byung-Ki, Oh Hyun-Bin, Kim Jun-Seong, Hyunwoo Ha, Tae-Hyun Oh*
[pdf ]
Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
Yan Hao, Florent Forest*, Olga Fink
[pdf ]
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
Linlan Huang, Xusheng Cao, Haori Lu, Xialei Liu*
[pdf ]
cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process
Yihang Chen, Tsai Hor Chan, Guosheng Yin, Yuming Jiang, Lequan Yu*
[pdf ]
Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
Haijun Xiong, Bin Feng*, Xinggang Wang, Wenyu Liu
[pdf ]
Retargeting Visual Data with Deformation Fields
Tim Elsner*, Julia Berger, Tong Wu, Victor Czech, Lin Gao, Leif Kobbelt
[pdf ]
Delving Deep into Engagement Prediction of Short Videos
dasong Li, Wenjie Li, Baili Lu, Hongsheng Li, Sizhuo Ma, Gurunandan Krishnan, Jian Wang*
[pdf ]
Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration
Emanuel Sanchez Aimar*, Nathaniel D Helgesen, Yonghao Xu, Marco Kuhlmann, Michael Felsberg
[pdf ]
CLEO: Continual Learning of Evolving Ontologies
Shishir Muralidhara*, Saqib Bukhari, Georg Dr. Schneider, Didier Stricker, René Schuster
[pdf ]
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Xixu Hu, Runkai Zheng, Jindong Wang*, Cheuk Hang Leung, Qi Wu*, Xing Xie
[pdf ]
Wavelet Convolutions for Large Receptive Fields
Shahaf E Finder*, Roy Amoyal, Eran Treister, Oren Freifeld*
[pdf ]
"BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion"
Bo-Kyeong Kim*, Hyoung-Kyu Song, Thibault Castells, Shinkook Choi
[pdf ]
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation
Haoyu Ji, Bowen Chen, Xinglong Xu, Weihong Ren, Zhiyong Wang*, Honghai Liu
[pdf ]
Leveraging scale- and orientation-covariant features for planar motion estimation
Marcus Valtonen Örnhag*, Alberto Jaenal
[pdf ]
Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
Zijun Long*, Lipeng Zhuang, George W Killick, Richard Mccreadie, Gerardo Aragon-Camarasa, Paul Henderson
[pdf ]
Adaptive Parametric Activation
Konstantinos P Alexandridis*, Jiankang Deng, Anh Nguyen, Shan Luo
[pdf ]
Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization
Yukun Wang*, Kunhong Li, Minglin Chen, Longguang Wang, Shunbo Zhou, Kaiwen Xue, Yulan Guo*
[pdf ]
VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, Jaegul Choo*
[pdf ]
HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
Tianpei Zou, Sanqing Qu, Zhijun Li, Alois C. Knoll, 何 良华, Guang Chen*, Changjun Jiang
[pdf ]
SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Richard Shaw*, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Pérez Pellitero
[pdf ]
Temporal-Mapping Photography for Event Cameras
Yuhan Bao, Lei Sun*, Yuqin Ma, Kaiwei Wang*
[pdf ]
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
Tuo Feng, Wenguan Wang, Ruijie Quan, Yi Yang*
[pdf ]
LineFit: A Geometric Approach for Fitting Line Segments in Images
Marion Boyer, David Youssefi, Florent Lafarge*
[pdf ]
Six-Point Method for Multi-Camera Systems with Reduced Solution Space
Banglei Guan, Ji Zhao*, Laurent Kneip
[pdf ]
Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network
Sukwon Yun, Jie Peng, Alexandro E Trevino, Chanyoung Park, Tianlong Chen*
[pdf ]
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Zilong Dong, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu*, Siyu Zhu*
[pdf ]
AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition
Fadi Boutros*, Vitomir Struc, Naser Damer
[pdf ]
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang, Shenghui Du, Lequan Yu*
[pdf ]
Labeled Data Selection for Category Discovery
Bingchen Zhao*, Nico Lang, Serge Belongie, Oisin Mac Aodha*
[pdf ]
Dependency-aware Differentiable Neural Architecture Search
Buang Zhang*, Xinle Wu, Hao Miao, Bin Yang, Chenjuan Guo
[pdf ]
WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie, Yuzhe Li, Yang Liu, Zhifei Zhang, Zhaowen Wang, Wei Xiong, Xiang Bai*
[pdf ]
CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
Wuyang Li, Xinyu Liu, Jiayi Ma, Yixuan Yuan*
[pdf ]
GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer
Youngho Yoon, Hyun-Kurl Jang, Kuk-Jin Yoon*
[pdf ]
Norface: Improving Facial Expression Analysis by Identity Normalization
Hanwei Liu*, Rudong An, Zhimeng Zhang, Bowen Ma, Wei Zhang, Yan Song, Yujing Hu, Chen Wei, Yu Ding*
[pdf ]
Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Yuan Ding, Yifan Yang*
[pdf ]
SNeRV: Spectra-preserving Neural Representation for Video
Jina Kim*, Jihoo Lee*, Jewon Kang*
[pdf ]
COMO: Compact Mapping and Odometry
Eric Dexheimer*, Andrew Davison
[pdf ]
OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction
Yini Fang*, Jingling Yu, Haozheng Zhang, Ralf van der Lans, Bertram E Shi
[pdf ]
SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder
Jaeseong Lee*, Junha Hyung*, Sohyun Jeong, Jaegul Choo*
[pdf ]
EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation
Chenhongyi Yang*, Anastasia Tkach, Shreyas Hampali, Linguang Zhang, Elliot J Crowley, Cem Keskin
[pdf ]
An Information Theoretical View for Out-Of-Distribution Detection
Hu Jinjing, Wenrui Liu, Hong Chang*, Bingpeng MA, Shiguang Shan, Xilin Chen
[pdf ]
DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes
Jing-Wen Yang, Jia-Mu Sun, Yong-Liang Yang, Jie Yang, Ying Shan, Yan-Pei Cao, Lin Gao*
[pdf ]
Gated Temporal Diffusion for Stochastic Long-term Dense Anticipation
Olga Zatsarynna*, Emad Bahrami*, Yazan Abu Farha, Gianpiero Francesca, Jürgen Gall*
[pdf ]
Gradient-Aware for Class-Imbalanced Semi-supervised Medical Image Segmentation
Wenbo Qi, Jiafei Wu*, S. C. Chan*
[pdf ]
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova*, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne
[pdf ]
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
Sanmin Kim, Youngseok Kim, Sihwan Hwang, Hyeonjun Jeong, Dongsuk Kum*
[pdf ]
Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction
Hyeongseok Jeon, Sanmin Kim, Abi Rahman Syamil, Junsoo Kim, Dongsuk Kum*
[pdf ]
On Pretraining Data Diversity for Self-Supervised Learning
Hasan Abed Al Kader Hammoud*, Tuhin Das, Fabio Pizzati*, Philip Torr, Adel Bibi, Bernard Ghanem
[pdf ]
Look Around and Learn: Self-Training Object Detection by Exploration
Gianluca Scarpellini*, Stefano Rosa*, Pietro Morerio, Lorenzo Natale, Alessio Del Bue
[pdf ]
Bayesian Self-Training for Semi-Supervised 3D Segmentation
Ozan Unal*, Christos Sakaridis, Luc Van Gool
[pdf ]
Motion and Structure from Event-based Normal Flow
Zhongyang Ren, Bangyan Liao, Delei Kong, Jinghang Li, Peidong Liu, Laurent Kneip, Guillermo Gallego, Yi Zhou*
[pdf ]
ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji*
[pdf ]
Learning to Complement and to Defer to Multiple Users
Zheng Zhang, Wenjie Ai, Kevin Wells, David M Rosewarne, Thanh-Toan Do, Gustavo Carneiro*
[pdf ]
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang*, Barry Cardiff, Antoine Frappé, Benoit Larras, Deepu John*
[pdf ]
DragVideo: Interactive Drag-style Video Editing
Yufan Deng, Ruida WANG, Yuhao ZHANG, Yu-Wing Tai*, Chi-Keung Tang*
[pdf ]
Multi-Sentence Grounding for Long-term Instructional Video
Zeqian Li, Qirui Chen, Tengda Han, Ya Zhang, Yan-Feng Wang, Weidi Xie*
[pdf ]
Do Generalised Classifiers really work on Human Drawn Sketches?
Hmrishav Bandyopadhyay*, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song
[pdf ]
KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Zhihao Xu, Shengjie Gong, Jiapeng Tang, Lingyu Liang, Yining Huang, Haojie Li, Shuangping Huang*
[pdf ]
Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°
Yuxiao He, Yiyu Zhuang, Yanwen Wang, Yao Yao, Siyu Zhu, Xiaoyu Li, Qi Zhang, Xun Cao, Hao Zhu*
[pdf ]
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jia-Wei Liu, weijia wu, Jussi Keppo, Mike Zheng Shou*
[pdf ]
Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer
Yang Wu*, Kaihua Zhang, Jianjun Qian, Jin Xie*, Jian Yang
[pdf ]
Enhanced Motion Forecasting with Visual Relation Reasoning
Sungjune Kim, Hadam Baek, Seunggwan Lee, Hyung-gun Chi, Hyerin Lim, Jinkyu Kim*, Sangpil Kim*
[pdf ]
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu*, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin
[pdf ]
Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers
Zixuan Fu*, Lanqing Guo, Chong Wang, Yufei Wang, Zhihao Li, Bihan Wen
[pdf ]
LiDAR-based All-weather 3D Object Detection via Prompting and Distilling 4D Radar
Yujeong Chae, Hyeonseong Kim, Changgyoon Oh, Minseok Kim, Kuk-Jin Yoon*
[pdf ]
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu*, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao
[pdf ]
Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
Siao Tang, Xin Wang*, Hong Chen, Chaoyu Guan, Zewen Wu, Yansong Tang, Wenwu Zhu*
[pdf ]
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Eric Brachmann*, Jamie Wynn, Shuai Chen, Tommaso Cavallari, Aron Monszpart, Daniyar Turmukhambetov, Victor Adrian Prisacariu
[pdf ]
Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
Ruicheng Wang*, Jianfeng Xiang, Jiaolong Yang, Xin Tong
[pdf ]
Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
Xinyu Yang*, Hossein Rahmani, Dame S Black, Bryan M Williams
[pdf ]
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
Ming Tao*, Bingkun Bao*, Hao Tang, Yaowei Wang, Changsheng Xu
[pdf ]
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li*
[pdf ]
Exact Diffusion Inversion via Bidirectional Integration Approximation
Guoqiang Zhang*, j.p. lewis, W. Bastiaan Kleijn
[pdf ]
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim*
[pdf ]
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu*
[pdf ]
Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
Wei Shang*, Dongwei Ren*, Wanying Zhang, Yuming Fang, Wangmeng Zuo, Kede Ma
[pdf ]
Object-Centric Diffusion for Efficient Video Editing
Kumara Kahatapitiya*, Adil Karjauv, Davide Abati*, Fatih Porikli, Yuki M Asano, Amirhossein Habibian
[pdf ]
Single-Mask Inpainting for Voxel-based Neural Radiance Fields
Jiafu Chen*, Tianyi Chu, Jiakai Sun, Wei Xing, Lei Zhao
[pdf ]
McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction
Daxuan Ren*, Hezi Shi, Jianmin Zheng, Jianfei Cai
[pdf ]
Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval
Aneeshan Sain*, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song
[pdf ]
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang, Minghao Chen*, Qibo Qiu, Jiahao WU, Wenxiao Wang, Binbin Lin, Ziyu Guan, Xiaofei He
[pdf ]
Diffusion for Natural Image Matting
Yihan Hu*, Yiheng Lin, Wei Wang, Yao Zhao, Yunchao Wei*, Humphrey Shi
[pdf ]
Agglomerative Token Clustering
Joakim Bruslund Haurum*, Sergio Escalera, Graham W. Taylor*, Thomas B. Moeslund
[pdf ]
CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
Jinhao Deng, Wei Ye, Hai Wu, Qiming Xia, Xun Huang, Xin Li, Jin Fang, Wei Li*, Chenglu Wen*, Cheng Wang
[pdf ]
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo, Jingwen Chen, Yehao Li, Yingwei Pan*, Jianlin Feng, Hongyang Chao, Ting Yao
[pdf ]
ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
Tianhao Wu*, Chuanxia Zheng, Qianyi Wu, Tat-Jen Cham
[pdf ]
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
Chenyu Liu, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du*, Qingfeng Liu
[pdf ]
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen*, Cian Eastwood, Fabian Mentzer
[pdf ]
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon*, Yonatan Bitton*, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor
[pdf ]
Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density
Peiyu Yang*, Naveed Akhtar, Mubarak Shah, Ajmal Mian
[pdf ]
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied*, Lei Shi, Andreas Bulling
[pdf ]
Factorized Diffusion: Perceptual Illusions by Noise Decomposition
Daniel Geng*, Inbum Park, Andrew Owens
[pdf ]
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Yimeng Zhang*, jinghan jia, Xin Chen, Aochuan Chen, Yihua Zhang, Jiancheng Liu, Ke Ding, Sijia Liu
[pdf ]
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang*
[pdf ]
StereoGlue: Joint Feature Matching and Robust Estimation
Daniel Barath*, Dmytro Mishkin, Luca Cavalli, Paul-Edouard Sarlin, Petr Hruby, Marc Pollefeys
[pdf ]
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia*, Xuhong Ren, Ivor Tsang, Qing Guo*
[pdf ]
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao*, Ningyi Xu*
[pdf ]
Robust Zero-Shot Crowd Counting and Localization with Adaptive Resolution SAM
Jia Wan*, Qiangqiang Wu, Wei Lin, Antoni Chan
[pdf ]
AWOL: Analysis WithOut synthesis using Language
Silvia Zuffi*, Michael J. Black
[pdf ]
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
Wanyun Li, Pinxue Guo, Xinyu Zhou, Lingyi Hong, Yangji He, Xiangyu Zheng, Wei Zhang*, Wenqiang Zhang*
[pdf ]
M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Zhuoyuan Li, Gang Yu, Tao Chen*
[pdf ]
MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
Casper van Engelenburg*, Fatemeh Mostafavi, Emanuel Kuhn, Yuntae Jeon, Michael Franzen, Matthias Standfest, Jan van Gemert, Seyran Khademi
[pdf ]
End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Henan Wang*, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen
[pdf ]
Temporal Residual Jacobians for Rig-free Motion Transfer
Sanjeev Muralikrishnan*, Niladri Shekhar Dutt, Siddhartha Chaudhuri, Noam Aigerman, Vladimir Kim, Matthew Fisher, Niloy Mitra
[pdf ]
LetsMap: Unsupervised Representation Learning for Label-Efficient Semantic BEV Mapping
Nikhil Gosala*, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo L. J. Drews-Jr, Wolfram Burgard, Abhinav Valada
[pdf ]
Deblurring 3D Gaussian Splatting
Byeonghyeon Lee*, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park*
[pdf ]
Taming Lookup Tables for Efficient Image Retouching
Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang*
[pdf ]
DualDn: Dual-domain Denoising via Differentiable ISP
Ruikang Li, Yujin Wang*, Shiqi Chen, Fan Zhang, Jinwei Gu, Tianfan Xue
[pdf ]
Quantization-Friendly Winograd Transformations for Convolutional Neural Networks
Vladimir Protsenko*, Vladimir Kryzhanovskiy, Alexander Filippov
[pdf ]
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang, Yanhong Zeng, WENRAN LIU, Chun Yuan*, Kai Chen*
[pdf ]
Self-supervised Shape Completion via Involution and Implicit Correspondences
Mengya Liu*, Ajad Chhatkuli, Janis Postels, Luc Van Gool, Federico Tombari
[pdf ]
From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
Maan Qraitem*, Kate Saenko, Bryan A. Plummer
[pdf ]
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Yuqian Fu*, Yu Wang, Yixuan Pan, Xingyu Qiu, Lian Huai, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang
[pdf ]
NICP: Neural ICP for 3D Human Registration at Scale
Riccardo Marin*, Enric Corona, Gerard Pons-Moll
[pdf ]
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
ZiDong Wang*, Zeyu Lu*, Di Huang*, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai*
[pdf ]
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Xinzhi Mu*, Li Chen, Bohan CHEN, Shuyang Gu, Jianmin Bao, Dong Chen, Ji Li, Yuhui Yuan
[pdf ]
Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
Kent Fujiwara*, Mikihiro Tanaka, Qing Yu
[pdf ]
StableDrag: Stable Dragging for Point-based Image Editing
Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wang*
[pdf ]
Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
Shashank Agnihotri*, Julia Grabinski, Margret Keuper
[pdf ]
Dynamic Data Selection for Efficient SSL via Coarse-to-Fine Refinement
Aditay Tripathi*, Pradeep Shenoy, Anirban Chakraborty
[pdf ]
Neural Surface Detection for Unsigned Distance Fields
Federico Stella*, Nicolas Talabot, Hieu Le, Pascal Fua
[pdf ]
One-Shot Diffusion Mimicker for Handwritten Text Generation
Gang Dai, Yifan Zhang, Quhui Ke, Qiangya Guo, Shuangping Huang*
[pdf ]
Event-Based Motion Magnification
Yutian Chen, Shi Guo*, Yu Fangzheng, Feng Zhang, Jinwei Gu, Tianfan Xue
[pdf ]
Improving Neural Surface Reconstruction with Feature Priors from Multi-View Images
Xinlin Ren*, Chenjie Cao, Yanwei Fu*, Xiangyang Xue
[pdf ]
Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai*, Ke Li, Lihua Zhang*
[pdf ]
Kernel Diffusion: An Alternate Approach to Blind Deconvolution
Yash Sanghvi*, Yiheng Chi, Stanley Chan
[pdf ]
MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
Tim Broedermann*, David Brüggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool
[pdf ]
Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
Sanjoy Kundu, Shubham Trehan, Sathyanarayanan N Aakur*
[pdf ]
Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang*, Hongchen Luo, Wei Zhai*, Yu Kang, Yang Cao
[pdf ]
Reinforcement Learning Meets Visual Odometry
Nico Messikommer*, Giovanni Cioffi, Mathias Gehrig, Davide Scaramuzza
[pdf ]
Bucketed Ranking-based Losses for Efficient Training of Object Detectors
Feyza Yavuz*, Baris Can Cam, Adnan Harun Dogan, Kemal Oksuz, Emre Akbas, Sinan Kalkan
[pdf ]
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer*, Yury Belousov, Slava Voloshynovskiy
[pdf ]
RSL-BA: Rolling Shutter Line Bundle Adjustment
Yongcong Zhang, Bangyan Liao, Yifei Xue, Lu Chen, Peidong Liu, Yizhen Lao*
[pdf ]
DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
Zaid Tasneem*, Akshat Dave, Abhishek Singh, Kushagra Tiwary, Praneeth Vepakomma, Ashok Veeraraghavan, Ramesh Raskar
[pdf ]
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Haibo Yang, Yang Chen, Yingwei Pan*, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei
[pdf ]
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu, Le Yang, Jinhao Duan, Jize Zhang, Jiahang Cao, Kaidi Xu, Renjing Xu*
[pdf ]
N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Yash Bhalgat*, Iro Laina, Joao F Henriques, Andrew Zisserman, Andrea Vedaldi
[pdf ]
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao*, Kai Han*, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong*
[pdf ]
PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
Rixin Zhou*, Ding Xia, YI ZHANG, honglin pang, Xi Yang, chuntao li
[pdf ]
Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
Zhengcen Li, Xinle Chang, Yueran Li, Jingyong Su*
[pdf ]
Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
Hao Dong*, Eleni Chatzi*, Olga Fink*
[pdf ]
ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
Chen-Yi Lu*, Shubham Agarwal, Md Mehrab Tanjim, Kanak Mahadik, Anup Rao, Subrata Mitra, Shiv K Saini, Saurabh Bagchi, Somali Chaterji
[pdf ]
AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
Pavel Suma*, Giorgos Kordopatis-Zilos, Ahmet Iscen, Giorgos Tolias
[pdf ]
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
Jeongho Kim*, Min-Jung Kim*, Junsoo Lee, Jaegul Choo*
[pdf ]
3D Hand Sequence Recovery from Real Blurry Images and Event Stream
JoonKyu Park, Gyeongsik Moon, Weipeng Xu, Evan Kaseman, Takaaki Shiratori, Kyoung Mu Lee*
[pdf ]
GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation
Bangyan Liao, Zhenjun Zhao, Lu Chen, Haoang Li, Daniel Cremers, Peidong Liu*
[pdf ]
Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection
Jian Shi*, Pengyi Zhang, Ni Zhang, Hakim Ghazzai, Peter Wonka
[pdf ]
StyleCity: Large-Scale 3D Urban Scenes Stylization
Yingshu Chen, Huajian Huang*, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung
[pdf ]
ViG-Bias: Visually Grounded Bias Discovery and Mitigation
Badr-Eddine Marani*, Mohamed Hanini, Nihitha Malayarukil, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante
[pdf ]
DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
Xinqi Lin*, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, Chao Dong*
[pdf ]
Assessing Sample Quality via the Latent Space of Generative Models
Jingyi Xu*, Hieu Le, Dimitris Samaras
[pdf ]
Relightable Neural Actor with Intrinsic Decomposition and Pose Control
Diogo Carbonera Luvizon*, Vladislav Golyanik, Adam Kortylewski, Marc Habermann, Christian Theobalt
[pdf ]
Sur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images
Zhangjin Huang*, Zhihao Liang, Kui Jia*
[pdf ]
HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Zhuopeng Li*, Yilin Zhang, Chenming Wu, Jianke Zhu*, Liangjun Zhang
[pdf ]
Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation
Yangzheng Wu*, Michael Alan Greenspan
[pdf ]
Consistent 3D Line Mapping
Xulong Bai, Hainan Cui*, Shuhan Shen*
[pdf ]
Distributed Active Client Selection With Noisy Clients Using Model Association Scores
Kwang In Kim*
[pdf ]
PixOOD: Pixel-Level Out-of-Distribution Detection
Tomas Vojir*, Jan Sochman, Jiri Matas
[pdf ]
GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
Maria Korosteleva*, Timur Levent Kesdogan, Fabian Kemper, Stephan Wenninger, Jasmin Koller, Yuhan Zhang, Mario Botsch, Olga Sorkine-Hornung
[pdf ]
Towards a Density Preserving Objective Function for Learning on Point Sets
Haritha Jayasinghe*, Ioannis Brilakis
[pdf ]
AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
Yuheng Li, Tianyu Luan, Yizhou Wu, Shaoyan Pan, Yenho Chen, Xiaofeng Yang*
[pdf ]
VF-NeRF: Viewshed Fields for Rigid NeRF Registration
Leo Segre*, Shai Avidan
[pdf ]
Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction
Jeffrey Wen*, Rizwan Ahmad, Phillip Schniter
[pdf ]
Trainable Highly-expressive Activation Functions
Irit Chelly*, Shahaf E. Finder, Shira Ifergane, Oren Freifeld
[pdf ]
Region-Aware Sequence-to-Sequence Learning for Hyperspectral Denoising
JiaHua Xiao, Yang Liu, Xing Wei*
[pdf ]
Self-Supervised Representation Learning for Adversarial Attack Detection
Yi Li*, Plamen Angelov, Neeraj Suri
[pdf ]
Do text-free diffusion models learn discriminative visual representations?
Soumik Mukhopadhyay*, Matthew A Gwilliam*, Yosuke Yamaguchi, Vatsal Agarwal, Namitha Padmanabhan, Archana Swaminathan, Tianyi Zhou, Jun Ohya, Abhinav Shrivastava
[pdf ]
Clean & Compact: Efficient Data-Free Backdoor Defense with Model Compactness
Huy Phan*, Jinqi Xiao, Yang Sui, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan
[pdf ]
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe*, Sunayana Rane, Zachary E Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason M Baldridge
[pdf ]
EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma*, Huajin Tang*
[pdf ]
AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
Junho Park, Kyeongbo Kong, Suk-Ju Kang*
[pdf ]
Dataset Quantization with Active Learning based Adaptive Sampling
Zhenghao Zhao*, Yuzhang Shang, Junyi Wu, Yan Yan
[pdf ]
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
Mingkang Zhu, Xi CHEN, Zhongdao Wang, Hengshuang Zhao*, Jiaya Jia*
[pdf ]
LEROjD: Lidar Extended Radar-Only Object Detection
Patrick Palmer*, Martin Krüger, Stefan Schütte, Richard Altendorfer, Ganesh Adam, Torsten Bertram
[pdf ]
"ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation"
Jack Lu*, Ryan Teehan*, Mengye Ren*
[pdf ]
Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching
Junpeng Jing*, Ye Mao, Krystian Mikolajczyk*
[pdf ]
Probabilistic Image-Driven Traffic Modeling via Remote Sensing
Scott Workman*, Armin Hadzic
[pdf ]
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
Xi Chen*, Sida Peng, Dongchen Yang, Yuan Liu, Bowen Pan, Chengfei Lyu, Xiaowei Zhou*
[pdf ]
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
Fuchen Long, Zhaofan Qiu*, Ting Yao, Tao Mei
[pdf ]
Semantic Residual Prompts for Continual Learning
Martin Menabue*, Emanuele Frascaroli, Matteo Boschini, Enver Sangineto, Lorenzo Bonicelli, Angelo Porrello*, SIMONE CALDERARA
[pdf ]
TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
Elona Dupont*, Kseniya Cherenkova, Dimitrios Mallis, Gleb A Gusev, Anis Kacem, Djamila Aouada
[pdf ]
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan*, Min Bai, Weifeng Chen, Xiong Zhou, Qixing Huang, Li Erran Li
[pdf ]
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh*, Yan Kang, Yuchen Liu, Richard Zhang, Zhe Lin, Heng Huang
[pdf ]
Occupancy as Set of Points
Yiang Shi, Tianheng Cheng, Qian Zhang, Wenyu Liu, Xinggang Wang*
[pdf ]
UAV First-Person Viewers Are Radiance Field Learners
Liqi Yan*, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu*
[pdf ]
Rethinking Few-shot Class-incremental Learning: Learning from Yourself
Yu-Ming Tang, Yi-Xing Peng, Jingke Meng*, Wei-Shi Zheng
[pdf ]
ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection
Erik Wallin*, Lennart Svensson, Fredrik Kahl, Lars Hammarstrand
[pdf ]
A Fair Ranking and New Model for Panoptic Scene Graph Generation
Julian Lorenz*, Alexander Pest, Daniel Kienzle, Katja Ludwig, Rainer Lienhart
[pdf ]
Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning
HyungJune Lee*, JinYi Yoon
[pdf ]
Compensation Sampling for Improved Convergence in Diffusion Models
Hui Lu*, Albert Ali Salah, Ronald Poppe
[pdf ]
Situated Instruction Following
So Yeon Min*, Xavier Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Priyam Parashar, Akshara Rai, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi
[pdf ]
Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography
Dorian Chan*, Matthew O'Toole, Sizhuo Ma, Jian Wang*
[pdf ]
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
Armen Avetisyan*, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Luke Holland, Duncan Frost, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas
[pdf ]
GalLop: Learning global and local prompts for vision-language models
Marc Lafon*, Elias Ramzi*, Clément Rambour, Nicolas Audebert, Nicolas Thome
[pdf ]
Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor
Andrea Conti*, Matteo Poggi, Valerio Cambareri, Stefano Mattoccia
[pdf ]
Lossy Image Compression with Foundation Diffusion Models
Lucas Relic*, Roberto Azevedo, Markus Gross, Christopher Schroers*
[pdf ]
CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
Monika Wysoczańska*, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick Pérez
[pdf ]
FMBoost: Boosting Latent Diffusion with Flow Matching
Johannes S Fischer*, Ming Gui, Pingchuan Ma, Nick Stracke, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer
[pdf ]
COMPOSE: Comprehensive Portrait Shadow Editing
Andrew Z Hou*, Zhixin Shu, Xuaner Zhang, He Zhang, Yannick Hold-Geoffroy, Jae Shin Yoon, Xiaoming Liu
[pdf ]
LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration
Siqi Wang*, Bryan Plummer
[pdf ]
Diffusion Models as Data Mining Tools
Ioannis Siglidis*, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry, Shiry Ginosar
[pdf ]
Graph Neural Network Causal Explanation via Neural Causal Models
Arman Behnam*, Binghui Wang
[pdf ]
"Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions"
Declan GD McIntosh*, Alexandra Branzan Albu
[pdf ]
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
Ruofan Liang, Zan Gojcic, Merlin Nimier-David, David Acuna, Nandita Vijaykumar, Sanja Fidler, Zian Wang*
[pdf ]
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
Manu S Pillai*, Mamshad Nayeem Rizve, Mubarak Shah
[pdf ]
SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
Edoardo Palladin*, Roland Dietze*, Praveen Narayanan, Mario Bijelic, Felix Heide
[pdf ]
Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
Aayam Shrestha, Pan Liu*, German Ros, Kai Yuan*, Alan Fern
[pdf ]
CoTracker: It is Better to Track Together
Nikita Karaev*, Ignacio Rocco, Ben Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
[pdf ]
"SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models"
Ziyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao*, Longtian Qiu, Han Xiao, Han Qiu, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Yu Qiao*, Hongsheng Li*
[pdf ]
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
Yuxuan Sun*, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin*, Lin Yang*
[pdf ]
Improving Adversarial Transferability via Model Alignment
Avery Ma*, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu
[pdf ]
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Wenhao Ding*, Yulong Cao, DING ZHAO, Chaowei Xiao, Marco Pavone
[pdf ]
ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
Hao Tang, Weiyao Wang, Pierre Gleize, Matt Feiszli*
[pdf ]
Embodied Understanding of Driving Scenarios
Yunsong Zhou*, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li
[pdf ]
Learning to Drive via Asymmetric Self-Play
Chris Zhang*, Sourav Biswas, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas, Raquel Urtasun
[pdf ]
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao*, Lei Zhu, Joan Lasenby*
[pdf ]
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang*, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming C Lin, Shan Yang
[pdf ]
Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar*, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Mian Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra
[pdf ]
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao*, Zhisheng Xiao*, Yanwu Xu, Haolin Jia, Tingbo Hou
[pdf ]
Open-Set Biometrics: Beyond Good Closed-Set Models
Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu*
[pdf ]
UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
Siyuan Cheng*, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang
[pdf ]
Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
Fengyuan Liu, Haochen Luo, Yiming Li, Philip Torr, Jindong Gu*
[pdf ]
Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
Opher Bar Nathan*, Deborah Levy, Tali Treibitz, Dan Rosenbaum
[pdf ]
Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
Feixiang Zhou, Bryan Williams, Hossein Rahmani*
[pdf ]
Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements
Niels Chr Overgaard*, Anders Holst
[pdf ]
DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
Yu Chi*, Fangneng Zhan, Sibo Wu, Christian Theobalt, Adam Kortylewski
[pdf ]
Flowed Time of Flight Radiance Fields
Mikhail Okunev*, Marc Mapeke, Benjamin Attal, Christian Richardt, Matthew O'Toole, James Tompkin
[pdf ]
3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing
Haoran Li, Long Ma, Haolin Shi, Yanbin Hao, Yong Liao*, Lechao Cheng, Peng Yuan Zhou*
[pdf ]
Fast Registration of Photorealistic Avatars for VR Facial Animation
Chaitanya Patel*, Shaojie Bai, Te-Li Wang, Jason Saragih, Shih-En Wei
[pdf ]
CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
Cristina Mata*, Kanchana N Ranasinghe, Michael S Ryoo
[pdf ]
HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs
Ziwei Yao, Ruiping Wang*, Xilin Chen
[pdf ]
Image-to-Lidar Relational Distillation for Autonomous Driving Data
Anas Mahmoud*, Ali Harakeh, Steven Waslander
[pdf ]
Thinking Outside the BBox: Unconstrained Generative Object Compositing
Gemma Canet Tarrés*, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Ruta, Andrew Gilbert, John Collomosse, Soo Ye Kim
[pdf ]
Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang*, Eric Tzeng, Yilun Du, Dmitry Kislyuk*
[pdf ]
CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
Jiarui Sun*, Girish Chowdhary*
[pdf ]
FedHARM: Harmonizing Model Architectural Diversity in Federated Learning
Anestis Kastellos*, Athanasios Psaltis, Charalampos Z Patrikakis, Petros Daras
[pdf ]
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Sharath Girish*, Kamal Gupta, Abhinav Shrivastava
[pdf ]
Global Counterfactual Directions
Bartłomiej Sobieski*, Przemyslaw Biecek*
[pdf ]
TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving
Cheng Zhao*, su sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren
[pdf ]
RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark
Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang*
[pdf ]
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
Ruoxi Chen, Haibo Jin, Yixin Liu, Jinyin Chen*, Haohan Wang, Lichao Sun
[pdf ]
"RICA^2: Rubric-Informed, Calibrated Assessment of Actions"
Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV, Yin Li*
[pdf ]
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim*, Anelia Angelova, Weicheng Kuo
[pdf ]
Commonly Interesting Images
Fitim Abdullahu*, Helmut Grabner*
[pdf ]
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi*, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara
[pdf ]
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching
Samia Shafique*, Shu Kong, Charless Fowlkes
[pdf ]
Caltech Aerial RGB-Thermal Dataset in the Wild
Connor Lee*, Matthew Anderson, Nikhil Ranganathan, Xingxing Zuo, Kevin T Do, Georgia Gkioxari, Soon-Jo Chung
[pdf ]
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Benjamin J Biggs*, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto
[pdf ]
Volumetric Rendering with Baked Quadrature Fields
Gopal Sharma*, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi
[pdf ]
CityGuessr: City-Level Video Geo-Localization on a Global Scale
Parth Parag Kulkarni*, Gaurav Kumar Nayak, Mubarak Shah
[pdf ]
Pseudo-Labelling Should Be Aware of Disguising Channel Activations
Changrui Chen, Kurt Debattista, Jungong Han*
[pdf ]
Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
Zhi Qin Tan*, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, Yunpeng Li
[pdf ]
Revising Densification in Gaussian Splatting
Samuel Rota Bulò*, Lorenzo Porzi, Peter Kontschieder
[pdf ]
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
Gwanhyeong Koo, Sunjae Yoon, Ji Woo Hong, Chang D. Yoo*
[pdf ]
"Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss"
Alex Rich*, Noah Stier, Pradeep Sen, Tobias Hollerer
[pdf ]
Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions
Yijun Qian*, Jack Urbanek, Alexander Hauptmann, Jungdam Won
[pdf ]
UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation
Jinho Park*, Se Young Chun, Mingoo Seok
[pdf ]
PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis
Jason J. Yu*, Tristan Aumentado-Armstrong, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A. Brubaker
[pdf ]
R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding
Qirui Wu*, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang
[pdf ]
A Graph-Based Approach for Category-Agnostic Pose Estimation
Or Hirschorn*, Shai Avidan
[pdf ]
Depth-guided NeRF Training via Earth Mover’s Distance
Anita Rau*, Josiah Aklilu, Floyd C Holsinger, Serena Yeung-Levy
[pdf ]
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
Ji Ha Jang, Hoigi Seo, Se Young Chun*
[pdf ]
DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
Sarah Jabbour*, Gregory Kondas, Ella Kazerooni, Michael Sjoding, David Fouhey, Jenna Wiens
[pdf ]
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury*, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha
[pdf ]
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu*
[pdf ]
Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration
Dongwon Park, Hayeon Kim, Se Young Chun*
[pdf ]
Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders
Lucas Stoffl, Andy Bonnetto, Stéphane D'Ascoli, Alexander Mathis*
[pdf ]
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun*
[pdf ]
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su*, Minghua Liu*
[pdf ]
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang*
[pdf ]
Discovering Unwritten Visual Classifiers with Large Language Models
Mia Chiquier*, Utkarsh Mall, Carl Vondrick
[pdf ]
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang*, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz
[pdf ]
MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
Timothy Chase Jr*, Karthik Dantu
[pdf ]
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You*, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeff Nichols, Yinfei Yang, Zhe Gan
[pdf ]
Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data
Zhengfeng Lai*, Joohi Chauhan, Brittany N. Dugger, Chen-Nee Chuah
[pdf ]
AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
Yangchao Wu*, Tian Yu Liu, Hyoungseob Park, Stefano Soatto, Dong Lao, Alex Wong
[pdf ]
CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection
Wei-Yu Lee*, Martin Dimitrievski, David Van Hamme, Jan Aelterman, Ljubomir Jovanov, Wilfried Philips
[pdf ]
SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging
Haijin Zeng, Yuxi Liu, Yongyong Chen*, Youfa Liu, Chong Peng, Jingyong Su
[pdf ]
Minimalist Vision with Freeform Pixels
Jeremy Klotz*, Shree Nayar
[pdf ]
All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation
Seongho Kim, Byung Cheol Song*
[pdf ]
LatentEditor: Text Driven Local Editing of 3D Scenes
Umar Khalid*, Hasan Iqbal, Muhammad Tayyab, Md Nazmul Karim, Jing Hua, Chen Chen
[pdf ]
Single-Photon 3D Imaging with Equi-Depth Photon Histograms
Kaustubh Sadekar*, David Maier, Atul Ingle
[pdf ]
Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision
Hussain Sajwani, Dimitrios Makris, Yahya Prof. Zweiri, Fariborz Baghaei Naeini, Sanket Mr Kachole*
[pdf ]
Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models
James Burgess*, Kuan-Chieh Wang, Serena Yeung-Levy
[pdf ]
POET: Prompt Offset Tuning for Continual Human Action Adaptation
Prachi Garg*, Joseph K J, Vineeth N Balasubramanian, Necati Cihan Camgoz, Chengde Wan, Kenrick Kin, Weiguang Si, Shugao Ma, Fernando de la Torre
[pdf ]
Domain Generalization of 3D Object Detection by Density-Resampling
Shuangzhi Li, Lei Ma, Xingyu Li*
[pdf ]
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Chenglin Yang*, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu
[pdf ]
MRSP: Learn Multi-Representations of Single Primitive for Compositional Zero-Shot Learning
Dongyao Jiang, Hui Chen, Haodong Jing, Yongqiang Ma, Nanning Zheng*
[pdf ]
Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs
Jeongkee Lim, Yusung Kim*
[pdf ]
TrafficNight : An Aerial Multimodal Benchmark For Nighttime Vehicle Surveillance
Guoxing Zhang, Yiming Liu, xiaoyu yang, Chao Huang*, HUANG Hailong
[pdf ]
Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
Yushi Lan*, Feitong Tan, Qiangeng Xu, Di Qiu, Kyle Genova, Zeng Huang, Rohit Pandey, Sean Fanello, Thomas Funkhouser, Chen Change Loy, Yinda Zhang*
[pdf ]
Towards Open Domain Text-Driven Synthesis of Multi-Person Motions
Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitchell K Hill*
[pdf ]
Generative End-to-End Autonomous Driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo*, Chenming Zhang, Long Chen
[pdf ]
Learning to Distinguish Samples for Generalized Category Discovery
Fengxiang Yang, Nan Pu, Wenjing Li, Zhiming Luo*, Shaozi Li, Nicu Sebe, Zhun Zhong*
[pdf ]
COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark
Atsushi Hashimoto*, Koki Maeda, Tosho Hirasawa, Jun Harashima, Leszek Rybicki, Yusuke Fukasawa, Yoshitaka Ushiku
[pdf ]
PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning
Haiyang Guo*, Fei Zhu, Wenzhuo Liu, Xu-Yao Zhang*, Cheng-Lin Liu
[pdf ]
Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
Qianliang Wu*, Haobo Jiang*, Lei Luo, Jun Li, Yaqing Ding*, Jin Xie*, Jian Yang*
[pdf ]
WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
Kunbei Cai*, Zhenkai Zhang, Qian Lou, Fan Yao*
[pdf ]
"Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice"
Xiayu Wang, Ke Ma, Ruiyun Zhong, Xinggang Wang, Yi Fang, Yang Xiao, Tian Xia*
[pdf ]
Encapsulating Knowledge in One Prompt
Qi Li*, Runpeng Yu*, Xinchao Wang*
[pdf ]
Cross-Input Certified Training for Universal Perturbations
Changming Xu*, Gagandeep Singh
[pdf ]
Visual Relationship Transformation
Xiaoyu Xu*, Jiayan Qiu, Baosheng Yu, Zhou Wang
[pdf ]
"Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data"
Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo*
[pdf ]
Delving into Adversarial Robustness on Document Tampering Localization
Huiru Shao, Zhuang Qian, Kaizhu Huang, Wei Wang, Xiaowei Huang, Qiufeng Wang*
[pdf ]
Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing
Seongmin Hong, Jaehyeok Bae, Jongho Lee*, Se Young Chun*
[pdf ]
Confidence-Based Iterative Generation for Real-World Image Super-Resolution
Jialun Peng, Xin Luo, Jingjing Fu*, Dong Liu*
[pdf ]
Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
Tao Li*, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James Kwok
[pdf ]
Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
Kohei Yamashita*, Vincent Lepetit, Ko Nishino
[pdf ]
Seeing Faces in Things: A Model and Dataset for Pareidolia
Mark T Hamilton*, Simon Stent, Vasha G DuTell, Anne Harrington, Jennifer E Corbett, Ruth Rosenholtz, William T. Freeman
[pdf ]
Cocktail Universal Adversarial Attack on Deep Neural Networks
Shaoxin Li*, Xiaofeng Liao, Xin Che, Xintong Li, Yong Zhang, Lingyang Chu*
[pdf ]
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Antoine Guédon*, Vincent Lepetit
[pdf ]
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han, Qifan Wang, Sohail A Dianat, Majid Rabbani, Raghuveer Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu*
[pdf ]
FairViT: Fair Vision Transformer via Adaptive Masking
Bowei Tian, Ruijie Du, Yanning Shen*
[pdf ]
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu*, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen
[pdf ]
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu*, Jianlin Su, Bo Zhang*, Chunhua Shen
[pdf ]
Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation
Donghyun Lee, Yejin Lee, Jae W. Lee*, Hongil Yoon*
[pdf ]
HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation
Noranart Vesdapunt*, Kah Kuen Fu, Yue Wu, Xu Zhang, Pradeep Natarajan
[pdf ]
Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
Sneha Paul*, Zachary Patterson, Nizar Bouguila
[pdf ]
PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
Renjie Lu, Jingke Meng*, WEI-SHI ZHENG
[pdf ]
MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
Seongju Lee, Junseok Lee, Yeonguk Yu, Taeri Kim, Kyoobin Lee*
[pdf ]
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Chang Wen Chen*
[pdf ]
Few-shot NeRF by Adaptive Rendering Loss Regularization
Qingshan Xu*, Xuanyu Yi, Jianyao Xu, Wenbing Tao, Yew Soon Ong, Hanwang Zhang
[pdf ]
Investigating Style Similarity in Diffusion Models
Gowthami Somepalli*, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas A. Geiping, Abhinav Shrivastava, Tom Goldstein
[pdf ]
JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention
Brian Cheong*, Jiachen Zhou*, Steven L Waslander*
[pdf ]
MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space
Armand Comas, Di Qiu*, Menglei Chai, Marcel C. Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark J Matthews, Paulo Gotardo, Sergio Orts-Escolano, Thabo Beeler
[pdf ]
EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification
Suorong Yang*, Furao Shen*, Jian Zhao
[pdf ]
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao, Feng Tian, Jun Chen*, Haonan Lin, Guang Dai, Yong Liu, Jingdong Wang
[pdf ]
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani*, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville
[pdf ]
Towards compact reversible image representations for neural style transfer
Xiyao Liu, Siyu Yang, Jian Zhang*, Gerald Schaefer, Jiya Li, Xunli FAN, Songtao Wu, Hui Fang*
[pdf ]
Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors
Tao Lin*, lijia Yu*, Gaojie Jin*, Renjue Li*, Peng Wu*, Lijun Zhang*
[pdf ]
GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method
Haoxin Lv, Tianxiong Zhong, Sanyuan Zhao*
[pdf ]
Long-term Temporal Context Gathering for Neural Video Compression
Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu*
[pdf ]
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
YIBO LIU*, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Liu Bingbing, Yang Liu, JINJUN SHAN
[pdf ]
From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
Yunfei Xie*, Cihang Xie, Alan Yuille, Jieru Mei
[pdf ]
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang*, Hongtao Xie, YuXin Wang, Yadong Qu, Fengjun Guo, Pengwei Liu
[pdf ]
Unmasking Bias in Diffusion Model Training
Hu Yu, Li Shen, Jie Huang, Hongsheng Li, Feng Zhao*
[pdf ]
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo, Taolin Zhang, Haoqian Wu, Hanjun Li, Ruizhi Qiao*, Xing Sun
[pdf ]
Animate Your Motion: Turning Still Images into Dynamic Videos
Mingxiao Li*, Bo Wan*, Sien Moens, Tinne Tuytelaars
[pdf ]
Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis
Zipeng Qi, Guoxi Huang*, Chenyang Liu, Fei Ye
[pdf ]
CIC-BART-SSA: : Controllable Image Captioning with Structured Semantic Augmentation
Kalliopi Basioti*, Mohamed A Abdelsalam*, Federico Fancellu*, Vladimir Pavlovic*, Afsaneh Fazly*
[pdf ]
A Simple Background Augmentation Method for Object Detection with Diffusion Model
Yuhang Li, Xin Dong, Chen Chen, Weiming Zhuang, Lingjuan Lyu*
[pdf ]
Echoes of the Past: Boosting Long-tail Recognition via Reflective Learning
Qihao Zhao, Yalun Dai, Shen Lin, Wei Hu, Fan Zhang*, Jun Liu
[pdf ]
"BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events"
Yijin Li, Yichen Shen, Zhaoyang Huang, Shuo Chen, Weikang Bian, Xiaoyu Shi, Fu-Yun Wang, Keqiang Sun, Hujun Bao, Zhaopeng Cui, Guofeng Zhang*, Hongsheng Li*
[pdf ]
A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
Qiyu Chen, Huiyuan Luo, Chengkan Lv*, Zhengtao Zhang
[pdf ]
Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation
Chenhao Li*, Trung Thanh Ngo, Hajime Nagahara
[pdf ]
Rethinking Features-Fused-Pyramid-Neck for Object Detection
Hulin Li*
[pdf ]
Spatial-Temporal Multi-level Association for Video Object Segmentation
Deshui Miao, Xin Li, Zhenyu He*, Huchuan Lu, Ming-Hsuan Yang
[pdf ]
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han*
[pdf ]
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Sanghyun Kim*, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee*
[pdf ]
An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought
Chunhao LU, Qiang Lu*, Jake Luo
[pdf ]
RaFE: Generative Radiance Fields Restoration
Zhongkai Wu, Ziyu Wan, Jing Zhang*, Jing Liao, Dong Xu
[pdf ]
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan*, Xiongkuo Min, Sijing Wu, Wei Shen, Guangtao Zhai
[pdf ]
Fast Sprite Decomposition from Animated Graphics
Tomoyuki Suzuki*, Kotaro Kikuchi, Kota Yamaguchi
[pdf ]
Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
Liren He, Zhengkai Jiang, Jinlong Peng, Wenbing Zhu, Liang Liu, Qiangang Du, Xiaobin Hu, Mingmin Chi*, Yabiao Wang*, Chengjie Wang*
[pdf ]
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
Mingjin Zhang, Yuchun Wang*, Jie Guo*, Yunsong Li, Xinbo Gao, Jing Zhang
[pdf ]
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
Zhenyu Li*, Shariq Farooq Bhat, Peter Wonka
[pdf ]
A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
Linfeng Ma, Han Fang*, Tianyi Wei, Zijin Yang, Zehua Ma*, Weiming Zhang, Nenghai Yu
[pdf ]
Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation
Yuhwan Jeong, Hoonhee Cho, Kuk-Jin Yoon*
[pdf ]
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
Akshat Ramachandran*, Souvik Kundu*, Tushar Krishna*
[pdf ]
A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
Tahmina Khanam, Mohammed Bennamoun, Guan Wang, Guanjin Wang, Ferdous Sohel, Farid Boussaid, Anuj Srivastava, Hamid Laga*
[pdf ]
Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation
Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He*
[pdf ]
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
Gen Li*, zhihao shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma*
[pdf ]
The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
Seungwoo Son*, Jegwang Ryu, Namhoon Lee, Jaeho Lee*
[pdf ]
Training A Small Emotional Vision Language Model for Visual Art Comprehension
Jing Zhang, Liang Zheng*, Meng Wang, Dan Guo*
[pdf ]
UGG: Unified Generative Grasping
Jiaxin Lu, Hao Kang, Haoxiang Li, Bo Liu, Yiding Yang, Qixing Huang, Gang Hua*
[pdf ]
FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation
Chenliang Zhou*, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Thomas Fogarty, Alejandro Sztrajman, Hongyun Gao, A. Cengiz Oztireli
[pdf ]
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Bin-Bin Gao*
[pdf ]
GAMMA-FACE: GAussian Mixture Models Amend Diffusion Models for Bias Mitigation in Face Images
Basudha Pal*, Arunkumar Kannan*, Ram Prabhakar Kathirvel, Alice O'Toole, Rama Chellappa
[pdf ]
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo, Ziluo Ding, Zongqing Lu*
[pdf ]
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, Paul Hongsuck Seo*, Jeany Son*
[pdf ]
Training-free Composite Scene Generation for Layout-to-Image Synthesis
Jiaqi Liu*, Tao Huang, Chang Xu
[pdf ]
Robustness Preserving Fine-tuning using Neuron Importance
Guangrui Li, Rahul Duggal*, Aaditya Singh, Kaustav Kundu, Bing Shuai, Jonathan Wu
[pdf ]
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang*
[pdf ]
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
jian ma, Chen Chen*, Qingsong Xie, Haonan Lu*
[pdf ]
Similarity of Neural Architectures using Adversarial Attack Transferability
Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun*, Jong-Seok Lee
[pdf ]
Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers
Tingting Chen*, Beibei Lin, Yeying Jin, Wending Yan, WEI YE, Yuan Yuan, Robby T. Tan
[pdf ]
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
Ning Gao, Sanping Zhou*, Le Wang, Nanning Zheng
[pdf ]
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor*, Yash Parag Butala*, Melisa A Russak, Jing Yu Koh, Kiran Kamble, Waseem AlShikh, Ruslan Salakhutdinov
[pdf ]
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen, Yuan Lin*, Yuchen Zhang*, Weiran Huang*
[pdf ]
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang, Teng Wang, Haigang Zhang, Ping Lu, Feng Zheng*
[pdf ]
Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks
Jiawei Wu, Zhi Jin*
[pdf ]
Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation
Duy Tho Le*, Hengcan Shi*, Jianfei Cai, Hamid Rezatofighi
[pdf ]
MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
Yushuo Chen*, Zerong Zheng, Zhe Li, Chao Xu, Yebin Liu
[pdf ]
Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
Hao Xu*, Xi Zhang, Xiaolin Wu*
[pdf ]
Scene-Conditional 3D Object Stylization and Composition
Jinghao Zhou*, Tomas Jakab, Philip Torr, Christian Rupprecht
[pdf ]
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
Xiaojie Li, Yibo Yang*, Xiangtai Li, Jianlong Wu*, Yue Yu, Bernard Ghanem, Min Zhang
[pdf ]
Revisit Anything: Visual Place Recognition via Image Segment Retrieval
Kartik Garg, Sai Shubodh, Shishir N Y Kolathaya, Madhava Krishna, Sourav Garg*
[pdf ]
EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching
Peiqi Chen*, Lei Yu, Yi Wan*, Yongjun Zhang*, Jian Wang, Liheng Zhong, Jingdong Chen, Ming Yang
[pdf ]
DGD: Dynamic 3D Gaussians Distillation
Isaac Labe, Noam Issachar, Itai Lang, Sagie Benaim*
[pdf ]
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
Jaehyeong Jeon*, Kibum Kim, Kanghoon Yoon, Chanyoung Park
[pdf ]
DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
Xiaobin Hu, Xu Peng, Donghao Luo*, Xiaozhong Ji, Jinlong Peng, ZhengKai Jiang, Jiangning Zhang, Taisong Jin*, Chengjie Wang, Rongrong Ji
[pdf ]
Self-Guided Generation of Minority Samples Using Diffusion Models
Soobin Um, Jong Chul Ye*
[pdf ]
DEVIAS: Learning Disentangled Video Representations of Action and Scene
Kyungho Bae, Youngrae Kim, Geo Ahn, Jinwoo Choi*
[pdf ]
AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset
Jan D Lehr*, Jan H Philipps, Alik Sargsyan, Martin Pape, Jörg Krüger
[pdf ]
RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
Qi WANG*, Ruijie Lu, Xudong XU, Jingbo Wang, Michael Yu Wang, Bo Dai, Gang Zeng, Dan Xu
[pdf ]
Class-Agnostic Object Counting with Text-to-Image Diffusion Model
Xiaofei Hui, Qian Wu, Hossein Rahmani, Jun Liu*
[pdf ]
Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
Sehwan Choi*, Jun Won Choi, Jungho Kim, Hongjae Shin
[pdf ]
SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
Yuliang Guo*, Abhinav Kumar, Cheng Zhao, Ruoyu Wang, Xinyu Huang, Liu Ren
[pdf ]
Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme
Jintae Kim, Seungwon Yang, Seong-Gyun Jeong, Chang-Su Kim*
[pdf ]
Pyramid Diffusion for Fine 3D Large Scene Generation
Yuheng Liu*, Xinke Li, Xueting Li, Lu Qi*, Chongshou Li, Ming-Hsuan Yang
[pdf ]
ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
Wenyu Li*, Binghui Chen, Yifeng Geng, Xuansong Xie, Wangmeng Zuo
[pdf ]
A Watermark-Conditioned Diffusion Model for IP Protection
Rui Min*, Sen Li*, Hongyang Chen*, Minhao Cheng*
[pdf ]
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha, Chaeyun Kim, Donghwa Kim, Junho Lee, Sangho Lee, Joonseok Lee*
[pdf ]
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen*, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville
[pdf ]
FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion
Xiaofeng Wu*, Velibor Bojkovic, Bin Gu*, Kun Suo, Kai Zou
[pdf ]
Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples
Chengen Lai, Shengli Song*, Sitong Yan, Guangneng Hu
[pdf ]
Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
Xu Zheng*, Yuanhuiyi Lyu, jiazhou zhou, Lin Wang*
[pdf ]
GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
Haonan Wang, Jie Liu*, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang
[pdf ]
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations
Ofir Shifman*, Yair Weiss
[pdf ]
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
Soojin Jang, JungMin Yun, JuneHyoung Kwon, Eunju Lee, YoungBin Kim*
[pdf ]
Rethinking Normalization Layers for Domain Generalizable Person Re-identification
Ren Nie, Jin Ding, Xue Zhou*, Xi Li
[pdf ]
Generalizing to Unseen Domains via Text-guided Augmentation
Daiqing Qi*, Handong Zhao, Aidong Zhang, Sheng Li
[pdf ]
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao*, Mukesh Prasad, Fei Shen, Zhengtao Zhang, Xinyi Gong, Guiguang Ding
[pdf ]
Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
Juntu Zhao, Junyu Deng, Yixin Ye, Chongxuan Li, Zhijie Deng*, Dequan Wang*
[pdf ]
Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes
Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang*
[pdf ]
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
Soyeong Kwon, Taegyeong Lee, Taehwan Kim*
[pdf ]
Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
Zhiheng Li, Muheng Li, Jixuan Fan, Lei Chen*, Yansong Tang, Jiwen Lu, Jie Zhou
[pdf ]
Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
Yang Jin, Lei Zhang, Shi Yan, Bin Fan, Binglu Wang*
[pdf ]
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization
Xi Yang, Songsong Duan*, Nannan Wang, Xinbo Gao
[pdf ]
Adaptive Multi-head Contrastive Learning
Lei Wang*, Piotr Koniusz, Tom Gedeon, Liang Zheng
[pdf ]
Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation
YAO YAO, Yixuan Pan, Wenjun Shi, Dongchen Zhu, Lei Wang, Jiamao Li*
[pdf ]
Easing 3D Pattern Reasoning with Side-view Features for Semantic Scene Completion
Linxi Huan, Mingyue Dong, Linwei Yue, Shuhan Shen, Xianwei Zheng*
[pdf ]
DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment
Jinsong Shi, Pan Gao*, Xiaojiang Peng, Jie Qin
[pdf ]
MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets
PENG LIAO*, Xilu Wang*, Yaochu Jin*, Wenli Du*
[pdf ]
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Animesh Sinha*, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy L Bearman, Dhruv Mahajan
[pdf ]
Adaptive Annealing for Robust Averaging
Sidhartha Chitturi*, Venu Madhav Govindu
[pdf ]
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong*
[pdf ]
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
Pei Zhou, Yanchao Yang*
[pdf ]
High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
Xin Ming, Jiawei Li, Jingwang Ling, Libo Zhang, Feng Xu*
[pdf ]
Disentangling Masked Autoencoders for Unsupervised Domain Generalization
An Zhang*, Han Wang, Xiang Wang, Tat-Seng Chua
[pdf ]
Early Anticipation of Driving Maneuvers
Abdul Wasi Lone, Shankar Gangisetty*, Shyam Nandan Rai, C. V. Jawahar
[pdf ]
Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
Siqi Liu*, Qirui Wang, Pong C. Yuen
[pdf ]
SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
Yiyang Chen, Siyan Dong*, Xulong Wang, Lulu Cai, Youyi Zheng, Yanchao Yang*
[pdf ]
On the Evaluation Consistency of Attribution-based Explanations
Jiarui Duan, Haoling Li, Haofei Zhang, Hao Jiang, Mengqi Xue, Li Sun, Mingli Song, Jie Song*
[pdf ]
Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Hao Fang, Peng Wu, Yawei Li, Xinxin Zhang, Xiankai Lu*
[pdf ]
InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction
Xulong Wang, Siyan Dong*, Youyi Zheng, Yanchao Yang*
[pdf ]
DreamReward: Aligning Human Preference in Text-to-3D Generation
Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan*, Jun Zhu*
[pdf ]
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen*, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
[pdf ]
Frontier-enhanced Topological Memory with Improved Exploration Awareness for Embodied Visual Navigation
Xinru Cui, Qiming Liu, Zhe Liu, Hesheng Wang*
[pdf ]
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
Baijiong Lin*, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Yingcong Chen
[pdf ]
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li, Lei Li, Yi Liu, Shuhuai Ren, Yuanxin Liu, Rundong Gao, Xu Sun*, Lu Hou
[pdf ]
Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
Jiacheng Cheng*, Xiang Dai, Jia Wan, Nick Antipa, Nuno Vasconcelos
[pdf ]
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu*, Amir Hosein Khasahmadi, Mor Katz, Pradeep Kumar Jayaraman, Yewen Pu, Karl D.D. Willis, Bang Liu*
[pdf ]
Towards Image Ambient Lighting Normalization
Florin-Alexandru Vasluianu*, Tim Seizinger, Zongwei WU*, Rakesh Ranjan, Radu Timofte
[pdf ]
FedHide: Federated Learning by Hiding in the Neighbors
Hyunsin Park*, Sungrack Yun
[pdf ]
Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients
Dohyung Kim, Junghyup Lee, Jeimin Jeon, JAEHYEON MOON, Bumsub Ham*
[pdf ]
SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
Sarah Rastegar*, Mohammadreza Salehi, Yuki M Asano, Hazel Doughty, Cees Snoek
[pdf ]
Self-Cooperation Knowledge Distillation for Novel Class Discovery
Yuzheng Wang*, Zhaoyu Chen, Dingkang Yang, Yunquan Sun, Lizhe Qi*
[pdf ]
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou*, Xu Zheng, Yuanhuiyi Lyu, Lin Wang
[pdf ]
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
Hang Yao, Ming Liu*, Zhicun Yin, Zifei Yan, Xiaopeng Hong, Wangmeng Zuo
[pdf ]
MedRAT: Unpaired Medical Report Generation via Auxiliary Tasks
Elad Hirsch*, Gefen Dawidowicz, Ayellet Tal
[pdf ]
Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
Rosario Leonardi*, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella
[pdf ]
"PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation"
Ginger Delmas*, Philippe Weinzaepfel, Francesc Moreno-Noguer, Gregory Rogez
[pdf ]
A Comparative Study of Image Restoration Networks for General Backbone Network Design
Xiangyu Chen*, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou*, Yu Qiao, Chao Dong*
[pdf ]
Learned Image Enhancement via Color Naming
David Serrano-Lozano*, Luis Herranz, Michael S Brown, Javier Vazquez-Corral
[pdf ]
Synthesizing Time-varying BRDFs via Latent Space
Takuto Narumoto*, Hiroaki Santo, Fumio Okura
[pdf ]
HoloADMM: High-Quality Holographic Complex Field Recovery
Mazen Mel*, Paul Springer, Pietro Zanuttigh, Haitao Zhou, Alexander Gatto
[pdf ]
Fundamental Matrix Estimation Using Relative Depths
Yaqing Ding*, Václav Vávra, Snehal Bhayani, Qianliang Wu, Jian Yang, Zuzana Kukelova
[pdf ]
Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
Otto Seiskari*, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin
[pdf ]
MTaDCS: Moving Trace and Feature Density-based Confidence Sample Selection under Label Noise
Qingzheng Huang, Xilin He, Xiaole Xian, Qinliang Lin, Weicheng Xie*, Siyang Song, Linlin Shen, Zitong Yu
[pdf ]
Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis
Brian Kostadinov Shalon Isaac-Medina*, Yona Falinie Abdul Gaus*, Neelanjan Bhowmik, Toby P Breckon
[pdf ]
GroundUp: Rapid Sketch-Based 3D City Massing
Gizem Esra Unlu*, Mohamed Sayed, Yulia Gryaditskaya, Gabriel Brostow
[pdf ]
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
Vadim Titov*, Madina Khalmatova*, Alexandra Ivanova*, Dmitry P Vetrov, Aibek Alanov*
[pdf ]
DataDream: Few-shot Guided Dataset Generation
Jae Myung Kim*, Jessica Bader, Stephan Alaniz, Cordelia Schmid, Zeynep Akata
[pdf ]
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu*, Zhe Wang*, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu*, Xiaoli Li, Weisi Lin*
[pdf ]
CipherDM: Secure Three-Party Inference for Diffusion Model Sampling
Xin Zhao, Xiaojun Chen*, Xudong Chen, He Li, Tingyu Fan, Zhendong Zhao
[pdf ]
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine MAROUF*, Subhankar Roy, Enzo Tartaglione, Stéphane Lathuilière
[pdf ]
GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
Hao Li, Yuanyuan Gao, Dingwen Zhang*, Chenming Wu, YALUN DAI, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han
[pdf ]
A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
Sha Guo, Lin Sui, Chen-Lin Zhang, Zhuo Chen, Wenhan Yang, Lingyu Duan*
[pdf ]
UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation
Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei*
[pdf ]
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo*, Pedro Morgado
[pdf ]
PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
Xiao Li*, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu
[pdf ]
Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
Jun Xiao*, Changjian Shui, Zhi-Song Liu, Qian Ye, Kin-Man Lam
[pdf ]
Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
Yurong Zhang*, Honghao Chen, Zhang Xinyu, Xiangxiang Chu, Li Song
[pdf ]
Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
Jinghe Yang*, Mingming Gong, Ye Pu
[pdf ]
Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift
Antonio Tejero-de-Pablos*, Riku Togashi, Mayu Otani, Shin'ichi Satoh
[pdf ]
Chains of Diffusion Models
Yanheng Wei*, Lianghua Huang*, Zhi-Fan Wu, Wei Wang, Yu Liu, Mingda Jia, Shuailei Ma
[pdf ]
Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
Phuong Hoang Dam*, Jihoon Jeong*, Anh T Tran*, Daeyoung Kim*
[pdf ]
Feature Diversification and Adaptation for Federated Domain Generalization
Seunghan Yang*, Seokeon Choi, Hyunsin Park, Sungha Choi, Simyung Chang, Sungrack Yun
[pdf ]
Grounding Image Matching in 3D with MASt3R
Vincent Leroy*, Yohann Cabon, Jerome Revaud
[pdf ]
TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
Jun Li*, Zedong Zhang, Jian Yang
[pdf ]
RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
Thang-Anh-Quan Nguyen*, Luis G Roldao Jimenez*, Nathan Piasco*, Moussab Bennehar*, Dzmitry Tsishkou*
[pdf ]
RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
Ming Chang, Xishan Zhang*, Rui Zhang, Zhipeng Zhao, Guanhua He, Shaoli Liu
[pdf ]
Efficient Bias Mitigation Without Privileged Information
Mateo Espinosa Zarlenga*, Swami Sankaranarayanan, Jerone T. A. Andrews, Zohreh Shams, Mateja Jamnik, Alice Xiang
[pdf ]
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
Ivan Martinović*, Josip Šarić, Siniša Šegvić
[pdf ]
Learning Neural Deformation Representation for 4D Dynamic Shape Generation
Gyojin Han*, Jiwan Hur, Jaehyun Choi, Junmo Kim*
[pdf ]
Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
Hyejin Park, Dongbo Min*
[pdf ]
Decomposition Betters Tracking Everything Everywhere
Rui Li, Dong Liu*
[pdf ]
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han*, Jinglei Tang*
[pdf ]
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
Camillo Quattrocchi*, Antonino Furnari, Daniele Di Mauro, Mario Valerio Giuffrida, Giovanni Maria Farinella
[pdf ]
LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
Yabin Zhang*, Wenjie Zhu, Chenhang He, Lei Zhang*
[pdf ]
Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification
Yan Jiang, Xu Cheng*, Hao Yu, Xingyu Liu, Haoyu Chen, Guoying Zhao
[pdf ]
Self-Supervised Video Desmoking for Laparoscopic Surgery
Renlong Wu, Zhilu Zhang*, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen*, Wangmeng Zuo
[pdf ]
Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining
Diwei Su, cheng fei, Jianxu Luo*
[pdf ]
Continuity Preserving Online CenterLine Graph Learning
Yunhui Han, Kun Yu, Zhiwei Li*
[pdf ]
Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping
Minseong Park, Suhan Woo, Euntai Kim*
[pdf ]
MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
Jiayue Liu, Xiao Tang, Freeman Cheng, Zihao Yang, Zhihao Li*, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan*
[pdf ]
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
Christos Koutlis*, Symeon Papadopoulos
[pdf ]
Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
Yanmeng Yao, Xiaohan Zhao, Bin Gu*
[pdf ]
HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos
Lixin Xue*, Chen Guo, Chengwei Zheng, Fangjinhua Wang, Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, Jie Song, Otmar Hilliges
[pdf ]
Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
Zefan Qu, Xinyang Jiang*, Yifan Yang, Dongsheng Li, Cairong Zhao*
[pdf ]
PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
Amrin Kareem*, Jean Lahoud, Hisham Cholakkal*
[pdf ]
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Jungwoo Kim, Wooseok Jang, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin*, Seungryong Kim*
[pdf ]
Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation
Zhaoyang Li*, Yuan Wang, Wangkai Li, Rui Sun, Tianzhu Zhang
[pdf ]
Think before Placement: Common Sense Enhanced Transformer for Object Placement
Yaxuan Qin, Jiayu Xu, Ruiping Wang*, Xilin Chen
[pdf ]
Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD)
Marko Savic, Guoying Zhao*
[pdf ]
Leveraging Imperfect Restoration for Data Availability Attack
YI HUANG*, Jeremy Styborski*, Mingzhi Lyu*, Fan Wang*, Wai-Kin Adams Kong*
[pdf ]
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang*
[pdf ]
Open-set Domain Adaptation via Joint Error based Multi-class Positive and Unlabeled Learning
Dexuan Zhang*, Thomas Westfechtel, Tatsuya Harada
[pdf ]
DoubleTake: Geometry Guided Depth Estimation
Mohamed Sayed*, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Guillermo Garcia-Hernando, Gabriel Brostow, Sara Vicente, Michael Firman
[pdf ]
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
Fangwei Zhong*, Kui Wu, Hai Ci, Chu-ran Wang, Hao Chen
[pdf ]
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Yunzhi Yan*, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng*
[pdf ]
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li*, hangyu guo, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
[pdf ]
Edge-Guided Fusion and Motion Augmentation for Event-Image Stereo
Fengan Zhao*, Qianang Zhou, Junlin Xiong*
[pdf ]
MetaWeather: Few-Shot Weather-Degraded Image Restoration
Youngrae Kim*, Younggeol Cho, Thanh-Tung Nguyen, Seunghoon Hong, Dongman Lee*
[pdf ]
CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance
Zhipeng Hu, Yongqiang Zhang*, Chen Liu, Lincheng Li*, Sida Peng, Xiaowei Zhou, Changjie Fan, Xin Yu
[pdf ]
"Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition"
Sergio Izquierdo*, Javier Civera*
[pdf ]
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Wangbo Yu*, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Wenbo Hu, Long Quan, Ying Shan, Yonghong Tian
[pdf ]
Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
Jianan Fan*, Dongnan Liu, Canran Li, Hang Chang, Heng Huang, Filip Braet, Mei Chen, Weidong Cai*
[pdf ]
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Amin Parchami-Araghi*, Moritz Böhle, Sukrut Rao, Bernt Schiele
[pdf ]
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu*
[pdf ]
FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
Junhyuk So, Jungwon Lee, Eunhyeok Park*
[pdf ]
Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Prakash Chandra Chhipa*, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah
[pdf ]
TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
Xixi Liu*, Christopher Zach
[pdf ]
CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
Zhangchen Ye, Tao Jiang, Chenfeng Xu, Yiming Li, Hang Zhao*
[pdf ]
SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
Niklas Gard*, Anna Hilsmann, Peter Eisert
[pdf ]
Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
Mohamed El Amine Boudjoghra*, Jean Lahoud, Salman Khan, Hisham Cholakkal, Rao M Anwer, Fahad Shahbaz Khan
[pdf ]
DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting
Linus Härenstam-Nielsen*, Lu Sang, Abhishek Saroha, Nikita Araslanov*, Daniel Cremers*
[pdf ]
Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
Lorenzo Vaquero*, Yihong Xu, Xavier Alameda-Pineda, Victor M. Brea, Manuel Mucientes
[pdf ]
Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection
Kangqi Ma*, Hao Dong, Yadong Mu
[pdf ]
Region-Native Visual Tokenization
Mengyu Wang*, Yuyao Huang, Henghui Ding, Xinlong Wang, Tiejun Huang, Yao Zhao, Yunchao Wei, Shuicheng Yan
[pdf ]
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
Mae Younes*, Amine Ouasfi, Adnane Boukhayma
[pdf ]
Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch Image
Fei Wang*
[pdf ]
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen*, Iro Laina, Andrea Vedaldi
[pdf ]
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
Jiafeng Mao*, Xueting Wang, Kiyoharu Aizawa
[pdf ]
Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond
Silvio Galesso*, Philipp Schröppel*, Hssan Driss, Thomas Brox
[pdf ]
Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction
Zijie Jiang*, Tianhan Xu*, Hiroharu Kato
[pdf ]
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
Tianhe Wu, Kede Ma*, Jie Liang, Yujiu Yang*, Lei Zhang
[pdf ]
Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment
Wulian Yun, Mengshi Qi, Fei Peng, Huadong Ma*
[pdf ]
Efficient Neural Video Representation with Temporally Coherent Modulation
Seungjun Shin*, Suji Kim*, Dokwan Oh
[pdf ]
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu*
[pdf ]
DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao*, Lin Wang, Lik-Hang Lee, Peng Yuan Zhou*
[pdf ]
Multi-modal Crowd Counting via a Broker Modality
Haoliang Meng, Xiaopeng Hong*, Chenhao Wang, Miao Shang, Wangmeng Zuo
[pdf ]
FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation
tianyu zhang, Guocheng Qian, Jin Xie*, Jian Yang
[pdf ]
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Charig Yang*, Weidi Xie, Andrew Zisserman
[pdf ]
PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration
Runzhao Yao, Shaoyi Du*, Wenting Cui, Canhui Tang, Chengwu Yang
[pdf ]
Open-Vocabulary RGB-Thermal Semantic Segmentation
GuoQiang Zhao, JunJie Huang, Xiaoyun Yan*, Zhaojing Wang, Junwei Tang, Yangjun Ou, Xinrong Hu, Tao Peng
[pdf ]
MeshVPR: Citywide Visual Place Recognition Using 3D Meshes
Gabriele Berton*, Lorenz Junglas, Riccardo Zaccone, Thomas Pollok, Barbara Caputo, Carlo Masone
[pdf ]
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu*
[pdf ]
Concise Plane Arrangements for Low-Poly Surface and Volume Modelling
Raphael Sulzer, Florent Lafarge*
[pdf ]
KeypointDETR: An End-to-End 3D Keypoint Detector
Hairong Jin, Yuefan Shen, Jianwen Lou, Kun Zhou, Youyi Zheng*
[pdf ]
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Sogand Salehi*, Mahdi Shafiei, Roman Bachmann, Teresa Yeo, Amir Zamir
[pdf ]
MLPHand: Real Time Multi-View 3D Hand Reconstruction via MLP Modeling
Jian Yang, Jiakun Li, Guoming Li, Huaiyu Wu, Zhen Shen, Zhaoxin Fan*
[pdf ]
uCAP: An Unsupervised Prompting Method for Vision-Language Models
A. Tuan Nguyen*, Kai Sheng Tai, Bor-Chun Chen, Satya Narayan Shukla, Hanchao Yu, Philip Torr, Tai-Peng Tian, Ser-Nam Lim
[pdf ]
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar, Zhenshi Li, Feng Gu, Xueliang Zhang*, Pengfeng Xiao
[pdf ]
How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology
Andrei Atanov*, Rishubh Singh, Jiawei Fu, Isabella Yu, Andrew Spielberg, Amir Zamir
[pdf ]
MONTAGE: Monitoring Training for Attribution of Generative Diffusion Models
Jonathan Brokman*, Omer Hofman, Roman Vainshtein, Amit Giloni, Toshiya Shimizu, Inderjeet Singh, Oren Rachmil, Alon Zolfi, Asaf Shabtai, Yuki Unno, Hisashi Kojima
[pdf ]
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov*, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin F Elsayed, Mohamed Elhoseiny
[pdf ]
Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination
Yunan Li*, Yihao Zhang, Shoude Li, Long Tian, DOU QUAN, Chaoneng Li, Qiguang Miao*
[pdf ]
Self-supervised visual learning from interactions with objects
Arthur Aubret*, Céline Teulière, Jochen Triesch
[pdf ]
OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
Yuchen Che*, Ryo Furukawa, Asako Kanezaki
[pdf ]
BAFFLE: A Baseline of Backpropagation-Free Federated Learning
Haozhe Feng*, Tianyu Pang*, Chao Du, Wei Chen*, Shuicheng Yan, Min Lin
[pdf ]
Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
Mathieu Cyrille Simon*, Pascal Frossard, Christophe De Vleeschouwer
[pdf ]
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
Akshay Krishnan*, Abhijit Kundu*, Kevis-Kokitsi Maninis, James Hays, Matthew Brown
[pdf ]
3R-INN: How to be climate friendly while consuming/delivering videos?
ZOUBIDA AMEUR*, Claire-Helene Demarty, Olivier LE MEUR, Daniel Menard
[pdf ]
Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
Bingyu Xin*, Meng Ye, Leon Axel, Dimitris N. Metaxas
[pdf ]
Towards Robust Full Low-bit Quantization of Super Resolution Networks
Denis S. Makhov*, Irina Zhelavskaya, Ruslan Ostapets, Dehua Song, Kirill Solodskikh
[pdf ]
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong*
[pdf ]
Diverse Text-to-3D Synthesis with Augmented Text Embedding
Uy Dieu Tran*, Minh N. Hoang Luu*, Phong Ha Nguyen*, Khoi Nguyen*, Binh-Son Hua*
[pdf ]
Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation
Mathias Öttl*, Frauke Wilm, Jana Steenpass, Jingna Qiu, Matthias Rübner, Prof Arndt Hartmann, Matthias W. Beckmann, Peter Fasching, Andreas K Maier, Ramona Erber, Bernhard Kainz, Katharina Breininger
[pdf ]
LLMCO4MR: LLMs-aided Neural Combinatorial Optimization for Ancient Manuscript Restoration from Fragments with Case Studies on Dunhuang
Yuqing Zhang, Hangqi Li, Shengyu Zhang*, Runzhong Wang, Baoyi He, Huaiyong Dou, Junchi Yan*, Yongquan Zhang, Fei Wu
[pdf ]
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
MohammadReza Davari*, Eugene Belilovsky
[pdf ]
AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
Roye Katzav*, Amit Giloni, Edita Grolman*, Hiroo Saito, Tomoyuki Shibata, Tsukasa Omino, Misaki Komatsu, Yoshikazu Hanatani, Yuval Elovici, Asaf Shabtai
[pdf ]
iHuman: Instant Animatable Digital Humans From Monocular Videos
Pramish Paudel*, Anubhav Khanal, Danda Pani Paudel, Jyoti Tandukar, Ajad Chhatkuli
[pdf ]
SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
Heyuan Li*, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying CHEN, Xiaoguang Han*
[pdf ]
Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
Prantik Howlader*, Srijan Das, Hieu Le, Dimitris Samaras
[pdf ]
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan*
[pdf ]
Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network
Rui Li, Mikhail Kudryashev, Artur Yakimovich*
[pdf ]
Face Reconstruction Transfer Attack as Out-of-Distribution Generalization
Yoon Gyo Jung*, Jaewoo Park, Xingbo Dong, Hojin Park, Andrew Beng Jin Teoh, Octavia Camps*
[pdf ]
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa*, Davide Boscaini, Amir Hamza, Fabio Poiesi
[pdf ]
Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems
Hyungjin Chung, Jong Chul Ye*
[pdf ]
Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
Prantik Howlader*, Hieu Le, Dimitris Samaras
[pdf ]
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li, Junfeng Wu, Weizhi Zhao, Song Bai, Xiang Bai*
[pdf ]
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong*, Yuki Kawana, Rajat Saini, Ashutosh Kumar, Jingjing Pan, Ta Gu, Yohei Ozao, Balazs Opra, Yoichi Sato, Norimasa Kobori
[pdf ]
Spiking Wavelet Transformer
Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu*
[pdf ]
WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
Yutang Feng, Sicheng Gao*, Yuxiang Bao, Xiaodi Wang, Shumin Han*, Juan Zhang*, Baochang Zhang, Angela Yao
[pdf ]
PDT Uav Target Detection Dataset for Pests and Diseases Tree
Mingle Zhou, Rui Xing, Delong Han, Zhiyong Qi, Gang Li*
[pdf ]
Hypernetworks for Generalizable BRDF Representation
Fazilet Gokbudak*, Alejandro Sztrajman, Chenliang Zhou, Fangcheng Zhong, Rafal Mantiuk, A. Cengiz Oztireli
[pdf ]
Photon Inhibition for Energy-Efficient Single-Photon Imaging
Lucas J Koerner*, Shantanu Gupta, Atul N Ingle, Mohit Gupta
[pdf ]
COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
Hao-Ran Yang, Chuan-Xian Ren*, You-Wei Luo
[pdf ]
RANRAC: Robust Neural Scene Representations via Random Ray Consensus
Benno Buschmann*, Andreea Dogaru, Elmar Eisemann, Michael Weinmann, Bernhard Egger
[pdf ]
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang*, Renjing Pei, Guansong Lu, Songcen Xu, Wei Zhang, Hang Xu
[pdf ]
Characterizing Model Robustness via Natural Input Gradients
Adrian Rodriguez-Munoz*, Tongzhou Wang, Antonio Torralba
[pdf ]
UpFusion: Novel View Diffusion from Unposed Sparse View Observations
Bharath Raj Nagoor Kani*, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani
[pdf ]
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Ozan Unal*, Christos Sakaridis, Suman Saha, Luc Van Gool
[pdf ]
"SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks"
Abhishek Singh*, Vivek Sharma, Rohan Sukumaran, John J Mose, Jeffrey K Chiu, Justin Yu, Ramesh Raskar
[pdf ]
Tuning-Free Image Customization with Image and Text Guidance
Pengzhi Li, Qiang Nie, Ying Chen, Xi Jiang, Kai Wu, Yuhuan Lin, Yong Liu, Jinlong Peng, Chengjie Wang, Feng Zheng*
[pdf ]
FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
Yu Tian*, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang
[pdf ]
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung, Hyejin Park, Jiyeong Kim, Dongbo Min*
[pdf ]
DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Yi-Xin Huang*, Hou-I Liu, Hong-Han Shuai, Wen-Huang Cheng
[pdf ]
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
Homanga Bharadhwaj*, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani
[pdf ]
SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Hiba Dahmani*, Moussab Bennehar, Nathan Piasco, Luis G Roldao Jimenez, Dzmitry Tsishkou
[pdf ]
Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
Dongbin Zhang*, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang*
[pdf ]
Few-shot Defect Image Generation based on Consistency Modeling
Qingfeng Shi, Jing Wei, Fei Shen*, Zhengtao Zhang
[pdf ]
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
Ada-Astrid Balauca*, Danda Pani Paudel, Kristina Toutanova, Luc Van Gool
[pdf ]
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali*, Adrian Bulat*, Brais Martinez, Georgios Tzimiropoulos
[pdf ]
Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning
yuehui han*, Can Xu, Rui Xu, Jianjun Qian, Jin Xie
[pdf ]
Prompt-Based Test-Time Real Image Dehazing: A Novel Pipeline
Zixuan Chen, Zewei He*, Ziqian Lu, Xuecheng Sun, Zheming Lu
[pdf ]
Video Editing via Factorized Diffusion Distillation
Uriel Singer*, Amit Zohar*, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman
[pdf ]
Trackastra: Transformer-based cell tracking for live-cell microscopy
Benjamin Gallusser, Martin Weigert*
[pdf ]
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng*, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong*, Ming Ding*, Jie Tang*
[pdf ]
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma*, Mark Goldstein, Michael Albergo, Nicholas M Boffi, Eric Vanden-Eijnden*, Saining Xie*
[pdf ]
Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
Baicheng Li*, Zike Yan*, Dong Wu, Hanqing Jiang, Hongbin Zha*
[pdf ]
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
Sudhir Yarram*, Junsong Yuan
[pdf ]
GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring
Emanuele Santellani*, Martin Zach, Christian Sormann, Mattia Rossi, Andreas Kuhn, Friedrich Fraundorfer
[pdf ]
Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
Sizhuo Li, Dimitri Gominski*, Martin Brandt, Xiaoye Tong, Philippe Ciais
[pdf ]
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Daniel Winter*, Matan Cohen, Shlomi Fruchter, Yael Pritch, Alex Rav-Acha, Yedid Hoshen*
[pdf ]
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
ZiYang Gong, FuHao Li, Yupeng Deng, Deblina Bhattacharjee, Xianzheng Ma*, Xiangwei Zhu*, Zhenming Ji*
[pdf ]
Curved Diffusion: A Generative Model With Optical Geometry Control
Andrey Voynov*, Amir Hertz, Moab Arar, Shlomi Fruchter, Daniel Cohen-Or
[pdf ]
Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Guangchi Fang, Bing Wang*
[pdf ]
MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis
Ziming Zhong*, Yanyu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao
[pdf ]
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
Kwanyoung Kim, Yujin Oh, Jong Chul Ye*
[pdf ]
Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
Yannick Kirchhoff*, Maximilian R Rokuss*, Saikat Roy*, Balint Kovacs, Constantin Ulrich, Tassilo Wald, Maximilian Zenk, Philipp Vollmuth, Jens Kleesiek, Fabian Isensee, Klaus H. Maier-Hein
[pdf ]
Conceptual Codebook Learning for Vision-Language Models
Yi Zhang*, Ke Yu, Siqi Wu, Zhihai He*
[pdf ]
LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu*, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, Elahe Arani, Oleg Sinavski
[pdf ]
AnimateMe: 4D Facial Expressions via Diffusion Models
Dimitrios Gerogiannis*, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias, Alexandros Lattas, Stylianos Moschoglou, Stylianos Ploumpis, Stefanos Zafeiriou
[pdf ]
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang, Garrett Bingham*, Adams Wei Yu, Quoc V. Le, Thang Luong, Golnaz Ghiasi
[pdf ]
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Kevin Xie*, Tianshi Cao, Jonathan P Lorraine, Jun Gao, James R Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng
[pdf ]
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
Tianyuan Yuan*, Yucheng Mao, Jiawei Yang, Yicheng LIU, Yue Wang, Hang Zhao*
[pdf ]
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren*, Yaxin Li, Shenglai Zeng, Han Xu, Lingjuan Lyu, Yue Xing, Jiliang Tang
[pdf ]
iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
Tom Fischer*, Yaoyao Liu, Artur Jesslen, Noor Ahmed, Prakhar Kaushik, Angtian Wang, Alan Yuille, Adam Kortylewski, Eddy Ilg
[pdf ]
Context Diffusion: In-Context Aware Image Generation
Ivona Najdenkoska*, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic
[pdf ]
Pose Guided Fine-Grained Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang, Jichao Feng, liu peidong, Wei Feng*
[pdf ]
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
Ali Zare*, Yulei Niu, Hammad Ayyubi, Shih-Fu Chang
[pdf ]
Certifiably Robust Image Watermark
Zhengyuan Jiang*, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong
[pdf ]
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
Sukrut Rao*, Sweta Mahajan*, Moritz Böhle, Bernt Schiele
[pdf ]
Online Zero-Shot Classification with CLIP
Qi Qian*, Juhua Hu
[pdf ]
SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
Qi Qian*, Yuanhong Xu, Juhua Hu
[pdf ]
Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
Yuqi Jia, Saeed Vahidian*, Jingwei Sun, Jianyi Zhang, Vyacheslav Kungurtsev, Neil Zhenqiang Gong, Yiran Chen
[pdf ]
Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting
Masoumeh Zareapoor, Pourya Shamsolmoali*
[pdf ]
Quality Assured: Rethinking Annotation Strategies in Imaging AI
Tim Rädsch*, Annika Reinke, Vivienn Weru, Minu D. Tizabi, Nicholas Heller, Fabian Isensee, Annette Kopp-Schneider, Lena Maier-Hein*
[pdf ]
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto*, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
[pdf ]
Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder
Jiajie Fan*, Amal Trigui*, Thomas Bäck, Hao Wang
[pdf ]
Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
Yufei Zhang*, Jeffrey Kephart, Qiang Ji*
[pdf ]
3D Reconstruction of Objects in Hands without Real World 3D Supervision
Aditya Prakash*, Matthew Chang, Matthew Jin, Ruisen Tu, Saurabh Gupta
[pdf ]
To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
Souhail Hadgi*, Lei Li, Maks Ovsjanikov
[pdf ]
Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
Xueyi Liu*, Kangbo Lyu, jieqiong zhang, Tao Du, Li Yi*
[pdf ]
3D Hand Pose Estimation in Everyday Egocentric Images
Aditya Prakash*, Ruisen Tu, Matthew Chang, Saurabh Gupta
[pdf ]
Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops
Aditya Prakash*, Arjun Gupta, Saurabh Gupta
[pdf ]
Towards Neuro-Symbolic Video Understanding
Minkyu Choi*, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali
[pdf ]
Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
Hanjing Wang*, Bashirul Azam Biswas, Qiang Ji
[pdf ]
Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
Tatsuya Sasaki*, Yoshiki Ito, Satoshi Kondo
[pdf ]
Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
Meilong Xu*, Xiaoling Hu, Saumya Gupta, Shahira Abousamra, Chao Chen
[pdf ]
Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
Noam Elata*, Tomer Michaeli, Michael Elad
[pdf ]
Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
Niki Amini-Naieni*, Tomas Jakab, Andrea Vedaldi, Ronald Clark
[pdf ]
MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
Sanbao Su, Xin Li*, Thang Doan, Sima Behpour, Wenbin He, Liang Gou, Fei Miao, Liu Ren
[pdf ]
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Hyesong Choi, Hyejin Park, Kwang Moo Yi, Sungmin Cha, Dongbo Min*
[pdf ]
Data Augmentation via Latent Diffusion for Saliency Prediction
Bahar Aydemir*, Deblina Bhattacharjee, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk
[pdf ]
Explorative Inbetweening of Time and Space
Haiwen Feng*, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Fernandez Abrevaya, Michael J. Black, Xuaner Zhang
[pdf ]
A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control
Karim Kadry*, Shreya Gupta, Jonas Sogbadji, Michiel Schaap, Kersten Petersen, Takuya Mizukami, Carlos Collet, Farhad R. Nezami, Elazer R Edelman
[pdf ]
Learning to Make Keypoints Sub-Pixel Accurate
Shinjeong Kim*, Marc Pollefeys, Daniel Barath
[pdf ]
Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images
Frederik Hoppe*, Claudio Mayrink Verdun, Hannah Sophie Laus, Sebastian Endt, Marion Irene Menzel, Felix Krahmer, Holger Rauhut
[pdf ]
Generalizable Human Gaussians for Sparse View Synthesis
YoungJoong Kwon*, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo J Takagi, Daeil Kim, Aayush Prakash, Fernando de la Torre
[pdf ]
DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
Li Xiaofan*, Zhang Yifu*, Ye Xiaoqing*
[pdf ]
Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
Levente Halmosi, Bálint Mohos, Márk Jelasity*
[pdf ]
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
Sahil S Khose*, Anisha Pal, Aayushi Agarwal, . Deepanshi, Judy Hoffman, Prithvijit Chattopadhyay
[pdf ]
Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
Jordão Bragantini*, Merlin Lange, Loïc A Royer
[pdf ]
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
Yuxuan Mu*, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofei Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng
[pdf ]
AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation
Shengkun Tang*, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu
[pdf ]
PFedEdit: Personalized Federated Learning via Automated Model Editing
Haolin Yuan*, William Paul, John Aucott, Philippe Burlina, Yinzhi Cao*
[pdf ]
De-Confusing Pseudo-Labels in Source-Free Domain Adaptation
Idit Diamant*, Amir Rosenfeld, Idan Achituve, Jacob Goldberger, Arnon Netzer
[pdf ]
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci*, Sezgin Er, Anjany Sekuboyina, Enis Simsar, Alperen Tezcan, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Furkan Almas, Irem Dogan, Muhammed Furkan Dasdelen, Chinmay Prabhakar, Hadrien Reynaud, Sarthak Pati, Christian Bluethgen, Mehmet Kemal Ozdemir, Bjoern Menze
[pdf ]
EraseDraw : Learning to Insert Objects by Erasing Them from Images
Alper Canberk*, Maksym Bondarenko, Ege Ozguroglu, Ruoshi Liu, Carl Vondrick
[pdf ]
SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
Alind Khare*, Animesh Agrawal, Aditya Annavajjala, Payman Behnam, Myungjin Lee, Hugo M Latapie, Alexey Tumanov
[pdf ]
Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
Francesco Croce*, Naman D. Singh, Matthias Hein*
[pdf ]
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan*, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal
[pdf ]
Keypoint Promptable Re-Identification
Vladimir Somers*, Alexandre Alahi, Christophe De Vleeschouwer
[pdf ]
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
Fabio Quattrini*, Vittorio Pippi, Silvia Cascianelli*, Rita Cucchiara
[pdf ]
DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Angelos Kratimenos*, Jiahui Lei, Kostas Daniilidis
[pdf ]
Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
Remy Sabathier*, David Novotny, Niloy Mitra
[pdf ]
Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores
Lucas Goncalves, Prashant Mathur*, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu Han
[pdf ]
MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
Mohammad Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Wang*, Peizhao Li, Adriano Cardace, Petros Boufounos
[pdf ]
Training A Secure Model against Data-Free Model Extraction
Zhenyi Wang*, Li Shen*, junfeng guo, Tiehang Duan, Siyu Luan, Tongliang Liu, Mingchen Gao
[pdf ]
EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control
Christopher May*, Daniel Aliaga
[pdf ]
TriNeRFLet: A Wavelet Based Triplane NeRF Representation
Rajaei Khatib*, Raja Giryes*
[pdf ]
EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset
Amy Zhao, Chengcheng Tang, Lezi Wang, Yijing Li, Mihika Dave, Lingling Tao*, Christopher D. Twigg, Robert Y. Wang
[pdf ]
Photorealistic Video Generation with Diffusion Models
Agrim Gupta*, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, Jose Lezama
[pdf ]
RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
Tatiana Gaintseva*, Martin Benning, Gregory Slabaugh*
[pdf ]
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Chinchure*, Pushkar Shukla*, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk
[pdf ]
Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Naoya Sogi*, Takashi Shibata*, Makoto Terao*
[pdf ]
DECIDER: Leveraging Foundation Model Priors for Improved Model Failure Detection and Explanation
Rakshith Subramanyam*, Kowshik Thopalli*, Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan
[pdf ]
Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
Minh Tran*, Yelin Kim, Che-Chun Su, Min Sun, Cheng-Hao Kuo, Mohammad Soleymani
[pdf ]
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li*, Renhao Wang, Po-Yao Huang, Andrew Owens, Gopala Krishna Anumanchipalli
[pdf ]
SAVE: Protagonist Diversification with Structure Agnostic Video Editing
Yeji Song*, Wonsik Shin, Junsoo Lee, Jeesoo Kim, Nojun Kwak*
[pdf ]
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang*, Yuhui Zhang, Orr Zohar, Serena Yeung-Levy
[pdf ]
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thong Thanh Nguyen*, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi M Le, Cong-Duy Nguyen, See Kiong Ng, Anh Tuan Luu
[pdf ]
Source-Free Domain-Invariant Performance Prediction
Ekaterina Khramtsova*, Mahsa Baktashmotlagh, Guido Zuccon, Xi Wang, Mathieu Salzmann
[pdf ]
Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
Sayanton V. Dibbo*, Adam Breuer, Juston Moore, Michael Teti
[pdf ]
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim*, Ze Wang, Qiang Qiu
[pdf ]
Direct Distillation between Different Domains
Jialiang Tang, Shuo Chen*, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong*, Masashi Sugiyama
[pdf ]
Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
Andy V Huynh*, Lauren Gillespie, Jael Lopez-Saucedo, Claire Tang, Rohan Sikand, Moisés Expósito-Alonso
[pdf ]
V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation
Pooja Guhan*, Tsung-Wei Huang, Guan-Ming Su, Subhadra Gopalakrishnan, Dinesh Manocha
[pdf ]
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu*, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang
[pdf ]
LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
Hongbeen Park, Minjeong Park, Giljoo Nam, Jinkyu Kim*
[pdf ]
Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning
Seokwon Shin, Hyungrok Do, Youngdoo Son*
[pdf ]
Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending
Delong Wu, Hao Zhu, Qi Zhang, You Li, Xun Cao*, Zhan Ma*
[pdf ]
Geometry Fidelity for Spherical Images
Anders Christensen*, Nooshin Mojab*, Khushman Patel, Karan Ahuja, Zeynep Akata, Ole Winther, Mar Gonzalez Franco, Andrea Colaco
[pdf ]
BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Cheng Peng*, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa
[pdf ]
CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
Erum Mushtaq*, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Jie Ding, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr
[pdf ]
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
Jiachen Lu, Ze Huang, Zeyu Yang, Zhang Jiahui, Li Zhang*
[pdf ]
Benchmarking Spurious Bias in Few-Shot Image Classifiers
Guangtao Zheng*, Wenqian Ye, Aidong Zhang
[pdf ]
TurboEdit: Real-time text-based disentangled real image editing
Zongze Wu*, Nicholas I Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman
[pdf ]
Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy
Fadlullah A Raji*, John Murray-Bruce*
[pdf ]
Augmented Neural Fine-tuning for Efficient Backdoor Purification
Nazmul Karim*, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Nazanin Rahnavard
[pdf ]
REDIR: Refocus-free Event-based De-occlusion Image Reconstruction
Qi Guo, Hailong Shi*, Huan Li, Jinsheng Xiao, Xingyu Gao*
[pdf ]
Free-Editor: Zero-shot Text-driven 3D Scene Editing
Nazmul Karim*, Hasan Iqbal, Umar Khalid, Chen Chen, Jing Hua
[pdf ]
DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
Fenggen Yu*, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang
[pdf ]
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
Zhiyu Tan, Mengping Yang, Luozheng Qin , Hao Yang, Ye Qian , Qiang Zhou, Cheng Zhang, Hao Li*
[pdf ]
Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
Chenxi Liu*, Zhenyi Wang, Tianyi Xiong, Ruibo Chen, Yihan Wu, junfeng guo, Heng Huang*
[pdf ]
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang*
[pdf ]
Generalizable Symbolic Optimizer Learning
Xiaotian Song, Peng Zeng, Yanan Sun*, Andy Song
[pdf ]
Online Continuous Generalized Category Discovery
Keon-Hee Park, Hakyung Lee, Kyungwoo Song*, Gyeong-Moon Park*
[pdf ]
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Shihao Zhao*, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong*
[pdf ]
Tackling Structural Hallucination in Image Translation with Local Diffusion
Seunghoi Kim*, Chen Jin, Tom Diethe, Matteo Figini, Henry FJ Tregidgo, Asher Mullokandov, Philip A Teare, Daniel Alexander
[pdf ]
Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Ping Wang*, Yulun Zhang, Lishun Wang, Xin Yuan*
[pdf ]
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He, Yifan Yang, Xinyang Jiang, Xufang Luo*, Haoji Hu, Siyun Zhao, Dongsheng Li, Yuqing Yang, Lili Qiu
[pdf ]
On the Vulnerability of Skip Connections to Model Inversion Attacks
Jun Hao Koh*, Sy-Tuyen Ho, Ngoc-Bao Nguyen, Ngai-Man Cheung
[pdf ]
Adversarial Robustification via Text-to-Image Diffusion Models
Daewon Choi, Jongheon Jeong, Huiwon Jang, Jinwoo Shin*
[pdf ]
Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection
Yunfeng FAN*, Wenchao Xu*, Haozhao Wang, Fushuo Huo, Jinyu Chen, Song Guo
[pdf ]
Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
Xianren Zhang, Dongwon Lee, Suhang Wang*
[pdf ]
Reinforcement Learning via Auxillary Task Distillation
Abhinav N Harish*, Larry Heck, Josiah P Hanna, Zsolt Kira, Andrew Szot
[pdf ]
DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
Sanghyun Jo, Fei Pan, In-Jae Yu, Kyungsu Kim*
[pdf ]
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo*, Bohan Zhou, Zongqing Lu*
[pdf ]
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
Haodi He, Colton Stearns, Adam Harley, Leonidas Guibas*
[pdf ]
Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
Tianyou Luo*, Quan Yuan*, Yuchen Xia, Guiyang Luo, Yujia Yang, Jinglin Li
[pdf ]
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang*, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao*, Shao-Yuan Lo*
[pdf ]
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen, Wei-Hua Li, Cheng Sun, Yu-Chiang Frank Wang, Chu-Song Chen*
[pdf ]
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Sanghyun Jo, Soohyun Ryu, Sungyub Kim, Eunho Yang, Kyungsu Kim*
[pdf ]
Learning Quantized Adaptive Conditions for Diffusion Models
Yuchen Liang*, Yuchuan Tian, Lei Yu, Huaao Tang, Jie Hu, Xiangzhong Fang, Hanting Chen*
[pdf ]
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
Yu Yongcan, Lijun Sheng, Ran He, Jian Liang*
[pdf ]
Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry
Shengjie Zhu*, Girish Chandar Ganesan, Abhinav Kumar, Xiaoming Liu
[pdf ]
Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic*
[pdf ]
High-Fidelity Modeling of Generalizable Wrinkle Deformation
Jingfan Guo, Jae Shin Yoon, Shunsuke Saito, Takaaki Shiratori, Hyun Soo Park*
[pdf ]
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang*, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang
[pdf ]
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei, Shaofeng Yin, Yuxin Peng, Yang Liu*
[pdf ]
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu*
[pdf ]
Revisit Self-supervision with Local Structure-from-Motion
Shengjie Zhu*, Xiaoming Liu
[pdf ]
FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis
Vishnu Mani Hema*, Shubhra Aich, Christian Haene, Jean-Charles Bazin, Fernando de la Torre
[pdf ]
Efficient Learning of Event-based Dense Representation using Hierarchical Memories with Adaptive Update
Uday Kamal*, Saibal Mukhopadhyay
[pdf ]
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
KyungHwan Shim, Jaewoong Yun, Shinkook Choi*
[pdf ]
Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation
lei wang, Zejian Yuan, Badong Chen*
[pdf ]
Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats
Mingyang Xie*, Haoming Cai, Sachin Shah, Yiran Xu, Brandon Y. Feng, Jia-Bin Huang, Christopher A. Metzler
[pdf ]
PALM: Predicting Actions through Language Models
Sanghwan Kim*, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc Van Gool, Xi Wang
[pdf ]
Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation
Clinton A Mo, Kun Hu*, Chengjiang Long, Dong Yuan, Zhiyong Wang
[pdf ]
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Trung Tuan Dao*, Thuan Hoang Nguyen, Thanh Van Le, Duc H Vu, Khoi Nguyen, Cuong Pham, Anh T Tran*
[pdf ]
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen*, Kai Li, Wentao Bao, Deep Patel, Yu Kong, Martin Renqiang Min, Dimitris N. Metaxas*
[pdf ]
Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
Yifei Yang, Wonjun Lee, Dongmian Zou*, Gilad Lerman
[pdf ]
VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG
Yankun Xu*, Junzhe Wang, Yun-Hsuan Chen, Jie Yang, Wenjie Ming, Shuang Wang, Mohamad Sawan*
[pdf ]
DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
Yusuke Yoshiyasu*, Leyuan Sun
[pdf ]
Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
Jeremy Styborski*, Mingzhi Lyu*, Yi Huang*, Adams Kong*
[pdf ]
Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je-Hwan Ryu, Woontack Woo, Tae-Kyun (T-K) Kim*
[pdf ]
Human Pose Recognition via Occlusion-Preserving Abstract Images
Saad Manzur*, Wayne B Hayes*
[pdf ]
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
Kai Jiang*, Jiaxing Huang, Weiying Xie, Jie Lei, Yunsong Li, Ling Shao, Shijian Lu
[pdf ]
SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
Yuanzhi Zhu*, Xingchao Liu, Qiang Liu*
[pdf ]
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, Shenlong Wang*
[pdf ]
Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
Chao Wang*, Zhedong Zheng, Ruijie Quan, Yi Yang
[pdf ]
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
Jeongsol Kim, Geon Yeong Park, Jong Chul Ye*
[pdf ]
Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
Zhilin Zhu*, Xiaopeng Hong*, Zhiheng Ma, Weijun Zhuang, YaoHui Ma, Yong Dai, Yaowei Wang
[pdf ]
Personalized Privacy Protection Mask Against Unauthorized Facial Recognition
Ka-Ho Chow*, Sihao Hu, Tiansheng Huang, Ling Liu
[pdf ]
PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation
Jaejung Seol, SeoJun Kim, Jaejun Yoo*
[pdf ]
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar*, Sachidanand VS, Sabariswaran Mani, Tejan Karmali, Venkatesh Babu RADHAKRISHNAN
[pdf ]
LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation
Pengwei Yin*, Jingjing Wang, Guanzhong Zeng, Di Xie, Jiang Zhu
[pdf ]
Efficient Training with Denoised Neural Weights
Yifan Gong*, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren
[pdf ]
Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng*, Bryan Hooi*
[pdf ]
Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
Seungwan Jin, Hoyoung Choi, Taehyung Noh, Kyungsik Han*
[pdf ]
Local and Global Flatness for Federated Domain Generalization
Hao Yan, Yuhong Guo*
[pdf ]
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
Rui Yin, Yulun Zhang, Zherong Pan, Jianjun Zhu, Cheng Wang, Biao Jia*
[pdf ]
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang*, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li*
[pdf ]
Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
Shi Liu*, Kecheng Zheng*, Wei Chen*
[pdf ]
Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
Zhuoyi Yang*, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang
[pdf ]
Implicit Neural Models to Extract Heart Rate from Video
Pradyumna Chari*, Anirudh Bindiganavale Harish, Adnan Armouti, Alexander Vilesov, Sanjit Sarda, Laleh Jalilian, Achuta Kadambi
[pdf ]
Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering
Francesco Di Sario*, Riccardo Renzulli, Marco Grangetto, Enzo Tartaglione
[pdf ]
PFGS: High Fidelity Point Cloud Rendering via Feature Splatting
Jiaxu Wang, Zhang Ziyi, Junhao He, Renjing Xu*
[pdf ]
Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
Guan Gui, Bin-Bin Gao*, Jun Liu, Chengjie Wang, Yunsheng Wu
[pdf ]
E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
Peijun Bao*, Zihao Shao, Wenhan Yang, Boon Poh Ng, Alex Kot
[pdf ]
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Linrui Tian*, Qi Wang*, Bang Zhang*, Liefeng Bo*
[pdf ]
LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement
Ye Yu, Fengxin Chen, Jun Yu*, Zhen Kan
[pdf ]
"Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs"
Shuchao Pang*, Ruhao Ma, Bing Li*, Yongbin Zhou, Yazhou Yao
[pdf ]
Efficient Vision Transformers with Partial Attention
Xuan-Thuy Vo*, Duy-Linh Nguyen, Adri Priadana, Kang-Hyun Jo*
[pdf ]
Generalized Coverage for More Robust Low-Budget Active Learning
Wonho Bae, Junhyug Noh, Danica J. Sutherland*
[pdf ]
Rasterized Edge Gradients: Handling Discontinuities Differentially
Stanislav Pidhorskyi*, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih
[pdf ]
Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment
Chong Li*, Xuelin Qian, Yun Wang, Jingyang Huo, Xiangyang Xue*, Yanwei Fu*, Jianfeng Feng
[pdf ]
FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning
Boyu Fan*, Chenrui Wu, Xiang Su, Pan HUI
[pdf ]
LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images
Zonghao Guo, Ruyi Xu, Yuan Yao*, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Gao Huang*
[pdf ]
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang*, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, Shiming Ge*
[pdf ]
ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
Xinhao Li, Yuhan Zhu, Limin Wang*
[pdf ]
Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
Yasar U Alcalar*, Mehmet Akcakaya
[pdf ]
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Changhoon Kim*, Kyle Min*, Yezhou Yang
[pdf ]
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
Hu Zhang, xu jianhua, Tao Tang, Haiyang Sun, Xin Yu*, Zi Helen Huang*, Kaicheng Yu
[pdf ]
Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion
Yu Cao*, Shaogang Gong
[pdf ]
Data Poisoning Quantization Backdoor Attack
Tran Huynh*, Anh Tran, Khoa Doan, Tung Pham
[pdf ]
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang*
[pdf ]
On the Topology Awareness and Generalization Performance of Graph Neural Networks
Junwei Su*, Chuan Wu
[pdf ]
T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
Fan Duan, Jiahao Yu, Li Chen*
[pdf ]
A high-quality robust diffusion framework for corrupted dataset
Quan Dao*, Binh Ta, Tung Pham, Anh Tran
[pdf ]
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar*, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer
[pdf ]
Distilling Knowledge from Large-Scale Image Models for Object Detection
Gang Li*, Wenhai Wang, Xiang Li, Ziheng Li, Jian Yang, Jifeng Dai, Yu Qiao, Shanshan Zhang*
[pdf ]
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
Hu Cao, Zehua Zhang, Yan Xia, Xinyi Li, Jiahao Xia, Guang Chen*, Alois C. Knoll
[pdf ]
TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion
Shi Guo, Yutian Chen, Tianfan Xue, Jinwei Gu, Yongrui Ma*
[pdf ]
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Tim Salzmann, Markus Ryll, Alex Bewley, Matthias Minderer*
[pdf ]
Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM
Jonathan Sauder*, Devis Tuia
[pdf ]
Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets
Qin Lei*, Jiang Zhong, Qizhu Dai
[pdf ]
Retrieval Robust to Object Motion Blur
Rong Zou, Marc Pollefeys, Denys Rozumnyi*
[pdf ]
Unsupervised Representation Learning by Balanced Self Attention Matching
Daniel Shalam*, Simon Korman*
[pdf ]
DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
Peidong Li*, Wancheng Shen, Qihao Huang, Dixiao Cui*
[