{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T02:24:29Z","timestamp":1758248669651,"version":"3.44.0"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"5","funder":[{"name":"Science Foundation Ireland Centre for Research Training in Digitally-Enhanced Reality","award":["18\/CRT\/6224"],"award-info":[{"award-number":["18\/CRT\/6224"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2025,10,31]]},"abstract":"<jats:p>\n            In this article, we propose Cross-Modal Association Models (C-MAMs), a novel approach for handling missing modalities during inference in multimodal learning. Unlike existing methods that modify the training process, C-MAMs generate missing modality features\n            <jats:italic toggle=\"yes\">post-training<\/jats:italic>\n            , preserving the integrity of the original multimodal model. In this article, we: (i) formalise the problem of missing modality inference and its challenges, (ii) introduce C-MAMs as a flexible, lightweight, post-hoc solution for reconstructing missing modality embeddings, (iii) evaluate their effectiveness across diverse datasets, tasks and baseline models, and (iv) analyse the quality of the generated versus the ground-truth features to quantify the reconstruction fidelity. Experimental results show that C-MAMs\n            <jats:italic toggle=\"yes\">significantly mitigate performance degradation<\/jats:italic>\n            due to missing modalities, in some cases fully restoring baseline performance, even when\n            <jats:italic toggle=\"yes\">trained on 10%<\/jats:italic>\n            of the data. We conclude that post-training feature reconstruction is an effective, targeted alternative to existing methods, with broad applicability in multimodal systems.\n          <\/jats:p>","DOI":"10.1145\/3746456","type":"journal-article","created":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T09:02:18Z","timestamp":1751014938000},"page":"1-48","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning to Associate: Multimodal Inference with Fully\u00a0Missing Modalities"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5544-9484","authenticated-orcid":false,"given":"Jack","family":"Geraghty","sequence":"first","affiliation":[{"name":"Computer Science, University College Dublin, Dublin, Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9636-2556","authenticated-orcid":false,"given":"Andrew","family":"Hines","sequence":"additional","affiliation":[{"name":"Computer Science, University College Dublin, Dublin, Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3712-6550","authenticated-orcid":false,"given":"Fatemeh","family":"Golpayegani","sequence":"additional","affiliation":[{"name":"Computer Science, University College Dublin, Dublin, Ireland"}]}],"member":"320","published-online":{"date-parts":[[2025,9,18]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643855"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.73"},{"key":"e_1_3_2_4_2","unstructured":"John Arevalo Thamar Solorio Manuel Montes-y-G\u00f3mez and Fabio A. Gonz\u00e1lez. 2017. Gated multimodal units for information fusion. arXiv:1702.01992. Retrieved from https:\/\/arxiv.org\/abs\/1702.01992"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1208"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3650040"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-008-9076-6"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2016.2515617"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219963"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403182"},{"key":"e_1_3_2_11_2","first-page":"46390","volume-title":"Advances in Neural Information Processing Systems","author":"Chen Minshuo","year":"2023","unstructured":"Minshuo Chen, Yu Bai, H. Vincent Poor, and Mengdi Wang. 2023. Efficient RL with impaired observability: Learning to act with delayed and missing state observations. In Advances in Neural Information Processing Systems. A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36, Curran Associates, Inc., 46390\u201346418. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/9156b0f6dfa9bbd18c79cc459ef5d61c-Paper-Conference.pdf"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3606368"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3611696"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3118287"},{"key":"e_1_3_2_15_2","unstructured":"Fei Ma Xiangxiang Xu Shao-Lun Huang and Lin Zhang. 2021. Maximum likelihood estimation for multimodal learning with missing modality. arXiv:2108.10513. Retrieved from https:\/\/arxiv.org\/abs\/2108.10513"},{"key":"e_1_3_2_16_2","first-page":"1","article-title":"Unsupervised multimodal anomaly detection with missing sources for liquid rocket engine","author":"Feng Yong","year":"2022","unstructured":"Yong Feng, Zijun Liu, Jinglong Chen, Haixin Lv, Jun Wang, and Xinwei Zhang. 2022. Unsupervised multimodal anomaly detection with missing sources for liquid rocket engine. IEEE Transactions on Neural Networks and Learning Systems 34 (2022), 1\u201315.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_3_2_18_2","volume-title":"Modelling and Representing Context (MRC), European Conference on Artificial Intelligence (ECAI)","author":"Geraghty Jack","year":"2023","unstructured":"Jack Geraghty, Andrew Hines, and Fatemeh Golpayegani. 2023. Understanding the relevancy of modality information in multimodal machine learning. In Modelling and Representing Context (MRC), European Conference on Artificial Intelligence (ECAI)."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40860-022-00198-x"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-013-0458-4"},{"key":"e_1_3_2_21_2","first-page":"1319","volume-title":"Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research","author":"Goodfellow Ian","year":"2013","unstructured":"Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research. PMLR, Atlanta, Georgia, 1319\u20131327."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cortex.2022.11.005"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Devamanyu Hazarika Yingting Li Bo Cheng Shuai Zhao Roger Zimmermann and Soujanya Poria. 2022. Analyzing Modality Robustness in Multimodal Sentiment Analysis. arxiv:1512.03385. Retrieved from http:\/\/arxiv.org\/abs\/1512.03385","DOI":"10.18653\/v1\/2022.naacl-main.50"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3645099"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472713"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2017.8273601"},{"key":"e_1_3_2_28_2","unstructured":"Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev et al. 2017. The kinetics human action video dataset. arXiv:1705.06950. Retrieved from https:\/\/arxiv.org\/abs\/1705.06950"},{"key":"e_1_3_2_29_2","first-page":"594","volume-title":"Intelligent Data Engineering and Automated Learning (IDEAL \u201918)","author":"Masood Khan Nadia","year":"2018","unstructured":"Nadia Masood Khan and Gul Muhammad Khan. 2018. Signal reconstruction using evolvable recurrent neural networks. In Intelligent Data Engineering and Automated Learning (IDEAL \u201918). Springer International Publishing, Cham, 594\u2013602."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2997255"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413579"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2020.2971713"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2017.2732287"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16330"},{"key":"e_1_3_2_35_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV) Workshops","author":"Matsuura Toshihiko","year":"2018","unstructured":"Toshihiko Matsuura, Kuniaki Saito, and Yoshitaka Ushiku. 2018. Generalized Bayesian canonical correlation analysis with missing modalities. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.3758\/CABN.4.2.133"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1533-6077.2008.00150.x"},{"key":"e_1_3_2_38_2","first-page":"400","volume-title":"Companion Publication of the 2020 International Conference on Multimodal Interaction (ICMI \u201920 Companion)","author":"Parthasarathy Srinivas","year":"2021","unstructured":"Srinivas Parthasarathy and Shiva Sundaram. 2021. Training strategies to handle missing modalities for audio-visual expression recognition. In Companion Publication of the 2020 International Conference on Multimodal Interaction (ICMI \u201920 Companion). ACM, New York, NY, 400\u2013404."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00806"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJCS.2022.3206407"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/j.neunet.2023.03.003","article-title":"COM: Contrastive masked-attention model for incomplete multimodal learning","author":"Shuwei Qian","year":"2023","unstructured":"Qian Shuwei and Wang Chongjun. 2023. COM: Contrastive masked-attention model for incomplete multimodal learning. Neural Networks 162 (2023), 443\u2013455.","journal-title":"Neural Networks"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.3758\/s13414-010-0073-7"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i13.29440"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2023.3274829"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.528"},{"key":"e_1_3_2_46_2","doi-asserted-by":"crossref","unstructured":"Valentin Vielzeuf Alexis Lechervy St\u00e9phane Pateux and Fr\u00e9d\u00e9ric Jurie. 2018. CentralNet: A multilayer approach for multimodal fusion. arXiv:1808.07275. Retrieved from https:\/\/arxiv.org\/abs\/1808.07275","DOI":"10.1007\/978-3-030-11024-6_44"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403234"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01271"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579826"},{"key":"e_1_3_2_50_2","first-page":"5527","article-title":"Perception-aware cross-modal signal reconstruction: From audio-haptic to visual","author":"Wei Xin","year":"2022","unstructured":"Xin Wei, Yuyuan Yao, Haoyu Wang, and Liang Zhou. 2022. Perception-aware cross-modal signal reconstruction: From audio-haptic to visual. IEEE Transactions on Multimedia 25 (2022), 5527\u20135538.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3194309"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999173"},{"key":"e_1_3_2_53_2","unstructured":"Haiyang Xu Hui Zhang Kun Han Yun Wang Yiping Peng and Xiangang Li. 2019. Learning alignment for multimodal emotion recognition from speech. arXiv:1909.05645. Retrieved from https:\/\/arxiv.org\/abs\/1909.05645"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-30675-4_19"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.5555\/3635637.3663065"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583076"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2012.03.059"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475585"},{"key":"e_1_3_2_59_2","unstructured":"Amir Zadeh Rowan Zellers Eli Pincus and Louis-Philippe Morency. 2016. MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv:1606.06259. Retrieved from https:\/\/arxiv.org\/abs\/1606.06259"},{"issue":"5","key":"e_1_3_2_60_2","first-page":"2402","article-title":"Deep partial Multi-View learning","volume":"44","author":"Zhang Changqing","year":"2022","unstructured":"Changqing Zhang, Yajie Cui, Zongbo Han, Joey Tianyi Zhou, Huazhu Fu, and Qinghua Hu. 2022. Deep partial Multi-View learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 5 (May 2022), 2402\u20132415.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2791607"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.203"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3746456","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,18]],"date-time":"2025-09-18T17:01:57Z","timestamp":1758214917000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3746456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,18]]},"references-count":62,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,10,31]]}},"alternative-id":["10.1145\/3746456"],"URL":"https:\/\/doi.org\/10.1145\/3746456","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2025,9,18]]},"assertion":[{"value":"2024-09-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}