{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T18:09:40Z","timestamp":1763748580171,"version":"3.41.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2022,1,25]],"date-time":"2022-01-25T00:00:00Z","timestamp":1643068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National cultural and tourism science and technology innovation project of China","award":["2021-97"],"award-info":[{"award-number":["2021-97"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,2,28]]},"abstract":"<jats:p>Hierarchical structure is a common characteristic for some kinds of videos (e.g., sports videos, game videos): The videos are composed of several actions hierarchically and there exist temporal dependencies among segments with different scales, where action labels can be enumerated. Our ideas are based on two observations: First, the actions are the fundamental units for people to understand these videos. Second, the humans summarize a video by iteratively observing and refining, i.e., observing segments in video and hierarchically refining the boundaries of important actions. Based on the above insights, we generate action proposals to construct the structure of the video and formulate the summarization process as a hierarchical refining process. We also train a hierarchical summarization network with deep Q-learning (HQSN) to achieve the refining process and explore temporal dependency. Besides, we collect a new dataset that consists of structured game videos with fine-grain actions and importance annotations. The experimental results demonstrate the effectiveness of the proposed method.<\/jats:p>","DOI":"10.1145\/3485472","type":"journal-article","created":{"date-parts":[[2022,1,25]],"date-time":"2022-01-25T15:06:00Z","timestamp":1643123160000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["From Coarse to Fine: Hierarchical Structure-aware Video Summarization"],"prefix":"10.1145","volume":"18","author":[{"given":"Wenxu","family":"Li","sequence":"first","affiliation":[{"name":"Tianjin University, China and Imperial College London, London, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2155-4689","authenticated-orcid":false,"given":"Gang","family":"Pan","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, P.R. China"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tianjin University, Tianjin, P.R. China"}]},{"given":"Zhen","family":"Xing","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, P.R. China"}]},{"given":"Zhenjun","family":"Han","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, P.R. China"}]}],"member":"320","published-online":{"date-parts":[[2022,1,25]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Laith Abualigah Mohammad Qassem Bashabsheh Hamzeh Alabool and Mohammad Shehab. 2020. Text summarization: A brief review. Recent Advances in NLP: The Case of Arabic Language Mohamed Abd Elaziz Mohammed A. A. Al-qaness Ahmed A. Ewees and Abdelghani Dahou (Eds.). Springer International Publishing 1\u201315. DOI:10.1007\/978-3-030-34614-0_1","DOI":"10.1007\/978-3-030-34614-0_1"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964286"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.675"},{"issue":"164","key":"e_1_3_2_5_2","first-page":"3","article-title":"Hierarchical object detection with deep reinforcement learning","volume":"31","author":"Bueno M\u00edriam Bellver","year":"2017","unstructured":"M\u00edriam Bellver Bueno, Xavier Gir\u00f3-i Nieto, Ferran Marqu\u00e9s, and Jordi Torres. 2017. Hierarchical object detection with deep reinforcement learning. Deep Learn. Image Process. Applic. 31, 164 (2017), 3.","journal-title":"Deep Learn. Image Process. Applic."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.211"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298981"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2870832"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46487-9_47"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_5"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.392"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969058"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10584-0_33"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298928"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504428"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045167"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00195"},{"key":"e_1_3_2_20_2","first-page":"5","volume-title":"Proceedings of the European Conference on Computer Vision THUMOS Workshop","author":"Karaman Svebor","year":"2014","unstructured":"Svebor Karaman, Lorenzo Seidenari, and Alberto Del Bimbo. 2014. Fast saliency based pooling of Fisher encoded dense trajectories. In Proceedings of the European Conference on Computer Vision THUMOS Workshop. Springer, Cham, 5."},{"key":"e_1_3_2_21_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations. arXiv.org, Ithaca, NY, 1\u201315."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/2481023"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00192"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2852750"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2822907"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_1"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.316"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2876046"},{"key":"e_1_3_2_29_2","first-page":"1","volume-title":"Proceedings of the Neural Information Processing Systems Deep Learning Workshop","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. In Proceedings of the Neural Information Processing Systems Deep Learning Workshop. The MIT Press, Cambridge, MA, 1\u20139."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00778"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.395"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00193"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_35"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.128"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3235765.3235781"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00809"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00194"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.119"},{"key":"e_1_3_2_41_2","first-page":"5179","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Song Yale","year":"2015","unstructured":"Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. TVSum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, New York, NY, 5179\u20135187."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2751969"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.5555\/3016100.3016191"},{"key":"e_1_3_2_44_2","first-page":"1","volume-title":"Proceedings of the European Conference on Computer Vision THUMOS Workshop","author":"Wang Limin","year":"2014","unstructured":"Limin Wang, Yu Qiao, and Xiaoou Tang. 2014. Action recognition and detection by combining motion and appearance features. In Proceedings of the European Conference on Computer Vision THUMOS Workshop. Springer, Cham, 1\u20136."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00443"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045601"},{"key":"e_1_3_2_47_2","first-page":"1","volume-title":"Proceedings of the International Conference on Pattern Recognition FGVRID Workshop","author":"Wenxu Li","year":"2020","unstructured":"Li Wenxu, Pan Gang, Wang Chen, Xing Zhen, Zhou Xiaozhou, Dong Xiaoxuan, and Zhang Jiawan. 2020. From coarse to fine: Hierarchical structure-aware video summarization. In Proceedings of the International Conference on Pattern Recognition FGVRID Workshop. IEEE, New York, NY, 1\u201313."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2190924"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.293"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.337"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.148"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.120"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_47"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2601493"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00773"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.322"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.317"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.5555\/3504035.3504964"},{"key":"e_1_3_2_59_2","first-page":"1","volume-title":"Proceedings of the British Machine Vision Conference","author":"Zhou Kaiyang","year":"2018","unstructured":"Kaiyang Zhou, Tao Xiang, and Andrea Cavallaro. 2018. Video summarisation by classification with deep reinforcement learning. In Proceedings of the British Machine Vision Conference. Springer, Cham, 1\u201313."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485472","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3485472","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:15Z","timestamp":1750188615000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485472"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,25]]},"references-count":58,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2022,2,28]]}},"alternative-id":["10.1145\/3485472"],"URL":"https:\/\/doi.org\/10.1145\/3485472","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2022,1,25]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}