Perceptual Quality Optimization of Image Super-Resolution

Perceptual Quality Optimization of Image Super-Resolution

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026)

4 – 8 May, 2026

Barcelona, Spain

[PDF]

Wei Zhou, Yixiao Li, Hadi Amirpour, Xiaoshuai Hao, Jiang Liu, Peng Wang, Hantao Liu

Abstract: Single image super-resolution (SR) has achieved remarkable progress with deep learning, yet most approaches rely on distortion-oriented losses or heuristic perceptual priors, which often lead to a trade-off between fidelity and visual quality. To address this issue, we propose an \textbf{Efficient Perceptual Bi-directional Attention Network (Efficient-PBAN)} that explicitly optimizes SR towards human-preferred quality. The proposed framework is trained on a newly constructed SR quality dataset that covers a wide range of state-of-the-art SR methods with corresponding human opinion scores. Using this dataset, Efficient-PBAN learns to predict perceptual quality in a way that correlates strongly with subjective judgments. The learned metric is further integrated into SR training as a differentiable perceptual loss, enabling closed-loop alignment between reconstruction and perceptual assessment. Extensive experiments demonstrate that our approach delivers superior perceptual quality.

 

Posted in ATHENA | Comments Off on Perceptual Quality Optimization of Image Super-Resolution

ProgressIQA: Progressive Curriculum and Ensemble Self-Training for Filter-Altered Image Quality Assessment

ProgressIQA: Progressive Curriculum and Ensemble Self-Training for Filter-Altered Image Quality Assessment

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026)

4 – 8 May, 2026

Barcelona, Spain

[PDF]

MohammadAli Hamidi, Hadi Amirpour, Christian Timmerer, Luigi Atzori

Abstract: Filter-altered images are increasingly prevalent in online visual communication, particularly on social media platforms. Assessing the relevant perceived quality is essential for effectively managing visual communication. However, the perceived quality is content-dependent and non-monotonic, posing challenges for distortion-centric Image Quality Assessment (IQA) models. The Image Manipulation Quality Assessment (IMQA) benchmark addressed this gap with a dual-stream baseline that fuses filter-aware and quality-aware encoders via an MS-CAM attention module. However, only eight of the ten dataset folds are publicly released, making the task more data-constrained than the original 10-fold protocol. To overcome this limitation, we propose ProgressIQA, a data-efficient framework that integrates ensemble self-training, label distribution stratification, and multi-stage progressive curriculum learning. Fold-specific models are ensembled to generate stable teacher predictions, which are used as pseudo-labels for external filter-augmented images. These pseudo-labels are then balanced through stratified sampling and combined with the original data in a progressive curriculum that transfers knowledge from coarse to fine resolution across stages. Under the restricted 8-fold protocol, ProgressIQA achieves PLCC 0.7082 / SROCC 0.7107, outperforming the IMQA baseline (0.5616 / 0.5486) and even surpassing the original 10-fold evaluation in SROCC (0.7253 / 0.6870).

 

Posted in ATHENA | Comments Off on ProgressIQA: Progressive Curriculum and Ensemble Self-Training for Filter-Altered Image Quality Assessment

BiNR: Live Video Broadcasting Quality Assessment

BiNR: Live Video Broadcasting Quality Assessment

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026)

4 – 8 May, 2026

Barcelona, Spain

[PDF]

Hadi Amirpour, MohammadAli Hamidi, Wei Zhou, Luigi Atzori, Christian Timmerer

Abstract: Live video broadcasting has become widely accessible through popular platforms such as Instagram, Facebook, and YouTube, enabling real-time content sharing and user interaction. While the Quality of Experience (QoE) has been extensively studied for Video-on-Demand (VoD) services, the QoE of live broadcast videos remains relatively underexplored. In this paper, we address this gap by proposing a novel machine learning–based model for QoE prediction in live video broadcasting scenarios. Our approach, BiNR, introduces two models: BiNR_fast, which uses only bitstream features for ultra-fast QoE predictions, and the full model BiNR_full, which integrates bitstream features with a pixel-based no-reference (NR) quality metric that works on the decoded signal.
We evaluate multiple regression models to predict subjective QoE scores and further conduct feature importance analysis. Experimental results show that our full model achieves a Pearson Correlation Coefficient (PCC)/Spearman Rank Correlation Coefficient (SRCC) of 0.92/0.92 with subjective scores, significantly outperforming the state-of-the-art methods.

 

Posted in ATHENA | Comments Off on BiNR: Live Video Broadcasting Quality Assessment

Dual-guided Generative Frame Interpolation

Dual-guided Generative Frame Interpolation

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026)

4 – 8 May, 2026

Barcelona, Spain

[PDF]

Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) and Christian Timmerer (AAU, Austria)

Abstract: Video frame interpolation (VFI) aims to generate intermediate frames between given keyframes to enhance temporal resolution and visual smoothness. While conventional optical flow–based methods and recent generative approaches achieve promising results, they often struggle with large displacements, failing to maintain temporal coherence and semantic consistency. In this work, we propose dual-guided generative frame interpolation (DGFI), a framework that integrates semantic guidance from vision-language models and flow guidance into a pre-trained diffusion-based image-to-video (I2V) generator. Specifically, DGFI extracts textual descriptions and injects multimodal embeddings to capture high-level semantics, while estimated motion guidance provides smooth transitions. Experiments on public datasets demonstrate the effectiveness of our dual-guided method over the state-of-the-art approaches.

Posted in ATHENA | Comments Off on Dual-guided Generative Frame Interpolation

IEEE ICME Workshop on “Physical Principles for Reliable 3D Modelling in Multimedia (P3DMM)”

The IEEE ICME Workshop on

Physical Principles for Reliable 3D Modelling in Multimedia (P3DMM)

July 5 to July 9, 2026, Bangkok, Thailand

CFP

Reliable 3D modelling is a foundational capability for many multimedia applications, yet achieving metrically accurate and physically meaningful 3D representations in real-world environments remains challenging. Variations in illumination, material properties, motion, sensor configurations, and environmental conditions often undermine the robustness and interpretability of purely data-driven approaches. This workshop focuses on advancing physically informed and physically interpretable 3D modelling, learning, and perception methods that explicitly incorporate physical principles to improve reliability, consistency, and trustworthiness across multimedia scenarios. This workshop aims to provide a unified forum for discussing how such physical principles can be systematically embedded into modern learning frameworks, including neural fields, radiance models, and multimodal foundation models.

In addition to physics-guided methods, the workshop welcomes contributions that integrate physical priors with diverse multimedia and sensing modalities, such as RGB-D, multi-view and video data, IMU and robotic kinematics, force and tactile sensing, acoustic measurements, and spectral or hyperspectral imaging. Particular emphasis is placed on methods that enhance physical consistency, interpretability, and measurement fidelity, enabling reliable 3D modelling for applications such as digital twins, intelligent manufacturing, robotics, biomedical imaging, and computational multimedia systems.

Call for Papers

We invite original submissions that address challenges and advances across the full spectrum of Physical Principles 3D Modelling. Topics of interest include, but are not limited to:

  • Learning-based 3D modelling with physical principles
  • Physics-coherent neural fields and radiance models
  • Shape, lighting and material decomposition with physical consistency
  • Modelling contact, collision and rigid/deformable body behaviour
  • Data-driven methods enriched by physical cues
  • Reliable 3D modelling in complex multimedia environments
  • Physical cues for digital twins and manufacturing
  • Multimedia applications requiring physically interpretable 3D models
  • Datasets, metrics and evaluations for physics-informed 3D modelling

Submission Guidance: Submit via CMT

Download CFP (PDF): Click here to download

Important Registration Note: All accepted papers need to be covered by a full registration.

 

Organizers

Posted in ATHENA | Comments Off on IEEE ICME Workshop on “Physical Principles for Reliable 3D Modelling in Multimedia (P3DMM)”

ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

            The 17th ACM Multimedia System Conference (MMSys’26)

Hong Kong SAR

4th – 8th April 2026

Zoha Azimi, Reza Farahani, Radu Prodan, Christian Timmerer

Abstract: Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content generation, analysis, and interaction. However, cloud-based LLM deployments face three key limitations: high computational and energy demands, privacy and reliability risks from remote processing, and recurring API costs. Recent advances in agentic AI, especially in structured reasoning and tool use, offer a better way to exploit open and locally deployed tools and LLM models. This paper presents ELLMPEG, an
edge-enabled agentic LLM framework for the automated generation of video-processing commands. ELLMPEG integrates tool-aware Retrieval-Augmented Generation (RAG) with iterative self-reflection to produce and locally verify executable FFmpeg and VVenC com-
mands directly at the edge, eliminating reliance on external cloud APIs. To evaluate ELLMPEG, we collect a dedicated prompt dataset comprising 480 diverse queries covering different categories of FFmpeg and the Versatile Video Codec (VVC) encoder (VVenC) com-
mands. We validate command generation accuracy and evaluate four open-source LLMs based on command validity, tokens generated per second, inference time, and energy efficiency. We also execute the generated commands to assess their runtime correctness and practical applicability. Experimental results show that Qwen2.5, when augmented with the ELLMPEG framework, achieves an average command-generation accuracy of 78 % with zero recurring API cost, outperforming all other open-source models across both the FFmpeg and VVenC datasets.

Posted in ATHENA | Comments Off on ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

Residual U-Network: 3D Point Cloud-Based Automotive Pressure Field Prediction Model

Residual U-Network: 3D Point Cloud-Based Automotive Pressure Field Prediction Model

18th International Congress on Image and Signal Processing, BioMedical Engineering, and Informatics (CISP-BMEI 2025)
October 25 – 27, 2025
Qingdao, China
http://www.cisp-bmei.cn/

[PDF]

Hezhi Li, Hongyou Chen, Lingfeng Qu, Baodan Tian, Yong Fan, Hadi Amirpour, and Christian Timmerer

Abstract: Automotive surface pressure field prediction is important for design optimization and performance evaluation of vehicle aerodynamics, fuel efficiency, and automotive safety. Although traditional computational fluid dynamics methods are accurate, they incur high computational costs and are time-consuming. Most existing deep learning methods show limitations in learning pressure variation features near complex geometric shapes of automotive exteriors. To address these issues, this paper proposes a deep learning method based on a hybrid architecture combining Residual Network (ResNet) and U-Network (UNet). The method processes 3D point cloud representations of automotive geometries by converting them into structured grid formats with signed distance function values for efficient neural network processing. The method improves the model’s predictive capability for complex geometric regions by integrating the Convolutional Block Attention Module (CBAM) attention mechanism. In the model, the Residual Convolutional Block Attention Module (ResCBAM) combines residual connections with channel and spatial attention mechanisms to improve perception of key pressure field features. The Decoder Convolutional Block Attention Module (DeCBAM) fuses multi-scale feature information in the decoder pathway, recovering neural network feature details. The feature fusion module integrates global flow field distribution features extracted by the encoder with local geometric detail features reconstructed by the decoder. Additionally, an automated hyperparameter optimization strategy is employed to improve the model’s prediction accuracy and generalization capability. To validate model performance, experiments are conducted on three automotive surface pressure datasets. Experimental results demonstrate that the proposed model achieves better prediction accuracy and generalization capability.

Posted in ATHENA | Comments Off on Residual U-Network: 3D Point Cloud-Based Automotive Pressure Field Prediction Model