2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

22-24 October 2025, Singapore

Tutorials

Tutorial 1: Recent Advances in End-to-End Learned Image and Video Coding

Time: Wednesday, 22 Oct 2025, 08:00-11:30am

Venue: Lotus I

Presenter: Prof. Heming Sun and Prof. Wen-Hsiao Peng

Part I: Overview of Learned Image/Video Coding (by Prof. Peng; 15 mins)

Introduction to end-to-end learned image and video coding
The rate-distortion performance of SOTA learned image/video codecs
Standardization activities on neural image/video coding in JPEG and MPEG

Part II: End-to-End Learned Image Coding (by Prof. Sun; 70 mins)

Elements of end-to-end learned image coding
Review of a few notable tool features (e.g. fast context models)
Network pruning and quantization for learned image codecs
Implicit Neural Representation (INR)-based image coding systems
Real-time implementation of learned image codecs

Coffee Break (20 mins)

Part III: End-to-End Learned Video Coding (by Prof. Peng; 60 mins)

End-to-end learned video coding frameworks: residual coding, conditional coding, and conditional residual coding
Review of some notable systems
The explicit, implicit, and hybrid temporal buffering strategies
The rate-distortion-complexity trade-offs from the perspectives of coding frameworks and buffering strategies
Network quantization for learned video codecs

Part IV: Practical Implementation (30 minutes)

Emerging learned coding techniques for 3D/4D Gaussian Splatting and multi-modal large language models
Open issues and concluding remarks

Tutorial 2: Deep Speaker Modeling: Theories, Applications and Practice

Time: Wednesday, 22 Oct 2025, 08:00-11:30am

Venue: Lotus II

Presenter: Shuai Wang, Yanmin Qian and Haizhou Li

Part I: Foundations and Recent Advances (60 mins)

Foundational theories and review of traditional methods in speaker modeling
Evolution of speaker representation techniques in the deep learning era
- From i-vector to various deep speaker representations
- Applications of self-supervised and semi-supervised learning in speaker modeling
- Analysis of speaker representation capabilities in foundation speech models
- Leveraging pretrained large models

Part II: Applications Beyond Recognition (60 mins)

Speaker-adaptive speech synthesis
- Voice cloning technologies and ethical considerations
- Speaker representation in few-shot and zero-shot speech synthesis
Personalized voice conversion systems
Speaker perception in multimodal human-computer interaction
Target speaker speech processing
- Target speaker extraction
- Target speaker speech recognition
- Target speaker verification
- Personalized VAD

Part III: Challenges and Countermeasures (30 mins)

Domain adaptation and domain-invariant learning
Privacy-preserving speaker representations
Robustness and adversarial attack defense
Computational efficiency and model compression
Explainability techniques and methods

Part IV: Practical Implementation (30 mins)

Introduction to tools and frameworks
- Wespeaker toolkit for speaker embedding learning
- Wesep toolkit for target speech extraction
Case studies and demonstrations
Interactive discussion and Q&A session

Tutorial 3: From Detection to Direction: An Overview of Sound Event Localization and Detection

Time: Wednesday, 22 Oct 2025, 08:00-11:30am

Venue: Hibiscus III

Presenter: Jun Wei Yeow and Ee-Leng TAN

Part I: Overview of Sound Event Localization and Detection (SELD) (30 mins)

Introduction to SELD and its applications
History of SELD and its component tasks (Sound Event Detection and Sound Source Localization)
Recent advances and challenges in SELD
Publicly available SELD datasets

Part II: Core Technical Components of SELD (60 mins)

Spatial audio formats used for SELD, including First Order Ambisonics, microphone array signals, and binaural recordings
Contemporary feature extraction techniques that capture spatiotemporal cues needed for robust event detection and localization
Deep learning architectures designed for SELD, including convolutional recurrent networks (CRNNs), transformer-based models, and multi-branch or multi-task setups
Training strategies, such as multi-task learning (joint DOA and event classification), data augmentation for spatial audio, and domain adaptation techniques
Benchmark datasets and metrics, including a deep dive into the DCASE Challenge series as well as evaluation criteria such as localization errors, detection accuracies, and combined SELD scores

Coffee Break (30 mins)

Part III: Advanced and Emerging Topics (60 mins)

Semi-supervised and weakly labelled learning approaches
Robustness to reverberation, overlapping events, and unseen acoustic scenes
Multi-modal SELD systems that integrate complementary modalities, such as video recordings or motion sensors
Complementary performance using acoustic scene classification (ASC)

Coffee Break (30 mins)

Part IV: Real-Time Implementation of SELD (40 mins)

Real-time constraints and considerations
Lightweight models suitable for real-time and edge applications
Discussion and Q&A session

Tutorial 4: Adaptive Sensor Networks in Digital Health

Time: Wednesday, 22 Oct 2025, 08:00-11:30am

Venue: Peony I

Presenter: Prof. Saeid Sanei

This research shows the importance of distributed networks and cooperation, borrowed from multi-agent communication systems domain, in modelling industrial, biological, and diagnostic systems. In many patient monitoring systems such as multichannel EEG, electromyography (EMG), and electrocardiography (ECG) as well as industrial sensors such as smart meter networks, the sensor data can be aggregated in an adaptive manner. On the other hand, adaptive cooperative networks are used to model single- or multi-task systems which are available where the agents have multiple targets. In industry, the use of smart meter networks in a household area and the information transfer between the smart meters can highly reduce the peak energy supply. On the clinical side, an adaptive network can be devised to use multichannel EEG to translate the brain function into body movement or model the link between two brains in a multi-subject (a.k.a hyperscanning) scenario. It can be verified that distributed array processing (beamforming) can improve the system quality for localization of brain responses to deep brain single-pulse electrical stimulation (SPES), applicable to drug-resistant epileptic seizure diagnosis. Also, a cooperative particle filtering approach can significantly enhance identification and tracking of event-related potentials (ERPs) to monitor brain degenerative diseases, fatigue, or cognition deterioration. The tutorial will be in three hours with approximately 30 min tea/coffee break in between.

Part I: Outline of the Tutorial and the Material to Be Presented (by Prof. Saeid Sanei; ~3 hrs including tea/coffee break)

From adaptive filters to adaptive cooperative networks
Distributed sensor networks, definitions, examples, and applications
Adaptive cooperative network topologies
Single- and multi-task networks, optimizations, and applications
Estimation of network connectivity (information transfer) for accurate setting of the combination weights
Body sensor networks and their clinical applications
Cooperative systems in brain computer interfacing (BCI)
Distributed beamforming for seizure source localization of interictal epileptiform discharges and delayed/late brain responses to deep brain electrical stimulation
Distributed particle filtering for tracking the brain event related potentials
Distributed systems for crowd monitoring
Wider applications of cooperative systems (biological modelling, network security, and energy distribution)
Concluding remarks

2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Tutorials

Tutorial 1: Recent Advances in End-to-End Learned Image and Video Coding

Tutorial 2: Deep Speaker Modeling: Theories, Applications and Practice

Tutorial 3: From Detection to Direction: An Overview of Sound Event Localization and Detection

Tutorial 4: Adaptive Sensor Networks in Digital Health

Menu

Technical Sponsor

Supporting Organizations