2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)


22-24 October 2025, Singapore



Technical Program


Day 1 | Day 2 | Day 3

Day 1: Wednesday, 22 Oct 2025 - Overview

08:00-10:00

D1-0800_IB Exhibition Setup

Location: Island Ballroom

08:00-11:30

D1-0800_L1 Tutorial 1: Recent Advances in End-to-End Learned Image and Video Coding

Location: Lotus I

08:00-11:30

D1-0800_L2 Tutorial 2: Deep Speaker Modeling: Theories, Applications and Practice

Location: Lotus II

08:00-11:30

D1-0800_H3 Tutorial 3: From Detection to Direction: An Overview of Sound Event Localization and Detection

Location: Hibiscus III

08:00-11:30

D1-0800_P1 Tutorial 4: Adaptive Sensor Networks in Digital Health

Location: Peony I

10:00-11:30

D1-1000_IB Registration

Location: Island Ballroom

11:30-13:30

D1-1130_IB Welcome Reception & Opening Ceremony

Location: Island Ballroom

13:30-14:30

D1-1330_IB Keynote 1 by Emanuël Habets

Location: Island Ballroom

14:30-16:00

D1-1430_IB Perspective 1: Voice Privacy & Security

Location: Island Ballroom

14:30-16:00

D1-1430_L1 Advanced Signal Processing and Machine Learning for Acoustic Scene Analysis and Signal Enhancement

Location: Lotus I

14:30-16:00

D1-1430_L2 Interactive, Natural, Expressive and Robust Conversation System

Location: Lotus II

14:30-16:00

D1-1430_H3 Generative AI Models for Vision-Based Applications

Location: Hibiscus III

14:30-16:00

D1-1430_P1 Multimedia Security & Forensics

Location: Peony I

14:30-16:00

D1-1430_P2 Wireless Communications & Networking

Location: Peony II

16:00-16:30

Break

16:30-18:00

D1-1630_IB Perspective 2: Utilization of the Foundation Models and the Future

Location: Island Ballroom

16:30-18:00

D1-1630_L1 Tracing the Fake: Deepfake Detection, Attribution & Spoof-aware Speaker Verification Across Languages & Accents

Location: Lotus I

16:30-18:00

D1-1630_L2 Speaker Modeling Beyond Speaker Recognition

Location: Lotus II

16:30-18:00

D1-1630_H3 Three-Minute Thesis (3MT) Competition

Location: Hibiscus III

16:30-18:00

D1-1630_P2 Advanced Multimedia Applications

Location: Peony II



Day 1: Wednesday, 22 Oct 2025 - With Papers

08:00-10:00

D1-0800_IB Exhibition Setup

Location: Island Ballroom

08:00-11:30

D1-0800_L1 Tutorial 1: Recent Advances in End-to-End Learned Image and Video Coding

Location: Lotus I

08:00-11:30

D1-0800_L2 Tutorial 2: Deep Speaker Modeling: Theories, Applications and Practice

Location: Lotus II

08:00-11:30

D1-0800_H3 Tutorial 3: From Detection to Direction: An Overview of Sound Event Localization and Detection

Location: Hibiscus III

08:00-11:30

D1-0800_P1 Tutorial 4: Adaptive Sensor Networks in Digital Health

Location: Peony I

10:00-11:30

D1-1000_IB Registration

Location: Island Ballroom

11:30-13:30

D1-1130_IB Welcome Reception & Opening Ceremony

Location: Island Ballroom

13:30-14:30

D1-1330_IB Keynote 1 by Emanuël Habets

Location: Island Ballroom

14:30-16:00

D1-1430_IB Perspective 1: Voice Privacy & Security

Location: Island Ballroom

D1-1430_IB.1 645 Speaker Privacy and Security in the Big Data Era: Protection and Defense against Deepfake

Liping Chen, Kong Aik Lee, Zhen-Hua Ling, Xin Wang, Rohan Kumar Das, Tomoki Toda, Haizhou Li

14:30-16:00

D1-1430_L1 Advanced Signal Processing and Machine Learning for Acoustic Scene Analysis and Signal Enhancement

Location: Lotus I

D1-1430_L1.1 51 MVDR beamforming for underdetermined sound source separation using iterative PSD estimation in beamspace

Jin Xuan Teh, Yusuke Hioka

D1-1430_L1.2 184 Exploring Dual-Mode Training for Real-time Target Speaker Extraction

Li Li, Shogo Seki

D1-1430_L1.3 186 Switching Constant Separating Vector for Moving Source Extraction with Geometric Constraints

Changda Chen, Yichen Yang, Yuehao Zhao, Shoji Makino, Jingdong Chen

D1-1430_L1.4 201 Neural Network-Assisted Joint DOA Estimation and Beamforming with First-Order Reflection Modeling

Yichen Yang, Chao Pan, Qiang Gao, Jacob Benesty, Shoji Makino, Jingdong Chen

D1-1430_L1.5 234 Speaker Localization in Classroom Environments Using GCC-PHAT Features and Mamba State Space Models with Ad-hoc Microphone Arrays

Rashed Iqbal, Christian Ritz, Jack Yang, Sarah Howard

D1-1430_L1.6 293 Joint Separation and Tracking of Moving Sources With Distributed Microphone Arrays Based on Time-Varying Inertial Spatial Models

Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii

D1-1430_L1.7 418 Visually-Informed Multichannel Sound Source Separation Based on 3D Gaussian Primitives

Haruki Asano, Ryunosuke Nihei, Yoshiaki Bando, Aditya Arie Nugraha, Diego Di Carlo, Hiroyuki Ueda, Yosuke Ito, Kazuyoshi Yoshii

D1-1430_L1.8 439 Joint Optimization of Sampling Rate Offsets and Demixing Filters Using Auxiliary Function Method

Hayato Takeuchi, Takao Kawamura, Nobutaka Ono, Shoko Araki

D1-1430_L1.9 442 First Demonstration of Acoustic Scene Classification Based on Trained Sound-to-Light Conversion

Shun Kotsugi, Takao Kawamura, Nobutaka Ono

D1-1430_L1.10 474 Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone Arrays

Kouei Yamaoka, Katsuhiro Morita, Norihiro Takamune, Hiroshi Saruwatari

D1-1430_L1.11 501 Interactive Spatial Audio Rendering on Mobile Devices: A Two-Stage User Interface with Adaptive HRTF Selection and Real-Time Room Acoustics Simulation

Shravan Raghunath, Kanishk AL, Sailesh S, Rishabh Gupta, Saurav Gupta, Ramesh R

D1-1430_L1.12 507 Are Identical Sounds Present in Distributed Recordings to Serve as Spatio-Temporal Anchors? A Case Study Using the SINS Database

Takao Kawamura, Nobutaka Ono

14:30-16:00

D1-1430_L2 Interactive, Natural, Expressive and Robust Conversation System

Location: Lotus II

D1-1430_L2.1 42 Language Adaptation Wake Word Spotting via Latent Space from Pre-trained Speech Models

Shifu Xiong, Hengshun Zhou, Kai Shen, Shi Cheng, Hang Chen, Genshun Wan, Kewei Li, Jun Du, Lirong Dai

D1-1430_L2.2 60 Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Tzu-Quan Lin, Hsi-Chun Cheng, Hung-yi Lee, Hao Tang

D1-1430_L2.3 73 Multi-task Pretraining for Enhancing Interpretable L2 Pronunciation Assessment

Jiun-Ting Li, Bi-Cheng Yan, Yi-Cheng Wang, Berlin Chen

D1-1430_L2.4 78 End-to-End Integration of Speech Emotion Recognition and Voice Activity Detection with a Self-Supervised Model for Noise Robustness

Natsuo Yamashita, Masaaki Yamamoto, Yohei Kawaguchi

D1-1430_L2.5 102 SCSMT: A Multilingual Children's Speech Corpus for Singapore's Mother Tongues

Bowen Zhang, Nur Afiqah Abdul Latiff, Rong Tong, Donny Soh, Ian McLoughlin

D1-1430_L2.6 112 Reducing Orthographic Dependency on Paired Data by Probabilistic Integration via Syllabogram for Japanese Dialogue Speech Recognition

Ryu Takeda, Kazunori Komatani

D1-1430_L2.7 152 Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS

Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Yu Jiang, Yuheng Lu, Chen Zhang, Longbiao Wang, Jianwu Dang

D1-1430_L2.8 199 Constructing an In-the-Wild Spoken Dialogue Dataset Based on YouTube Dialogue Videos

Yuki Sato, Sanae Yamashita, Shinnosuke Takamichi, Ryuichiro Higashinaka

D1-1430_L2.9 217 Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Jianing Yang, Sheng Li, Takahiro Shinozaki, Yuki Saito, Hiroshi Saruwatari

D1-1430_L2.10 321 Conversation Context-aware Direct Preference Optimization for Style-Controlled Speech Synthesis

Atsushi Kojima, Yusuke Fujita, Hao Shi, Tomoya Mizumoto, Mengjie Zhao, Yui Sudo

D1-1430_L2.11 483 A Hybrid Attention Mechanism to Improve Tacotron 2 Performance for Indonesian Text-to-Speech Synthesis

Angela Catherina, Bima Prihasto, Boby Mugi Pratama, Li-Wei Kang, Jia-Ching Wang

14:30-16:00

D1-1430_H3 Generative AI Models for Vision-Based Applications

Location: Hibiscus III

D1-1430_H3.1 91 TRUST: Token‑dRiven Ultrasound Style Transfer for Cross‑Device Adaptation

Nhat-Tuong Do-Tran, Ngoc-Hoang-Lam Le, Ian Chiu, Po-Tsun Paul Kuo, Ching-Chun Huang

D1-1430_H3.2 92 Two-Stage Transformer-based Deep Hyperspectral and Multispectral Image Fusion Network for Hyperspectral Image Super-Resolution

Wo-Yen Li, Chia-Ming Lee, Chih-Chung Hsu, Volodymyr Khylenko, Li-Wei Kang

D1-1430_H3.3 147 Pedestrian Detection based on Visible Guided Occlusion Handling

Lien-Chieh Huang, Ching-Te Chiu, Yung-Cheng Su

D1-1430_H3.4 242 Spatial-Frequency Guided Moiré Removal with Multi-Stage Feature Fusion

Chen Lo, Chia-Hung Yeh

D1-1430_H3.5 251 Registration of Infrared and Visible Images Using Style Transfer-Based Semantic Segmentation

Si-Ting Lin, Chih-Hung Han, Chieh-Ling Lee, Po-Chyi Su, Feng-Tsun Chien, Min-Kuan Chang

D1-1430_H3.6 533 Peransformer: Improving Low-informed Expressive Performance Rendering with Score-aware Discriminator

Xian He, Wei Zeng, Ye Wang

D1-1430_H3.7 536 Prompt-Based Vertebral Segmentation Using a Generative AI Approach in OVCF Spinal Radiographs

Po-Kai Su, Pei-Rong Jiang, Kai-Xuan Xu, Meng-Lei Su, Jiann-Her Lin, Hsin-Han Chiang, Hsiao-Chi Li

D1-1430_H3.8 561 A Dual-Stream Diffusion Model with Physically-Based Rendering for Single Image Reflection Removal

Cheng-Wei Hsu, Ming-Sui Lee

D1-1430_H3.9 615 DYNAMIC FACIAL EXPRESSION RECOGNITION IN THE WILD USING MAMBA-STYLE SELECTIVE SSM AND FACIAL ATTENTION MECHANISM

Yudhistira Arditya Pratama, Theophilus Ezra Nugroho Pandin, Yi-Zeng Hsieh

14:30-16:00

D1-1430_P1 Multimedia Security & Forensics

Location: Peony I

D1-1430_P1.1 104 The Potential of LLMs for Generating Malicious Domain Names

Lim Kit Michael Ye, Kaijian Zheng, Ngai Fong Law, Jianping Li

D1-1430_P1.2 135 Reducing Implicit Class Imbalance in Unlabeled Datasets Using Text-Specified Sensitive Attributes

Kosei Suyama, Kazuaki Nakamura

D1-1430_P1.3 215 DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction

Cheng-Yeh Yang, Kuan-Tang Huang, Chien-Chun Wang, Hung-Shin Lee, Hsin-Min Wang, Berlin Chen

D1-1430_P1.4 348 Multimodal Large Language Model for Deepfake Video Detection and Description

Haoran Sun, Chen Cai, Kong Aik Lee, Lap-Pui Chau, Yi Wang

D1-1430_P1.5 379 Biometric Identification Using Default Mode Network Features Extracted from Eyes-Open Resting-State EEG Data

Parvathy Remesh, Jijomon Chettuthara Moncy, Vinod Achutavarrier Prasad

D1-1430_P1.6 410 Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods

Shota Iwamatsu, Koichi Ito, Takafumi Aoki

D1-1430_P1.7 464 Access Control for Diffusion Models by Random Masking the Covariance of Initial Noise Distribution

Temma Tanaka, Kazuaki Nakamura

14:30-16:00

D1-1430_P2 Wireless Communications & Networking

Location: Peony II

D1-1430_P2.1 39 Low-Complexity Sparse Channel Estimation for Reconfigurable Intelligent Surface-Aided MIMO

Wei-Lin Chiang, Shu-Yu Lin, Jung-Chun Chi, Yuan-Hao Huang

D1-1430_P2.2 50 BFIS: Efficient Unknown Protocol Feature Extraction Method For Satellite Communication Systems

Xianwen Ling, Kun Zhang, Rong Tong, Dianying Chen

D1-1430_P2.3 74 Outdoor Experiment of Deep Joint Source-Channel Coding Using FFT-Enabled Convolutional Neural Network for Image Transmission

Tomoka Mori, Hiroshi Tatsukawa, Yuji Kawai, Yoshinori Shinohara, Hiroki Ikeda, Daisuke Hisano

D1-1430_P2.4 128 On LSTM-Based Behavioral Modeling of Radio-Frequency Power Amplifiers with a Small Training Dataset

Ryoki Yamaguchi, Satoshi Miyata, Suehiro Shimauchi, Eiji Mochida, Seiji Fujiwara

D1-1430_P2.5 209 DL-based Optical Fibre Fault Detection for Healthcare Telesurgery Communication System

Khushi Shah, Lakshit Pathak, Akshita Abrol, Kanak Jain, Rajesh Gupta, Parishi Shah, Sudeep Tanwar, Umesh Bodkhe, Tong Rong

D1-1430_P2.6 270 Overcoming Imperfect Detection Limitations: Deep Learning-Based Calibration Strategy for Rotating Interferometric Arrays

Zhaohang Zhang, Chunzhe Wang, Zhen Huang, Yafeng Zhan

D1-1430_P2.7 287 A Regional Clustering Method Based on Propagation Similarity for Modeling Cumulative Interference from Large Numbers of Terminals

Tatsuro Hidaka, Osamu Takyu, Kei Inage, Takeo Fujii, Kohei Yoshida, Masayuki Ariyoshi

D1-1430_P2.8 290 Radio Frequency Fingerprinting-Based Device Identification Using Deep Metric Learning

Dinh Tuan Anh, Bui Tung Lam, Pham An Duy, Pham Minh Tuan, Tran Vinh Co, Nguyen Huu Tinh, Huynh Cong Bang

D1-1430_P2.9 351 GNSS Spoofing Detection Based on LSTM-TNN-CVAE Network

Chaowen Tang, Tian Qin

D1-1430_P2.10 456 Enhancing Speech Quality in Scintillating Satellite Communications: A Rician Fading Modeling Approach

Teh Kah Kuan, Sun Hanwu, Tran Huy Dat

16:00-16:30

Break

16:30-18:00

D1-1630_IB Perspective 2: Utilization of the Foundation Models and the Future

Location: Island Ballroom

D1-1630_IB.1 643 Foundation Models as Guardrails: LLM- and VLM-Based Approaches to Safety and Alignment

Huy Nguyen, Pride Kavumba, Tomoya Kurosawa, Koki Wataoka

16:30-18:00

D1-1630_L1 Tracing the Fake: Deepfake Detection, Attribution & Spoof-aware Speaker Verification Across Languages & Accents

Location: Lotus I

D1-1630_L1.1 126 NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation

Huhong Xian, Rui Liu, Berrak Sisman, Haizhou Li

D1-1630_L1.2 187 Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation

Hieu-Thi Luong, Inbal Rimon, Haim Permuter, Kong Aik Lee, Eng Siong Chng

D1-1630_L1.3 188 Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection

Janne Laakkonen, Ivan Kukanov, Ville Hautamäki

D1-1630_L1.4 203 Continual Audio Deepfake Detection via Universal Adversarial Perturbation

Wangjie Li, Lin Li, Qingyang Hong

D1-1630_L1.5 212 Exploring Source Features with Deep Residual Neural Networks for Replay Attack Detection

Suresh Veesa, Badugu Vamsi Krishna, Madhusudan Singh

D1-1630_L1.6 254 A Preliminary Study on Sectional Voice Anonymization and Detection

Shaoqi Tang, Zeyan Liu, Liping Chen, Kong Aik Lee, Tomoki Toda, Zhenhua Ling

D1-1630_L1.7 276 ArcticEcho: A Novel Speaker-Controlled Voice Cloning Dataset for Modern Deepfake Detection Benchmarking

Soham Gangopadhyay, Inderpreet Singh, Prateek Pandya, Ashish Mani, Sumit Goswami

D1-1630_L1.8 327 Variational Regularization for End-to-end Speech Deepfake Detection

Siqing Qin, Kong Aik Lee, Man-Wai Mak, Pasquale Lisena, Massimiliano Todisco

D1-1630_L1.9 508 A Wavelet tour of Audio Deepfake Detection

Arth Shah, Aniket Pandey, Manav Giakwad, Hemant Patil

D1-1630_L1.10 509 Fusion of Modulation Spectrogram and SSL with Multi-head Attention for Fake Speech Detection

Rishith Sadashiv T N, Abhishek Bedge, Saisha Suresh Bore, Jagabandhu Mishra, Mrinmoy Bhattacharjee, S R Mahadeva Prasanna

16:30-18:00

D1-1630_L2 Speaker Modeling Beyond Speaker Recognition

Location: Lotus II

D1-1630_L2.1 41 SpkAugTSE: A Simple and Efficient Approach to Address Target Confusion in End-to-End Speaker Extraction

Zhenghai You, Zhenyu Zhou, Lantian Li, Dong Wang

D1-1630_L2.2 81 Interpolating Speaker Identities in Embedding Space for Data Expansion

Tianchi Liu, Ruijie Tao, Qiongqiong Wang, Yidi Jiang, Hardik B. Sailor, Ke Zhang, Jingru Lin, Haizhou Li

D1-1630_L2.3 101 MDD: a Mask Diffusion Detector to Protect Speaker Verification Systems from Adversarial Perturbations

Yibo Bai, Sizhou Chen, Michele Panariello, Xiao-Lei Zhang, Massimiliano Todisco, Nicholas Evans

D1-1630_L2.4 131 Fusing Multi-layer Features of the Pre-trained Model With Grouped Cross Attention for Spoofing Speech Detection

Yu Guan, Wu Guo, Jie Zhang, Zhijun Zhang

D1-1630_L2.5 132 Fusing Blocked Deep Features of Pre-Trained Models for Short-Duration Speaker Verification

Zhi jun Zhang, Wu Guo, Jie Zhang, Yu Guan

D1-1630_L2.6 182 Multi-level Adversarial Training with Data Augmentation for Robust Speaker Verification

Xiaolei Zhang, Zhihua Fang, Liang He

D1-1630_L2.7 189 Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission

Nirmalya Mallick Thakur, Jia Qi Yip, Eng Siong Chng

D1-1630_L2.8 197 Estimating Speaker’s Seating Position from Monaural Speech in a Simulated Vehicle Interior Sound Field

Masataka Kaneko, Wen-Chin Huang, Tomoki Toda

D1-1630_L2.9 333 TS-VAD+: Modularized Target-Speaker Voice Activity Detection for Robust Speaker Diarization

Tran The Anh, Azmat Adnan, Wu Yihao, Chng Eng Siong

D1-1630_L2.10 590 Are Multimodal Foundation Models All That Is Needed for Emofake Detection?

Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Swarup Ranjan Behera, Pailla Balakrishna Reddy, Ananda Chandra Nayak, Sanjib Kumar Nayak, Arun Balaji Buduru

16:30-18:00

D1-1630_H3 Three-Minute Thesis (3MT) Competition

Location: Hibiscus III

16:30-18:00

D1-1630_P2 Advanced Multimedia Applications

Location: Peony II

D1-1630_P2.1 113 Ensemble Methods for Estimating the Localization of Coronary Stenosis from CT Images Using 3D CNN Models

Minori Kondo, Masaki Aono, Kazuki Shimizu, Masashi Hashimoto, Takeshi Miyaji, Kei Nomura

D1-1630_P2.2 158 Tiered Assessment for DSP Education: Exploring Students’ Motivation and Performance

Eliathamby Ambikairajah, Tharmakulasingam Sirojan, Vidhyasaharan Sethu

D1-1630_P2.3 297 An Investigation of Parameter Scheduling for Image Restoration in Optical Analog Circuits

Taisei Kato, Ryo Hayakawa, Soma Furusawa, Kazunori Hayashi, Youji Iiguni

D1-1630_P2.4 364 Robust Cloud Removal from Optical Satellite Images Using Synthetic Aperture Radar and Multimodal Embedding Prior

Taishin Miura, Shunsuke Ono, Ryo Matsuoka

D1-1630_P2.5 367 Reflection and Noise Separation from Polarized Images via Joint Nonnegative Matrix Factorization and Plug-and-Play Denoising

Maharu Oda, Ryo Matsuoka

D1-1630_P2.6 377 Gated probabilistic diffusion for temporal action segmentation

Yun LI, Hanmin Li, Kin-Man Lam

D1-1630_P2.7 382 Theory of Spherical VR model for Landscape Representation

Hiroyuki Nishimoto, Toru Takahashi, Masakazu Yoshida

D1-1630_P2.8 423 HyTver: A Novel Loss Function for Longitudinal Multiple Sclerosis Lesion Segmentation.

Dayan Perera, Ting Fung Fung, Vishnu Monn Baskaran

D1-1630_P2.9 473 KH-FUNSD: A Hierarchical and Fine-Grained Layout Analysis Dataset for Low-Resource Khmer Business Document

Nimol Thuon, Jun Du

D1-1630_P2.10 516 Effective Speckle Noise Reduction Using Transformed Bayesian Likelihood with Wiener-Based and Sketch-Based Geometric Priors

Ming-Hsun Mo, Pin-Wen Huang, Jian-Jiun Ding