22-24 October 2025, Singapore

International Audio Laboratories Erlangen
Title: Advancing Speech Communication: From Spatial Capture to Runtime Adaptation
Abstract: Speech is at the heart of modern communication, from everyday voice and video calls to teleconferencing, telepresence, and AR/VR experiences. Yet making conversations sound natural across rooms, varying talker–microphone distances, and devices, remains a challenge. In this talk, I will share how recent advances in speech and acoustic signal processing are bringing us closer to seamless, high-quality voice communication anywhere. We begin with spatial sound capturing, revisiting classical beamforming approaches before introducing a new neural beamforming approach. This learning-based method enables us to “point our ears” in any direction with enhanced robustness and precise control. Next, we explore acoustic teleportation – a technique that transforms a recording made in one environment to sound as if it were recorded in another. By disentangling speech and acoustic embeddings, we can effectively make the recording sound as if it were made in the target environment. Finally, we discuss efficient deployment of these technologies on resource-constrained devices using slimmable neural networks – models that can adapt their size and computational complexity on the fly. Such architectures support both fixed (device-driven) and dynamic (data-driven) runtime adaptation, allowing resource-constrained devices to deliver advanced speech processing. Looking forward, these developments pave the way for more natural and immersive speech communication.
Emanuël Habets received the M.Sc. (2002) and Ph.D. (2007) degrees in Electrical Engineering from Eindhoven University of Technology (TU/e), The Netherlands. From 2007 to 2009, he was a Postdoctoral Fellow at the Technion—Israel Institute of Technology and Bar-Ilan University, Israel, and from 2009 to 2010, he was a Research Fellow in the Communications and Signal Processing Group at Imperial College London, U.K. He is currently a Full Professor at the International Audio Laboratories Erlangen—a joint institute of Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer IIS—and Head of the Department for Speech and Audio Research at Fraunhofer IIS (home of mp3), Germany.
Dr. Habets’ research focuses on speech and acoustic signal processing; he has authored over 100 journal papers and 250 conference papers. He was Co-Chair of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) in New Paltz, NY, and Co-Chair of the 2014 International Conference on Spatial Audio (ICSA) in Erlangen, Germany. He served on the IEEE Industry Digital Signal Processing Technology Standing Committee (2013–2015), as Associate Editor of IEEE Signal Processing Letters (2013–2017), and as Editor-in-Chief of the EURASIP Journal on Audio, Speech, and Music Processing (2016–2018). He is a founding member of the EURASIP Acoustic, Speech, and Music Signal Processing Technical Area Committee (Vice-Chair, 2015–2018; Chair, 2019–2021). With S. Gannot and I. Cohen, he received the 2014 IEEE Signal Processing Letters Best Paper Award. He currently serves on the Technical Program Committee of the 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), on the IEEE Audio and Acoustic Signal Processing and IEEE Speech and Language Processing Technical Committees, and as a Senior Area Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing.
Fellow: IEEE, Canadian Academy of Engineering
Electrical and Computer Engineering Dept./Biomedical Engineering School
University of British Columbia, Canada
Title: Signal Processing Meets Deep Learning in Healthcare — A Case Study on Parkinson’s Disease (PD)
Abstract: Recent exciting breakthroughs and revolutions of artificial intelligence (AI), especially deep learning models, in numerous fields also come with big challenges: lack of interpretability and explainability due to the “black box” nature of current deep learning models; deep learning models are vulnerable to adversarial attacks; and the scarcity of well-annotated data in real-world problems. Such challenges are particularly critical for biomedical and healthcare applications.
Our suggestion is to explore the intersection of traditional signal/image processing (SP/IP) and deep learning to make the decision making clinically meaningful and explainable, by leveraging domain knowledge with the learning ability of deep learning to mitigate the deficiencies of traditional SP/IP and black-box deep learning approaches. In this talk, I will first provide an overview and then focus on illustrative research on Parkinson’s Disease (PD) study (e.g., pose estimation-based assessment and monitoring using video data). We propose innovative strategies (e.g., self-supervision, partial annotation, data synthesis) for training deep learning models without or reducing the need for explicit annotated data. The talk will conclude by brainstorming future research directions.
Jane Wang received the B.Sc. degree from Tsinghua University in 1996 and the M.Sc. and Ph.D. degrees from the University of Connecticut (UConn) in 2000 and 2002, respectively, all in electrical engineering. While at UConn, she received the annual Outstanding Engineering Doctoral Student Award. She was a Research Associate at the University of Maryland, College Park from 2002 to 2004. Since 2004, she has been with the University of British Columbia (UBC), where she is now a Professor.
She is an IEEE Fellow, a Fellow of the Canadian Academy of Engineering (FCAE), and a member of the College of New Scholars, Artists and Scientists of the Royal Society of Canada. Her research interests lie in statistical signal processing and machine learning, with current focuses on digital media and biomedical data analytics. She has co-authored 200+ journal and 140+ conference papers, and co-founded Cortic Tech., a Vancouver-based AI education startup whose team won First Grand Prize in the OpenCV AI Competition 2021.
Her professional service includes roles on numerous IEEE SPS technical committees (BISP, MMSP, MLSP, IFS), the IEEE Fellow Committee, and as an editor or associate editor for IEEE TSP, SPL, TMM, TIFS, TBME, and SPM. She is currently the Editor-in-Chief for IEEE SPL and has served as general/technical chair of many IEEE conferences including ICIP2026, MMSP2018, and DSLW2021.
Tsinghua Shenzhen International Graduate School (SIGS), PhD, MBA, P.Eng., FIEEE, FEIC, FCAE
Title: Record and Represent Human Movement – A Type of Generic Sequential Symbolic Notation System
Abstract: Recording and representing knowledge with temporal characteristics such as music and dancing before digital age have been challenging. The brilliant way people invented such as sheet music has played a crucial role in our cultural development. In this talk, we first review a symbolic notation of a type of human body movement – Labanotation. We then illustrate how to take advantage of this powerful symbolic notation to recognize body gesture elements using neural network learning. Further, we introduce a new symbolic notation system, namely HandLaban, to record and represent human hand movements. This type of generic sequential symbolic notation system translates between dynamic human movements and static sequence of symbols. It has great potential in AI applications such as digital human, robotics as well as LLM based generative AI.
Xiao-Ping (Steven) Zhang received the B.S. and Ph.D. degrees from Tsinghua University, in 1992 and 1996, respectively, all in electronic engineering. He holds an MBA in Finance and Economics with Honors from the University of Chicago Booth School of Business. He is Tsinghua Pengrui Chair Professor at Tsinghua Shenzhen International Graduate School (SIGS), Tsinghua University. He was the founding Dean of Institute of Data and Information (iDI) at Tsinghua SIGS. He had been with the Department of Electrical, Computer and Biomedical Engineering, Toronto Metropolitan University (Formerly Ryerson University), Toronto, ON, Canada, as a Professor and the Director of the Communication and Signal Processing Applications Laboratory (CASPAL) and has served as the Program Director of Graduate Studies. His research interests include sensor networks and IoT, machine learning/AI/robotics, statistical signal processing, image and multimedia content analysis, and applications in big data, finance, and marketing.
Dr. Zhang is Fellow of the Canadian Academy of Engineering, Fellow of the Engineering Institute of Canada, Fellow of the IEEE, a registered Professional Engineer in Ontario, Canada, and a member of Beta Gamma Sigma Honor Society. He is the general Co-Chair for the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2021 and 2027. He is the general co-chair for 2017 GlobalSIP Symposium on Signal and Information Processing for Finance and Business, and the general co-chair for 2019 GlobalSIP Symposium on Signal, Information Processing and AI for Finance and Business. He was an elected Member of the ICME steering committee. He is the general chair for ICME2024 and BioCAS2023. He is Editor-in-Chief for the IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING. He served as Senior Area Editor for the IEEE TRANSACTIONS ON IMAGE PROCESSING and the IEEE TRANSACTIONS ON SIGNAL PROCESSING. He served as Associate Editor for the IEEE TRANSACTIONS ON IMAGE PROCESSING, the IEEE TRANSACTIONS ON MULTIMEDIA, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, the IEEE TRANSACTIONS ON SIGNAL PROCESSING, and the IEEE SIGNAL PROCESSING LETTERS. He was selected as IEEE Distinguished Lecturer by the IEEE Signal Processing Society and by the IEEE Circuits and Systems Society.