Papers by Chris Kyriakakis
Real-valued delayless subband affine projection algorithm for acoustic echo cancellation
... W IFFT & Discard Last L Samples H APA0 APAK/2-1 GDFT Based SSB Analysis Filter Banks GDFT... more ... W IFFT & Discard Last L Samples H APA0 APAK/2-1 GDFT Based SSB Analysis Filter Banks GDFT Based SSB Analysis Filter Banks x(t) e(t) d(t) - echo y(t) n(t) ambient noise A B near end FFT Band0 Frequency Stacking FFT BandK/2-1 d(t) ^ 2L/N 2L/N 2L L ...
This paper describes an approach towards automating the identification of design problems with th... more This paper describes an approach towards automating the identification of design problems with three-dimensional mediated or gaming environments through the capture and query of user-player behavior represented as a data schema that we have termed "immersidata". Analysis of data from a study of an educational computer game that we are developing shows that this approach is an effective way to pinpoint potential usability or design problems occurring in unfolding situational and episodic events that can interrupt or break user experience. As well as informing redesign, a key advantage of this cost-effective approach is that it considerably reduces the time evaluators spend analyzing hours of videoed study material.

Eurasip Journal on Advances in Signal Processing, 2003
Multichannel audio offers significant advantages for music reproduction, including the ability to... more Multichannel audio offers significant advantages for music reproduction, including the ability to provide better localization and envelopment, as well as reduced imaging distortion. On the other hand, multichannel audio is a demanding media type in terms of transmission requirements. Often, bandwidth limitations prohibit transmission of multiple audio channels. In such cases, an alternative is to transmit only one or two reference channels and recreate the rest of the channels at the receiving end. Here, we propose a system capable of synthesizing the required signals from a smaller set of signals recorded in a particular venue. These synthesized "virtual" microphone signals can be used to produce multichannel recordings that accurately capture the acoustics of that venue. Applications of the proposed system include transmission of multichannel audio over the current Internet infrastructure and, as an extension of the methods proposed here, remastering existing monophonic and stereophonic recordings for multichannel rendering.

In this paper we describe the underlying concepts behind the spatial sound renderer built at the ... more In this paper we describe the underlying concepts behind the spatial sound renderer built at the University of Southern California's Immersive Audio Laboratory. In creating this sound rendering system, we were faced with three main challenges. First the rendering of sound using the Head-Related Transfer Functions, second the cancellation of the crosstalk terms and third the localization of the listener's ears. To deal with the spatial rendering sound we use a two-layer method of modeling the HRTF's. The first layer accurately reproduces the ITD's and IAD's, and the second layer reproduces the spectral characteristics of the HRTF's. A novel method for generating the required crosstalk cancellation filters as the listener moves was developed based on Low-Rank modeling. Using Karhunen-Loeve expansion we can interpolate among listener positions from a small number of HRTF measurements. Finally we present a Head Detection algorithm for tracking the location of the listener's ears in real time using a laser scanner.

IEEE Transactions on Multimedia, 2000
Immersive audio systems can be used to render virtual sound sources in three-dimensional (3-D) sp... more Immersive audio systems can be used to render virtual sound sources in three-dimensional (3-D) space around a listener. This is achieved by simulating the head-related transfer function (HRTF) amplitude and phase characteristics using digital filters. In this paper, we examine certain key signal processing considerations in spatial sound rendering over headphones and loudspeakers. We address the problem of crosstalk inherent in loudspeaker rendering and examine two methods for implementing crosstalk cancellation and loudspeaker frequency response inversion in real time. We demonstrate that it is possible to achieve crosstalk cancellation of 30 dB using both methods, but one of the two (the Fast RLS Transversal Filter Method) offers a significant advantage in terms of computational efficiency. Our analysis is easily extendable to nonsymmetric listening positions and moving listeners
AN INTERCHANNEL REDUNDANCY REMOVAL APPROACH FOR HIGH-QUALITY MULTICHANNEL AUDIO COMPRESSION
Multichannel audio can immerse a group of listeners in a seamless aural environment. Previously, ... more Multichannel audio can immerse a group of listeners in a seamless aural environment. Previously, we proposed a system capable of synthesizing the multiple channels of a virtual multichannel recording from a smaller set of reference recordings. This problem was termed multichannel audio resynthesis and the application was to reduce the excessive transmission requirements of multichannel audio. In this paper, we address the more general problem of multichannel audio synthesis, i.e. how to completely synthesize a multichannel audio recording from a specific stereophonic or monophonic recording, significantly enhancing the recording's quality. We approach this problem by extending the model employed for the resynthesis problem.
Multichannel audio is attracting rapidly increasing popularity in audio reproduction. In most cas... more Multichannel audio is attracting rapidly increasing popularity in audio reproduction. In most cases, however, its transmission requirements are extremely demanding compared to the available bandwidth. One possible solution to this problem could be to transmit a reference channel and recreate the remaining channels at the receiving end. In this paper such a method is proposed by taking advantage of spectral conversion techniques that have been successfully applied to speech processing. Applications of the proposed system include transmission of multichannel audio over the current Internet infrastructure and, as an extension of the methods proposed here, remastering of existing monophonic and stereophonic recordings for multichannel rendering.

IEEE Transactions on Speech and Audio Processing, 2003
A new quality-scalable high-fidelity multichannel audio compression algorithm based on MPEG-2 Adv... more A new quality-scalable high-fidelity multichannel audio compression algorithm based on MPEG-2 Advanced Audio Coding (AAC) is presented in this research. The Karhunen-Loève Transform (KLT) is applied to multichannel audio signals in the pre-processing stage to remove inter-channel redundancy. Then, signals in de-correlated channels are compressed by a modified AAC main profile encoder. Finally, a channel transmission control mechanism is used to re-organize the bitstream so that the multichannel audio bitstream has a quality scalable property when it is transmitted over a heterogeneous network. Experimental results show that, compared with AAC, the proposed algorithm achieves a better performance while maintaining a similar computational complexity at the regular bit rate of 64 kbit/sec/ch. When the bitstream is transmitted to narrow-band end users at a lower bit rate, packets of some channels can be dropped, and slightly degraded yet full-channel audio can still be reconstructed in a reasonable fashion without any additional computational cost.

The Distributed Immersive Performance (DIP) project explores one of the most challenging goals of... more The Distributed Immersive Performance (DIP) project explores one of the most challenging goals of networked media technology: creating a seamless environment for remote and synchronous musical collaboration. A number of research groups have presented one-time demonstrations of distributed performance with varying degrees of success since the 1970s. None, as far as we know, has focused on capture and recording of musical experience and thorough analysis of realistic musical interaction in an environment constrained by network latency and reduced physical presence. In this paper we present a comprehensive framework for the capture, recording and replay of high-resolution video, audio and MIDI streams in an interactive environment for collaborative music performance, and user-based experiments to determine the effects of latency in aural response on performers' satisfaction with the ease of creating a tight ensemble, a musical interpretative and adaptation to the conditions. The experiments mark the beginning of our efforts to study comprehensively the effects of musical interaction over the Internet in a realistic performance setting. The users and evaluators of the system are the Tosheff piano duo, Vely Stoyanova and Ilia Tosheff, a professional piano duo who have won awards and concertized internationally. We present preliminary results from our first two sets of experiments: two players in the same room at separate keyboards with direct visual contact and delayed aural response from partner (between 0 ms and 150 ms), and the same experiments with players swapping parts. User responses are reported for experiments using Poulenc's Sonata for Piano Four-Hands, where the movements are Prelude (score-recommended tempo of 132 bpm), Rustique (46 bpm) and Final (160 bpm). For the fast movements (one and three), the players experienced the highest difficulty in creating a tight ensemble at 50 ms and above. In the fast but not-so-rapid first movement, the players almost always rated difficulty in creating a musical interpretation higher than ensemble (synchronization) difficulties. In general, the users judged that, with practice, they could adapt to delays below 50 ms.

We present the architecture, technology and experimental applications of a real-time, multi-site,... more We present the architecture, technology and experimental applications of a real-time, multi-site, interactive and collaborative environment called Distributed Immersive Performance (DIP). The objective of DIP is to develop the technology for live, interactive musical performances in which the participants -subsets of musicians, the conductor and the audience -are in different physical locations and are interconnected by very high fidelity multichannel audio and video links. DIP is a specific realization of broader immersive technology -the creation of the complete aural and visual ambience that places a person or a group of people in a virtual space where they can experience events occurring at a remote site or communicate naturally regardless of their location. The DIP experimental system has interaction sites and servers in different locations on the USC campus and at several partners, including the New World Symphony of Miami Beach, FL. The sites have different types of equipment to test the effects of video and audio fidelity on the ease of use and functionality for different applications. Many sites have high-definition (HD) video or digital video (DV) quality images projected onto wide screen wall displays completely integrated with an immersive audio reproduction system for a seamless, fully three-dimensional aural environment with the correct spatial sound localization for participants. The system is capable of storage and playback of the many streams of synchronized audio and video data (immersidata), and utilizes novel protocols for the low-latency, seamless, synchronized realtime delivery of immersidata over local area networks and widearea networks such as Internet2. We discuss several recent interactive experiments using the system and many technical challenges common to the DIP scenario and a broader range of applications.
A Second Report on the User Experiments in the Distributed Immersive Performance Project
... collaboration over the Internet. Acknowledgements We thank Chris Sampson, Associate Dean for ... more ... collaboration over the Internet. Acknowledgements We thank Chris Sampson, Associate Dean for New Initiatives at the USC Thornton School of Music, for introducing us to the Tosheff Piano Duo, who have proved to be such wonderful collaborators on this project. ...

We describe new techniques for interactive input and manipulation of three-dimensional data using... more We describe new techniques for interactive input and manipulation of three-dimensional data using a motion tracking system combined with an autostereoscopic display. Users interact with the system by means of video cameras that track a light source or a user's hand motions in space. We process this 3D tracking data with OpenGL to create or manipulate objects in virtual space. We then synthesize two to nine images as seen by virtual cameras observing the objects and interlace them to drive the autostereoscopic display. The light source is tracked within a separate interaction space, so that users interact with images appearing both inside and outside the display. With some displays that use nine images inside a viewing zone (such as the SG 202 autostereoscopic display from StereoGraphics), user head tracking is not necessary because there is a built-in left right look-around capability. With such multi-view autostereoscopic displays, more than one user can see the interaction at the same time and more than one person can interact with the display.
Large complex networks have become an inseparable part of modern society. However, very little ha... more Large complex networks have become an inseparable part of modern society. However, very little has been done to develop tools to manage and ensure the security of such networks. Network operators continue to slave over endless daily logs and alerts in a struggle to keep networks operational. Perhaps the most formidable enemy of network operations today is the volume of management data that must be perused. Expensive commercial products attempt to visualize data but with limited utility, as witnessed by the prevailing use of command-line interfaces and homegrown scripts. In addition to data collection tools, operators need to immediately observe and debug the effects of their actions; yet that information is buried deep in the data that pours daily from monitoring equipment. Thus, they need better ways to abstract network events and better, more informative ways to render them.
Uploads
Papers by Chris Kyriakakis