How to Measure Audio Latency in Networked Music Systems

08 Aug 2024

nmp

audio

latency

realtime

...

myImage
From Telematic Testing, an event that combined live music and dance using advanced networked technology in three different places, hosted at Popsenteret, Oslo, April 2024. Photo by Thusha Rajendran.

The most crucial factor of any Networked Music Performance (NMP) is latency. More specifically, their success is based on whether or not the technical system used is able to provide the ultra-low-latency conditions necessary to create a sense of common simultaneity between the musicians. I'm sure most of us have experienced laggy and latency-ridden Zoom sessions and felt how much jitter and delays can disrupt the flow of the conversation. When playing music together, these tolerances and thresholds are much lower. In fact, current NMP research dictates that even audio latency beyond 30-50ms is enough to disrupt the sense of simultaneity between two remote performers.

When considering how important low-latency audio transmissions are for NMPs, we understand how valuable it is to know how to accurately measure it. Although many modern audio transmission software and DAWs offer built-in latency features that can display network audio latency between two machines and estimate buffering latencies, these numbers are often unreliable and not comprehensive enough for pinpointing the true latency experienced in networked music systems. Measuring audio latency in NMP systems is a more complex procedure that requires attention to many additional factors, such as the distance and sound propagation from speakers to performers, digital-to-analog conversion stages, additional software and hardware peripherals, software settings, and more.

At the Department of Musicology at UiO, we have developed solid strategies for measuring audio and video latency in networked music systems after many years of running the MCT masters programme. In this post, I write about some of these strategies and share good practical ways to measure audio latency in networked music scenarios where multiple remote parties are involved. Hopefully, it can guide you to design your own blueprints to pinpoint the latencies you are looking for.

Content

  1. How to Measure
  2. What to Measure
  3. Measurement Procedure
    1. Verify The Measurement System
    2. Connect The Music System to The Measurement System
    3. Fine-tune The Audio Quality
    4. Measure The Latency
    5. Measure With Alternative Parameters (extra)
  4. Summary
  5. References

To do precision measurements, we need to know what to measure and how to best measure it so we can sketch out a good measurement procedure.

How to Measure

Sending an audio impulse through a signal chain and comparing it against some reference is the most widely used technique to measure audio latency in technical systems (Fasciani, 2020). The most crucial aspect of this technique is to ensure that the delayed and reference signal chains are subject to the exact same initial conditions before any measurement object is introduced. This is so important that, if we cannot empirically guarantee that these two signals are initially identical, we cannot trust any of the results.

Creating such technical conditions when in real-world NMP contexts can be challenging. There are usually a plethora of interconnected software and hardware modules to oversee and very little time to manage them. As mentioned, this is a problem for the validity of our measurements. However, it is possible to build an empirically solid measurement system that works well with very few components. I prefer to build measurement systems with a laptop connected to an external audio interface that has two or more inputs/outputs. To create the two identical signal chains, I physically connecting two of the outputs on the interface to two of the inputs with XLR or jack, effectively looping back two channels. Audio is administered then from one end of the chains and recorded at the other.

myImage
A loopback measurement system diagram using a laptop connected to an external audio interface. Looping back channels isolates the signals from any unknown/unwanted processes that might affect our latency measurement. Real example in the next image →

It's also possible to design a cheaper and more travel-friendly system using something like a smartphone connected to a handy recorder with two or more inputs. To create identical signal chains in this scenario, split the physical audio output from your smartphone/device and connect them to two physical inputs on the handy recorder. This method acheives the same signal isolation as the loopback method, only it requires some additional work to load the recorded audio into software for analysis.

In any case, it's important to be as scientific as possible and spend time verifying your measurement method before you put it into practice. The latency between the two signal chains should be zero if no additional system or module is inserted into the chain. Example plan:

myImage
A simpler measurement system can be created with a handy recorder connected to something like a laptop or smartphone where the audio is administered from the laptop/smartphone and recorded on the handy recorder. Real example in the next image →

To measure the latency, send one impulse audio signal to both the reference and delayed signal chain outputs and record them separately. Then, count the sample difference between the recorded reference and delayed impulses. Finally, factor in the sampling rate to get the latency metric. With the measurement system in place, what remains is simply to introduce the system you wish to measure into one of the signal chains. The latency between the recorded refence and delayed signals is an accurate representation of the latency in the inserted system.

To be able to best distinguish between the reference and delayed sound when counting samples, it's wise to choose audio impulses that are "short". This means audio that is short in length but also audio that features an immediate attack and sustain, like clapping sounds or, even better, a simple signal click.

myImage
The loopback measurement system with added measurement object inserted into one of the signal chains. Real examples in the next images →

What To Measure

In a networked music setting, what we are interested in measuring is usually called the Over-all One-way Source-to-Ear delay, or OOSE for short (Rottondi et.al, 2016). The OOSE delay encompasses everything from sound in air, traveling from the source of the sound to the microphone, to the sound propagating from the loudspeaker to the ears of listners at the destination. In between, we might find several AD/DA conversions, hardware modules, network and software layers.

myImage
OOSE delay diagram showing the typical components included in an NMP signal path. Any step along the signal path will cause additional latency to various degrees.

However, the OOSE delay metric only carries real value if it's measured in a system that can transmit clear and stable audio. In other words, it's practically meaningless to measure the latency of a NMP system with bad audio quality. However, enhancing the media fidelity in streaming contexts usually means entails adding additional processing stages that, in turn, add more latency. For instance, increasing audio buffer sizes and bit-rates, or experimenting with different compression settings, are all good modifications that can enhance audio fidelity, but they all do so at the expense of time. There's always a tradeoff between the quality, latency and stability of transmissions in networked music contexts that has to constantly be negotiated (Fasciani, 2020).

The desired audio latency measurement object in networked music contexts should be the OOSE delay of a system that is fine-tuned to produce stable and high-quality audio over a significant period with minimal dropouts. For clarity, we can call this expanded definition for Stable OOSE, or SOOSE for short.

But measuring SOOSE delays over networks presents us with another key challenge. How can we compare a delayed impulse to a reference if the delayed signal is sent to another location? We cannot also send the reference because we want to isolate and measure the network path. The solution is to adopt a Round-Trip Time (RTT) strategy instead, where the delayed impulse is sent to the remote location and looped back to the source before being measured (Carôt 2009, Rottondi et.al 2016, Fasciani 2020). This strategy is similar to how computers negotiate clock time, actively measuring RTT latencies from fixed locations and subtracting the results from a common source. RTT SOOSE measurements requires us to loop back the delayed impulse at the very end of the SOOSE delay chain, preferably with a microphone capturing the sound from the speaker and sending it back to the source. If the signal chains at both locations are technically similar, with matching software and hardware configurations, then dividing the RTT SOOSE delay metric by two should give you the one-way latency.

myImage
Of course, to be able to measure the true (S)OOSE delay, we need to adopt an RTT measurement strategy.

Finally, having insufficient bandwidth or an unstable network connection can heavily impact the quality and stability of an audio signal, producing dropouts and quality degradation. Therefore, it can be wise to check the network bandwidth whenever you are in a new location. Even though you don't need more than about 1.5Mbps of bandwidth to transmit high-quality stereo audio, it's always best to be as much in know as possible. There are several online speed test tools you can use the measure the bandwidth of a network. Personally, I recommend working with the iPerf networking tools for a little more advanced control.

Measurement Procedure

Now that we know what to measure and how we should measure it, it's time to construct a good measurement plan. Setting up networked music systems is usually pretty hectic with many concerns and very little time to manage them. Having a detailed plan can ensure precision and scientific validity in chaotic times. And the better the plan is, the less you have to think and stress when the iron is hot.

Below I have sketched out an example plan with four individual steps that, when followed, will give accurate and precise audio latency measurements. But first, let's discuss when you should measure. If it's a networked concert, I recommend doing all technical measurements on the day of the performance, for instance during the final soundchecks. Networks are fundamentally non-deterministic, meaning that the conditions might change from day to day, even from hour to hour. Measuring right before a show is therefore the best indication of the true latency in the system at the time when it matters most.

Step 1 - Verify The Measurement System

First thing is first, set up and verify the measurement system using the methods I've demonstrated in this post. This is also a good time to run some bandwidth tests and configure any network settings.

Step 2 - Connect The Music System to The Measurement System

Second, insert the measurement object (being the entire SOOSE networked music system) into the delayed signal chain of the measurement system. Connecting these two systems usually means connecting the output of the delayed signal chain to the mixer that sends audio to the remote location, and connecting a microphone to the signal chain input. The microphone is placed somewhere near the performers and the main PA speakers are directed towards the mic. When the audio impulses are sent over the networked and looped back from the remote location, they exit the PA and get picked up by the microphone.

myImage
Connecting the networked music system to the measurement system might require additional microphones and speakers at each end to be closest to the OOSE standard. However, it is also to simplify by administering and routing the measurement impulses directly to the mixer.

However, since we can pretty reliably calculate how long it takes sound to travel a specific distance through air, it is also possible to skip the extra mic and speaker stages, administer the measurement impulses directly to the mixer at one end, and route them back from the mixer at the remote end. In this case, just remember to take note of the distances between the speakers and the performers and factor these distances into the final calculations.

Step 3 - Fine-tune The Audio Quality

As stated, the key to unlocking a meaningful networked music practice is to achieve clear and stable audio transmissions. This is also necessary to be able to maintain the "stable" part of the SOOSE delay metric. Therefore, once the measurement and music systems are connected, the next step is to fine-tune the system for good audio quality.

The simplest way to analyze the audio quality is to send a constant sine tone through the delayed RTT signal chain and listen for any impurities or dropouts to the signal as it exits the local PA. This is an iterative process that involves adjusting various system settings until the audio is glitch-free, such as buffersize, bit-rate and sampling rate. I've written more extensively about this method in another post of mine about networked audio engineering practices, about how to fine-tune audio parameters to suit the network.

myImage
Send a reference sine tone through the RTT signal chain to monitor the audio (and network) quality. Here, I bypass the microphone and speaker at the remote location because it's often the AD/DA and network convesions that cause artifacts to audio signals, rarly because of microphones or speakers.

Step 4 - Measure The Latency

Finally, you can measure the latency of your system. Since these measurement practices are essentially statistical exercises, the more data you collect have, the better. In fact, recently I decided to re-develop some older MaxMSP code into software that can administer audio impulses, record, measure and store large amounts of latency datam all in one. Having this software gives me a lot of freedom with the added benefit of making my measurement system more systematic and reproducible. Another advantage of using specialized measurement software is that it opens up the possibilities of measuring other audio devices, such as VST plugins and virtual audio drivers.

myImage
Dedicated software (like ALEX) for batch collecting audio latency data from musical devices can be useful to get enough samples to properly evalute the system.

Read more and download my software (ALEX) for free via my previous post.

Step 5 (extra) - Measure With Alternative Parameters

In addition to the SOOSE delay, it can be well worth collecting data from different system settings to get a better understanding of the system and gain important clues to how to make it more efficient. For instance, experimenting with different buffer sizes, compression settings, and even without certain modules in the chain, will give you insights into how each component contributes to the overall latency. Experimenting with the physical setup, like different microphone and speaker setups, can also be worthwhile in this regard.

To measure the latency with alternative parameters, make a table beforehand where you systematically list all the configurations you want to test.

Summary

In this post, I've shared some theories and practical strategies to achieve good audio latency measurements in networked music systems. The post has covered how to best measure the latency, what you should aim to measure, and discussed what a good measurement procedure should look like.

In my experience, the three most important factors when measuring audio latency in a networked music system are: 1) considering the entire musical chain from mouth to ear, 2) ensuring the system has glitch-free and stable audio transmission, and 3) using a verified measurement method that accurately isolates the object you wish to measure. It's also wise to plan the procedure with concrete steps ahead of time and to consider using tailored software that can collect, crunch and analyze larger amounts of data.

Happy measuring!

References

Carôt, A., & Werner, C. (2009). Fundamentals and principles of musical telepresence. Journal of Science and Technology of the Arts, 26-37 Páginas. https://doi.org/10.7559/CITARJ.V1I1.6

Fasciani, Stefano. (2020). Network-Based Collaborative Music Making [Video]. YouTube. https://youtu.be/GZCueJeg168?si=HYGVmNREuthYHZay

Hope, C. (2024). Unit 6: Networked Music Performance. Mutor Music Technology Online Repository. https://mutor-2.github.io/HistoryAndPracticeOfMultimedia/units/06/index.html

Rottondi, C., Chafe, C., Allocchio, C., & Sarti, A. (2016). An Overview on Networked Music Performance Technologies. IEEE Access, 4, 8823–8843. https://doi.org/10.1109/ACCESS.2016.2628440

Leave a Comment