Exploring Dataset Sonification with Web Audio

20 Sep 2023







Creating audio-visual art on the Web is easier than ever, thanks to the growing capabilities of modern web browsers and the availability of open-source creative coding libraries. Together, they facilitate an environment where people with varying technical abilities and backgrounds can create meaningful and interactive art in the browser at record speeds.

As an audio enthusiast, I have been interested in the DSP abilities of these libraries for some time, and speficially how Web Audio can contribute to modern artistic and research practices. In early 2023, I ran a workshop in collaboration with the Creative Computing Hub Oslo (C2HO), specifically on how sonification can help us navigate and explore Big data on the Web.

In this post, I share some examples from my workshop, presenting two approaches to model-based sonification of large datasets in a web browser using the p5js library.


  1. Why Sonify?
  2. Example Dataset
  3. Multi-dimensional Plot
  4. The Sonic Orbs
  5. Final Thoughts
  6. Source Code

Why Sonify?

The most common approach to engaging with data online is through visual graphs and plots where various axis represent different data dimensions. However, with sonification, or Auditory Displays, we can use (non-speech) audio to convey information or perceptualize data (Hermann et.al, 2011). As you may not know, sonification is an established field of research, and the literature often differentiates among many kinds of sonification methods. For instance, the method of Audification refers to the direct playback of data samples or a direct translation of a data waveform into sound. So with audification, we can perhaps perceive patterns and irregularities in large datasets in record time.

Most real-world sonification applications augment existing visual-based tools, helping to extract valuable and coherent information in realtime. For example, the goal of model-based sonification (another sonification method) is to investigate how acoustic responses from dynamic (sometimes visual) environments can provide valuable information in response to user interactions. As such, model-based sonification systems are great for explorative data analysis and augmenting existing visual analysis methods.

We know that sonification applications can facilitate better accessibility practices and sometimes even be more efficient than data visualization techniques. We also know that rapidly developing technologies, such as the Web Audio API and WebAssembly, etc., make it easy to create powerful and fast DSP engines on the web. But how can we use sonification to navigate Big Data on the web? How useful would this be? And how do we do this from a practical standpoint? These questions inspired me to spend some time exploring dataset sonification using the p5js library and a big dataset of coffee quality.

Example Dataset

The dataset I am using for my examples is a simple and generic .csv spreadsheet detailing Arabic coffee quality from different farms over several years. I sourced the dataset on Kaggle.com, where you can find lots of high-quality and free datasets for your projects, in a variety of different formats.

Here is a taste of what my dataset looks like:

Each of the 1300 rows in the spreadsheet represents a specific coffee shipments (dispatches) between 2010 - 2017. On the other hand, the columns tell us useful information about the dispatches, detailing everything from the quality of the coffee to the number of bags in the shipment.

It should be noted that I am only using a small fraction of this dataset in my examples in this post.

Multi-dimensional Plot

Below is my first example of a model-based sonification system where I've augmented a traditional 2D plot with web audio. Each black dot reprents unique coffee dispatches; rows in the dataset spreadsheet. The height (vertical Y-axis) of the red dots equals coffee acidity levels over time (horizontal X-axis). The farm name is also printed for reference.

When you click the dots, the frequency of the played synth note represents the altitude of the farm. The higher the frequency of the note, the higher up the farm is. An additional/second oscillator is used as an LFO to control the amplitude of the first oscillator, creating an amplitude modulation synthesizer in the process. By listening to the frequency of the LFO and the varying characteristics of the synth notes, we can disseminate information about how many bags of coffee were part of every shipment. The higher the frequency of the LFO, the more coffee bags were produced. See if you can hear the differences.

A great thing about p5 are its Web Audio API utility functions for high-level declarative audio programming. This framework makes it easy to define and update audio parameters on the fly, making it ideal for designing interactive model-based sonification systems for the web. Together with some basic knowledge of sound synthesis, these utilities make it possible to build complex synths, fast, that can react to user input or any other dynamic data.

The code block below demonstrates some key p5 audio utilities and how easy it can be to manipulate parameters dynamically.

// ./ampmodsketch.ts
// Example of a simple p5 sketch in instance mode.
// amplitude modulation synthesis
const ampmodsketch = (p5: p5Types) => {
let sine: any
let LFO: any
p5.preload = (): void => {
// preload some data (for instance a .csv)
p5.setup = (): void => {
// create a canvas.
p5.createCanvas(200, 200);
// load an oscillator with a sinewave and set initial amplitude.
sine = new p5.constructor.Oscillator("sine");
// load another oscillator as an LFO
LFO = new p5.constructor.Oscillator("sawtooth");
LFO.disconnect(); // disconnect the LFO from the master output
// make so that the LFO controls the amplitude of the sine wave oscillator.
p5.draw = (): void => {
// changes the frequency and add logic to play synth
// based on the data loaded.
let myp5 = new p5(ampmodsketch);

But in a more general sense, re-creating traditional plots and graph structures with p5 is tedious work. Unlike other libraries, such as Plotly, GraficaJs and Matplotlib, p5 is not optimized nor designed for data visualizations and statistical analysis. So instead of replicating traditional plotting techniques and functionalities, we should try to take advantage of the dynamic and strong visual capabilities of p5 to create more tailored platforms.

The Sonic Orbs

Below is another example of a model-based sonification system with a more creative and object-oriented approach to visual and sonic design. Here, users can navigate through time by interacting with a slider, moving colored orbs around in a 2D plot. Each orb represents a unique coffee company in the dataset. Selected information about the company's coffee dispatches are mapped to various properties of the orb. In my particular example, the XY position of the orbs communicate the "balance" and "body" of coffee dispatches from 3 companies between 2010 - 2012.

For simplicity's sake, I used the same dimensions (body and balance) to control the sound of the orbs through the amplitude modulation synthesis I designed for my multi-dimensional plot example. The orb's carrier frequency is determined by its vertical position in the plot (reflecting the balance variable), while the frequency of the LFO is determined by the orb's horizontal position (reflecting the body variable).

In the code, I define an Orb class with some utility methods where each class instance represents a unique coffee owner/company from the dataset. So every Orb has the same basic features, such as shape, size, movement, sound, etc. But any data point can be mapped to any property of these orbs. When I create a "new Orb" I simply pass whatever data I want to see and hear as arguments and ensure that the current value of my slider is passed to every Orb in the draw function.

There are several advantages to using this object-oriented approach. First, representing high-level data structures from your dataset through objects in code is a good programming strategy. It's highly scalable, maintainable, and more compatible when working with larger, varying, and more complex datasets. For sonificatin purposes, its also an elegant way to handle polyphony and complex harmonic content. But more importantly, I think the Sonic Orb example is better at leveraging our complex ability to listen, deconstruct, and differentiate amongst subtle sound textures and musical harmonies in our environments.

// Example of a simple p5 sketch in instance mode.
// I create an Orb class where whose visual and sound
// is dynamically shaped by the current slider value.
const orbsketch = (p5: p5Types) => {
let data, data_body, data_balance, i, sliderHTML, sliderVal, orb1;
let owner_1 = "Kona Pacific Farmers Cooperative";
let color_1 = [150, 100, 190];
let size_1 = 12;
p5.preload = () => {
data = p5.loadTable("PATH/TO/DATASET");
data_balance = data.getColumn("balance")
data_body = data.getColumn("body")
p5.setup = () => {
p5.createCanvas(200, 200);
slider_html = p5.createSlider(0, 100, 0); // create a HTML slider
orb1 = new Orb(owner_1, color_1, size_1); // initialize a new Orb
p5.draw = () => {
sliderVal = sliderHTML.value(); // get the current slider value
orb1.update(data_balance[sliderVal], data_body[sliderVal]); // update my orbs on every new window frame.
let myp5 = new p5(orbsketch);
// My orb class where I define the behavior of my orbs.
class Orb {
constructor(owner, color, size) {
this.owner = owner;
this.color = color;
this.size = size;
this.position = [0,0];
this.oscillators = [];
// etc..
update(sliderVal, x, y){}
draw() {}
play() {}
// etc..

But how useful are these audio-augmented plots, really? And can they be used in any analytic or scientific way? Building complex sounds from data can increase the quality and amount of information perceived, but the mapping has to be high-quality, consistent, and reliable. For my orb example to be useful, all parameters and ranges would require fine-tuning and proper testing to ensure that users could pinpoint meaningful(!) data differences from the harmonic content. It would also be advantageous if users could configure more parameters and choose which data dimensions make sound. This would make the system more interactive and customizable for different purposes.

Final Thoughts

In summary, it's very much possible to enrich data visualizations with model-based sonification in a web browser using the p5js library. I demonstrate two approaches for how to do this that use audio to augment visual data displays as a way to reveal more dimensions in realtime. The first approach can be considered more traditional, using audio to augment simple 2D visual graphs, while the second approach is more experimental, with usual and complex visual-audio aspects.

Besides their educational and artistic potential, I think the biggest advantage of these audio-visual apps is that they can facilitate more diverse, accessible and multi-faceted forms of online interaction with complex data. However, it's important to note that there is no guarantee that this is the case. Doing user-testing is a tedious and complicated process usually requiring a variety of qualitative and quantitative data to be definitive. People also interpret sounds and music very subjectively, making matters even worse. But from my experience from working on this project, I believe you can come a long way by designing tools that are highly data-driven, consistent, and interactive(!). The web provides a unique platform for large-scale testing and distributing to people of all demographics, so interactivity is key.

Finally, although p5js is great for building easy-to-use, audio-visually robust web apps, it's not really a library designed to be the backbone of fast and powerful analysis software. In other words, programming analysis-heavy programs with p5 is not recommended. Rather, it's great for prototyping and smaller apps/projects.

Source Code

To get the full source for these examples, and much more, check out this workshop repository on my GitHub page. I recommend using the p5js Web Editor as your development environment so you can focus 100% on coding. However, you can also run the code from a local node development server, if you want.

In any case, ensure that you use a Chromium-based web browser, such as Chrome, Brave, or Vivaldi. I have not experimented with other browser engines, such as Firefox's Gecko, so you might run into unforseen issues there.


Hermann, T., Hunt, A., & Neuhoff, J. G. (Eds.). (2011). The Sonification Handbook (1st ed.). Logos Publishing House.