Audio Visualizations

Priya Kalyanakrishnan
3 min readMay 20, 2021

--

We normally know data sets as a table with variables, and values in any data type organized in columns and rows. Visualizing raw multimedia files without transcribed metadata is possible.

Goal: Measuring and visualizing sounds to analyze factors such as beats, decibel measurements, and various frequencies. This analysis can predict if an audio file is demographically and statistically appropriate. Unlock another level of audio visualization instead of sticking with the status quo.

Prerequisites:

  • Linux container, if not running a Canonical Operating System (OS) already.
  • Install a version of Jupyter Notebook within the Linux container.
  • Install a version of Python (the tutorial uses 3.9) inside the Linux container.
  • Prior knowledge of math and sciences.
  • Previous experience with Python Programming.
  • Comfortable learning new Python Programming modules.
  • Must know how to read a variety of graphs and legends.

In this tutorial, we compare two audio files using modules and libraries, particularly librosa, matplotlib, and NumPy. The two audio files are Will Smith’s “Switch” and Cascada’s “Au Revoir”.

The following algorithmic codes visualize both audio files. As an example, Sine waves from trigonometric functions and decibel-dB with hertz visualization.

Will Smith — Switch

Sine Wave:

import librosa
import numpy as np
import matplotlib.pylab as plt

y, sr = librosa.load("01-Switch.m4a")
D = librosa.stft(y)
s = np.abs(librosa.stft(y)**2)
chroma = librosa.feature.chroma_stft(S=s, sr=sr)
chroma = np.array(chroma)
chroma = np.cumsum(chroma)

x = np.linspace(-chroma, chroma)
plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt.show()
Will Smith-Switch: Sine Wave

Decibel & Hertz:

import matplotlib.pyplot as plt
from librosa import *
y, sr = librosa.load("01-Switch.m4a")
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
librosa.frames_to_time(beats, sr=sr)
onset_env = librosa.onset.onset_strength(y, sr=sr,
aggregate=np.median)
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env,
sr=sr)
hop_length = 512
fig, ax = plt.subplots(nrows=2, sharex=True)
times = librosa.times_like(onset_env, sr=sr, hop_length=hop_length)
M = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=hop_length)
import librosa.display
librosa.display.specshow(librosa.power_to_db(M, ref=np.max),
y_axis='mel', x_axis='time', hop_length=hop_length,
ax=ax[0])
ax[0].label_outer()
ax[0].set(title='Mel spectrogram')
ax[1].plot(times, librosa.util.normalize(onset_env),
label='Onset strength')
ax[1].vlines(times[beats], 0, 1, alpha=0.5, color='r',
linestyle='--', label='Beats')
ax[1].legend()
ax[1].label_outer()
Will Smith-Switch: Decibel & Hertz

Cascada — Au Revior

Sine Wave:

import librosa
import numpy as np
import matplotlib.pylab as plt

y, sr = librosa.load("02-Au-Revoir.m4a")
D = librosa.stft(y)
s = np.abs(librosa.stft(y)**2)
chroma = librosa.feature.chroma_stft(S=s, sr=sr)
chroma = np.array(chroma)
chroma = np.cumsum(chroma)

x = np.linspace(-chroma, chroma)
plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt.show()
Cascada-Au Revior: Sine Wave

Decibel & Hertz:

import matplotlib.pyplot as plt
from librosa import *
y, sr = librosa.load("02-Au-Revoir.m4a")
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
librosa.frames_to_time(beats, sr=sr)
onset_env = librosa.onset.onset_strength(y, sr=sr,
aggregate=np.median)
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env,
sr=sr)
hop_length = 512
fig, ax = plt.subplots(nrows=2, sharex=True)
times = librosa.times_like(onset_env, sr=sr, hop_length=hop_length)
M = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=hop_length)
import librosa.display
librosa.display.specshow(librosa.power_to_db(M, ref=np.max),
y_axis='mel', x_axis='time', hop_length=hop_length,
ax=ax[0])
ax[0].label_outer()
ax[0].set(title='Mel spectrogram')
ax[1].plot(times, librosa.util.normalize(onset_env),
label='Onset strength')
ax[1].vlines(times[beats], 0, 1, alpha=0.5, color='r',
linestyle='--', label='Beats')
ax[1].legend()
ax[1].label_outer()
Cascada-Au Revior: Decibel & Hertz

Conclusion:

Referring to the visuals provided, each audio file shows different variable prominence in both quantitative and qualitative forms. Quantitatively, many factors correlate. For example, the onset strength can be found with decibel measurements. Qualitatively, colours included in graphs convey presence and directional movements excluding paired coordinate plots.

Takeaways:

  • Expanding on audio files in Python Programming with Jupyter Notebook is possible without using a traditional data set.
  • Almost all common audio file types (extension type: .mp3, .wav, .m4a) are compatible. Refer to the library website for more details.
  • Audio measurements (decibel-dB, hertz-Hz, and frequencies) can apply to raw audio files.
  • Sine waves can provide a visualized version of consistency.

Try it out for yourself, and discover what you couldn’t typically see before.

--

--

Priya Kalyanakrishnan

This “story” smorgasbord acts as storage for helpful reminders.