---
slug: 2016/11/pca-audio-compression
title: PCA audio compression
date: 2016-11-01 12:00:00
authors: [bspeice]
tags: []
---
In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure.
Towards a new (and pretty poor) compression scheme
--------------------------------------------------
I'm going to be working with some audio data for a while as I get prepared for a term project this semester. I'll be working (with a partner) to design a system for separating voices from music. Given my total lack of experience with [Digital Signal Processing][1] I figured that now was as good a time as ever to work on a couple of fun projects that would get me back up to speed.
The first project I want to work on: Designing a new compression scheme for audio data.
A Brief Introduction to Audio Compression
-----------------------------------------
Audio files when uncompressed (files ending with `.wav`) are huge. Like, 10.5 Megabytes per minute huge. Storage is cheap these days, but that's still an incredible amount of data that we don't really need. Instead, we'd like to compress that data so that it's not taking up so much space. There are broadly two ways to accomplish this:
1. Lossless compression - Formats like [FLAC][2], [ALAC][3], and [Monkey's Audio (.ape)][4] all go down this route. The idea is that when you compress and uncompress a file, you get exactly the same as what you started with.
2. Lossy compression - Formats like [MP3][5], [Ogg][6], and [AAC (`.m4a`)][7] are far more popular, but make a crucial tradeoff: We can reduce the file size even more during compression, but the decompressed file won't be the same.
There is a fundamental tradeoff at stake: Using lossy compression sacrifices some of the integrity of the resulting file to save on storage space. Most people (I personally believe it's everybody) can't hear the difference, so this is an acceptable tradeoff. You have files that take up a 10th of the space, and nobody can tell there's a difference in audio quality.
A PCA-based Compression Scheme
------------------------------
What I want to try out is a [PCA][8] approach to encoding audio. The PCA technique comes from Machine Learning, where it is used for a process called [Dimensionality Reduction][9]. Put simply, the idea is the same as lossy compression: if we can find a way that represents the data well enough, we can save on space. There are a lot of theoretical concerns that lead me to believe this compression style will not end well, but I'm interested to try it nonetheless.
PCA works as follows: Given a dataset with a number of features, I find a way to approximate those original features using some "new features" that are statistically as close as possible to the original ones. This is comparable to a scheme like MP3: Given an original signal, I want to find a way of representing it that gets approximately close to what the original was. The difference is that PCA is designed for statistical data, and not signal data. But we won't let that stop us.
The idea is as follows: Given a signal, reshape it into 1024 columns by however many rows are needed (zero-padded if necessary). Run the PCA algorithm, and do dimensionality reduction with a couple different settings. The number of components I choose determines the quality: If I use 1024 components, I will essentially be using the original signal. If I use a smaller number of components, I start losing some of the data that was in the original file. This will give me an idea of whether it's possible to actually build an encoding scheme off of this, or whether I'm wasting my time.
Running the Algorithm
---------------------
The audio I will be using comes from the song [Tabulasa][10], by [Broke for Free][11]. I'll be loading in the audio signal to Python and using [Scikit-Learn][12] to actually run the PCA algorithm.
We first need to convert the FLAC file I have to a WAV:
[1]: https://en.wikipedia.org/wiki/Digital_signal_processing
[2]: https://en.wikipedia.org/wiki/FLAC
[3]: https://en.wikipedia.org/wiki/Apple_Lossless
[4]: https://en.wikipedia.org/wiki/Monkey%27s_Audio
[5]: https://en.wikipedia.org/wiki/MP3
[6]: https://en.wikipedia.org/wiki/Vorbis
[7]: https://en.wikipedia.org/wiki/Advanced_Audio_Coding
[8]: https://en.wikipedia.org/wiki/Principal_component_analysis
[9]: https://en.wikipedia.org/wiki/Dimensionality_reduction
[10]: https://brokeforfree.bandcamp.com/track/tabulasa
[11]: https://brokeforfree.bandcamp.com/album/xxvii
[12]: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA
```python
!ffmpeg -hide_banner -loglevel panic -i "Broke For Free/XXVII/01 Tabulasa.flac" "Tabulasa.wav" -c wav
```
Then, let's go ahead and load a small sample so you can hear what is going on.
```python
from IPython.display import Audio
from scipy.io import wavfile
samplerate, tabulasa = wavfile.read('Tabulasa.wav')
start = samplerate * 14 # 10 seconds in
end = start + samplerate * 10 # 5 second duration
Audio(data=tabulasa[start:end, 0], rate=samplerate)
```
import wav1 from "./1.wav";