diff --git a/archives.html b/archives.html index 35c80d0..db4a65d 100644 --- a/archives.html +++ b/archives.html @@ -82,6 +82,8 @@
Bradlee Speice, Tue 01 November 2016, Blog
+ + + ++
I'm going to be working with some audio data for a while as I get prepared for a term project this semester. I'll be working (with a partner) to design a system for separating voices from music. Given my total lack of experience with Digital Signal Processing I figured that now was as good a time as ever to work on a couple of fun projects that would get me back up to speed.
+The first project I want to work on: Designing a new compression scheme for audio data.
+Audio files when uncompressed (files ending with .wav
) are huge. Like, 10.5 Megabytes per minute huge. Storage is cheap these days, but that's still an incredible amount of data that we don't really need. Instead, we'd like to compress that data so that it's not taking up so much space. There are broadly two ways to accomplish this:
Lossless compression - Formats like FLAC, ALAC, and Monkey's Audio (.ape) all go down this route. The idea is that when you compress and uncompress a file, you get exactly the same as what you started with.
+Lossy compression - Formats like MP3, Ogg, and AAC (.m4a
) are far more popular, but make a crucial tradeoff: We can reduce the file size even more during compression, but the decompressed file won't be the same.
There is a fundamental tradeoff at stake: Using lossy compression sacrifices some of the integrity of the resulting file to save on storage space. Most people (I personally believe it's everybody) can't hear the difference, so this is an acceptable tradeoff. You have files that take up a 10th of the space, and nobody can tell there's a difference in audio quality.
+What I want to try out is a PCA approach to encoding audio. The PCA technique comes from Machine Learning, where it is used for a process called Dimensionality Reduction. Put simply, the idea is the same as lossy compression: if we can find a way that represents the data well enough, we can save on space. There are a lot of theoretical concerns that lead me to believe this compression style will not end well, but I'm interested to try it nonetheless.
+PCA works as follows: Given a dataset with a number of features, I find a way to approximate those original features using some "new features" that are statistically as close as possible to the original ones. This is comparable to a scheme like MP3: Given an original signal, I want to find a way of representing it that gets approximately close to what the original was. The difference is that PCA is designed for statistical data, and not signal data. But we won't let that stop us.
+The idea is as follows: Given a signal, reshape it into 1024 columns by however many rows are needed (zero-padded if necessary). Run the PCA algorithm, and do dimensionality reduction with a couple different settings. The number of components I choose determines the quality: If I use 1024 components, I will essentially be using the original signal. If I use a smaller number of components, I start losing some of the data that was in the original file. This will give me an idea of whether it's possible to actually build an encoding scheme off of this, or whether I'm wasting my time.
+The audio I will be using comes from the song Tabulasa, by Broke for Free. I'll be loading in the audio signal to Python and using Scikit-Learn to actually run the PCA algorithm.
+We first need to convert the FLAC file I have to a WAV:
+ +!ffmpeg -hide_banner -loglevel panic -i "Broke For Free/XXVII/01 Tabulasa.flac" "Tabulasa.wav" -c wav
+
Then, let's go ahead and load a small sample so you can hear what is going on.
+ +from IPython.display import Audio
+from scipy.io import wavfile
+
+samplerate, tabulasa = wavfile.read('Tabulasa.wav')
+
+start = samplerate * 14 # 10 seconds in
+end = start + samplerate * 10 # 5 second duration
+Audio(data=tabulasa[start:end, 0], rate=samplerate)
+