<!doctype html><htmllang=endir=ltrclass="blog-wrapper blog-post-page plugin-blog plugin-id-default"data-has-hydrated=false><metacharset=UTF-8><metaname=generatorcontent="Docusaurus v3.6.1"><titledata-rh=true>PCA audio compression | The Old Speice Guy</title><metadata-rh=truename=viewportcontent="width=device-width,initial-scale=1.0"><metadata-rh=truename=twitter:cardcontent=summary_large_image><metadata-rh=trueproperty=og:urlcontent=https://speice.io/2016/11/pca-audio-compression><metadata-rh=trueproperty=og:localecontent=en><metadata-rh=truename=docusaurus_localecontent=en><metadata-rh=truename=docusaurus_tagcontent=default><metadata-rh=truename=docsearch:languagecontent=en><metadata-rh=truename=docsearch:docusaurus_tagcontent=default><metadata-rh=trueproperty=og:titlecontent="PCA audio compression | The Old Speice Guy"><metadata-rh=truename=descriptioncontent="In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure."><metadata-rh=trueproperty=og:descriptioncontent="In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure."><metadata-rh=trueproperty=og:typecontent=article><metadata-rh=trueproperty=article:published_timecontent=2016-11-01T12:00:00.000Z><linkdata-rh=truerel=iconhref=/img/favicon.ico><linkdata-rh=truerel=canonicalhref=https://speice.io/2016/11/pca-audio-compression><linkdata-rh=truerel=alternatehref=https://speice.io/2016/11/pca-audio-compressionhreflang=en><linkdata-rh=truerel=alternatehref=https://speice.io/2016/11/pca-audio-compressionhreflang=x-default><scriptdata-rh=truetype=application/ld+json>{"@context":"https://schema.org","@id":"https://speice.io/2016/11/pca-audio-compression","@type":"BlogPosting","author":{"@type":"Person","name":"Bradlee Speice"},"dateModified":"2024-11-06T03:32:56.000Z","datePublished":"2016-11-01T12:00:00.000Z","description":"In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure.","headline":"PCA audio compression","isPartOf":{"@id":"https://speice.io/","@type":"Blog","name":"Blog"},"keywords":[],"mainEntityOfPage":"https://speice.io/2016/11/pca-audio-compression","name":"PCA audio compression","url":"https://speice.io/2016/11/pca-audio-compression"}</script><linkrel=alternatetype=application/rss+xmlhref=/rss.xmltitle="The Old Speice Guy RSS Feed"><linkrel=alternatetype=application/atom+xmlhref=/atom.xmltitle="The Old Speice Guy Atom Feed"><linkrel=stylesheethref=/katex/katex.min.css><linkrel=stylesheethref=/assets/css/styles.16c3428d.css><scriptsrc=/assets/js/runtime~main.29a27dcf.jsdefer></script><scriptsrc=/assets/js/main.d461af80.jsdefer></script><bodyclass=navigation-with-keyboard><script>!function(){vart,e=function(){try{returnnewURLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}}()||function(){try{returnwindow.localStorage.getItem("theme")}catch(t){}}();t=null!==e?e:"light",document.documentElement.setAttribute("data-theme",t)}(),function(){try{for(var[t,e]ofnewURLSearchParams(window.location.search).entries())if(t.startsWith("docusaurus-data-")){vara=t.replace("docusaurus-data-","data-");document.documentElement.setAttribute(a,e)}}catch(t){}}()</script><divid=__docusaurus><divrole=regionaria-label="Skip to main content"><aclass=skipToContent_fXgnhref=#__docusaurus_skipToContent_fallback>Skip to main content</a></div><navaria-label=Mainclass="navbar navbar--fixed-top"><divclass=navbar__inner><divclass=navbar__items><buttonaria-label="Toggle navigation bar"aria-expanded=falseclass="navbar__toggle clean-btn"type=button><svgwidth=30height=30viewBox="0 0 30 30"aria-hidden=true><pathstroke=currentColorstroke-linecap=roundstroke-miterlimit=10stroke-width=2d="M4 7h22M4 15h22M4 23h22"/></svg></button><aclass=navbar__brandhref=/><divclass=navbar__logo><imgsrc=/img/logo.svgalt="Sierpinski Gasket"class="themedComponent_mlkZ themedComponent--light_NVdE"><imgsrc=/img/logo-dark.svgalt="Sierpinski Gasket"class="themedCompon
<h2class="anchor anchorWithStickyNavbar_LWe7"id=towards-a-new-and-pretty-poor-compression-scheme>Towards a new (and pretty poor) compression scheme<ahref=#towards-a-new-and-pretty-poor-compression-schemeclass=hash-linkaria-label="Direct link to Towards a new (and pretty poor) compression scheme"title="Direct link to Towards a new (and pretty poor) compression scheme"></a></h2>
<p>I'm going to be working with some audio data for a while as I get prepared for a term project this semester. I'll be working (with a partner) to design a system for separating voices from music. Given my total lack of experience with <ahref=https://en.wikipedia.org/wiki/Digital_signal_processingtarget=_blankrel="noopener noreferrer">Digital Signal Processing</a> I figured that now was as good a time as ever to work on a couple of fun projects that would get me back up to speed.</p>
<p>The first project I want to work on: Designing a new compression scheme for audio data.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id=a-brief-introduction-to-audio-compression>A Brief Introduction to Audio Compression<ahref=#a-brief-introduction-to-audio-compressionclass=hash-linkaria-label="Direct link to A Brief Introduction to Audio Compression"title="Direct link to A Brief Introduction to Audio Compression"></a></h2>
<p>Audio files when uncompressed (files ending with <code>.wav</code>) are huge. Like, 10.5 Megabytes per minute huge. Storage is cheap these days, but that's still an incredible amount of data that we don't really need. Instead, we'd like to compress that data so that it's not taking up so much space. There are broadly two ways to accomplish this:</p>
<ol>
<li>
<p>Lossless compression - Formats like <ahref=https://en.wikipedia.org/wiki/FLACtarget=_blankrel="noopener noreferrer">FLAC</a>, <ahref=https://en.wikipedia.org/wiki/Apple_Losslesstarget=_blankrel="noopener noreferrer">ALAC</a>, and <ahref=https://en.wikipedia.org/wiki/Monkey%27s_Audiotarget=_blankrel="noopener noreferrer">Monkey's Audio (.ape)</a> all go down this route. The idea is that when you compress and uncompress a file, you get exactly the same as what you started with.</p>
</li>
<li>
<p>Lossy compression - Formats like <ahref=https://en.wikipedia.org/wiki/MP3target=_blankrel="noopener noreferrer">MP3</a>, <ahref=https://en.wikipedia.org/wiki/Vorbistarget=_blankrel="noopener noreferrer">Ogg</a>, and <ahref=https://en.wikipedia.org/wiki/Advanced_Audio_Codingtarget=_blankrel="noopener noreferrer">AAC (<code>.m4a</code>)</a> are far more popular, but make a crucial tradeoff: We can reduce the file size even more during compression, but the decompressed file won't be the same.</p>
</li>
</ol>
<p>There is a fundamental tradeoff at stake: Using lossy compression sacrifices some of the integrity of the resulting file to save on storage space. Most people (I personally believe it's everybody) can't hear the difference, so this is an acceptable tradeoff. You have files that take up a 10<sup>th</sup> of the space, and nobody can tell there's a difference in audio quality.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id=a-pca-based-compression-scheme>A PCA-based Compression Scheme<ahref=#a-pca-based-compression-schemeclass=hash-linkaria-label="Direct link to A PCA-based Compression Scheme"title="Direct link to A PCA-based Compression Scheme"></a></h2>
<p>What I want to try out is a <ahref=https://en.wikipedia.org/wiki/Principal_component_analysistarget=_blankrel="noopener noreferrer">PCA</a> approach to encoding audio. The PCA technique comes from Machine Learning, where it is used for a process called <ahref=https://en.wikipedia.org/wiki/Dimensionality_reductiontarget=_blankrel="noopener noreferrer">Dimensionality Reduction</a>. Put simply, the idea is the same as lossy compression: if we can find a way that represents the data well enough, we can save on space. There are a lot of theoretical concerns that lead me to believe this compression style will not end well, but I'm interested to try it nonetheless.</p>
<p>PCA works as follows: Given a dataset with a number of features, I find a way to approximate those original features using some "new features" that are statistically as close as possible to the original ones. This is comparable to a scheme like MP3: Given an original signal, I want to find a way of representing it that gets approximately close to what the original was. The difference is that PCA is designed for statistical data, and not signal data. But we won't let that stop us.</p>
<p>The idea is as follows: Given a signal, reshape it into 1024 columns by however many rows are needed (zero-padded if necessary). Run the PCA algorithm, and do dimensionality reduction with a couple different settings. The number of components I choose determines the quality: If I use 1024 components, I will essentially be using the original signal. If I use a smaller number of components, I start losing some of the data that was in the original file. This will give me an idea of whether it's possible to actually build an encoding scheme off of this, or whether I'm wasting my time.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id=running-the-algorithm>Running the Algorithm<ahref=#running-the-algorithmclass=hash-linkaria-label="Direct link to Running the Algorithm"title="Direct link to Running the Algorithm"></a></h2>
<p>The audio I will be using comes from the song <ahref=https://brokeforfree.bandcamp.com/track/tabulasatarget=_blankrel="noopener noreferrer">Tabulasa</a>, by <ahref=https://brokeforfree.bandcamp.com/album/xxviitarget=_blankrel="noopener noreferrer">Broke for Free</a>. I'll be loading in the audio signal to Python and using <ahref=http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCAtarget=_blankrel="noopener noreferrer">Scikit-Learn</a> to actually run the PCA algorithm.</p>
<p>We first need to convert the FLAC file I have to a WAV:</p>
<p>Now that we've got our functions set up, let's try actually running something. First, we'll use <code>n_components == block_size</code>, which implies that we should end up with the same signal we started with.</p>
<p>OK, that does indeed sound like what we originally had. Let's drastically cut down the number of components we're doing this with as a sanity check: the audio quality should become incredibly poor.</p>
<p>As expected, our reconstructed audio does sound incredibly poor! But there's something else very interesting going on here under the hood. Did you notice that the bassline comes across very well, but that there's no midrange or treble? The drums are almost entirely gone.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id=drop-the-treble><ahref="https://youtu.be/Ua0KpfJsxKo?t=1m17s"target=_blankrel="noopener noreferrer">Drop the (Treble)</a><ahref=#drop-the-trebleclass=hash-linkaria-label="Direct link to drop-the-treble"title="Direct link to drop-the-treble"></a></h2>
<p>It will help to understand PCA more fully when trying to read this part, but I'll do my best to break it down. PCA tries to find a way to best represent the dataset using "components." Think of each "component" as containing some of the information you need in order to reconstruct the full audio. For example, you might have a "low frequency" component that contains all the information you need in order to hear the bassline. There might be other components that explain the high frequency things like singers, or melodies, that you also need.</p>
<p>What makes PCA interesting is that it attempts to find the "most important" components in explaining the signal. In a signal processing world, this means that PCA is trying to find the signal amongst the noise in your data. In our case, this means that PCA, when forced to work with small numbers of components, will chuck out the noisy components first. It's doing it's best job to reconstruct the signal, but it has to make sacrifices somewhere.</p>
<p>So I've mentioned that PCA identifies the "noisy" components in our dataset. This is equivalent to saying that PCA removes the "high frequency" components in this case: it's very easy to represent a low-frequency signal like a bassline. It's far more difficult to represent a high-frequency signal because it's changing all the time. When you force PCA to make a tradeoff by using a small number of components, the best it can hope to do is replicate the low-frequency sections and skip the high-frequency things.</p>
<p>This is a very interesting insight, and it also has echos (pardon the pun) of how humans understand music in general. Other encoding schemes (like MP3, etc.) typically chop off a lot of the high-frequency range as well. There is typically a lot of high-frequency noise in audio that is nearly impossible to hear, so it's easy to remove it without anyone noticing. PCA ends up doing something similar, and while that certainly wasn't the intention, it is an interesting effect.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id=a-more-realistic-example>A More Realistic Example<ahref=#a-more-realistic-exampleclass=hash-linkaria-label="Direct link to A More Realistic Example"title="Direct link to A More Realistic Example"></a></h2>
<p>So we've seen the edge cases so far: Using a large number of components results in audio very close to the original, and using a small number of components acts as a low-pass filter. How about we develop something that sounds "good enough" in practice, that we can use as a benchmark for size? We'll use ourselves as judges of audio quality, and build another function to help us estimate how much space we need to store everything in.</p>
<p>As we can see, there are a couple of instances where we do nearly 20 times better on storage space than the uncompressed file. Let's here what that sounds like:</p>
<p>And just out of curiosity, we can try something that has the same ratio of components to block size. This should be close to an apples-to-apples comparison.</p>
<p>The smaller block size definitely has better high-end response, but I personally think the larger block size sounds better overall.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id=conclusions>Conclusions<ahref=#conclusionsclass=hash-linkaria-label="Direct link to Conclusions"title="Direct link to Conclusions"></a></h2>
<p>So, what do I think about audio compression using PCA?</p>
<p>Strangely enough, it actually works pretty well relative to what I expected. That said, it's a terrible idea in general.</p>
<p>First off, you don't really save any space. The component matrix needed to actually run the PCA algorithm takes up a lot of space on its own, so it's very difficult to save space without sacrificing a huge amount of audio quality. And even then, codecs like AAC sound very nice even at bitrates that this PCA method could only dream of.</p>
<p>Second, there's the issue of audio streaming. PCA relies on two components: the datastream, and a matrix used to reconstruct the original signal. While it is easy to stream the data, you can't stream that matrix. And even if you divided the stream up into small blocks to give you a small matrix, you must guarantee that the matrix arrives; if you don't have that matrix, the data stream will make no sense whatsoever.</p>