speice.io/2016/11/pca-audio-compression/index.html

66 lines
90 KiB
HTML
Raw Permalink Normal View History

<!doctype html><html lang=en dir=ltr class="blog-wrapper blog-post-page plugin-blog plugin-id-default" data-has-hydrated=false><meta charset=UTF-8><meta name=generator content="Docusaurus v3.6.1"><title data-rh=true>PCA audio compression | The Old Speice Guy</title><meta data-rh=true name=viewport content="width=device-width,initial-scale=1.0"><meta data-rh=true name=twitter:card content=summary_large_image><meta data-rh=true property=og:url content=https://speice.io/2016/11/pca-audio-compression><meta data-rh=true property=og:locale content=en><meta data-rh=true name=docusaurus_locale content=en><meta data-rh=true name=docusaurus_tag content=default><meta data-rh=true name=docsearch:language content=en><meta data-rh=true name=docsearch:docusaurus_tag content=default><meta data-rh=true property=og:title content="PCA audio compression | The Old Speice Guy"><meta data-rh=true name=description content="In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure."><meta data-rh=true property=og:description content="In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure."><meta data-rh=true property=og:type content=article><meta data-rh=true property=article:published_time content=2016-11-01T12:00:00.000Z><link data-rh=true rel=icon href=/img/favicon.ico><link data-rh=true rel=canonical href=https://speice.io/2016/11/pca-audio-compression><link data-rh=true rel=alternate href=https://speice.io/2016/11/pca-audio-compression hreflang=en><link data-rh=true rel=alternate href=https://speice.io/2016/11/pca-audio-compression hreflang=x-default><script data-rh=true type=application/ld+json>{"@context":"https://schema.org","@id":"https://speice.io/2016/11/pca-audio-compression","@type":"BlogPosting","author":{"@type":"Person","name":"Bradlee Speice"},"dateModified":"2024-11-06T03:32:56.000Z","datePublished":"2016-11-01T12:00:00.000Z","description":"In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure.","headline":"PCA audio compression","isPartOf":{"@id":"https://speice.io/","@type":"Blog","name":"Blog"},"keywords":[],"mainEntityOfPage":"https://speice.io/2016/11/pca-audio-compression","name":"PCA audio compression","url":"https://speice.io/2016/11/pca-audio-compression"}</script><link rel=alternate type=application/rss+xml href=/rss.xml title="The Old Speice Guy RSS Feed"><link rel=alternate type=application/atom+xml href=/atom.xml title="The Old Speice Guy Atom Feed"><link rel=stylesheet href=/katex/katex.min.css><link rel=stylesheet href=/assets/css/styles.16c3428d.css><script src=/assets/js/runtime~main.29a27dcf.js defer></script><script src=/assets/js/main.d461af80.js defer></script><body class=navigation-with-keyboard><script>!function(){var t,e=function(){try{return new URLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}}()||function(){try{return window.localStorage.getItem("theme")}catch(t){}}();t=null!==e?e:"light",document.documentElement.setAttribute("data-theme",t)}(),function(){try{for(var[t,e]of new URLSearchParams(window.location.search).entries())if(t.startsWith("docusaurus-data-")){var a=t.replace("docusaurus-data-","data-");document.documentElement.setAttribute(a,e)}}catch(t){}}()</script><div id=__docusaurus><div role=region aria-label="Skip to main content"><a class=skipToContent_fXgn href=#__docusaurus_skipToContent_fallback>Skip to main content</a></div><nav aria-label=Main class="navbar navbar--fixed-top"><div class=navbar__inner><div class=navbar__items><button aria-label="Toggle navigation bar" aria-expanded=false class="navbar__toggle clean-btn" type=button><svg width=30 height=30 viewBox="0 0 30 30" aria-hidden=true><path stroke=currentColor stroke-linecap=round stroke-miterlimit=10 stroke-width=2 d="M4 7h22M4 15h22M4 23h22"/></svg></button><a class=navbar__brand href=/><div class=navbar__logo><img src=/img/logo.svg alt="Sierpinski Gasket" class="themedComponent_mlkZ themedComponent--light_NVdE"><img src=/img/logo-dark.svg alt="Sierpinski Gasket" class="themedCompon
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=towards-a-new-and-pretty-poor-compression-scheme>Towards a new (and pretty poor) compression scheme<a href=#towards-a-new-and-pretty-poor-compression-scheme class=hash-link aria-label="Direct link to Towards a new (and pretty poor) compression scheme" title="Direct link to Towards a new (and pretty poor) compression scheme"></a></h2>
<p>I'm going to be working with some audio data for a while as I get prepared for a term project this semester. I'll be working (with a partner) to design a system for separating voices from music. Given my total lack of experience with <a href=https://en.wikipedia.org/wiki/Digital_signal_processing target=_blank rel="noopener noreferrer">Digital Signal Processing</a> I figured that now was as good a time as ever to work on a couple of fun projects that would get me back up to speed.</p>
<p>The first project I want to work on: Designing a new compression scheme for audio data.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=a-brief-introduction-to-audio-compression>A Brief Introduction to Audio Compression<a href=#a-brief-introduction-to-audio-compression class=hash-link aria-label="Direct link to A Brief Introduction to Audio Compression" title="Direct link to A Brief Introduction to Audio Compression"></a></h2>
<p>Audio files when uncompressed (files ending with <code>.wav</code>) are huge. Like, 10.5 Megabytes per minute huge. Storage is cheap these days, but that's still an incredible amount of data that we don't really need. Instead, we'd like to compress that data so that it's not taking up so much space. There are broadly two ways to accomplish this:</p>
<ol>
<li>
<p>Lossless compression - Formats like <a href=https://en.wikipedia.org/wiki/FLAC target=_blank rel="noopener noreferrer">FLAC</a>, <a href=https://en.wikipedia.org/wiki/Apple_Lossless target=_blank rel="noopener noreferrer">ALAC</a>, and <a href=https://en.wikipedia.org/wiki/Monkey%27s_Audio target=_blank rel="noopener noreferrer">Monkey's Audio (.ape)</a> all go down this route. The idea is that when you compress and uncompress a file, you get exactly the same as what you started with.</p>
</li>
<li>
<p>Lossy compression - Formats like <a href=https://en.wikipedia.org/wiki/MP3 target=_blank rel="noopener noreferrer">MP3</a>, <a href=https://en.wikipedia.org/wiki/Vorbis target=_blank rel="noopener noreferrer">Ogg</a>, and <a href=https://en.wikipedia.org/wiki/Advanced_Audio_Coding target=_blank rel="noopener noreferrer">AAC (<code>.m4a</code>)</a> are far more popular, but make a crucial tradeoff: We can reduce the file size even more during compression, but the decompressed file won't be the same.</p>
</li>
</ol>
<p>There is a fundamental tradeoff at stake: Using lossy compression sacrifices some of the integrity of the resulting file to save on storage space. Most people (I personally believe it's everybody) can't hear the difference, so this is an acceptable tradeoff. You have files that take up a 10<sup>th</sup> of the space, and nobody can tell there's a difference in audio quality.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=a-pca-based-compression-scheme>A PCA-based Compression Scheme<a href=#a-pca-based-compression-scheme class=hash-link aria-label="Direct link to A PCA-based Compression Scheme" title="Direct link to A PCA-based Compression Scheme"></a></h2>
<p>What I want to try out is a <a href=https://en.wikipedia.org/wiki/Principal_component_analysis target=_blank rel="noopener noreferrer">PCA</a> approach to encoding audio. The PCA technique comes from Machine Learning, where it is used for a process called <a href=https://en.wikipedia.org/wiki/Dimensionality_reduction target=_blank rel="noopener noreferrer">Dimensionality Reduction</a>. Put simply, the idea is the same as lossy compression: if we can find a way that represents the data well enough, we can save on space. There are a lot of theoretical concerns that lead me to believe this compression style will not end well, but I'm interested to try it nonetheless.</p>
<p>PCA works as follows: Given a dataset with a number of features, I find a way to approximate those original features using some "new features" that are statistically as close as possible to the original ones. This is comparable to a scheme like MP3: Given an original signal, I want to find a way of representing it that gets approximately close to what the original was. The difference is that PCA is designed for statistical data, and not signal data. But we won't let that stop us.</p>
<p>The idea is as follows: Given a signal, reshape it into 1024 columns by however many rows are needed (zero-padded if necessary). Run the PCA algorithm, and do dimensionality reduction with a couple different settings. The number of components I choose determines the quality: If I use 1024 components, I will essentially be using the original signal. If I use a smaller number of components, I start losing some of the data that was in the original file. This will give me an idea of whether it's possible to actually build an encoding scheme off of this, or whether I'm wasting my time.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=running-the-algorithm>Running the Algorithm<a href=#running-the-algorithm class=hash-link aria-label="Direct link to Running the Algorithm" title="Direct link to Running the Algorithm"></a></h2>
<p>The audio I will be using comes from the song <a href=https://brokeforfree.bandcamp.com/track/tabulasa target=_blank rel="noopener noreferrer">Tabulasa</a>, by <a href=https://brokeforfree.bandcamp.com/album/xxvii target=_blank rel="noopener noreferrer">Broke for Free</a>. I'll be loading in the audio signal to Python and using <a href=http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA target=_blank rel="noopener noreferrer">Scikit-Learn</a> to actually run the PCA algorithm.</p>
<p>We first need to convert the FLAC file I have to a WAV:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">!ffmpeg </span><span class="token operator" style="color:hsl(221, 87%, 60%)">-</span><span class="token plain">hide_banner </span><span class="token operator" style="color:hsl(221, 87%, 60%)">-</span><span class="token plain">loglevel panic </span><span class="token operator" style="color:hsl(221, 87%, 60%)">-</span><span class="token plain">i </span><span class="token string" style="color:hsl(119, 34%, 47%)">"Broke For Free/XXVII/01 Tabulasa.flac"</span><span class="token plain"> </span><span class="token string" style="color:hsl(119, 34%, 47%)">"Tabulasa.wav"</span><span class="token plain"> </span><span class="token operator" style="color:hsl(221, 87%, 60%)">-</span><span class="token plain">c wav</span><br></span></code></pre><div class=buttonGroup__atx><button type=button aria-label="Copy code to clipboard" title=Copy class=clean-btn><span class=copyButtonIcons_eSgA aria-hidden=true><svg viewBox="0 0 24 24" class=copyButtonIcon_y97N><path fill=currentColor d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"/></svg><svg viewBox="0 0 24 24" class=copyButtonSuccessIcon_LjdS><path fill=currentColor d=M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z /></svg></span></button></div></div></div>
<p>Then, let's go ahead and load a small sample so you can hear what is going on.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token keyword" style="color:hsl(301, 63%, 40%)">from</span><span class="token plain"> IPython</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">display </span><span class="token keyword" style="color:hsl(301, 63%, 40%)">import</span><span class="token plain"> Audio</span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"></span><span class="token keyword" style="color:hsl(301, 63%, 40%)">from</span><span class="token plain"> scipy</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">io </span><span class="token keyword" style="color:hsl(301, 63%, 40%)">import</span><span class="token plain"> wavfile</span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">samplerate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> tabulasa </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> wavfile</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">read</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token string" style="color:hsl(119, 34%, 47%)">'Tabulasa.wav'</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">start </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> samplerate </span><span class="token operator" style="color:hsl(221, 87%, 60%)">*</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">14</span><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># 10 seconds in</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">end </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> start </span><span class="token operator" style="color:hsl(221, 87%, 60%)">+</span><span class="token plain"> samplerate </span><span class="token operator" style="color:hsl(221, 87%, 60%)">*</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">10</span><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># 5 second duration</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">Audio</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">data</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">tabulasa</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token plain">start</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain">end</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">0</span><span class="token punctuatio
<!-- -->
<audio controls src=/assets/medias/1-bc356a416dae6236d2e366a42bee2cd3.wav></audio>
<p>Next, we'll define the code we will be using to do PCA. It's very short, as the PCA algorithm is very simple.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token keyword" style="color:hsl(301, 63%, 40%)">from</span><span class="token plain"> sklearn</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">decomposition </span><span class="token keyword" style="color:hsl(301, 63%, 40%)">import</span><span class="token plain"> PCA</span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"></span><span class="token keyword" style="color:hsl(301, 63%, 40%)">import</span><span class="token plain"> numpy </span><span class="token keyword" style="color:hsl(301, 63%, 40%)">as</span><span class="token plain"> np</span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"></span><span class="token keyword" style="color:hsl(301, 63%, 40%)">def</span><span class="token plain"> </span><span class="token function" style="color:hsl(221, 87%, 60%)">pca_reduce</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">signal</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> n_components</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> block_size</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token number" style="color:hsl(35, 99%, 36%)">1024</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># First, zero-pad the signal so that it is divisible by the block_size</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> samples </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> </span><span class="token builtin" style="color:hsl(119, 34%, 47%)">len</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">signal</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> hanging </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> block_size </span><span class="token operator" style="color:hsl(221, 87%, 60%)">-</span><span class="token plain"> np</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">mod</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">samples</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> block_size</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> padded </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:hsl(119, 34%, 47%
<p>Now that we've got our functions set up, let's try actually running something. First, we'll use <code>n_components == block_size</code>, which implies that we should end up with the same signal we started with.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">tabulasa_left </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> tabulasa</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token number" style="color:hsl(35, 99%, 36%)">0</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">]</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">_</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> _</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> reconstructed </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> pca_reduce</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">tabulasa_left</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">1024</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">1024</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">Audio</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">data</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">reconstructed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token plain">start</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain">end</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">]</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> rate</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">samplerate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><br></span></code></pre><div class=buttonGroup__atx><button type=button aria-label="Copy code to clipboard" title=Copy class=clean-btn><span class=copyButtonIcons_eSgA aria-hidden=true><svg viewBox="0 0 24 24" class=copyButtonIcon_y97N><path fill=currentColor d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"/></svg><svg viewBox="0 0 24 24" class=copyButtonSuccessIcon_LjdS><path fill=currentColor d=M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z /></svg></span></button></div></div></div>
<!-- -->
<audio controls src=/assets/medias/2-bc356a416dae6236d2e366a42bee2cd3.wav></audio>
<p>OK, that does indeed sound like what we originally had. Let's drastically cut down the number of components we're doing this with as a sanity check: the audio quality should become incredibly poor.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">_</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> _</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> reconstructed </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> pca_reduce</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">tabulasa_left</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">32</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">1024</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">Audio</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">data</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">reconstructed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token plain">start</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain">end</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">]</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> rate</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">samplerate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><br></span></code></pre><div class=buttonGroup__atx><button type=button aria-label="Copy code to clipboard" title=Copy class=clean-btn><span class=copyButtonIcons_eSgA aria-hidden=true><svg viewBox="0 0 24 24" class=copyButtonIcon_y97N><path fill=currentColor d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"/></svg><svg viewBox="0 0 24 24" class=copyButtonSuccessIcon_LjdS><path fill=currentColor d=M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z /></svg></span></button></div></div></div>
<!-- -->
<audio controls src=/assets/medias/3-e8092f56b531e18a0d335c0f391b46b9.wav></audio>
<p>As expected, our reconstructed audio does sound incredibly poor! But there's something else very interesting going on here under the hood. Did you notice that the bassline comes across very well, but that there's no midrange or treble? The drums are almost entirely gone.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=drop-the-treble><a href="https://youtu.be/Ua0KpfJsxKo?t=1m17s" target=_blank rel="noopener noreferrer">Drop the (Treble)</a><a href=#drop-the-treble class=hash-link aria-label="Direct link to drop-the-treble" title="Direct link to drop-the-treble"></a></h2>
<p>It will help to understand PCA more fully when trying to read this part, but I'll do my best to break it down. PCA tries to find a way to best represent the dataset using "components." Think of each "component" as containing some of the information you need in order to reconstruct the full audio. For example, you might have a "low frequency" component that contains all the information you need in order to hear the bassline. There might be other components that explain the high frequency things like singers, or melodies, that you also need.</p>
<p>What makes PCA interesting is that it attempts to find the "most important" components in explaining the signal. In a signal processing world, this means that PCA is trying to find the signal amongst the noise in your data. In our case, this means that PCA, when forced to work with small numbers of components, will chuck out the noisy components first. It's doing it's best job to reconstruct the signal, but it has to make sacrifices somewhere.</p>
<p>So I've mentioned that PCA identifies the "noisy" components in our dataset. This is equivalent to saying that PCA removes the "high frequency" components in this case: it's very easy to represent a low-frequency signal like a bassline. It's far more difficult to represent a high-frequency signal because it's changing all the time. When you force PCA to make a tradeoff by using a small number of components, the best it can hope to do is replicate the low-frequency sections and skip the high-frequency things.</p>
<p>This is a very interesting insight, and it also has echos (pardon the pun) of how humans understand music in general. Other encoding schemes (like MP3, etc.) typically chop off a lot of the high-frequency range as well. There is typically a lot of high-frequency noise in audio that is nearly impossible to hear, so it's easy to remove it without anyone noticing. PCA ends up doing something similar, and while that certainly wasn't the intention, it is an interesting effect.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=a-more-realistic-example>A More Realistic Example<a href=#a-more-realistic-example class=hash-link aria-label="Direct link to A More Realistic Example" title="Direct link to A More Realistic Example"></a></h2>
<p>So we've seen the edge cases so far: Using a large number of components results in audio very close to the original, and using a small number of components acts as a low-pass filter. How about we develop something that sounds "good enough" in practice, that we can use as a benchmark for size? We'll use ourselves as judges of audio quality, and build another function to help us estimate how much space we need to store everything in.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token keyword" style="color:hsl(301, 63%, 40%)">from</span><span class="token plain"> bz2 </span><span class="token keyword" style="color:hsl(301, 63%, 40%)">import</span><span class="token plain"> compress</span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"></span><span class="token keyword" style="color:hsl(301, 63%, 40%)">import</span><span class="token plain"> pandas </span><span class="token keyword" style="color:hsl(301, 63%, 40%)">as</span><span class="token plain"> pd</span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain" style=display:inline-block></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"></span><span class="token keyword" style="color:hsl(301, 63%, 40%)">def</span><span class="token plain"> </span><span class="token function" style="color:hsl(221, 87%, 60%)">raw_estimate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">transformed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> pca</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># We assume that we'll be storing things as 16-bit WAV,</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># meaning two bytes per sample</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> signal_bytes </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> transformed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">tobytes</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># PCA stores the components as floating point, we'll assume</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><span class="token comment" style="color:hsl(230, 4%, 64%)"># that means 32-bit floats, so 4 bytes per element</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> component_bytes </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> transformed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">.</span><span class="token plain">tobytes</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain"> </span><span class="token comment" style="color:hsl(230,
<div><table><thead><tr><th><th>Raw<th>PCA<th>PCA w/ BZ2<tbody><tr><th>(1, 1)<td>69.054298<td>138.108597<td>16.431797<tr><th>(1, 2)<td>69.054306<td>69.054306<td>32.981380<tr><th>(1, 4)<td>69.054321<td>34.527161<td>16.715032<tr><th>(4, 32)<td>69.054443<td>17.263611<td>8.481735<tr><th>(16, 256)<td>69.054688<td>8.631836<td>4.274846<tr><th>(32, 256)<td>69.054688<td>17.263672<td>8.542909<tr><th>(64, 256)<td>69.054688<td>34.527344<td>17.097543<tr><th>(128, 1024)<td>69.054688<td>17.263672<td>9.430644<tr><th>(256, 1024)<td>69.054688<td>34.527344<td>18.870387<tr><th>(512, 1024)<td>69.054688<td>69.054688<td>37.800940<tr><th>(128, 2048)<td>69.062500<td>8.632812<td>6.185015<tr><th>(256, 2048)<td>69.062500<td>17.265625<td>12.366942<tr><th>(512, 2048)<td>69.062500<td>34.531250<td>24.736506<tr><th>(1024, 2048)<td>69.062500<td>69.062500<td>49.517493</table></div>
<p>As we can see, there are a couple of instances where we do nearly 20 times better on storage space than the uncompressed file. Let's here what that sounds like:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">_</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> _</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> reconstructed </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> pca_reduce</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">tabulasa_left</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">16</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">256</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">Audio</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">data</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">reconstructed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token plain">start</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain">end</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">]</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> rate</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">samplerate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><br></span></code></pre><div class=buttonGroup__atx><button type=button aria-label="Copy code to clipboard" title=Copy class=clean-btn><span class=copyButtonIcons_eSgA aria-hidden=true><svg viewBox="0 0 24 24" class=copyButtonIcon_y97N><path fill=currentColor d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"/></svg><svg viewBox="0 0 24 24" class=copyButtonSuccessIcon_LjdS><path fill=currentColor d=M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z /></svg></span></button></div></div></div>
<!-- -->
<audio controls src=/assets/medias/4-90047e615651067970475dc7f117aceb.wav></audio>
<p>It sounds incredibly poor though. Let's try something that's a bit more realistic:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">_</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> _</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> reconstructed </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> pca_reduce</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">tabulasa_left</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">1</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">4</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">Audio</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">data</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">reconstructed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token plain">start</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain">end</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">]</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> rate</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">samplerate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><br></span></code></pre><div class=buttonGroup__atx><button type=button aria-label="Copy code to clipboard" title=Copy class=clean-btn><span class=copyButtonIcons_eSgA aria-hidden=true><svg viewBox="0 0 24 24" class=copyButtonIcon_y97N><path fill=currentColor d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"/></svg><svg viewBox="0 0 24 24" class=copyButtonSuccessIcon_LjdS><path fill=currentColor d=M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z /></svg></span></button></div></div></div>
<!-- -->
<audio controls src=/assets/medias/5-896767515da7b5a0fe46e9a205c1130f.wav></audio>
<p>And just out of curiosity, we can try something that has the same ratio of components to block size. This should be close to an apples-to-apples comparison.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-background-color:hsl(230, 1%, 98%);--prism-color:hsl(230, 8%, 24%)"><div class=codeBlockContent_biex><pre tabindex=0 class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="background-color:hsl(230, 1%, 98%);color:hsl(230, 8%, 24%)"><code class=codeBlockLines_e6Vv><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">_</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> _</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> reconstructed </span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain"> pca_reduce</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">tabulasa_left</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">64</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> </span><span class="token number" style="color:hsl(35, 99%, 36%)">256</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><span class="token plain"></span><br></span><span class=token-line style="color:hsl(230, 8%, 24%)"><span class="token plain">Audio</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">(</span><span class="token plain">data</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">reconstructed</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">[</span><span class="token plain">start</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">:</span><span class="token plain">end</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">]</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">,</span><span class="token plain"> rate</span><span class="token operator" style="color:hsl(221, 87%, 60%)">=</span><span class="token plain">samplerate</span><span class="token punctuation" style="color:hsl(119, 34%, 47%)">)</span><br></span></code></pre><div class=buttonGroup__atx><button type=button aria-label="Copy code to clipboard" title=Copy class=clean-btn><span class=copyButtonIcons_eSgA aria-hidden=true><svg viewBox="0 0 24 24" class=copyButtonIcon_y97N><path fill=currentColor d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"/></svg><svg viewBox="0 0 24 24" class=copyButtonSuccessIcon_LjdS><path fill=currentColor d=M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z /></svg></span></button></div></div></div>
<!-- -->
<audio controls src=/assets/medias/6-756ec27a28b4fa02181f43ed9061f0b3.wav></audio>
<p>The smaller block size definitely has better high-end response, but I personally think the larger block size sounds better overall.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=conclusions>Conclusions<a href=#conclusions class=hash-link aria-label="Direct link to Conclusions" title="Direct link to Conclusions"></a></h2>
<p>So, what do I think about audio compression using PCA?</p>
<p>Strangely enough, it actually works pretty well relative to what I expected. That said, it's a terrible idea in general.</p>
<p>First off, you don't really save any space. The component matrix needed to actually run the PCA algorithm takes up a lot of space on its own, so it's very difficult to save space without sacrificing a huge amount of audio quality. And even then, codecs like AAC sound very nice even at bitrates that this PCA method could only dream of.</p>
<p>Second, there's the issue of audio streaming. PCA relies on two components: the datastream, and a matrix used to reconstruct the original signal. While it is easy to stream the data, you can't stream that matrix. And even if you divided the stream up into small blocks to give you a small matrix, you must guarantee that the matrix arrives; if you don't have that matrix, the data stream will make no sense whatsoever.</p>
<p>All said, this was an interesting experiment. It's really cool seeing PCA used for signal analysis where I haven't seen it applied before, but I don't think it will lead to any practical results. Look forward to more signal processing stuff in the future!</div></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="Blog post page navigation"><a class="pagination-nav__link pagination-nav__link--prev" href=/2016/10/rustic-repodcasting><div class=pagination-nav__sublabel>Older post</div><div class=pagination-nav__label>A Rustic re-podcasting server</div></a><a class="pagination-nav__link pagination-nav__link--next" href=/2018/01/captains-cookbook-part-1><div class=pagination-nav__sublabel>Newer post</div><div class=pagination-nav__label>Captain's Cookbook: Project setup</div></a></nav></main><div class="col col--2"><div class="tableOfContents_bqdL thin-scrollbar"><ul class="table-of-contents table-of-contents__left-border"><li><a href=#towards-a-new-and-pretty-poor-compression-scheme class="table-of-contents__link toc-highlight">Towards a new (and pretty poor) compression scheme</a><li><a href=#a-brief-introduction-to-audio-compression class="table-of-contents__link toc-highlight">A Brief Introduction to Audio Compression</a><li><a href=#a-pca-based-compression-scheme class="table-of-contents__link toc-highlight">A PCA-based Compression Scheme</a><li><a href=#running-the-algorithm class="table-of-contents__link toc-highlight">Running the Algorithm</a><li><a href=#drop-the-treble class="table-of-contents__link toc-highlight">Drop the (Treble)</a><li><a href=#a-more-realistic-example class="table-of-contents__link toc-highlight">A More Realistic Example</a><li><a href=#conclusions class="table-of-contents__link toc-highlight">Conclusions</a></ul></div></div></div></div></div><footer class=footer><div class="container container-fluid"><div class="footer__bottom text--center"><div class=footer__copyright>Copyright © 2024 Bradlee Speice</div></div></div></footer></div>