Status: passes rudimentary tests Current goals: - Start xforms - xform selection, pre- and post-transform in xform - first of the variations Things to do (rather severely incomplete): - LaunchContext thread distribution based on generated code register count and shared memory size - qlocal storage - Performance implications of different state spaces - Shared / cache projected usage and its effect on above - Implement qlocal storage, and hide the complexity - The `Feature` class - Transform count and per-transform code layout - Filter size, oversample, final buffer size - Buffer allocation, clearing, reading from device - Preview window - When/how to sample? - OpenGL interop worth it? - Implement - Implement xforms - Shuffle - State space implications, you know the drill - Implement - Test effects on quality by masking off writes on all but one lane and boosting the sample density to compensate (muuuuuch later on) - DE - Clean up code (particularly DSL stuff incl. injector) Things to test: - Debug flag/dict/whatever for entire project in general - Iteration counters for IterThread Things to benchmark: - Kernel invocation and/or interrupt times (will high load freeze X?) - MWC float conversion - The entire scatter process - Radix sort of writeback coordinates - Log-copy-histogram approach - Direct reductions - Surface loads, stores, reductions