Status: passes rudimentary tests Current goals: - Draw some dang points! - Allocate buffer (can it be pre-allocated?) - Direct scatter linear points by GTID from flame number - Re-enable preview window - Execute frame, update texture, repeat - Writeback of points to the buffer - Define writeback class, args - Do camera rotation across frameset - Postpone other kinds of testing and address clamping for now - Start xforms - At first, fixed Sierpinski triangle or something - xform selection, pre- and post-transform in xform - first of the variations Things to do (rather severely incomplete): - LaunchContext thread distribution based on generated code register count and shared memory size - qlocal storage - Performance implications of different state spaces - Shared / cache projected usage and its effect on above - Implement qlocal storage, and hide the complexity - The `Feature` class - Transform count and per-transform code layout - Filter size, oversample, final buffer size - Buffer allocation, clearing, reading from device - Preview window - When/how to sample? - OpenGL interop worth it? - Implement - Implement xforms - Shuffle - State space implications, you know the drill - Implement - Test effects on quality by masking off writes on all but one lane and boosting the sample density to compensate (muuuuuch later on) - DE - Clean up code (particularly DSL stuff incl. injector) Things to test: - DeviceStream allocator and proper handling of corner cases - Debug flag/dict/whatever for entire project in general - Iteration counters for IterThread Things to benchmark: - Kernel invocation and/or interrupt times (will high load freeze X?) - 1D/2D texture load+interpolation speeds vs constant memory loading - Must test under high SFU load - Tex uses separate cache? Has lower bandwidth penalty for gather? - MWC float conversion - The entire scatter process - Radix sort of writeback coordinates - Log-copy-histogram approach - Direct reductions - Surface loads, stores, reductions