mirror of
https://github.com/stevenrobertson/cuburn.git
synced 2025-07-12 03:05:14 -04:00
Rearrange the main render loop... again.
Using one stream with two pagelocked host buffers allows us to keep the GPU work queue full without pegging the CPU, and also reduces the incidences where a host buffer will get overwritten before it can be written. devtid() was flaky, so this patch also introduces a ringbuffer to handle the 'slots' concept. It also introduces an adaptive number of temporal samples, which improves efficiency but also killed the assumption that (ntemporal_samples % 256 == 0), which required some additional fixes.
This commit is contained in:
@ -227,8 +227,9 @@ class GenomePacker(HunkOCode):
|
||||
|
||||
__global__
|
||||
void interp_{{tname}}({{tname}}* out, float *times, float *knots,
|
||||
float tstart, float tstep, mwc_st *rctxes) {
|
||||
float tstart, float tstep, mwc_st *rctxes, int maxid) {
|
||||
int id = gtid();
|
||||
if (id >= maxid) return;
|
||||
out = &out[id];
|
||||
mwc_st rctx = rctxes[id];
|
||||
float time = tstart + id * tstep;
|
||||
|
Reference in New Issue
Block a user