Rearrange the main render loop... again.

Using one stream with two pagelocked host buffers allows us to keep the
GPU work queue full without pegging the CPU, and also reduces the
incidences where a host buffer will get overwritten before it can be
written. devtid() was flaky, so this patch also introduces a ringbuffer
to handle the 'slots' concept. It also introduces an adaptive number of
temporal samples, which improves efficiency but also killed the
assumption that (ntemporal_samples % 256 == 0), which required some
additional fixes.
This commit is contained in:
Steven Robertson
2011-10-28 08:30:36 -04:00
parent 15f88383b1
commit 185823ad55
5 changed files with 127 additions and 113 deletions

View File

@ -227,8 +227,9 @@ class GenomePacker(HunkOCode):
__global__
void interp_{{tname}}({{tname}}* out, float *times, float *knots,
float tstart, float tstep, mwc_st *rctxes) {
float tstart, float tstep, mwc_st *rctxes, int maxid) {
int id = gtid();
if (id >= maxid) return;
out = &out[id];
mwc_st rctx = rctxes[id];
float time = tstart + id * tstep;