mirror of
https://github.com/stevenrobertson/cuburn.git
synced 2025-06-30 13:26:27 -04:00
185823ad552a9ea442a6e677029ba729050bfdf5

Using one stream with two pagelocked host buffers allows us to keep the GPU work queue full without pegging the CPU, and also reduces the incidences where a host buffer will get overwritten before it can be written. devtid() was flaky, so this patch also introduces a ringbuffer to handle the 'slots' concept. It also introduces an adaptive number of temporal samples, which improves efficiency but also killed the assumption that (ntemporal_samples % 256 == 0), which required some additional fixes.
Description
Languages
Python
92.8%
Cuda
6%
Shell
0.6%
C
0.6%