mirror of
https://github.com/stevenrobertson/cuburn.git
synced 2025-02-05 11:40:04 -05:00
a439bf671d
There are 16 bar.sync() registers available per *chip*, not per block, and I was using number 8 in the shuffle code. Evidently the driver rewrites them per SM, but does not compact their range. Good to know. |
||
---|---|---|
.. | ||
__init__.py | ||
cuda.py | ||
device_code.py | ||
ptx.py | ||
render.py | ||
variations.py |