PyCUDA implementation of a GPU-accelerated fractal flame renderer.
Go to file
Steven Robertson a439bf671d Fix occupancy issues (1 block/SM when shuffle was on).
There are 16 bar.sync() registers available per *chip*, not per block, and I
was using number 8 in the shuffle code. Evidently the driver rewrites them per
SM, but does not compact their range. Good to know.
2010-09-12 11:09:47 -04:00
cuburn Fix occupancy issues (1 block/SM when shuffle was on). 2010-09-12 11:09:47 -04:00
helpers Experiments with larger CTAs for IterThread 2010-09-12 02:01:03 -04:00
bench.py Fixed bench.py, with the help of Device Assertions™!* 2010-09-11 00:16:43 -04:00
main.py A fake log filter stage while I work on other stuff 2010-09-12 02:32:03 -04:00
TODO Lots-o-stuff. 2010-09-09 11:36:14 -04:00