cuburn/cuburnlib
Steven Robertson 094890c324 Use shared memory for iter_count and have each CP processed by only one CTA.
Slower, but the code is a bit simpler conceptually, and the difference will be
more than accounted for by better scheduling towards the end of the process.
2010-09-07 14:54:50 -04:00
..
__init__.py Splitting things up a bit 2010-08-28 16:56:05 -04:00
cuda.py Use shared memory for iter_count and have each CP processed by only one CTA. 2010-09-07 14:54:50 -04:00
device_code.py Use shared memory for iter_count and have each CP processed by only one CTA. 2010-09-07 14:54:50 -04:00
ptx.py Use shared memory for iter_count and have each CP processed by only one CTA. 2010-09-07 14:54:50 -04:00
render.py Use shared memory for iter_count and have each CP processed by only one CTA. 2010-09-07 14:54:50 -04:00