Steven Robertson
6eaa80be7a
Added property ctx.warps_per_cta
2010-09-10 12:53:40 -04:00
Steven Robertson
094890c324
Use shared memory for iter_count and have each CP processed by only one CTA.
...
Slower, but the code is a bit simpler conceptually, and the difference will be
more than accounted for by better scheduling towards the end of the process.
2010-09-07 14:54:50 -04:00
Steven Robertson
aa065dc25d
Add the first of many microbenchmarks
2010-09-07 12:44:12 -04:00
Steven Robertson
f3298e0bed
Finally runs again
2010-09-06 11:18:20 -04:00
Steven Robertson
a23a493d68
Formatter improvements
2010-09-02 16:12:22 -04:00
Steven Robertson
32f68ea1d5
Remove some dead code
2010-09-01 22:46:55 -04:00
Steven Robertson
a3660ec6e4
PTX DSL working, at least well enough to pass MWCRNGTest
2010-09-01 21:09:40 -04:00
Steven Robertson
cceb75396f
Before I rip out tempita and start a DSL
2010-08-30 14:45:44 -04:00
Steven Robertson
0c78e972b1
Splitting things up a bit
2010-08-28 16:56:05 -04:00