Steven Robertson
97180003a4
Broken: Variations, CP stream implemented
2010-10-09 11:18:58 -04:00
Steven Robertson
576d2fa683
Switch to pyptx.
2010-10-07 11:21:43 -04:00
Steven Robertson
c0e3c1d599
Known broken checkin because I'm nervous.
2010-10-01 01:20:20 -04:00
Steven Robertson
b938c320a8
Last touchups before ripping out the DSL
2010-09-13 12:22:08 -04:00
Steven Robertson
e4aac6993f
A few touchups
2010-09-13 00:20:15 -04:00
Steven Robertson
e0b218feba
A new (somewhat experimental) approach to fusing
2010-09-12 23:45:38 -04:00
Steven Robertson
5a5fcf5bb9
Fix the unbelieveably stupid bug I've been chasing for days.
2010-09-12 18:42:52 -04:00
Steven Robertson
2f48d01aa9
Fix linear variation typo
2010-09-12 17:38:51 -04:00
Steven Robertson
5c5122e8c8
Optimization doubles performance... but breaks the output (even more)
2010-09-12 17:17:08 -04:00
Steven Robertson
3e4e1d88a2
Allow device call exceptions to propagate after cleanup
2010-09-12 16:22:56 -04:00
Steven Robertson
70ca6d7729
Fix RNG test
2010-09-12 16:22:22 -04:00
Steven Robertson
a6141f492d
A byte is *8* bits
2010-09-12 15:48:31 -04:00
Steven Robertson
7ef0d334ca
...except I missed the file that actually contained the new method
2010-09-12 14:06:07 -04:00
Steven Robertson
6ed8907fcb
LaunchContext.get_per_thread
2010-09-12 13:45:55 -04:00
Steven Robertson
3265982fec
Change 'ctx.threads' to 'ctx.nthreads', as it should have been from the start
2010-09-12 11:13:53 -04:00
Steven Robertson
a439bf671d
Fix occupancy issues (1 block/SM when shuffle was on).
...
There are 16 bar.sync() registers available per *chip*, not per block, and I
was using number 8 in the shuffle code. Evidently the driver rewrites them per
SM, but does not compact their range. Good to know.
2010-09-12 11:09:47 -04:00
Steven Robertson
c13f6a06cf
Experiments with larger CTAs for IterThread
2010-09-12 02:01:03 -04:00
Steven Robertson
e2b1c161cf
More readable memory allocations
2010-09-12 01:13:22 -04:00
Steven Robertson
802ca1d585
Allow swapping out store methods for easier testing of performance
2010-09-12 01:09:04 -04:00
Steven Robertson
f368a99a16
Shuffle points between threads of a CTA
2010-09-12 00:17:18 -04:00
Steven Robertson
40a5ceafde
Use a somewhat better writeback mechanism for now
2010-09-12 00:16:35 -04:00
Steven Robertson
aa688564f1
Add Timeouter, for timing out infinite loops so data can be recovered.
2010-09-11 13:18:40 -04:00
Steven Robertson
a5d7c2cc1a
Use variations. This works, but is still fragile.
2010-09-11 13:15:36 -04:00
Steven Robertson
860d7b2fad
Add xforms and variations.
2010-09-11 13:10:41 -04:00
Steven Robertson
56404b629f
Add device assertions to standard library.
2010-09-11 00:12:02 -04:00
Steven Robertson
3932412539
Test to make sure floating point numbers were in the right range.
2010-09-10 19:36:39 -04:00
Steven Robertson
e71a8422e5
Make store_per_thread reuse gtid in multiple calls when possible
2010-09-10 18:45:32 -04:00
Steven Robertson
943e92b80c
Use pycuda SourceModule to work around crashes, and a few invocation touchups.
2010-09-10 18:02:37 -04:00
Steven Robertson
c3d12d07c2
Fix MWCRNGTest.
2010-09-10 18:01:50 -04:00
Steven Robertson
36f1c1c056
Rename "cuburnlib" (stupid) to "cuburn" (stupid but shorter)
...
--HG--
rename : cuburnlib/__init__.py => cuburn/__init__.py
rename : cuburnlib/cuda.py => cuburn/cuda.py
rename : cuburnlib/device_code.py => cuburn/device_code.py
rename : cuburnlib/ptx.py => cuburn/ptx.py
rename : cuburnlib/render.py => cuburn/render.py
2010-09-10 14:48:34 -04:00