b938c320a8 
					 
					
						
						
							
							Last touchups before ripping out the DSL  
						
						 
						
						
						
						
					 
					
						2010-09-13 12:22:08 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e4aac6993f 
					 
					
						
						
							
							A few touchups  
						
						 
						
						
						
						
					 
					
						2010-09-13 00:20:15 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0b218feba 
					 
					
						
						
							
							A new (somewhat experimental) approach to fusing  
						
						 
						
						
						
						
					 
					
						2010-09-12 23:45:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a5fcf5bb9 
					 
					
						
						
							
							Fix the unbelieveably stupid bug I've been chasing for days.  
						
						 
						
						
						
						
					 
					
						2010-09-12 18:42:52 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f48d01aa9 
					 
					
						
						
							
							Fix linear variation typo  
						
						 
						
						
						
						
					 
					
						2010-09-12 17:38:51 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c5122e8c8 
					 
					
						
						
							
							Optimization doubles performance... but breaks the output (even more)  
						
						 
						
						
						
						
					 
					
						2010-09-12 17:17:08 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d01de61952 
					 
					
						
						
							
							Simultaneous occupancy microbenchmark  
						
						 
						
						
						
						
					 
					
						2010-09-12 16:23:24 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e4e1d88a2 
					 
					
						
						
							
							Allow device call exceptions to propagate after cleanup  
						
						 
						
						
						
						
					 
					
						2010-09-12 16:22:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70ca6d7729 
					 
					
						
						
							
							Fix RNG test  
						
						 
						
						
						
						
					 
					
						2010-09-12 16:22:22 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6141f492d 
					 
					
						
						
							
							A byte is *8* bits  
						
						 
						
						
						
						
					 
					
						2010-09-12 15:48:31 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ef0d334ca 
					 
					
						
						
							
							...except I missed the file that actually contained the new method  
						
						 
						
						
						
						
					 
					
						2010-09-12 14:06:07 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ed8907fcb 
					 
					
						
						
							
							LaunchContext.get_per_thread  
						
						 
						
						
						
						
					 
					
						2010-09-12 13:45:55 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3265982fec 
					 
					
						
						
							
							Change 'ctx.threads' to 'ctx.nthreads', as it should have been from the start  
						
						 
						
						
						
						
					 
					
						2010-09-12 11:13:53 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a439bf671d 
					 
					
						
						
							
							Fix occupancy issues (1 block/SM when shuffle was on).  
						
						 
						
						... 
						
						
						
						There are 16 bar.sync() registers available per *chip*, not per block, and I
was using number 8 in the shuffle code. Evidently the driver rewrites them per
SM, but does not compact their range. Good to know. 
						
						
					 
					
						2010-09-12 11:09:47 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce0571deaf 
					 
					
						
						
							
							A fake log filter stage while I work on other stuff  
						
						 
						
						
						
						
					 
					
						2010-09-12 02:32:03 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c13f6a06cf 
					 
					
						
						
							
							Experiments with larger CTAs for IterThread  
						
						 
						
						
						
						
					 
					
						2010-09-12 02:01:03 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2b1c161cf 
					 
					
						
						
							
							More readable memory allocations  
						
						 
						
						
						
						
					 
					
						2010-09-12 01:13:22 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						802ca1d585 
					 
					
						
						
							
							Allow swapping out store methods for easier testing of performance  
						
						 
						
						
						
						
					 
					
						2010-09-12 01:09:04 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f368a99a16 
					 
					
						
						
							
							Shuffle points between threads of a CTA  
						
						 
						
						
						
						
					 
					
						2010-09-12 00:17:18 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40a5ceafde 
					 
					
						
						
							
							Use a somewhat better writeback mechanism for now  
						
						 
						
						
						
						
					 
					
						2010-09-12 00:16:35 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa688564f1 
					 
					
						
						
							
							Add Timeouter, for timing out infinite loops so data can be recovered.  
						
						 
						
						
						
						
					 
					
						2010-09-11 13:18:40 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5d7c2cc1a 
					 
					
						
						
							
							Use variations. This works, but is still fragile.  
						
						 
						
						
						
						
					 
					
						2010-09-11 13:15:36 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						860d7b2fad 
					 
					
						
						
							
							Add xforms and variations.  
						
						 
						
						
						
						
					 
					
						2010-09-11 13:10:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						383c0f1f9a 
					 
					
						
						
							
							Fixed bench.py, with the help of Device Assertions™!*  
						
						 
						
						... 
						
						
						
						* Not actually a trademark 
						
						
					 
					
						2010-09-11 00:16:43 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56404b629f 
					 
					
						
						
							
							Add device assertions to standard library.  
						
						 
						
						
						
						
					 
					
						2010-09-11 00:12:02 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3932412539 
					 
					
						
						
							
							Test to make sure floating point numbers were in the right range.  
						
						 
						
						
						
						
					 
					
						2010-09-10 19:36:39 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e71a8422e5 
					 
					
						
						
							
							Make store_per_thread reuse gtid in multiple calls when possible  
						
						 
						
						
						
						
					 
					
						2010-09-10 18:45:32 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						943e92b80c 
					 
					
						
						
							
							Use pycuda SourceModule to work around crashes, and a few invocation touchups.  
						
						 
						
						
						
						
					 
					
						2010-09-10 18:02:37 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3d12d07c2 
					 
					
						
						
							
							Fix MWCRNGTest.  
						
						 
						
						
						
						
					 
					
						2010-09-10 18:01:50 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36f1c1c056 
					 
					
						
						
							
							Rename "cuburnlib" (stupid) to "cuburn" (stupid but shorter)  
						
						 
						
						... 
						
						
						
						--HG--
rename : cuburnlib/__init__.py => cuburn/__init__.py
rename : cuburnlib/cuda.py => cuburn/cuda.py
rename : cuburnlib/device_code.py => cuburn/device_code.py
rename : cuburnlib/ptx.py => cuburn/ptx.py
rename : cuburnlib/render.py => cuburn/render.py 
						
						
					 
					
						2010-09-10 14:48:34 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4552589b35 
					 
					
						
						
							
							Refactor call() to be more elegant  
						
						 
						
						
						
						
					 
					
						2010-09-10 14:43:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb4e5b75e9 
					 
					
						
						
							
							Add support for writing float literals in store_per_thread  
						
						 
						
						
						
						
					 
					
						2010-09-10 14:33:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6eaa80be7a 
					 
					
						
						
							
							Added property ctx.warps_per_cta  
						
						 
						
						
						
						
					 
					
						2010-09-10 12:53:40 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f3ac42153 
					 
					
						
						
							
							Improved DataStream record format  
						
						 
						
						
						
						
					 
					
						2010-09-10 12:53:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e0d36af7d 
					 
					
						
						
							
							Add performance tuning (maxnregs) to entries  
						
						 
						
						
						
						
					 
					
						2010-09-10 12:52:47 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						086e4e4fb4 
					 
					
						
						
							
							Lots-o-stuff.  
						
						 
						
						
						
						
					 
					
						2010-09-09 11:36:14 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f7b00b61e 
					 
					
						
						
							
							instmethod decorator: another hack (to get around ctx.ptx.instances[])  
						
						 
						
						
						
						
					 
					
						2010-09-08 13:12:46 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						094890c324 
					 
					
						
						
							
							Use shared memory for iter_count and have each CP processed by only one CTA.  
						
						 
						
						... 
						
						
						
						Slower, but the code is a bit simpler conceptually, and the difference will be
more than accounted for by better scheduling towards the end of the process. 
						
						
					 
					
						2010-09-07 14:54:50 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa065dc25d 
					 
					
						
						
							
							Add the first of many microbenchmarks  
						
						 
						
						
						
						
					 
					
						2010-09-07 12:44:12 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db72a7d496 
					 
					
						
						
							
							Allow register local name rebinding  
						
						 
						
						
						
						
					 
					
						2010-09-06 16:50:54 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e03f20392d 
					 
					
						
						
							
							Switch from to_inject() to object insertion. One less kludge to deal with.  
						
						 
						
						
						
						
					 
					
						2010-09-06 16:09:37 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ada0fe20c7 
					 
					
						
						
							
							Random floats (I think)  
						
						 
						
						
						
						
					 
					
						2010-09-06 14:19:06 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f3298e0bed 
					 
					
						
						
							
							Finally runs again  
						
						 
						
						
						
						
					 
					
						2010-09-06 11:18:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27e7fd82a3 
					 
					
						
						
							
							Time to go have nightmares about this code again (no really)  
						
						 
						
						
						
						
					 
					
						2010-09-03 00:52:27 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c26ff9ab6 
					 
					
						
						
							
							* Fix deptrace typos  
						
						 
						
						... 
						
						
						
						* Add predicate support to DeviceStream fetches, making them even uglier
* Add `store_per_thread` to PTX stdlib 
						
						
					 
					
						2010-09-03 00:51:23 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a68fc064a1 
					 
					
						
						
							
							Added TODO, because even in my nightmares I underestimate how much is left  
						
						 
						
						
						
						
					 
					
						2010-09-03 00:08:58 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e7fbda2cc 
					 
					
						
						
							
							Start of render module  
						
						 
						
						
						
						
					 
					
						2010-09-02 17:26:16 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a23a493d68 
					 
					
						
						
							
							Formatter improvements  
						
						 
						
						
						
						
					 
					
						2010-09-02 16:12:22 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						731c637f80 
					 
					
						
						
							
							DataStream. Completely untested. I want to see the bugfixes in the log.  
						
						 
						
						
						
						
					 
					
						2010-09-02 16:11:44 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf79dc7fa0 
					 
					
						
						
							
							Adjust softjoin and vec to be easier to format  
						
						 
						
						
						
						
					 
					
						2010-09-02 15:05:24 -04:00