Commit Graph

104 Commits

Author SHA1 Message Date
Steven Robertson
29c595ddc5 Move most warning/info statements to stderr 2017-05-15 12:00:11 -07:00
Steven Robertson
7dc58a0e1c Grow launch sizes and synchronize if they pile up 2017-05-15 00:43:10 -07:00
Steven Robertson
f58289af53 Hotspot writeback. 10x performance increase.
Create a map assigning two bits to every output bin. During the atomic
flush, compute a threshold for discarding writes altogether that would
keep us under 2% error - discard 1 of every 2 writes if we've already
accumulated 64 writes (hotspot value 1), 7 of 8 if we're above 256
(hotspot value 2), or 31 of 32 at 2048 (hotspot value 3). Pack this
value into a read-only buffer that can often be cached at L2, and for
particularly concentrated flames (which historically choke cuburn), L1.
During writeback, discard writes at the apporpriate rate. During the
flush of the integer accumulator to the float, scale the integer
accumulators by the discard rate.

This works because for most flames, there's not a lot of interesting
stuff in the middle regimes; either stuff is very well defined, in which
case we pretty much know exactly what the color is going to be
(remember, the max 2% relative error gets log-scaled as well), or it's
loosely defined so we should keep it at full accuracy.

Of course, a 10x boost is best-case-ish - a long, high-res render. I
realized though that I really didn't care about low quality stuff and
should go for broke optimizing this for my use case, which is
ridiculously high res HDR stuff. (On pathological flames, on the other
hand, 10x is conservative; this easily gives us 100x.)
2017-05-09 21:16:43 -07:00
Steven Robertson
8f21ffd4c3 Add right buffer. 2x allocation of uchar buffer. 2017-05-02 00:11:03 -07:00
Steven Robertson
3a3b3b33d1 Rename d_side to d_left (to add d_right later) 2017-05-02 00:07:08 -07:00
Steven Robertson
9892acbc7f Populate arch by default; add --keep 2017-04-24 16:39:15 -07:00
Steven Robertson
bdcaca1f97 Initial draft of hotspot deferral.
Build an array of one-bit flags for every pixel (addressed as u32 data).
If we have accumulated at least 64 points for that pixel, set the flag;
thereafter only write 1/16 (and multiply subsequent points that do get
written by 16).

The theory is, after 64 points, the color is pretty much locked in; this
lets us crank SPP up to get excellent coverage in dark areas but the
bright ones don't matter so much since they're fully resolved. Still
needs a lot of tuning to get peak performance, and the trigger threshold
may need to be scaled along with the render size. It also will likely
not scale as well to higher resolutions, because we rely on L2 cache to
make this fast.
2017-04-24 16:33:39 -07:00
Steven Robertson
abcb3fa50f Look up renderers by name, rather than position 2015-10-10 15:58:36 -07:00
Steven Robertson
8da1821616 Bump gutter to 12px to align reads 2014-12-25 15:04:31 -08:00
Steven Robertson
d832b6bc98 Updates for the modern era 2013-12-21 15:32:37 -08:00
Steven Robertson
3294ba10d6 Support x264 10-bit output format. 2012-07-22 15:53:38 -07:00
Steven Robertson
bb852ff255 Make dist worker work with pipelining 2012-07-04 23:54:22 -07:00
Steven Robertson
8c7db9d0fc Changes to CUDA module loading
Modules may (once again) be compiled and loaded in separate stages,
including compiling without having a CUDA context on hand. Also, modules
will be reused if they are already loaded.
2012-05-20 13:05:28 -07:00
Steven Robertson
31234b986e Add new SmearClip filter, and make it the default.
Also removes haloclip's separate gamma; instead it will use colorclip's
gamma setting.

Also expanded side buffer to full size.
2012-04-16 01:30:17 -07:00
Steven Robertson
08d33ea593 Allow for customized blur width.
Also moves host pool to framebuffer for use by filters.
2012-04-16 01:25:34 -07:00
Steven Robertson
44869cc9ea Remove stray debugging statements 2012-04-14 23:42:38 -07:00
Steven Robertson
a4178c60fb Update the genome specs a bit 2012-04-14 22:55:00 -07:00
Steven Robertson
b53f703e6e Checkpoint! Renders again. Many fixes outstanding. 2012-04-10 08:44:25 -07:00
Steven Robertson
5d3b290c43 Fix typo 2012-03-18 17:29:34 -07:00
Steven Robertson
e726511c5a Fix print_interp_knots debugging helper 2012-03-16 20:51:50 -07:00
Steven Robertson
5a91d9f96c Hang on to old modules to avoid syncing 2012-02-15 10:07:47 -05:00
Steven Robertson
b6dfd2d980 Fix (same) logic error in RenderManager.render() 2012-02-14 10:30:34 -08:00
Steven Robertson
60a45c9a20 Sweeping refactor. More bugs undoubtedly remain. 2012-02-14 07:40:58 -08:00
Steven Robertson
6fba14e2f7 Okay, now I'm satisfied. 2012-01-29 18:49:19 -05:00
Steven Robertson
b4132c7cd9 Absurdly complicated enhancements to filtering. 2012-01-22 23:57:03 -05:00
Steven Robertson
c054c757bd Limit the maximum number of separate xf buffers 2012-01-22 23:52:09 -05:00
Steven Robertson
45b75d3fa5 Experimental bilateral filtering. 2012-01-21 00:06:15 -05:00
Steven Robertson
a803216551 Move argset to code.util 2012-01-21 00:03:28 -05:00
Steven Robertson
acbde65b9f Don't call set_format after set_address_2d 2012-01-20 11:22:27 -05:00
Steven Robertson
1398706886 Remove SS from DE, and improve performance. 2012-01-20 11:17:07 -05:00
Steven Robertson
8c29212821 Experimental supersampling and DE changes 2012-01-09 21:15:05 -05:00
Steven Robertson
de56383a61 Add new palette modes; use 'yuv' by default. 2011-12-23 09:50:03 -05:00
Steven Robertson
09725ba794 Correct dither fail. 2011-12-21 11:59:40 -05:00
Steven Robertson
529bf48982 Use functions for palette instead of silly objects 2011-12-17 18:45:33 -05:00
Steven Robertson
3b29bb2dc2 Drop stale fr0stlib dependency 2011-12-17 17:24:32 -05:00
Steven Robertson
c80b8a07a7 Another incompatible update to the genome format 2011-12-17 09:23:39 -05:00
Steven Robertson
b43481e374 New genome format to support flockutil 2011-12-15 11:11:05 -05:00
Steven Robertson
6c50e6dadc New atomic write mode 2011-12-10 12:18:00 -05:00
Steven Robertson
c5da1efc74 Lockless lossy shared memory writeback.
Barely tested! And yet it's going straight into master. Lucky you!
2011-12-09 16:13:23 -05:00
Steven Robertson
6bac3b3a95 Use reordered, lossy bit handling 2011-12-09 14:14:36 -05:00
Steven Robertson
d3ee6f36c2 Flat (pre-packed int) palettes in deferred mode. 2011-12-08 20:55:07 -05:00
Steven Robertson
b76208078f Deferred works again. Time to break it. 2011-12-08 15:28:10 -05:00
Steven Robertson
e106524701 Fix code in comment 2011-12-08 13:24:10 -05:00
Steven Robertson
b73461132c Use consts for image size instead of immediates.
This saves us from having to recompile if the frame size changes.
2011-12-08 12:07:22 -05:00
Steven Robertson
24c0c8ee56 Fix some color foibles (more yet remain) 2011-11-12 10:42:02 -05:00
Steven Robertson
eb43b151dc Deferred writeback. 2011-11-11 17:37:27 -05:00
Steven Robertson
3147fd40d2 Support CUDA 4.1. Split filtering into new module.
The new toolkit generates code for filtering which uses too many
registers, so this change splits filtering into its own module so that
it can have separate register usage limits during compiling. As a bonus,
this should improve startup time in general, since the filtering code
is now fixed and does not need to be recompiled.
2011-11-08 14:38:45 -05:00
Steven Robertson
185823ad55 Rearrange the main render loop... again.
Using one stream with two pagelocked host buffers allows us to keep the
GPU work queue full without pegging the CPU, and also reduces the
incidences where a host buffer will get overwritten before it can be
written. devtid() was flaky, so this patch also introduces a ringbuffer
to handle the 'slots' concept. It also introduces an adaptive number of
temporal samples, which improves efficiency but also killed the
assumption that (ntemporal_samples % 256 == 0), which required some
additional fixes.
2011-10-28 08:30:36 -04:00
Steven Robertson
f3a79b200c New badvals mechanism. 2011-10-27 12:59:58 -04:00
Steven Robertson
1faffa1d14 'fill_dptr' instead of 'zero_dptr' 2011-10-27 10:35:01 -04:00