Commit Graph

520 Commits

Author SHA1 Message Date
112a674520 Redesign distribution: now based on ssh, not zmq 2017-05-15 12:04:16 -07:00
c7654357a6 Move naming code into a common place 2017-05-15 12:01:59 -07:00
04702d7903 Add --list-devices option 2017-05-15 12:01:25 -07:00
29c595ddc5 Move most warning/info statements to stderr 2017-05-15 12:00:11 -07:00
9bcfc36b7a Retrieve out suffix without creating a renderer 2017-05-15 11:56:37 -07:00
636efcd059 Drop GL mode in main.py; sleep to reduce load 2017-05-15 00:44:15 -07:00
7dc58a0e1c Grow launch sizes and synchronize if they pile up 2017-05-15 00:43:10 -07:00
5402838a74 Disable ill-thought-through form of antialiasing 2017-05-15 00:41:30 -07:00
3528cd1da4 Force use of clang for compilation for Debian 2017-05-15 00:38:52 -07:00
f58289af53 Hotspot writeback. 10x performance increase.
Create a map assigning two bits to every output bin. During the atomic
flush, compute a threshold for discarding writes altogether that would
keep us under 2% error - discard 1 of every 2 writes if we've already
accumulated 64 writes (hotspot value 1), 7 of 8 if we're above 256
(hotspot value 2), or 31 of 32 at 2048 (hotspot value 3). Pack this
value into a read-only buffer that can often be cached at L2, and for
particularly concentrated flames (which historically choke cuburn), L1.
During writeback, discard writes at the apporpriate rate. During the
flush of the integer accumulator to the float, scale the integer
accumulators by the discard rate.

This works because for most flames, there's not a lot of interesting
stuff in the middle regimes; either stuff is very well defined, in which
case we pretty much know exactly what the color is going to be
(remember, the max 2% relative error gets log-scaled as well), or it's
loosely defined so we should keep it at full accuracy.

Of course, a 10x boost is best-case-ish - a long, high-res render. I
realized though that I really didn't care about low quality stuff and
should go for broke optimizing this for my use case, which is
ridiculously high res HDR stuff. (On pathological flames, on the other
hand, 10x is conservative; this easily gives us 100x.)
2017-05-09 21:16:43 -07:00
0bcde947b5 Go to 1024 contexts on Pascal 2017-05-09 21:15:03 -07:00
d1502e3b79 rings2 is not identity at high precision 2017-05-09 21:09:58 -07:00
d759d675be Always flush status lines 2017-05-09 21:09:40 -07:00
5af90b01a2 Fix a silly 'except e' (too much yavascrip in my life) 2017-05-09 21:09:00 -07:00
8fe4fbec1c Use yield scheduling to reduce CPU load 2017-05-09 21:07:58 -07:00
77afb2f4b5 Turns out spread is period 180 but spin 1/2 2017-05-09 19:59:21 -07:00
8f21ffd4c3 Add right buffer. 2x allocation of uchar buffer. 2017-05-02 00:11:03 -07:00
3a3b3b33d1 Rename d_side to d_left (to add d_right later) 2017-05-02 00:07:08 -07:00
f83e36d948 Add prores as an option on the command line 2017-04-24 16:39:44 -07:00
9892acbc7f Populate arch by default; add --keep 2017-04-24 16:39:15 -07:00
582221dd0f Always spit out lineinfo when possible 2017-04-24 16:38:51 -07:00
6bf428caee Move 'mktref' to util 2017-04-24 16:38:39 -07:00
bdcaca1f97 Initial draft of hotspot deferral.
Build an array of one-bit flags for every pixel (addressed as u32 data).
If we have accumulated at least 64 points for that pixel, set the flag;
thereafter only write 1/16 (and multiply subsequent points that do get
written by 16).

The theory is, after 64 points, the color is pretty much locked in; this
lets us crank SPP up to get excellent coverage in dark areas but the
bright ones don't matter so much since they're fully resolved. Still
needs a lot of tuning to get peak performance, and the trigger threshold
may need to be scaled along with the render size. It also will likely
not scale as well to higher resolutions, because we rely on L2 cache to
make this fast.
2017-04-24 16:33:39 -07:00
6b2b72a3fe Remove unused texture reference 2017-04-23 01:15:51 -07:00
c79db04490 Choose a GPU in main.py 2017-04-21 13:08:16 -07:00
b507c9d604 Make tiffs 16-bit using tifffile 2017-04-20 18:22:27 -07:00
c6fcaf472f Be sure to close the output files in main.py. 2017-04-20 17:52:37 -07:00
746aee9a75 Add a 'plainclip' filter.
This is useful for doing color manipulation on output renders, either by
hand in color grading software or automatically using renormalization
based on image statistics.
2017-04-20 17:51:05 -07:00
14f755e434 Add an FFmpeg ProRes handler. 2017-04-20 17:50:33 -07:00
96585e2ca5 Update YUV444p12 to be Rec. 709, studio swing. 2017-04-20 17:42:08 -07:00
f64cf79d8d Fixes to parse all gen198 genomes. Named palettes. 2017-04-20 13:54:46 -07:00
d1228ac303 Add a simple graph-walker for playback 2015-10-26 01:35:09 -07:00
36e3b7aca9 Compiler made register restrictions unnecessary 2015-10-26 01:34:35 -07:00
17b5a1a96f Spit out raw content.
Previewing using an Intensity Pro 4K and secret monitor-sauce.
2015-10-11 00:52:26 -07:00
cfe815f0d6 Min/max YUV output for 444p12.
Again, better solution forthcoming.
2015-10-11 00:51:58 -07:00
5f5e69f3a3 Crank up the dispatch params again after fix. 2015-10-11 00:51:22 -07:00
0e91d01528 Fix the noise issues on Maxwell GPUs.
AAAAUUUUUUGGGGGGHHH
2015-10-11 00:49:37 -07:00
37e6642d37 Add logencode filter. 2015-10-10 18:04:27 -07:00
37245085a9 Add pre-YUV output clamping to YUV444P12.
A better solution would cover all the ouput transforms, but... later.
2015-10-10 18:03:17 -07:00
0476bbfdce Fix default width/height 2015-10-10 16:01:58 -07:00
f93b4dbf23 Add YUV444P12 support 2015-10-10 16:01:11 -07:00
0ce1b51d16 Convert YUV to RGB before filtering 2015-10-10 15:59:57 -07:00
abcb3fa50f Look up renderers by name, rather than position 2015-10-10 15:58:36 -07:00
698d9c2337 Register filters with a class decorator 2015-10-10 15:58:13 -07:00
227a6016c2 Enable VP9 ARNR 2015-02-15 10:24:23 -08:00
51b1280e1e Better status messages for main.py 2015-02-14 17:51:13 -08:00
e70073175e Fix codec naming issue for jpeg 2015-02-14 17:50:19 -08:00
8dc629d91e Autoselect number of columns to use for VP9 2015-02-14 17:50:03 -08:00
e08444f74b Work around an overflow condition for now.
I'm not sure what's going wrong; the math still holds up at higher
densities, but when you crank up the samples-per-pixel count the
accumulators start overflowing stochastically, and when they do
they dump nonsense into the output. Until I have time, take a small
perf hit by flushing much more often.
2015-02-14 17:48:22 -08:00
8da1821616 Bump gutter to 12px to align reads 2014-12-25 15:04:31 -08:00