Commit Graph

518 Commits

Author SHA1 Message Date
Steven Robertson
04702d7903 Add --list-devices option 2017-05-15 12:01:25 -07:00
Steven Robertson
29c595ddc5 Move most warning/info statements to stderr 2017-05-15 12:00:11 -07:00
Steven Robertson
9bcfc36b7a Retrieve out suffix without creating a renderer 2017-05-15 11:56:37 -07:00
Steven Robertson
636efcd059 Drop GL mode in main.py; sleep to reduce load 2017-05-15 00:44:15 -07:00
Steven Robertson
7dc58a0e1c Grow launch sizes and synchronize if they pile up 2017-05-15 00:43:10 -07:00
Steven Robertson
5402838a74 Disable ill-thought-through form of antialiasing 2017-05-15 00:41:30 -07:00
Steven Robertson
3528cd1da4 Force use of clang for compilation for Debian 2017-05-15 00:38:52 -07:00
Steven Robertson
f58289af53 Hotspot writeback. 10x performance increase.
Create a map assigning two bits to every output bin. During the atomic
flush, compute a threshold for discarding writes altogether that would
keep us under 2% error - discard 1 of every 2 writes if we've already
accumulated 64 writes (hotspot value 1), 7 of 8 if we're above 256
(hotspot value 2), or 31 of 32 at 2048 (hotspot value 3). Pack this
value into a read-only buffer that can often be cached at L2, and for
particularly concentrated flames (which historically choke cuburn), L1.
During writeback, discard writes at the apporpriate rate. During the
flush of the integer accumulator to the float, scale the integer
accumulators by the discard rate.

This works because for most flames, there's not a lot of interesting
stuff in the middle regimes; either stuff is very well defined, in which
case we pretty much know exactly what the color is going to be
(remember, the max 2% relative error gets log-scaled as well), or it's
loosely defined so we should keep it at full accuracy.

Of course, a 10x boost is best-case-ish - a long, high-res render. I
realized though that I really didn't care about low quality stuff and
should go for broke optimizing this for my use case, which is
ridiculously high res HDR stuff. (On pathological flames, on the other
hand, 10x is conservative; this easily gives us 100x.)
2017-05-09 21:16:43 -07:00
Steven Robertson
0bcde947b5 Go to 1024 contexts on Pascal 2017-05-09 21:15:03 -07:00
Steven Robertson
d1502e3b79 rings2 is not identity at high precision 2017-05-09 21:09:58 -07:00
Steven Robertson
d759d675be Always flush status lines 2017-05-09 21:09:40 -07:00
Steven Robertson
5af90b01a2 Fix a silly 'except e' (too much yavascrip in my life) 2017-05-09 21:09:00 -07:00
Steven Robertson
8fe4fbec1c Use yield scheduling to reduce CPU load 2017-05-09 21:07:58 -07:00
Steven Robertson
77afb2f4b5 Turns out spread is period 180 but spin 1/2 2017-05-09 19:59:21 -07:00
Steven Robertson
8f21ffd4c3 Add right buffer. 2x allocation of uchar buffer. 2017-05-02 00:11:03 -07:00
Steven Robertson
3a3b3b33d1 Rename d_side to d_left (to add d_right later) 2017-05-02 00:07:08 -07:00
Steven Robertson
f83e36d948 Add prores as an option on the command line 2017-04-24 16:39:44 -07:00
Steven Robertson
9892acbc7f Populate arch by default; add --keep 2017-04-24 16:39:15 -07:00
Steven Robertson
582221dd0f Always spit out lineinfo when possible 2017-04-24 16:38:51 -07:00
Steven Robertson
6bf428caee Move 'mktref' to util 2017-04-24 16:38:39 -07:00
Steven Robertson
bdcaca1f97 Initial draft of hotspot deferral.
Build an array of one-bit flags for every pixel (addressed as u32 data).
If we have accumulated at least 64 points for that pixel, set the flag;
thereafter only write 1/16 (and multiply subsequent points that do get
written by 16).

The theory is, after 64 points, the color is pretty much locked in; this
lets us crank SPP up to get excellent coverage in dark areas but the
bright ones don't matter so much since they're fully resolved. Still
needs a lot of tuning to get peak performance, and the trigger threshold
may need to be scaled along with the render size. It also will likely
not scale as well to higher resolutions, because we rely on L2 cache to
make this fast.
2017-04-24 16:33:39 -07:00
Steven Robertson
6b2b72a3fe Remove unused texture reference 2017-04-23 01:15:51 -07:00
Steven Robertson
c79db04490 Choose a GPU in main.py 2017-04-21 13:08:16 -07:00
Steven Robertson
b507c9d604 Make tiffs 16-bit using tifffile 2017-04-20 18:22:27 -07:00
Steven Robertson
c6fcaf472f Be sure to close the output files in main.py. 2017-04-20 17:52:37 -07:00
Steven Robertson
746aee9a75 Add a 'plainclip' filter.
This is useful for doing color manipulation on output renders, either by
hand in color grading software or automatically using renormalization
based on image statistics.
2017-04-20 17:51:05 -07:00
Steven Robertson
14f755e434 Add an FFmpeg ProRes handler. 2017-04-20 17:50:33 -07:00
Steven Robertson
96585e2ca5 Update YUV444p12 to be Rec. 709, studio swing. 2017-04-20 17:42:08 -07:00
Steven Robertson
f64cf79d8d Fixes to parse all gen198 genomes. Named palettes. 2017-04-20 13:54:46 -07:00
Steven Robertson
d1228ac303 Add a simple graph-walker for playback 2015-10-26 01:35:09 -07:00
Steven Robertson
36e3b7aca9 Compiler made register restrictions unnecessary 2015-10-26 01:34:35 -07:00
Steven Robertson
17b5a1a96f Spit out raw content.
Previewing using an Intensity Pro 4K and secret monitor-sauce.
2015-10-11 00:52:26 -07:00
Steven Robertson
cfe815f0d6 Min/max YUV output for 444p12.
Again, better solution forthcoming.
2015-10-11 00:51:58 -07:00
Steven Robertson
5f5e69f3a3 Crank up the dispatch params again after fix. 2015-10-11 00:51:22 -07:00
Steven Robertson
0e91d01528 Fix the noise issues on Maxwell GPUs.
AAAAUUUUUUGGGGGGHHH
2015-10-11 00:49:37 -07:00
Steven Robertson
37e6642d37 Add logencode filter. 2015-10-10 18:04:27 -07:00
Steven Robertson
37245085a9 Add pre-YUV output clamping to YUV444P12.
A better solution would cover all the ouput transforms, but... later.
2015-10-10 18:03:17 -07:00
Steven Robertson
0476bbfdce Fix default width/height 2015-10-10 16:01:58 -07:00
Steven Robertson
f93b4dbf23 Add YUV444P12 support 2015-10-10 16:01:11 -07:00
Steven Robertson
0ce1b51d16 Convert YUV to RGB before filtering 2015-10-10 15:59:57 -07:00
Steven Robertson
abcb3fa50f Look up renderers by name, rather than position 2015-10-10 15:58:36 -07:00
Steven Robertson
698d9c2337 Register filters with a class decorator 2015-10-10 15:58:13 -07:00
Steven Robertson
227a6016c2 Enable VP9 ARNR 2015-02-15 10:24:23 -08:00
Steven Robertson
51b1280e1e Better status messages for main.py 2015-02-14 17:51:13 -08:00
Steven Robertson
e70073175e Fix codec naming issue for jpeg 2015-02-14 17:50:19 -08:00
Steven Robertson
8dc629d91e Autoselect number of columns to use for VP9 2015-02-14 17:50:03 -08:00
Steven Robertson
e08444f74b Work around an overflow condition for now.
I'm not sure what's going wrong; the math still holds up at higher
densities, but when you crank up the samples-per-pixel count the
accumulators start overflowing stochastically, and when they do
they dump nonsense into the output. Until I have time, take a small
perf hit by flushing much more often.
2015-02-14 17:48:22 -08:00
Steven Robertson
8da1821616 Bump gutter to 12px to align reads 2014-12-25 15:04:31 -08:00
Steven Robertson
9a5c31ce37 Improve/fix vpx output 2014-12-25 15:04:10 -08:00
Steven Robertson
42f9ae2824 Pixlib fixes, a new yuv420p10 pix format, tests. 2014-12-25 14:36:02 -08:00