Commit Graph

519 Commits

Author SHA1 Message Date
Steven Robertson c7654357a6 Move naming code into a common place 2017-05-15 12:01:59 -07:00
Steven Robertson 04702d7903 Add --list-devices option 2017-05-15 12:01:25 -07:00
Steven Robertson 29c595ddc5 Move most warning/info statements to stderr 2017-05-15 12:00:11 -07:00
Steven Robertson 9bcfc36b7a Retrieve out suffix without creating a renderer 2017-05-15 11:56:37 -07:00
Steven Robertson 636efcd059 Drop GL mode in main.py; sleep to reduce load 2017-05-15 00:44:15 -07:00
Steven Robertson 7dc58a0e1c Grow launch sizes and synchronize if they pile up 2017-05-15 00:43:10 -07:00
Steven Robertson 5402838a74 Disable ill-thought-through form of antialiasing 2017-05-15 00:41:30 -07:00
Steven Robertson 3528cd1da4 Force use of clang for compilation for Debian 2017-05-15 00:38:52 -07:00
Steven Robertson f58289af53 Hotspot writeback. 10x performance increase.
Create a map assigning two bits to every output bin. During the atomic
flush, compute a threshold for discarding writes altogether that would
keep us under 2% error - discard 1 of every 2 writes if we've already
accumulated 64 writes (hotspot value 1), 7 of 8 if we're above 256
(hotspot value 2), or 31 of 32 at 2048 (hotspot value 3). Pack this
value into a read-only buffer that can often be cached at L2, and for
particularly concentrated flames (which historically choke cuburn), L1.
During writeback, discard writes at the apporpriate rate. During the
flush of the integer accumulator to the float, scale the integer
accumulators by the discard rate.

This works because for most flames, there's not a lot of interesting
stuff in the middle regimes; either stuff is very well defined, in which
case we pretty much know exactly what the color is going to be
(remember, the max 2% relative error gets log-scaled as well), or it's
loosely defined so we should keep it at full accuracy.

Of course, a 10x boost is best-case-ish - a long, high-res render. I
realized though that I really didn't care about low quality stuff and
should go for broke optimizing this for my use case, which is
ridiculously high res HDR stuff. (On pathological flames, on the other
hand, 10x is conservative; this easily gives us 100x.)
2017-05-09 21:16:43 -07:00
Steven Robertson 0bcde947b5 Go to 1024 contexts on Pascal 2017-05-09 21:15:03 -07:00
Steven Robertson d1502e3b79 rings2 is not identity at high precision 2017-05-09 21:09:58 -07:00
Steven Robertson d759d675be Always flush status lines 2017-05-09 21:09:40 -07:00
Steven Robertson 5af90b01a2 Fix a silly 'except e' (too much yavascrip in my life) 2017-05-09 21:09:00 -07:00
Steven Robertson 8fe4fbec1c Use yield scheduling to reduce CPU load 2017-05-09 21:07:58 -07:00
Steven Robertson 77afb2f4b5 Turns out spread is period 180 but spin 1/2 2017-05-09 19:59:21 -07:00
Steven Robertson 8f21ffd4c3 Add right buffer. 2x allocation of uchar buffer. 2017-05-02 00:11:03 -07:00
Steven Robertson 3a3b3b33d1 Rename d_side to d_left (to add d_right later) 2017-05-02 00:07:08 -07:00
Steven Robertson f83e36d948 Add prores as an option on the command line 2017-04-24 16:39:44 -07:00
Steven Robertson 9892acbc7f Populate arch by default; add --keep 2017-04-24 16:39:15 -07:00
Steven Robertson 582221dd0f Always spit out lineinfo when possible 2017-04-24 16:38:51 -07:00
Steven Robertson 6bf428caee Move 'mktref' to util 2017-04-24 16:38:39 -07:00
Steven Robertson bdcaca1f97 Initial draft of hotspot deferral.
Build an array of one-bit flags for every pixel (addressed as u32 data).
If we have accumulated at least 64 points for that pixel, set the flag;
thereafter only write 1/16 (and multiply subsequent points that do get
written by 16).

The theory is, after 64 points, the color is pretty much locked in; this
lets us crank SPP up to get excellent coverage in dark areas but the
bright ones don't matter so much since they're fully resolved. Still
needs a lot of tuning to get peak performance, and the trigger threshold
may need to be scaled along with the render size. It also will likely
not scale as well to higher resolutions, because we rely on L2 cache to
make this fast.
2017-04-24 16:33:39 -07:00
Steven Robertson 6b2b72a3fe Remove unused texture reference 2017-04-23 01:15:51 -07:00
Steven Robertson c79db04490 Choose a GPU in main.py 2017-04-21 13:08:16 -07:00
Steven Robertson b507c9d604 Make tiffs 16-bit using tifffile 2017-04-20 18:22:27 -07:00
Steven Robertson c6fcaf472f Be sure to close the output files in main.py. 2017-04-20 17:52:37 -07:00
Steven Robertson 746aee9a75 Add a 'plainclip' filter.
This is useful for doing color manipulation on output renders, either by
hand in color grading software or automatically using renormalization
based on image statistics.
2017-04-20 17:51:05 -07:00
Steven Robertson 14f755e434 Add an FFmpeg ProRes handler. 2017-04-20 17:50:33 -07:00
Steven Robertson 96585e2ca5 Update YUV444p12 to be Rec. 709, studio swing. 2017-04-20 17:42:08 -07:00
Steven Robertson f64cf79d8d Fixes to parse all gen198 genomes. Named palettes. 2017-04-20 13:54:46 -07:00
Steven Robertson d1228ac303 Add a simple graph-walker for playback 2015-10-26 01:35:09 -07:00
Steven Robertson 36e3b7aca9 Compiler made register restrictions unnecessary 2015-10-26 01:34:35 -07:00
Steven Robertson 17b5a1a96f Spit out raw content.
Previewing using an Intensity Pro 4K and secret monitor-sauce.
2015-10-11 00:52:26 -07:00
Steven Robertson cfe815f0d6 Min/max YUV output for 444p12.
Again, better solution forthcoming.
2015-10-11 00:51:58 -07:00
Steven Robertson 5f5e69f3a3 Crank up the dispatch params again after fix. 2015-10-11 00:51:22 -07:00
Steven Robertson 0e91d01528 Fix the noise issues on Maxwell GPUs.
AAAAUUUUUUGGGGGGHHH
2015-10-11 00:49:37 -07:00
Steven Robertson 37e6642d37 Add logencode filter. 2015-10-10 18:04:27 -07:00
Steven Robertson 37245085a9 Add pre-YUV output clamping to YUV444P12.
A better solution would cover all the ouput transforms, but... later.
2015-10-10 18:03:17 -07:00
Steven Robertson 0476bbfdce Fix default width/height 2015-10-10 16:01:58 -07:00
Steven Robertson f93b4dbf23 Add YUV444P12 support 2015-10-10 16:01:11 -07:00
Steven Robertson 0ce1b51d16 Convert YUV to RGB before filtering 2015-10-10 15:59:57 -07:00
Steven Robertson abcb3fa50f Look up renderers by name, rather than position 2015-10-10 15:58:36 -07:00
Steven Robertson 698d9c2337 Register filters with a class decorator 2015-10-10 15:58:13 -07:00
Steven Robertson 227a6016c2 Enable VP9 ARNR 2015-02-15 10:24:23 -08:00
Steven Robertson 51b1280e1e Better status messages for main.py 2015-02-14 17:51:13 -08:00
Steven Robertson e70073175e Fix codec naming issue for jpeg 2015-02-14 17:50:19 -08:00
Steven Robertson 8dc629d91e Autoselect number of columns to use for VP9 2015-02-14 17:50:03 -08:00
Steven Robertson e08444f74b Work around an overflow condition for now.
I'm not sure what's going wrong; the math still holds up at higher
densities, but when you crank up the samples-per-pixel count the
accumulators start overflowing stochastically, and when they do
they dump nonsense into the output. Until I have time, take a small
perf hit by flushing much more often.
2015-02-14 17:48:22 -08:00
Steven Robertson 8da1821616 Bump gutter to 12px to align reads 2014-12-25 15:04:31 -08:00
Steven Robertson 9a5c31ce37 Improve/fix vpx output 2014-12-25 15:04:10 -08:00