fractorium/Source/EmberCL/RendererCL.h
Person 90ec5b8246 --User changes:
-Show common folder locations such as documents, downloads, pictures in the sidebar in all file dialogs.
 -Warning message about exceeding memory in final render dialog now suggests strips as the solution to the problem.
 -Strips now has a tooltip explaining what it does.
 -Allow more digits in the spinners on the color section the flame tab.
 -Add manually adjustable size spinners in the final render dialog. Percentage scale and absolute size are fully synced.
 -Default prefix in final render is now the filename when doing animations (coming from sequence section of the library tab).
 -Changed the elliptic variation back to using a less precise version for float, and a more precise version for double. The last release had it always using double.
 -New applied xaos table that shows a read-only view of actual weights by taking the base xform weights and multiplying them by the xaos values.
 -New table in the xaos tab that gives a graphical representation of the probability that each xform is chosen, with and without xaos.
 -Add button to transpose the xaos rows and columns.
 -Add support for importing .chaos files from Chaotica.
 --Pasting back to Chaotica will work for most, but not all, variations due to incompatible parameter names in some.
 -Curves are now splines instead of Bezier. This adds compatibility with Chaotica, but breaks it for Apophysis. Xmls are still pastable, but the color curves will look different.
 --The curve editor on the palette tab can now add points by clicking on the lines and remove points by clicking on the points themselves, just like Chaotica.
 --Splines are saved in four new xml fields: overall_curve, red_curve, green_curve and blue_curve.
 -Allow for specifying the percentage of a sub batch each thread should iterate through per kernel call when running with OpenCL. This gives a roughly 1% performance increase due to having to make less kernel calls while iterating.
 --This field is present for interactive editing (where it's not very useful) and in the final render dialog.
 --On the command line, this is specified as --sbpctth for EmberRender and EmberAnimate.
 -Allow double clicking to toggle the supersample field in the flame tab between 1 and 2 for easily checking the effect of the field.
 -When showing affine values as polar coordinates, show angles normalized to 360 to match Chaotica.
 -Fuse Count spinner now toggles between 15 and 100 when double clicking for easily checking the effect of the field.
 -Added field for limiting the range in the x and y direction that the initial points are chosen from.
 -Added a field called K2 which is an alternative way to set brightness, ignored when zero.
 --This has no effect for many variations, but hs a noticeable effect for some.
 -Added new variations:
 arcsech
 arcsech2
 arcsinh
 arctanh
 asteria
 block
 bwraps_rand
 circlecrop2
 coth_spiral
 crackle2
 depth_blur
 depth_blur2
 depth_gaussian
 depth_gaussian2
 depth_ngon
 depth_ngon2
 depth_sine
 depth_sine2
 dragonfire
 dspherical
 dust
 excinis
 exp2
 flipx
 flowerdb
 foci_p
 gaussian
 glynnia2
 glynnsim4
 glynnsim5
 henon
 henon
 hex_rand
 hex_truchet
 hypershift
 lazyjess
 lens
 lozi
 lozi
 modulusx
 modulusy
 oscilloscope2
 point_symmetry
 pointsymmetry
 projective
 pulse
 rotate
 scry2
 shift
 smartshape
 spher
 squares
 starblur2
 swirl3
 swirl3r
 tanh_spiral
 target0
 target2
 tile_hlp
 truchet_glyph
 truchet_inv
 truchet_knot
 unicorngaloshen
 vibration
 vibration2
 --hex_truchet, hex_rand should always use double. They are extremely sensitive.

--Bug fixes:
 -Bounds sign was flipped for x coordinate of world space when center was not zero.
 -Right clicking and dragging spinner showed menu on mouse up, even if it was very far away.
 -Text boxes for size in final render dialog were hard to type in. Same bug as xform weight used to be so fix the same way.
 -Fix spelling to be plural in toggle color speed box.
 -Stop using the blank user palette to generate flames. Either put colored palettes in it, or exclude it from randoms.
 -Clicking the random palette button for a palette file with only one palette in it would freeze the program.
 -Clicking none scale in final render did not re-render the preview.
 -Use less precision on random xaos. No need for 12 decimal places.
 -The term sub batch is overloaded in the options dialog. Change the naming and tooltip of those settings for cpu and opencl.
 --Also made clear in the tooltip for the default opencl quality setting that the value is per device.
 -The arrows spinner in palette editor appears like a read-only label. Made it look like a spinner.
 -Fix border colors for various spin boxes and table headers in the style sheet. Requires reload.
 -Fix a bug in the bwraps variation which would produce different results than Chaotica and Apophysis.
 -Synth was allowed to be selected for random flame generation when using an Nvidia card but it shouldn't have been because Nvidia has a hard time compiling synth.
 -A casting bug in the OpenCL kernels for log scaling and density filtering was preventing successful compilations on Intel iGPUs. Fixed even though we don't support anything other than AMD and Nvidia.
 -Palette rotation (click and drag) position was not being reset when loading a new flame.
 -When the xform circles were hidden, opening and closing the options dialog would improperly reshow them.
 -Double click toggle was broken on integer spin boxes.
 -Fixed tab order of some controls.
 -Creating a palette from a jpg in the palette editor only produced a single color.
 --Needed to package imageformats/qjpeg.dll with the Windows installer.
 -The basic memory benchmark test flame was not really testing memory. Make it more spread out.
 -Remove the temporal samples field from the flame tab, it was never used because it's only an animation parameter which is specified in the final render dialog or on the command line with EmberAnimate.

--Code changes:
 -Add IsEmpty() to Palette to determine if a palette is all black.
 -Attempt to avoid selecting a blank palette in PaletteList::GetRandomPalette().
 -Add function ScanForChaosNodes() and some associated helper functions in XmlToEmber.
 -Make variation param name correction be case insensitive in XmlToEmber.
 -Report error when assigning a variation param value in XmlToEmber.
 -Add SubBatchPercentPerThread() method to RendererCL.
 -Override enterEvent() and leaveEvent() in DoubleSpinBox and SpinBox to prevent the context menu from showing up on right mouse up after already leaving the spinner.
 -Filtering the mouse wheel event in TableWidget no longer appears to be needed. It was probably an old Qt bug that has been fixed.
 -Gui/ember syncing code in the final render dialog needed to be reworked to accommodate absolute sizes.
2019-04-13 19:00:46 -07:00

265 lines
11 KiB
C++

#pragma once
#include "EmberCLPch.h"
#include "OpenCLWrapper.h"
#include "DEOpenCLKernelCreator.h"
#include "FinalAccumOpenCLKernelCreator.h"
#include "RendererClDevice.h"
/// <summary>
/// RendererCLBase and RendererCL classes.
/// </summary>
namespace EmberCLns
{
/// <summary>
/// Serves only as an interface for OpenCL specific rendering functions.
/// </summary>
class EMBERCL_API RendererCLBase
{
public:
virtual ~RendererCLBase() { }
virtual bool ReadFinal(v4F* pixels) { return false; }
virtual bool ClearFinal() { return false; }
virtual bool AnyNvidia() const { return false; }
bool OptAffine() const { return m_OptAffine; }
void OptAffine(bool optAffine) { m_OptAffine = optAffine; }
std::function<void(void)> m_CompileBegun;
protected:
bool m_OptAffine = false;
};
/// <summary>
/// RendererCL is a derivation of the basic CPU renderer which
/// overrides various functions to render on the GPU using OpenCL.
/// This supports multi-GPU rendering and is done in the following manner:
/// -When rendering a single image, the iterations will be split between devices in sub batches.
/// -When animating, a renderer for each device will be created by the calling code,
/// and the frames will each be rendered by a single device as available.
/// The synchronization across devices is done through a single atomic counter.
/// Since this class derives from EmberReport and also contains an
/// OpenCLWrapper member which also derives from EmberReport, the
/// reporting functions are overridden to aggregate the errors from
/// both sources.
/// Template argument T expected to be float or double.
/// Template argument bucketT must always be float.
/// </summary>
template <typename T, typename bucketT>
class EMBERCL_API RendererCL : public Renderer<T, bucketT>, public RendererCLBase
{
using EmberNs::Renderer<T, bucketT>::RendererBase::Abort;
using EmberNs::Renderer<T, bucketT>::RendererBase::EarlyClip;
using EmberNs::Renderer<T, bucketT>::RendererBase::EnterResize;
using EmberNs::Renderer<T, bucketT>::RendererBase::LeaveResize;
using EmberNs::Renderer<T, bucketT>::RendererBase::FinalRasW;
using EmberNs::Renderer<T, bucketT>::RendererBase::FinalRasH;
using EmberNs::Renderer<T, bucketT>::RendererBase::SuperRasW;
using EmberNs::Renderer<T, bucketT>::RendererBase::SuperRasH;
using EmberNs::Renderer<T, bucketT>::RendererBase::SuperSize;
using EmberNs::Renderer<T, bucketT>::RendererBase::BytesPerChannel;
using EmberNs::Renderer<T, bucketT>::RendererBase::TemporalSamples;
using EmberNs::Renderer<T, bucketT>::RendererBase::ItersPerTemporalSample;
using EmberNs::Renderer<T, bucketT>::RendererBase::FuseCount;
using EmberNs::Renderer<T, bucketT>::RendererBase::DensityFilterOffset;
using EmberNs::Renderer<T, bucketT>::RendererBase::PrepFinalAccumVector;
using EmberNs::Renderer<T, bucketT>::RendererBase::Paused;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_ProgressParameter;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_YAxisUp;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_LockAccum;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_Abort;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_LastIter;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_LastIterPercent;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_Stats;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_Callback;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_Rand;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_RenderTimer;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_IterTimer;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_ProgressTimer;
using EmberNs::Renderer<T, bucketT>::RendererBase::EmberReport::AddToReport;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_ResizeCs;
using EmberNs::Renderer<T, bucketT>::RendererBase::m_ProcessAction;
using EmberNs::Renderer<T, bucketT>::m_RotMat;
using EmberNs::Renderer<T, bucketT>::m_Ember;
using EmberNs::Renderer<T, bucketT>::m_Csa;
using EmberNs::Renderer<T, bucketT>::m_CurvesSet;
using EmberNs::Renderer<T, bucketT>::CenterX;
using EmberNs::Renderer<T, bucketT>::CenterY;
using EmberNs::Renderer<T, bucketT>::K1;
using EmberNs::Renderer<T, bucketT>::K2;
using EmberNs::Renderer<T, bucketT>::Supersample;
using EmberNs::Renderer<T, bucketT>::HighlightPower;
using EmberNs::Renderer<T, bucketT>::HistBuckets;
using EmberNs::Renderer<T, bucketT>::AccumulatorBuckets;
using EmberNs::Renderer<T, bucketT>::GetDensityFilter;
using EmberNs::Renderer<T, bucketT>::GetSpatialFilter;
using EmberNs::Renderer<T, bucketT>::CoordMap;
using EmberNs::Renderer<T, bucketT>::XformDistributions;
using EmberNs::Renderer<T, bucketT>::XformDistributionsSize;
using EmberNs::Renderer<T, bucketT>::m_Dmap;
using EmberNs::Renderer<T, bucketT>::m_DensityFilter;
using EmberNs::Renderer<T, bucketT>::m_SpatialFilter;
public:
RendererCL(const vector<pair<size_t, size_t>>& devices, bool shared = false, GLuint outputTexID = 0);
RendererCL(const RendererCL<T, bucketT>& renderer) = delete;
RendererCL<T, bucketT>& operator = (const RendererCL<T, bucketT>& renderer) = delete;
virtual ~RendererCL() = default;
//Non-virtual member functions for OpenCL specific tasks.
bool Init(const vector<pair<size_t, size_t>>& devices, bool shared, GLuint outputTexID);
bool SetOutputTexture(GLuint outputTexID);
//Iters per kernel/block/grid.
inline size_t IterCountPerKernel() const;
inline size_t IterCountPerBlock() const;
inline size_t IterCountPerGrid() const;
//Kernels per block.
inline size_t IterBlockKernelWidth() const;
inline size_t IterBlockKernelHeight() const;
inline size_t IterBlockKernelCount() const;
//Kernels per grid.
inline size_t IterGridKernelWidth() const;
inline size_t IterGridKernelHeight() const;
inline size_t IterGridKernelCount() const;
//Blocks per grid.
inline size_t IterGridBlockWidth() const;
inline size_t IterGridBlockHeight() const;
inline size_t IterGridBlockCount() const;
bool ReadHist(size_t device);
bool ReadAccum();
bool ReadPoints(size_t device, vector<PointCL<T>>& vec);
bool ClearHist();
bool ClearHist(size_t device);
bool ClearAccum();
bool WritePoints(size_t device, vector<PointCL<T>>& vec);
#ifdef TEST_CL
bool WriteRandomPoints(size_t device);
#endif
void SubBatchPercentPerThread(float f);
float SubBatchPercentPerThread() const;
const string& IterKernel() const;
const string& DEKernel() const;
const string& FinalAccumKernel() const;
//Access to underlying OpenCL structures. Use cautiously.
const vector<unique_ptr<RendererClDevice>>& Devices() const;
//Virtual functions overridden from RendererCLBase.
virtual bool ReadFinal(v4F* pixels);
virtual bool ClearFinal();
//Public virtual functions overridden from Renderer or RendererBase.
virtual size_t MemoryAvailable() override;
virtual bool Ok() const override;
virtual size_t SubBatchSize() const override;
virtual size_t ThreadCount() const override;
virtual bool CreateDEFilter(bool& newAlloc) override;
virtual bool CreateSpatialFilter(bool& newAlloc) override;
virtual eRendererType RendererType() const override;
virtual bool Shared() const override;
virtual void ClearErrorReport() override;
virtual string ErrorReportString() override;
virtual vector<string> ErrorReport() override;
virtual bool RandVec(vector<QTIsaac<ISAAC_SIZE, ISAAC_INT>>& randVec) override;
virtual bool AnyNvidia() const override;
#ifndef TEST_CL
protected:
#endif
//Protected virtual functions overridden from Renderer.
virtual bool Alloc(bool histOnly = false) override;
virtual bool ResetBuckets(bool resetHist = true, bool resetAccum = true) override;
virtual eRenderStatus LogScaleDensityFilter(bool forceOutput = false) override;
virtual eRenderStatus GaussianDensityFilter() override;
virtual eRenderStatus AccumulatorToFinalImage(vector<v4F>& pixels, size_t finalOffset) override;
virtual EmberStats Iterate(size_t iterCount, size_t temporalSample) override;
#ifndef TEST_CL
private:
#endif
//Private functions for making and running OpenCL programs.
bool BuildIterProgramForEmber(bool doAccum = true);
bool RunIter(size_t iterCount, size_t temporalSample, size_t& itersRan);
eRenderStatus RunLogScaleFilter();
eRenderStatus RunDensityFilter();
eRenderStatus RunFinalAccum();
bool ClearBuffer(size_t device, const string& bufferName, uint width, uint height, uint elementSize);
bool RunDensityFilterPrivate(size_t kernelIndex, size_t gridW, size_t gridH, size_t blockW, size_t blockH, uint chunkSizeW, uint chunkSizeH, uint colChunkPass, uint rowChunkPass);
int MakeAndGetDensityFilterProgram(size_t ss, uint filterWidth);
int MakeAndGetFinalAccumProgram();
int MakeAndGetGammaCorrectionProgram();
bool CreateHostBuffer();
bool SumDeviceHist();
void FillSeeds();
//Private functions passing data to OpenCL programs.
void ConvertDensityFilter();
void ConvertSpatialFilter();
void ConvertEmber(Ember<T>& ember, EmberCL<T>& emberCL, vector<XformCL<T>>& xformsCL);
void ConvertCarToRas(const CarToRas<T>& carToRas);
std::string ErrorStr(const std::string& loc, const std::string& error, RendererClDevice* dev);
bool m_Init = false;
bool m_Shared = false;
bool m_DoublePrecision = typeid(T) == typeid(double);
float m_SubBatchPercentPerThread = 0.025f;//0.025 * 10,240 gives a default value of 256 iters per thread for the default sub batch size of 10,240 which almost all flames will use.
//It's critical that these numbers never change. They are
//based on the cuburn model of each kernel launch containing
//256 threads. 32 wide by 8 high. Everything done in the OpenCL
//iteraion kernel depends on these dimensions.
size_t m_IterCountPerKernel = 256;
size_t m_IterBlocksWide = 64, m_IterBlockWidth = 32;
size_t m_IterBlocksHigh = 2, m_IterBlockHeight = 8;
size_t m_MaxDEBlockSizeW;
size_t m_MaxDEBlockSizeH;
//Buffer names.
string m_EmberBufferName = "Ember";
string m_XformsBufferName = "Xforms";
string m_ParVarsBufferName = "ParVars";
string m_GlobalSharedBufferName = "GlobalShared";
string m_SeedsBufferName = "Seeds";
string m_DistBufferName = "Dist";
string m_CarToRasBufferName = "CarToRas";
string m_DEFilterParamsBufferName = "DEFilterParams";
string m_SpatialFilterParamsBufferName = "SpatialFilterParams";
string m_DECoefsBufferName = "DECoefs";
string m_DEWidthsBufferName = "DEWidths";
string m_DECoefIndicesBufferName = "DECoefIndices";
string m_SpatialFilterCoefsBufferName = "SpatialFilterCoefs";
string m_CurvesCsaName = "CurvesCsa";
string m_HostBufferName = "Host";
string m_HistBufferName = "Hist";
string m_AccumBufferName = "Accum";
string m_FinalImageName = "Final";
string m_PointsBufferName = "Points";
//Kernels.
string m_IterKernel;
cl::ImageFormat m_PaletteFormat;
cl::ImageFormat m_FinalFormat;
cl::Image2D m_Palette;
cl::ImageGL m_AccumImage;
GLuint m_OutputTexID;
EmberCL<T> m_EmberCL;
vector<XformCL<T>> m_XformsCL;
vector<vector<glm::highp_uvec2>> m_Seeds;
CarToRasCL<T> m_CarToRasCL;
DensityFilterCL<bucketT> m_DensityFilterCL;
SpatialFilterCL<bucketT> m_SpatialFilterCL;
IterOpenCLKernelCreator<T> m_IterOpenCLKernelCreator;
DEOpenCLKernelCreator m_DEOpenCLKernelCreator;
FinalAccumOpenCLKernelCreator m_FinalAccumOpenCLKernelCreator;
pair<string, vector<T>> m_Params;
pair<string, vector<T>> m_GlobalShared;
vector<unique_ptr<RendererClDevice>> m_Devices;
Ember<T> m_LastBuiltEmber;
};
}