fractorium/Source/EmberCL/IterOpenCLKernelCreator.cpp

840 lines
28 KiB
C++
Raw Normal View History

#include "EmberCLPch.h"
#include "IterOpenCLKernelCreator.h"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
//#define STRAIGHT_RAND 1
#define USE_CASE 1
namespace EmberCLns
{
/// <summary>
/// Constructor that sets up some basic entry point strings and creates
/// the zeroization kernel string since it requires no conditional inputs.
/// </summary>
template <typename T>
IterOpenCLKernelCreator<T>::IterOpenCLKernelCreator()
{
m_IterEntryPoint = "IterateKernel";
m_ZeroizeEntryPoint = "ZeroizeKernel";
--User changes -Add support for multiple GPU devices. --These options are present in the command line and in Fractorium. -Change scheme of specifying devices from platform,device to just total device index. --Single number on the command line. --Change from combo boxes for device selection to a table of all devices in Fractorium. -Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Bug fixes -EmberAnimate, EmberRender, FractoriumSettings, FinalRenderDialog: Fix wrong order of arguments to Clamp() when assigning thread priority. -VariationsDC.h: Fix NVidia OpenCL compilation error in DCTriangleVariation. -FractoriumXformsColor.cpp: Checking for null pixmap pointer is not enough, must also check if the underlying buffer is null via call to QPixmap::isNull(). --Code changes -Ember.h: Add case for FLAME_MOTION_NONE and default in ApplyFlameMotion(). -EmberMotion.h: Call base constructor. -EmberPch.h: #pragma once only on Windows. -EmberToXml.h: --Handle different types of exceptions. --Add default cases to ToString(). -Isaac.h: Remove unused variable in constructor. -Point.h: Call base constructor in Color(). -Renderer.h/cpp: --Add bool to Alloc() to only allocate memory for the histogram. Needed for multi-GPU. --Make CoordMap() return a const ref, not a pointer. -SheepTools.h: --Use 64-bit types like the rest of the code already does. --Fix some comment misspellings. -Timing.h: Make BeginTime(), EndTime(), ElapsedTime() and Format() be const functions. -Utils.h: --Add new functions Equal() and Split(). --Handle more exception types in ReadFile(). --Get rid of most legacy blending of C and C++ argument parsing. -XmlToEmber.h: --Get rid of most legacy blending of C and C++ code from flam3. --Remove some unused variables. -EmberAnimate: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --If a render fails, exit since there is no point in continuing an animation with a missing frame. --Pass variables to threaded save better, which most likely fixes a very subtle bug that existed before. --Remove some unused variables. -EmberGenome, EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. -EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --Only print values when not rendering with OpenCL, since they're always 0 in that case. -EmberCLPch.h: --#pragma once only on Windows. --#include <atomic>. -IterOpenCLKernelCreator.h: Add new kernel for summing two histograms. This is needed for multi-GPU. -OpenCLWrapper.h: --Move all OpenCL info related code into its own class OpenCLInfo. --Add members to cache the values of global memory size and max allocation size. -RendererCL.h/cpp: --Redesign to accomodate multi-GPU. --Constructor now takes a vector of devices. --Remove DumpErrorReport() function, it's handled in the base. --ClearBuffer(), ReadPoints(), WritePoints(), ReadHist() and WriteHist() now optionally take a device index as a parameter. --MakeDmap() override and m_DmapCL member removed because it no longer applies since the histogram is always float since the last commit. --Add new function SumDeviceHist() to sum histograms from two devices by first copying to a temporary on the host, then a temporary on the device, then summing. --m_Calls member removed, as it's now per-device. --OpenCLWrapper removed. --m_Seeds member is now a vector of vector of seeds, to accomodate a separate and different array of seeds for each device. --Added member m_Devices, a vector of unique_ptr of RendererCLDevice. -EmberCommon.h --Added Devices() function to convert from a vector of device indices to a vector of platform,device indices. --Changed CreateRenderer() to accept a vector of devices to create a single RendererCL which will split work across multiple devices. --Added CreateRenderers() function to accept a vector of devices to create multiple RendererCL, each which will render on a single device. --Add more comments to some existing functions. -EmberCommonPch.h: #pragma once only on Windows. -EmberOptions.h: --Remove --platform option, it's just sequential device number now with the --device option. --Make --out be OPT_USE_RENDER instead of OPT_RENDER_ANIM since it's an error condition when animating. It makes no sense to write all frames to a single image. --Add Devices() function to parse comma separated --device option string and return a vector of device indices. --Make int and uint types be 64-bit, so intmax_t and size_t. --Make better use of macros. -JpegUtils.h: Make string parameters to WriteJpeg() and WritePng() be const ref. -All project files: Turn off buffer security check option in Visual Studio (/Gs-) -deployment.pri: Remove the line OTHER_FILES +=, it's pointless and was causing problems. -Ember.pro, EmberCL.pro: Add CONFIG += plugin, otherwise it wouldn't link. -EmberCL.pro: Add new files for multi-GPU support. -build_all.sh: use -j4 and QMAKE=${QMAKE:/usr/bin/qmake} -shared_settings.pri: -Add version string. -Remove old DESTDIR definitions. -Add the following lines or else nothing would build: CONFIG(release, debug|release) { CONFIG += warn_off DESTDIR = ../../../Bin/release } CONFIG(debug, debug|release) { DESTDIR = ../../../Bin/debug } QMAKE_POST_LINK += $$quote(cp --update ../../../Data/flam3-palettes.xml $${DESTDIR}$$escape_expand(\n\t)) LIBS += -L/usr/lib -lpthread -AboutDialog.ui: Another futile attempt to make it look correct on Linux. -FinalRenderDialog.h/cpp: --Add support for multi-GPU. --Change from combo boxes for device selection to a table of all devices. --Ensure device selection makes sense. --Remove "FinalRender" prefix of various function names, it's implied given the context. -FinalRenderEmberController.h/cpp: --Add support for multi-GPU. --Change m_FinishedImageCount to be atomic. --Move CancelRender() from the base to FinalRenderEmberController<T>. --Refactor RenderComplete() to omit any progress related functionality or image saving since it can be potentially ran in a thread. --Consolidate setting various renderer fields into SyncGuiToRenderer(). -Fractorium.cpp: Allow for resizing of the options dialog to show the entire device table. -FractoriumCommon.h: Add various functions to handle a table showing the available OpenCL devices on the system. -FractoriumEmberController.h/cpp: Remove m_FinalImageIndex, it's no longer needed. -FractoriumRender.cpp: Scale the interactive sub batch count and quality by the number of devices used. -FractoriumSettings.h/cpp: --Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Add multi-GPU support, remove old device,platform pair. -FractoriumToolbar.cpp: Disable OpenCL toolbar button if there are no devices present on the system. -FractoriumOptionsDialog.h/cpp: --Add support for multi-GPU. --Consolidate more assignments in DataToGui(). --Enable/disable CPU/OpenCL items in response to OpenCL checkbox event. -Misc: Convert almost everything to size_t for unsigned, intmax_t for signed.
2015-09-12 21:33:45 -04:00
m_SumHistEntryPoint = "SumHisteKernel";
m_ZeroizeKernel = CreateZeroizeKernelString();
--User changes -Add support for multiple GPU devices. --These options are present in the command line and in Fractorium. -Change scheme of specifying devices from platform,device to just total device index. --Single number on the command line. --Change from combo boxes for device selection to a table of all devices in Fractorium. -Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Bug fixes -EmberAnimate, EmberRender, FractoriumSettings, FinalRenderDialog: Fix wrong order of arguments to Clamp() when assigning thread priority. -VariationsDC.h: Fix NVidia OpenCL compilation error in DCTriangleVariation. -FractoriumXformsColor.cpp: Checking for null pixmap pointer is not enough, must also check if the underlying buffer is null via call to QPixmap::isNull(). --Code changes -Ember.h: Add case for FLAME_MOTION_NONE and default in ApplyFlameMotion(). -EmberMotion.h: Call base constructor. -EmberPch.h: #pragma once only on Windows. -EmberToXml.h: --Handle different types of exceptions. --Add default cases to ToString(). -Isaac.h: Remove unused variable in constructor. -Point.h: Call base constructor in Color(). -Renderer.h/cpp: --Add bool to Alloc() to only allocate memory for the histogram. Needed for multi-GPU. --Make CoordMap() return a const ref, not a pointer. -SheepTools.h: --Use 64-bit types like the rest of the code already does. --Fix some comment misspellings. -Timing.h: Make BeginTime(), EndTime(), ElapsedTime() and Format() be const functions. -Utils.h: --Add new functions Equal() and Split(). --Handle more exception types in ReadFile(). --Get rid of most legacy blending of C and C++ argument parsing. -XmlToEmber.h: --Get rid of most legacy blending of C and C++ code from flam3. --Remove some unused variables. -EmberAnimate: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --If a render fails, exit since there is no point in continuing an animation with a missing frame. --Pass variables to threaded save better, which most likely fixes a very subtle bug that existed before. --Remove some unused variables. -EmberGenome, EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. -EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --Only print values when not rendering with OpenCL, since they're always 0 in that case. -EmberCLPch.h: --#pragma once only on Windows. --#include <atomic>. -IterOpenCLKernelCreator.h: Add new kernel for summing two histograms. This is needed for multi-GPU. -OpenCLWrapper.h: --Move all OpenCL info related code into its own class OpenCLInfo. --Add members to cache the values of global memory size and max allocation size. -RendererCL.h/cpp: --Redesign to accomodate multi-GPU. --Constructor now takes a vector of devices. --Remove DumpErrorReport() function, it's handled in the base. --ClearBuffer(), ReadPoints(), WritePoints(), ReadHist() and WriteHist() now optionally take a device index as a parameter. --MakeDmap() override and m_DmapCL member removed because it no longer applies since the histogram is always float since the last commit. --Add new function SumDeviceHist() to sum histograms from two devices by first copying to a temporary on the host, then a temporary on the device, then summing. --m_Calls member removed, as it's now per-device. --OpenCLWrapper removed. --m_Seeds member is now a vector of vector of seeds, to accomodate a separate and different array of seeds for each device. --Added member m_Devices, a vector of unique_ptr of RendererCLDevice. -EmberCommon.h --Added Devices() function to convert from a vector of device indices to a vector of platform,device indices. --Changed CreateRenderer() to accept a vector of devices to create a single RendererCL which will split work across multiple devices. --Added CreateRenderers() function to accept a vector of devices to create multiple RendererCL, each which will render on a single device. --Add more comments to some existing functions. -EmberCommonPch.h: #pragma once only on Windows. -EmberOptions.h: --Remove --platform option, it's just sequential device number now with the --device option. --Make --out be OPT_USE_RENDER instead of OPT_RENDER_ANIM since it's an error condition when animating. It makes no sense to write all frames to a single image. --Add Devices() function to parse comma separated --device option string and return a vector of device indices. --Make int and uint types be 64-bit, so intmax_t and size_t. --Make better use of macros. -JpegUtils.h: Make string parameters to WriteJpeg() and WritePng() be const ref. -All project files: Turn off buffer security check option in Visual Studio (/Gs-) -deployment.pri: Remove the line OTHER_FILES +=, it's pointless and was causing problems. -Ember.pro, EmberCL.pro: Add CONFIG += plugin, otherwise it wouldn't link. -EmberCL.pro: Add new files for multi-GPU support. -build_all.sh: use -j4 and QMAKE=${QMAKE:/usr/bin/qmake} -shared_settings.pri: -Add version string. -Remove old DESTDIR definitions. -Add the following lines or else nothing would build: CONFIG(release, debug|release) { CONFIG += warn_off DESTDIR = ../../../Bin/release } CONFIG(debug, debug|release) { DESTDIR = ../../../Bin/debug } QMAKE_POST_LINK += $$quote(cp --update ../../../Data/flam3-palettes.xml $${DESTDIR}$$escape_expand(\n\t)) LIBS += -L/usr/lib -lpthread -AboutDialog.ui: Another futile attempt to make it look correct on Linux. -FinalRenderDialog.h/cpp: --Add support for multi-GPU. --Change from combo boxes for device selection to a table of all devices. --Ensure device selection makes sense. --Remove "FinalRender" prefix of various function names, it's implied given the context. -FinalRenderEmberController.h/cpp: --Add support for multi-GPU. --Change m_FinishedImageCount to be atomic. --Move CancelRender() from the base to FinalRenderEmberController<T>. --Refactor RenderComplete() to omit any progress related functionality or image saving since it can be potentially ran in a thread. --Consolidate setting various renderer fields into SyncGuiToRenderer(). -Fractorium.cpp: Allow for resizing of the options dialog to show the entire device table. -FractoriumCommon.h: Add various functions to handle a table showing the available OpenCL devices on the system. -FractoriumEmberController.h/cpp: Remove m_FinalImageIndex, it's no longer needed. -FractoriumRender.cpp: Scale the interactive sub batch count and quality by the number of devices used. -FractoriumSettings.h/cpp: --Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Add multi-GPU support, remove old device,platform pair. -FractoriumToolbar.cpp: Disable OpenCL toolbar button if there are no devices present on the system. -FractoriumOptionsDialog.h/cpp: --Add support for multi-GPU. --Consolidate more assignments in DataToGui(). --Enable/disable CPU/OpenCL items in response to OpenCL checkbox event. -Misc: Convert almost everything to size_t for unsigned, intmax_t for signed.
2015-09-12 21:33:45 -04:00
m_SumHistKernel = CreateSumHistKernelString();
}
/// <summary>
/// Accessors.
/// </summary>
template <typename T> const string& IterOpenCLKernelCreator<T>::ZeroizeKernel() const { return m_ZeroizeKernel; }
template <typename T> const string& IterOpenCLKernelCreator<T>::ZeroizeEntryPoint() const { return m_ZeroizeEntryPoint; }
--User changes -Add support for multiple GPU devices. --These options are present in the command line and in Fractorium. -Change scheme of specifying devices from platform,device to just total device index. --Single number on the command line. --Change from combo boxes for device selection to a table of all devices in Fractorium. -Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Bug fixes -EmberAnimate, EmberRender, FractoriumSettings, FinalRenderDialog: Fix wrong order of arguments to Clamp() when assigning thread priority. -VariationsDC.h: Fix NVidia OpenCL compilation error in DCTriangleVariation. -FractoriumXformsColor.cpp: Checking for null pixmap pointer is not enough, must also check if the underlying buffer is null via call to QPixmap::isNull(). --Code changes -Ember.h: Add case for FLAME_MOTION_NONE and default in ApplyFlameMotion(). -EmberMotion.h: Call base constructor. -EmberPch.h: #pragma once only on Windows. -EmberToXml.h: --Handle different types of exceptions. --Add default cases to ToString(). -Isaac.h: Remove unused variable in constructor. -Point.h: Call base constructor in Color(). -Renderer.h/cpp: --Add bool to Alloc() to only allocate memory for the histogram. Needed for multi-GPU. --Make CoordMap() return a const ref, not a pointer. -SheepTools.h: --Use 64-bit types like the rest of the code already does. --Fix some comment misspellings. -Timing.h: Make BeginTime(), EndTime(), ElapsedTime() and Format() be const functions. -Utils.h: --Add new functions Equal() and Split(). --Handle more exception types in ReadFile(). --Get rid of most legacy blending of C and C++ argument parsing. -XmlToEmber.h: --Get rid of most legacy blending of C and C++ code from flam3. --Remove some unused variables. -EmberAnimate: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --If a render fails, exit since there is no point in continuing an animation with a missing frame. --Pass variables to threaded save better, which most likely fixes a very subtle bug that existed before. --Remove some unused variables. -EmberGenome, EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. -EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --Only print values when not rendering with OpenCL, since they're always 0 in that case. -EmberCLPch.h: --#pragma once only on Windows. --#include <atomic>. -IterOpenCLKernelCreator.h: Add new kernel for summing two histograms. This is needed for multi-GPU. -OpenCLWrapper.h: --Move all OpenCL info related code into its own class OpenCLInfo. --Add members to cache the values of global memory size and max allocation size. -RendererCL.h/cpp: --Redesign to accomodate multi-GPU. --Constructor now takes a vector of devices. --Remove DumpErrorReport() function, it's handled in the base. --ClearBuffer(), ReadPoints(), WritePoints(), ReadHist() and WriteHist() now optionally take a device index as a parameter. --MakeDmap() override and m_DmapCL member removed because it no longer applies since the histogram is always float since the last commit. --Add new function SumDeviceHist() to sum histograms from two devices by first copying to a temporary on the host, then a temporary on the device, then summing. --m_Calls member removed, as it's now per-device. --OpenCLWrapper removed. --m_Seeds member is now a vector of vector of seeds, to accomodate a separate and different array of seeds for each device. --Added member m_Devices, a vector of unique_ptr of RendererCLDevice. -EmberCommon.h --Added Devices() function to convert from a vector of device indices to a vector of platform,device indices. --Changed CreateRenderer() to accept a vector of devices to create a single RendererCL which will split work across multiple devices. --Added CreateRenderers() function to accept a vector of devices to create multiple RendererCL, each which will render on a single device. --Add more comments to some existing functions. -EmberCommonPch.h: #pragma once only on Windows. -EmberOptions.h: --Remove --platform option, it's just sequential device number now with the --device option. --Make --out be OPT_USE_RENDER instead of OPT_RENDER_ANIM since it's an error condition when animating. It makes no sense to write all frames to a single image. --Add Devices() function to parse comma separated --device option string and return a vector of device indices. --Make int and uint types be 64-bit, so intmax_t and size_t. --Make better use of macros. -JpegUtils.h: Make string parameters to WriteJpeg() and WritePng() be const ref. -All project files: Turn off buffer security check option in Visual Studio (/Gs-) -deployment.pri: Remove the line OTHER_FILES +=, it's pointless and was causing problems. -Ember.pro, EmberCL.pro: Add CONFIG += plugin, otherwise it wouldn't link. -EmberCL.pro: Add new files for multi-GPU support. -build_all.sh: use -j4 and QMAKE=${QMAKE:/usr/bin/qmake} -shared_settings.pri: -Add version string. -Remove old DESTDIR definitions. -Add the following lines or else nothing would build: CONFIG(release, debug|release) { CONFIG += warn_off DESTDIR = ../../../Bin/release } CONFIG(debug, debug|release) { DESTDIR = ../../../Bin/debug } QMAKE_POST_LINK += $$quote(cp --update ../../../Data/flam3-palettes.xml $${DESTDIR}$$escape_expand(\n\t)) LIBS += -L/usr/lib -lpthread -AboutDialog.ui: Another futile attempt to make it look correct on Linux. -FinalRenderDialog.h/cpp: --Add support for multi-GPU. --Change from combo boxes for device selection to a table of all devices. --Ensure device selection makes sense. --Remove "FinalRender" prefix of various function names, it's implied given the context. -FinalRenderEmberController.h/cpp: --Add support for multi-GPU. --Change m_FinishedImageCount to be atomic. --Move CancelRender() from the base to FinalRenderEmberController<T>. --Refactor RenderComplete() to omit any progress related functionality or image saving since it can be potentially ran in a thread. --Consolidate setting various renderer fields into SyncGuiToRenderer(). -Fractorium.cpp: Allow for resizing of the options dialog to show the entire device table. -FractoriumCommon.h: Add various functions to handle a table showing the available OpenCL devices on the system. -FractoriumEmberController.h/cpp: Remove m_FinalImageIndex, it's no longer needed. -FractoriumRender.cpp: Scale the interactive sub batch count and quality by the number of devices used. -FractoriumSettings.h/cpp: --Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Add multi-GPU support, remove old device,platform pair. -FractoriumToolbar.cpp: Disable OpenCL toolbar button if there are no devices present on the system. -FractoriumOptionsDialog.h/cpp: --Add support for multi-GPU. --Consolidate more assignments in DataToGui(). --Enable/disable CPU/OpenCL items in response to OpenCL checkbox event. -Misc: Convert almost everything to size_t for unsigned, intmax_t for signed.
2015-09-12 21:33:45 -04:00
template <typename T> const string& IterOpenCLKernelCreator<T>::SumHistKernel() const { return m_SumHistKernel; }
template <typename T> const string& IterOpenCLKernelCreator<T>::SumHistEntryPoint() const { return m_SumHistEntryPoint; }
template <typename T> const string& IterOpenCLKernelCreator<T>::IterEntryPoint() const { return m_IterEntryPoint; }
/// <summary>
/// Create the iteration kernel string using the Cuburn method.
/// Template argument expected to be float or double.
/// </summary>
/// <param name="ember">The ember to create the kernel string for</param>
/// <param name="params">The parametric variation #define string</param>
/// <param name="doAccum">Debugging parameter to include or omit accumulating to the histogram. Default: true.</param>
/// <returns>The kernel string</returns>
template <typename T>
string IterOpenCLKernelCreator<T>::CreateIterKernelString(Ember<T>& ember, string& parVarDefines, bool lockAccum, bool doAccum)
{
bool doublePrecision = typeid(T) == typeid(double);
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t i, v, varIndex, varCount, totalXformCount = ember.TotalXformCount();
ostringstream kernelIterBody, xformFuncs, os;
vector<Variation<T>*> variations;
xformFuncs << "\n" << parVarDefines << endl;
ember.GetPresentVariations(variations);
for (auto var : variations) if (var) xformFuncs << var->OpenCLFuncsString();
for (i = 0; i < totalXformCount; i++)
{
Xform<T>* xform = ember.GetTotalXform(i);
//size_t totalVarCount = xform->TotalVariationCount();
bool needPrecalcSumSquares = false;
bool needPrecalcSqrtSumSquares = false;
bool needPrecalcAngles = false;
bool needPrecalcAtanXY = false;
bool needPrecalcAtanYX = false;
v = varIndex = varCount = 0;
xformFuncs <<
"void Xform" << i << "(__constant XformCL* xform, __constant real_t* parVars, Point* inPoint, Point* outPoint, uint2* mwc)\n" <<
"{\n"
" real_t transX, transY, transZ;\n"
" real4 vIn, vOut = 0.0;\n";
//Determine if any variations, regular, pre, or post need precalcs.
while (Variation<T>* var = xform->GetVariation(v++))
{
needPrecalcSumSquares |= var->NeedPrecalcSumSquares();
needPrecalcSqrtSumSquares |= var->NeedPrecalcSqrtSumSquares();
needPrecalcAngles |= var->NeedPrecalcAngles();
needPrecalcAtanXY |= var->NeedPrecalcAtanXY();
needPrecalcAtanYX |= var->NeedPrecalcAtanYX();
}
if (needPrecalcSumSquares)
xformFuncs << "\treal_t precalcSumSquares;\n";
if (needPrecalcSqrtSumSquares)
xformFuncs << "\treal_t precalcSqrtSumSquares;\n";
if (needPrecalcAngles)
{
xformFuncs << "\treal_t precalcSina;\n";
xformFuncs << "\treal_t precalcCosa;\n";
}
if (needPrecalcAtanXY)
xformFuncs << "\treal_t precalcAtanxy;\n";
if (needPrecalcAtanYX)
xformFuncs << "\treal_t precalcAtanyx;\n";
xformFuncs << "\treal_t tempColor = outPoint->m_ColorX = xform->m_ColorSpeedCache + (xform->m_OneMinusColorCache * inPoint->m_ColorX);\n\n";
if (xform->PreVariationCount() + xform->VariationCount() == 0)
{
xformFuncs <<
" outPoint->m_X = (xform->m_A * inPoint->m_X) + (xform->m_B * inPoint->m_Y) + xform->m_C;\n" <<
" outPoint->m_Y = (xform->m_D * inPoint->m_X) + (xform->m_E * inPoint->m_Y) + xform->m_F;\n" <<
" outPoint->m_Z = inPoint->m_Z;\n";
}
else
{
xformFuncs <<
" transX = (xform->m_A * inPoint->m_X) + (xform->m_B * inPoint->m_Y) + xform->m_C;\n" <<
" transY = (xform->m_D * inPoint->m_X) + (xform->m_E * inPoint->m_Y) + xform->m_F;\n" <<
" transZ = inPoint->m_Z;\n";
varCount = xform->PreVariationCount();
if (varCount > 0)
{
xformFuncs << "\n\t//Apply each of the " << varCount << " pre variations in this xform.\n";
//Output the code for each pre variation in this xform.
for (varIndex = 0; varIndex < varCount; varIndex++)
{
if (Variation<T>* var = xform->GetVariation(varIndex))
{
xformFuncs << "\n\t//" << var->Name() << ".\n";
xformFuncs << var->PrecalcOpenCLString();
xformFuncs << xform->ReadOpenCLString(VARTYPE_PRE) << endl;
xformFuncs << var->OpenCLString() << endl;
xformFuncs << xform->WriteOpenCLString(VARTYPE_PRE, var->AssignType()) << endl;
}
}
}
if (xform->VariationCount() > 0)
{
if (xform->NeedPrecalcSumSquares())
xformFuncs << "\tprecalcSumSquares = SQR(transX) + SQR(transY);\n";
if (xform->NeedPrecalcSqrtSumSquares())
xformFuncs << "\tprecalcSqrtSumSquares = sqrt(precalcSumSquares);\n";
if (xform->NeedPrecalcAngles())
{
xformFuncs << "\tprecalcSina = transX / Zeps(precalcSqrtSumSquares);\n";
xformFuncs << "\tprecalcCosa = transY / Zeps(precalcSqrtSumSquares);\n";
}
if (xform->NeedPrecalcAtanXY())
xformFuncs << "\tprecalcAtanxy = atan2(transX, transY);\n";
if (xform->NeedPrecalcAtanYX())
xformFuncs << "\tprecalcAtanyx = atan2(transY, transX);\n";
xformFuncs << "\n\toutPoint->m_X = 0;";
xformFuncs << "\n\toutPoint->m_Y = 0;";
xformFuncs << "\n\toutPoint->m_Z = 0;\n";
xformFuncs << "\n\t//Apply each of the " << xform->VariationCount() << " regular variations in this xform.\n\n";
xformFuncs << xform->ReadOpenCLString(VARTYPE_REG);
varCount += xform->VariationCount();
//Output the code for each regular variation in this xform.
for (; varIndex < varCount; varIndex++)
{
if (Variation<T>* var = xform->GetVariation(varIndex))
{
xformFuncs << "\n\t//" << var->Name() << ".\n"
<< var->OpenCLString() << (varIndex == varCount - 1 ? "\n" : "\n\n")
<< xform->WriteOpenCLString(VARTYPE_REG, ASSIGNTYPE_SUM);
}
}
}
else
{
xformFuncs <<
" outPoint->m_X = transX;\n"
" outPoint->m_Y = transY;\n"
" outPoint->m_Z = transZ;\n";
}
}
if (xform->PostVariationCount() > 0)
{
varCount += xform->PostVariationCount();
xformFuncs << "\n\t//Apply each of the " << xform->PostVariationCount() << " post variations in this xform.\n";
//Output the code for each post variation in this xform.
for (; varIndex < varCount; varIndex++)
{
if (Variation<T>* var = xform->GetVariation(varIndex))
{
xformFuncs << "\n\t//" << var->Name() << ".\n";
xformFuncs << var->PrecalcOpenCLString();
xformFuncs << xform->ReadOpenCLString(VARTYPE_POST) << endl;
xformFuncs << var->OpenCLString() << endl;
xformFuncs << xform->WriteOpenCLString(VARTYPE_POST, var->AssignType()) << (varIndex == varCount - 1 ? "\n" : "\n\n");
}
}
}
if (xform->HasPost())
{
xformFuncs <<
"\n\t//Apply post affine transform.\n"
"\treal_t tempX = outPoint->m_X;\n"
"\n"
"\toutPoint->m_X = (xform->m_PostA * tempX) + (xform->m_PostB * outPoint->m_Y) + xform->m_PostC;\n" <<
"\toutPoint->m_Y = (xform->m_PostD * tempX) + (xform->m_PostE * outPoint->m_Y) + xform->m_PostF;\n";
}
xformFuncs << "\toutPoint->m_ColorX = outPoint->m_ColorX + xform->m_DirectColor * (tempColor - outPoint->m_ColorX);\n";
xformFuncs << "}\n"
<< "\n";
}
os <<
ConstantDefinesString(doublePrecision) <<
InlineMathFunctionsString <<
ClampRealFunctionString <<
RandFunctionString <<
PointCLStructString <<
XformCLStructString <<
EmberCLStructString <<
UnionCLStructString <<
CarToRasCLStructString <<
CarToRasFunctionString;
if (lockAccum)
os << AtomicString();
os <<
xformFuncs.str() <<
"__kernel void " << m_IterEntryPoint << "(\n" <<
" uint iterCount,\n"
" uint fuseCount,\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
" __global uint2* seeds,\n"
" __constant EmberCL* ember,\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
" __constant XformCL* xforms,\n"
" __constant real_t* parVars,\n"
" __global uchar* xformDistributions,\n"//Using uchar is quicker than uint. Can't be constant because the size can be too large to fit when using xaos.//FINALOPT
" __constant CarToRasCL* carToRas,\n"
" __global real4reals_bucket* histogram,\n"
" uint histSize,\n"
" __read_only image2d_t palette,\n"
" __global Point* points\n"
"\t)\n"
"{\n"
" bool fuse, ok;\n"
" uint threadIndex = INDEX_IN_BLOCK_2D;\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
" uint pointsIndex = INDEX_IN_GRID_2D;\n"
" uint i, itersToDo;\n"
" uint consec = 0;\n"
//" int badvals = 0;\n"
" uint histIndex;\n"
" real_t p00, p01;\n"
" Point firstPoint, secondPoint, tempPoint;\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
" uint2 mwc = seeds[pointsIndex];\n"
" float4 palColor1;\n"
" int2 iPaletteCoord;\n"
" const sampler_t paletteSampler = CLK_NORMALIZED_COORDS_FALSE |\n"//Coords from 0 to 255.
" CLK_ADDRESS_CLAMP_TO_EDGE |\n"//Clamp to edge
" CLK_FILTER_NEAREST;\n"//Don't interpolate
" uint threadXY = (THREAD_ID_X + THREAD_ID_Y);\n"
" uint threadXDivRows = (THREAD_ID_X / (NTHREADS / THREADS_PER_WARP));\n"
" uint threadsMinus1 = NTHREADS - 1;\n"
;
os <<
"\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#ifndef STRAIGHT_RAND
" __local Point swap[NTHREADS];\n"
" __local uint xfsel[NWARPS];\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#endif
"\n"
" iPaletteCoord.y = 0;\n"
"\n"
" if (fuseCount > 0)\n"
" {\n"
" fuse = true;\n"
" itersToDo = fuseCount;\n"
" firstPoint.m_X = MwcNextNeg1Pos1(&mwc);\n"
" firstPoint.m_Y = MwcNextNeg1Pos1(&mwc);\n"
" firstPoint.m_Z = 0.0;\n"
" firstPoint.m_ColorX = MwcNext01(&mwc);\n"
" firstPoint.m_LastXfUsed = 0;\n"
" }\n"
" else\n"
" {\n"
" fuse = false;\n"
" itersToDo = iterCount;\n"
" firstPoint = points[pointsIndex];\n"
" }\n"
"\n";
//This is done once initially here and then again after each swap-sync in the main loop.
//This along with the randomness that the point shuffle provides gives sufficient randomness
//to produce results identical to those produced on the CPU.
os <<
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#ifndef STRAIGHT_RAND
" if (THREAD_ID_Y == 0 && THREAD_ID_X < NWARPS)\n"
" xfsel[THREAD_ID_X] = MwcNext(&mwc) & " << CHOOSE_XFORM_GRAIN_M1 << ";\n"//It's faster to do the & here ahead of time than every time an xform is looked up to use inside the loop.
"\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#endif
" barrier(CLK_LOCAL_MEM_FENCE);\n"
"\n"
" for (i = 0; i < itersToDo; i++)\n"
" {\n";
os <<
" consec = 0;\n"
"\n"
" do\n"
" {\n";
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
//If xaos is present, the a hybrid of the cuburn method is used.
//This makes each thread in a row pick the same offset into a distribution, using xfsel.
//However, the distribution the offset is in, is determined by firstPoint.m_LastXfUsed.
if (ember.XaosPresent())
{
os <<
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#ifdef STRAIGHT_RAND
" secondPoint.m_LastXfUsed = xformDistributions[MwcNext(&mwc) & " << CHOOSE_XFORM_GRAIN_M1 << " + (" << CHOOSE_XFORM_GRAIN << " * (firstPoint.m_LastXfUsed + 1u))];\n\n";
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#else
" secondPoint.m_LastXfUsed = xformDistributions[xfsel[THREAD_ID_Y] + (" << CHOOSE_XFORM_GRAIN << " * (firstPoint.m_LastXfUsed + 1u))];\n\n";//Partial cuburn hybrid.
#endif
}
else
{
os <<
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#ifdef STRAIGHT_RAND
" secondPoint.m_LastXfUsed = xformDistributions[MwcNext(&mwc) & " << CHOOSE_XFORM_GRAIN_M1 << "];\n\n";//For testing, using straight rand flam4/fractron style instead of cuburn.
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#else
" secondPoint.m_LastXfUsed = xformDistributions[xfsel[THREAD_ID_Y]];\n\n";
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#endif
}
for (i = 0; i < ember.XformCount(); i++)
{
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#ifdef USE_CASE
if (i == 0)
{
os <<
" switch (secondPoint.m_LastXfUsed)\n"
" {\n";
}
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
os <<
" case " << i << ":\n"
" {\n" <<
" Xform" << i << "(&(xforms[" << i << "]), parVars, &firstPoint, &secondPoint, &mwc);\n" <<
" break;\n"
" }\n";
if (i == ember.XformCount() - 1)
{
os <<
" }\n";
}
#else
if (i == 0)
os <<
" if (secondPoint.m_LastXfUsed == " << i << ")\n";
else
os <<
" else if (secondPoint.m_LastXfUsed == " << i << ")\n";
os <<
" {\n" <<
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
" Xform" << i << "(&(xforms[" << i << "]), parVars, &firstPoint, &secondPoint, &mwc);\n" <<
" }\n";
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#endif
}
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
os <<
"\n"
" ok = !BadVal(secondPoint.m_X) && !BadVal(secondPoint.m_Y);\n"
//" ok = !BadVal(secondPoint.m_X) && !BadVal(secondPoint.m_Y) && !BadVal(secondPoint.m_Z);\n"
"\n"
" if (!ok)\n"
" {\n"
" firstPoint.m_X = MwcNextNeg1Pos1(&mwc);\n"
" firstPoint.m_Y = MwcNextNeg1Pos1(&mwc);\n"
" firstPoint.m_Z = 0.0;\n"
" firstPoint.m_ColorX = secondPoint.m_ColorX;\n"
" consec++;\n"
//" badvals++;\n"
" }\n"
" }\n"
" while (!ok && consec < 5);\n"
"\n"
" if (!ok)\n"
" {\n"
" secondPoint.m_X = MwcNextNeg1Pos1(&mwc);\n"
" secondPoint.m_Y = MwcNextNeg1Pos1(&mwc);\n"
" secondPoint.m_Z = 0.0;\n"
" }\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#ifndef STRAIGHT_RAND
"\n"//Rotate points between threads. This is how randomization is achieved.
" uint swr = threadXY + ((i & 1u) * threadXDivRows);\n"
" uint sw = (swr * THREADS_PER_WARP + THREAD_ID_X) & threadsMinus1;\n"
"\n"
//Write to another thread's location.
" swap[sw] = secondPoint;\n"
"\n"
//Populate randomized xform index buffer with new random values.
" if (THREAD_ID_Y == 0 && THREAD_ID_X < NWARPS)\n"
" xfsel[THREAD_ID_X] = MwcNext(&mwc) & " << CHOOSE_XFORM_GRAIN_M1 << ";\n"
"\n"
" barrier(CLK_LOCAL_MEM_FENCE);\n"
//Another thread will have written to this thread's location, so read the new value and use it for accumulation below.
" firstPoint = swap[threadIndex];\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
#else
" firstPoint = secondPoint;\n"//For testing, using straight rand flam4/fractron style instead of cuburn.
#endif
"\n"
" if (fuse)\n"
" {\n"
" if (i >= fuseCount - 1)\n"
" {\n"
" i = 0;\n"
" fuse = false;\n"
" itersToDo = iterCount;\n"
" barrier(CLK_LOCAL_MEM_FENCE);\n"//Sort of seems necessary, sort of doesn't. Makes no speed difference.
" }\n"
"\n"
" continue;\n"
" }\n"
"\n";
if (ember.UseFinalXform())
{
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t finalIndex = ember.TotalXformCount() - 1;
//CPU takes an extra step here to preserve the opacity of the randomly selected xform, rather than the final xform's opacity.
//The same thing takes place here automatically because secondPoint.m_LastXfUsed is used below to retrieve the opacity when accumulating.
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
os <<
" if ((xforms[" << finalIndex << "].m_Opacity == 1) || (MwcNext01(&mwc) < xforms[" << finalIndex << "].m_Opacity))\n"
" {\n"
" tempPoint.m_LastXfUsed = secondPoint.m_LastXfUsed;\n"
" Xform" << finalIndex << "(&(xforms[" << finalIndex << "]), parVars, &secondPoint, &tempPoint, &mwc);\n"
" secondPoint = tempPoint;\n"
" }\n"
"\n";
}
os << CreateProjectionString(ember);
if (doAccum)
{
os <<
" p00 = secondPoint.m_X - ember->m_CenterX;\n"
" p01 = secondPoint.m_Y - ember->m_CenterY;\n"
" tempPoint.m_X = (p00 * ember->m_RotA) + (p01 * ember->m_RotB) + ember->m_CenterX;\n"
" tempPoint.m_Y = (p00 * ember->m_RotD) + (p01 * ember->m_RotE) + ember->m_CenterY;\n"
"\n"
//Add this point to the appropriate location in the histogram.
" if (CarToRasInBounds(carToRas, &tempPoint))\n"
" {\n"
" CarToRasConvertPointToSingle(carToRas, &tempPoint, &histIndex);\n"
"\n"
" if (histIndex < histSize)\n"//Provides an extra level of safety and makes no speed difference.
" {\n";
//Basic texture index interoplation does not produce identical results
//to the CPU. So the code here must explicitly do the same thing and not
//rely on the GPU texture coordinate lookup.
if (ember.m_PaletteMode == PALETTE_LINEAR)
{
os <<
" real_t colorIndexFrac;\n"
" real_t colorIndex = secondPoint.m_ColorX * COLORMAP_LENGTH;\n"
" int intColorIndex = (int)colorIndex;\n"
" float4 palColor2;\n"
"\n"
" if (intColorIndex < 0)\n"
" {\n"
" intColorIndex = 0;\n"
" colorIndexFrac = 0;\n"
" }\n"
" else if (intColorIndex >= COLORMAP_LENGTH_MINUS_1)\n"
" {\n"
" intColorIndex = COLORMAP_LENGTH_MINUS_1 - 1;\n"
" colorIndexFrac = 1.0;\n"
" }\n"
" else\n"
" {\n"
" colorIndexFrac = colorIndex - (real_t)intColorIndex;\n"//Interpolate between intColorIndex and intColorIndex + 1.
" }\n"
"\n"
" iPaletteCoord.x = intColorIndex;\n"//Palette operations are strictly float because OpenCL does not support dp64 textures.
" palColor1 = read_imagef(palette, paletteSampler, iPaletteCoord);\n"
" iPaletteCoord.x += 1;\n"
" palColor2 = read_imagef(palette, paletteSampler, iPaletteCoord);\n"
" palColor1 = (palColor1 * (1.0f - (float)colorIndexFrac)) + (palColor2 * (float)colorIndexFrac);\n";//The 1.0f here *must* have the 'f' suffix at the end to compile.
}
else if (ember.m_PaletteMode == PALETTE_STEP)
{
os <<
" iPaletteCoord.x = (int)(secondPoint.m_ColorX * COLORMAP_LENGTH);\n"
" palColor1 = read_imagef(palette, paletteSampler, iPaletteCoord);\n";
}
if (lockAccum)
{
os <<
" AtomicAdd(&(histogram[histIndex].m_Reals[0]), palColor1.x * (real_bucket_t)xforms[secondPoint.m_LastXfUsed].m_VizAdjusted);\n"//Always apply opacity, even though it's usually 1.
" AtomicAdd(&(histogram[histIndex].m_Reals[1]), palColor1.y * (real_bucket_t)xforms[secondPoint.m_LastXfUsed].m_VizAdjusted);\n"
" AtomicAdd(&(histogram[histIndex].m_Reals[2]), palColor1.z * (real_bucket_t)xforms[secondPoint.m_LastXfUsed].m_VizAdjusted);\n"
" AtomicAdd(&(histogram[histIndex].m_Reals[3]), palColor1.w * (real_bucket_t)xforms[secondPoint.m_LastXfUsed].m_VizAdjusted);\n";
}
else
{
os <<
" histogram[histIndex].m_Real4 += (palColor1 * (real_bucket_t)xforms[secondPoint.m_LastXfUsed].m_VizAdjusted);\n";//real_bucket_t should always be float.
}
os <<
" }\n"//histIndex < histSize.
" }\n"//CarToRasInBounds.
"\n"
" barrier(CLK_GLOBAL_MEM_FENCE);\n";//Barrier every time, whether or not the point was in bounds, else artifacts will occur when doing strips.
}
os <<
" }\n"//Main for loop.
"\n"
//At this point, iterating for this round is done, so write the final points back out
//to the global points buffer to be used as inputs for the next round. This preserves point trajectory
//between kernel calls.
#ifdef TEST_CL_BUFFERS//Use this to populate with test values and read back in EmberTester.
" points[pointsIndex].m_X = MwcNextNeg1Pos1(&mwc);\n"
" points[pointsIndex].m_Y = MwcNextNeg1Pos1(&mwc);\n"
" points[pointsIndex].m_Z = MwcNextNeg1Pos1(&mwc);\n"
" points[pointsIndex].m_ColorX = MwcNextNeg1Pos1(&mwc);\n"
#else
" points[pointsIndex] = firstPoint;\n"
0.4.1.5 Beta 11/28/2014 --User Changes Remove limit on the number of xforms allowable on the GPU. This was previously 21. Show actual strips count to be used in parens outside of user specified strips count on final render dialog. Allow for adjustment of iteration depth and fuse count per ember and save/read these values with the xml. Iteration optimizations on both CPU and GPU. Automatically adjust default quality spinner value when using CPU/GPU to 10/30, respectively. --Bug Fixes Fix severe randomization bug with OpenCL. Fix undo list off by one error when doing a new edit anywhere but the end of the undo list. Make integer variation parameters use 4 decimal places in the variations list like all the others. New build of the latest Qt to fix scroll bar drawing bug. Prevent grid from showing as much when pressing control to increase a spinner's increment speed. Still shows sometimes, but better than before. --Code Changes Pass count and fuse to iterator as a structure now to allow for passing more params in the future. Slightly different grid/block logic when running DE filtering on the GPU. Attempt a different way of doing DE, but #define out because it ended up not being faster. Restructure some things to allow for a variable length xforms buffer to be passed to the GPU. Add sub batch size and fuse count as ember members, and remove them from the renderer classes. Remove m_LastPass from Renderer. It should have been removed with passes. Pass seeds as a buffer to the OpenCL iteration kernel, rather than a single seed that gets modified. Slight optimization on CPU accum. Use case statement instead of if/else for xform chosing in OpenCL for a 2% speedup on params with large numbers of xforms. Add SizeOf() wrapper around sizeof(vec[0]) * vec.size(). Remove LogScaleSum() functions from the CPU and GPU because they're no longer used since passes were removed. Make some OpenCLWrapper getters const. Better ogranize RendererCL methods that return grid dimensions.
2014-11-28 04:37:51 -05:00
" seeds[pointsIndex] = mwc;\n"
#endif
" barrier(CLK_GLOBAL_MEM_FENCE);\n"
"}\n";
return os.str();
}
/// <summary>
/// Create an OpenCL string of #defines and a corresponding host side vector for parametric variation values.
/// Parametric variations present a special problem in the iteration code.
/// The values can't be passed in with the array of other xform values because
/// the length of the parametric values is unknown.
/// This is solved by passing a separate buffer of values dedicated specifically
/// to parametric variations.
/// In OpenCL, a series of #define constants are declared which specify the indices in
/// the buffer where the various values are stored.
/// The possibility of a parametric variation type being present in multiple xforms is taken
/// into account by appending the xform index to the #define, thus making each unique.
/// The kernel creator then uses these to retrieve the values in the iteration code.
/// Example:
/// Xform1: Curl (curl_c1: 1.1, curl_c2: 2.2)
/// Xform2: Curl (curl_c1: 4.4, curl_c2: 5.5)
/// Xform3: Blob (blob_low: 1, blob_high: 2, blob_waves: 3)
///
/// Host vector to be passed as arg to the iter kernel call:
/// [1.1][2.2][4.4][5.5][1][2][3]
///
/// #defines in OpenCL to access the buffer:
///
/// #define CURL_C1_1 0
/// #define CURL_C2_1 1
/// #define CURL_C1_2 2
/// #define CURL_C2_2 3
/// #define BLOB_LOW_3 4
/// #define BLOB_HIGH_3 5
/// #define BLOB_WAVES_ 6
///
/// The variations the use these #defines by first looking up the index of the
/// xform they belong to in the parent ember and generating the OpenCL string based on that
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
/// in their overridden OpenCLString() functions.
/// Template argument expected to be float or double.
/// </summary>
/// <param name="ember">The ember to create the values from</param>
/// <param name="params">The string,vector pair to store the values in</param>
/// <param name="doVals">True if the vector should be populated, else false. Default: true.</param>
/// <param name="doString">True if the string should be populated, else false. Default: true.</param>
template <typename T>
void IterOpenCLKernelCreator<T>::ParVarIndexDefines(Ember<T>& ember, pair<string, vector<T>>& params, bool doVals, bool doString)
{
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t i, j, k, size = 0, xformCount = ember.TotalXformCount();
Xform<T>* xform;
ostringstream os;
if (doVals)
params.second.clear();
for (i = 0; i < xformCount; i++)
{
if ((xform = ember.GetTotalXform(i)))
{
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t varCount = xform->TotalVariationCount();
for (j = 0; j < varCount; j++)
{
if (ParametricVariation<T>* parVar = dynamic_cast<ParametricVariation<T>*>(xform->GetVariation(j)))
{
for (k = 0; k < parVar->ParamCount(); k++)
{
if (doString)
os << "#define " << ToUpper(parVar->Params()[k].Name()) << "_" << i << " " << size << endl;//Uniquely identify this param in this variation in this xform.
if (doVals)
params.second.push_back(parVar->Params()[k].ParamVal());
size++;
}
}
}
}
}
if (doString)
{
os << "\n";
params.first = os.str();
}
}
/// <summary>
/// Determine whether the two embers passed in differ enough
/// to require a rebuild of the iteration code.
/// A rebuild is required if they differ in the following ways:
/// Xform count
/// Final xform presence
/// Xaos presence
/// Palette accumulation mode
/// Xform post affine presence
/// Variation count
/// Variation type
/// Template argument expected to be float or double.
/// </summary>
/// <param name="ember1">The first ember to compare</param>
/// <param name="ember2">The second ember to compare</param>
/// <returns>True if a rebuild is required, else false</returns>
template <typename T>
bool IterOpenCLKernelCreator<T>::IsBuildRequired(Ember<T>& ember1, Ember<T>& ember2)
{
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t i, j, xformCount = ember1.TotalXformCount();
if (xformCount != ember2.TotalXformCount())
return true;
if (ember1.UseFinalXform() != ember2.UseFinalXform())
return true;
if (ember1.XaosPresent() != ember2.XaosPresent())
return true;
if (ember1.m_PaletteMode != ember2.m_PaletteMode)
return true;
if (ember1.ProjBits() != ember2.ProjBits())
return true;
for (i = 0; i < xformCount; i++)
{
Xform<T>* xform1 = ember1.GetTotalXform(i);
Xform<T>* xform2 = ember2.GetTotalXform(i);
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t varCount = xform1->TotalVariationCount();
if (xform1->HasPost() != xform2->HasPost())
return true;
if (varCount != xform2->TotalVariationCount())
return true;
for (j = 0; j < varCount; j++)
if (xform1->GetVariation(j)->VariationId() != xform2->GetVariation(j)->VariationId())
return true;
}
return false;
}
/// <summary>
/// Create the zeroize kernel string.
/// OpenCL comes with no way to zeroize a buffer like memset()
/// would do on the CPU. So a special kernel must be ran to set a range
/// of memory addresses to zero.
/// </summary>
/// <returns>The kernel string</returns>
template <typename T>
string IterOpenCLKernelCreator<T>::CreateZeroizeKernelString()
{
ostringstream os;
os <<
ConstantDefinesString(typeid(T) == typeid(double)) <<//Double precision doesn't matter here since it's not used.
"__kernel void " << m_ZeroizeEntryPoint << "(__global uchar* buffer, uint width, uint height)\n"
"{\n"
" if (GLOBAL_ID_X >= width || GLOBAL_ID_Y >= height)\n"
" return;\n"
"\n"
" buffer[(GLOBAL_ID_Y * width) + GLOBAL_ID_X] = 0;\n"//Can't use INDEX_IN_GRID_2D here because the grid might be larger than the buffer to make even dimensions.
" barrier(CLK_GLOBAL_MEM_FENCE);\n"//Just to be safe.
"}\n"
"\n";
return os.str();
}
--User changes -Add support for multiple GPU devices. --These options are present in the command line and in Fractorium. -Change scheme of specifying devices from platform,device to just total device index. --Single number on the command line. --Change from combo boxes for device selection to a table of all devices in Fractorium. -Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Bug fixes -EmberAnimate, EmberRender, FractoriumSettings, FinalRenderDialog: Fix wrong order of arguments to Clamp() when assigning thread priority. -VariationsDC.h: Fix NVidia OpenCL compilation error in DCTriangleVariation. -FractoriumXformsColor.cpp: Checking for null pixmap pointer is not enough, must also check if the underlying buffer is null via call to QPixmap::isNull(). --Code changes -Ember.h: Add case for FLAME_MOTION_NONE and default in ApplyFlameMotion(). -EmberMotion.h: Call base constructor. -EmberPch.h: #pragma once only on Windows. -EmberToXml.h: --Handle different types of exceptions. --Add default cases to ToString(). -Isaac.h: Remove unused variable in constructor. -Point.h: Call base constructor in Color(). -Renderer.h/cpp: --Add bool to Alloc() to only allocate memory for the histogram. Needed for multi-GPU. --Make CoordMap() return a const ref, not a pointer. -SheepTools.h: --Use 64-bit types like the rest of the code already does. --Fix some comment misspellings. -Timing.h: Make BeginTime(), EndTime(), ElapsedTime() and Format() be const functions. -Utils.h: --Add new functions Equal() and Split(). --Handle more exception types in ReadFile(). --Get rid of most legacy blending of C and C++ argument parsing. -XmlToEmber.h: --Get rid of most legacy blending of C and C++ code from flam3. --Remove some unused variables. -EmberAnimate: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --If a render fails, exit since there is no point in continuing an animation with a missing frame. --Pass variables to threaded save better, which most likely fixes a very subtle bug that existed before. --Remove some unused variables. -EmberGenome, EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. -EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --Only print values when not rendering with OpenCL, since they're always 0 in that case. -EmberCLPch.h: --#pragma once only on Windows. --#include <atomic>. -IterOpenCLKernelCreator.h: Add new kernel for summing two histograms. This is needed for multi-GPU. -OpenCLWrapper.h: --Move all OpenCL info related code into its own class OpenCLInfo. --Add members to cache the values of global memory size and max allocation size. -RendererCL.h/cpp: --Redesign to accomodate multi-GPU. --Constructor now takes a vector of devices. --Remove DumpErrorReport() function, it's handled in the base. --ClearBuffer(), ReadPoints(), WritePoints(), ReadHist() and WriteHist() now optionally take a device index as a parameter. --MakeDmap() override and m_DmapCL member removed because it no longer applies since the histogram is always float since the last commit. --Add new function SumDeviceHist() to sum histograms from two devices by first copying to a temporary on the host, then a temporary on the device, then summing. --m_Calls member removed, as it's now per-device. --OpenCLWrapper removed. --m_Seeds member is now a vector of vector of seeds, to accomodate a separate and different array of seeds for each device. --Added member m_Devices, a vector of unique_ptr of RendererCLDevice. -EmberCommon.h --Added Devices() function to convert from a vector of device indices to a vector of platform,device indices. --Changed CreateRenderer() to accept a vector of devices to create a single RendererCL which will split work across multiple devices. --Added CreateRenderers() function to accept a vector of devices to create multiple RendererCL, each which will render on a single device. --Add more comments to some existing functions. -EmberCommonPch.h: #pragma once only on Windows. -EmberOptions.h: --Remove --platform option, it's just sequential device number now with the --device option. --Make --out be OPT_USE_RENDER instead of OPT_RENDER_ANIM since it's an error condition when animating. It makes no sense to write all frames to a single image. --Add Devices() function to parse comma separated --device option string and return a vector of device indices. --Make int and uint types be 64-bit, so intmax_t and size_t. --Make better use of macros. -JpegUtils.h: Make string parameters to WriteJpeg() and WritePng() be const ref. -All project files: Turn off buffer security check option in Visual Studio (/Gs-) -deployment.pri: Remove the line OTHER_FILES +=, it's pointless and was causing problems. -Ember.pro, EmberCL.pro: Add CONFIG += plugin, otherwise it wouldn't link. -EmberCL.pro: Add new files for multi-GPU support. -build_all.sh: use -j4 and QMAKE=${QMAKE:/usr/bin/qmake} -shared_settings.pri: -Add version string. -Remove old DESTDIR definitions. -Add the following lines or else nothing would build: CONFIG(release, debug|release) { CONFIG += warn_off DESTDIR = ../../../Bin/release } CONFIG(debug, debug|release) { DESTDIR = ../../../Bin/debug } QMAKE_POST_LINK += $$quote(cp --update ../../../Data/flam3-palettes.xml $${DESTDIR}$$escape_expand(\n\t)) LIBS += -L/usr/lib -lpthread -AboutDialog.ui: Another futile attempt to make it look correct on Linux. -FinalRenderDialog.h/cpp: --Add support for multi-GPU. --Change from combo boxes for device selection to a table of all devices. --Ensure device selection makes sense. --Remove "FinalRender" prefix of various function names, it's implied given the context. -FinalRenderEmberController.h/cpp: --Add support for multi-GPU. --Change m_FinishedImageCount to be atomic. --Move CancelRender() from the base to FinalRenderEmberController<T>. --Refactor RenderComplete() to omit any progress related functionality or image saving since it can be potentially ran in a thread. --Consolidate setting various renderer fields into SyncGuiToRenderer(). -Fractorium.cpp: Allow for resizing of the options dialog to show the entire device table. -FractoriumCommon.h: Add various functions to handle a table showing the available OpenCL devices on the system. -FractoriumEmberController.h/cpp: Remove m_FinalImageIndex, it's no longer needed. -FractoriumRender.cpp: Scale the interactive sub batch count and quality by the number of devices used. -FractoriumSettings.h/cpp: --Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Add multi-GPU support, remove old device,platform pair. -FractoriumToolbar.cpp: Disable OpenCL toolbar button if there are no devices present on the system. -FractoriumOptionsDialog.h/cpp: --Add support for multi-GPU. --Consolidate more assignments in DataToGui(). --Enable/disable CPU/OpenCL items in response to OpenCL checkbox event. -Misc: Convert almost everything to size_t for unsigned, intmax_t for signed.
2015-09-12 21:33:45 -04:00
template <typename T>
string IterOpenCLKernelCreator<T>::CreateSumHistKernelString()
{
ostringstream os;
os <<
ConstantDefinesString(typeid(T) == typeid(double)) <<//Double precision doesn't matter here since it's not used.
"__kernel void " << m_SumHistEntryPoint << "(__global real4_bucket* source, __global real4_bucket* dest, uint width, uint height, uint clear)\n"
"{\n"
" if (GLOBAL_ID_X >= width || GLOBAL_ID_Y >= height)\n"
" return;\n"
"\n"
" dest[(GLOBAL_ID_Y * width) + GLOBAL_ID_X] += source[(GLOBAL_ID_Y * width) + GLOBAL_ID_X];\n"//Can't use INDEX_IN_GRID_2D here because the grid might be larger than the buffer to make even dimensions.
"\n"
" if (clear)\n"
" source[(GLOBAL_ID_Y * width) + GLOBAL_ID_X] = 0;\n"
"\n"
" barrier(CLK_GLOBAL_MEM_FENCE);\n"//Just to be safe.
"}\n"
"\n";
return os.str();
}
/// <summary>
/// Create the string for 3D projection based on the 3D values of the ember.
/// Projection is done on the second point.
/// If any of these fields toggle between 0 and nonzero between runs, a recompile is triggered.
/// </summary>
/// <param name="ember">The ember to create the projection string for</param>
/// <returns>The kernel string</returns>
template <typename T>
string IterOpenCLKernelCreator<T>::CreateProjectionString(Ember<T>& ember)
{
0.4.1.3 Beta 10/14/2014 --User Changes Size is no longer fixed to the window size. Size scaling is done differently in the final render dialog. This fixes several bugs. Remove Xml saving size from settings and options dialog, it no longer applies. Final render can be broken into strips. Set default save path to the desktop if none is found in the settings file. Set default output size to 1920x1080 if none is found in the settings file. --Bug Fixes Better memory size reporting in final render dialog. --Code Changes Migrate to C++11, Qt 5.3.1, and Visual Studio 2013. Change most instances of unsigned int to size_t, and int to intmax_t. Add m_OrigPixPerUnit and m_ScaleType to Ember for scaling purposes. Replace some sprintf_s() calls in XmlToEmber with ostringstream. Move more non-templated members into RendererBase. Add CopyVec() overload that takes a per element function pointer. Add vector Memset(). Replace '&' with '+' instead of "&amp;" in XmlToEmber for much faster parsing. Break strips rendering out into EmberCommon and call from EmberRender and Fractorium. Make AddAndWriteBuffer() just call WriteBuffer(). Make AddAndWriteImage() delete the existing image first before replacing it. Add SetOutputTexture() to RendererCL to support making new textures in response to resize events. Remove multiple return statements in RendererCL, and replace with a bool that tracks results. Add ToDouble(), MakeEnd(), ToString() and Exists() wrappers in Fractorium. Add Size() wrapper in EmberFile. Make QString function arguments const QString&, and string with const string&. Make ShowCritical() wrapper for invoking a message box from another thread. Add combo box to TwoButtonWidget and rename.
2014-10-14 11:53:15 -04:00
size_t projBits = ember.ProjBits();
ostringstream os;
if (projBits)
{
if (projBits & PROJBITS_BLUR)
{
if (projBits & PROJBITS_YAW)
{
os <<
" real_t dsin, dcos;\n"
" real_t t = MwcNext01(&mwc) * M_2PI;\n"
" real_t z = secondPoint.m_Z - ember->m_CamZPos;\n"
" real_t x = ember->m_C00 * secondPoint.m_X + ember->m_C10 * secondPoint.m_Y;\n"
" real_t y = ember->m_C01 * secondPoint.m_X + ember->m_C11 * secondPoint.m_Y + ember->m_C21 * z;\n"
"\n"
" z = ember->m_C02 * secondPoint.m_X + ember->m_C12 * secondPoint.m_Y + ember->m_C22 * z;\n"
"\n"
" real_t zr = Zeps(1 - ember->m_CamPerspective * z);\n"
" real_t dr = MwcNext01(&mwc) * ember->m_BlurCoef * z;\n"
"\n"
" dsin = sin(t);\n"
" dcos = cos(t);\n"
"\n"
" secondPoint.m_X = (x + dr * dcos) / zr;\n"
" secondPoint.m_Y = (y + dr * dsin) / zr;\n"
" secondPoint.m_Z -= ember->m_CamZPos;\n";
}
else
{
os <<
" real_t y, z, zr;\n"
" real_t dsin, dcos;\n"
" real_t t = MwcNext01(&mwc) * M_2PI;\n"
"\n"
" z = secondPoint.m_Z - ember->m_CamZPos;\n"
" y = ember->m_C11 * secondPoint.m_Y + ember->m_C21 * z;\n"
" z = ember->m_C12 * secondPoint.m_Y + ember->m_C22 * z;\n"
" zr = Zeps(1 - ember->m_CamPerspective * z);\n"
"\n"
" dsin = sin(t);\n"
" dcos = cos(t);\n"
"\n"
" real_t dr = MwcNext01(&mwc) * ember->m_BlurCoef * z;\n"
"\n"
" secondPoint.m_X = (secondPoint.m_X + dr * dcos) / zr;\n"
" secondPoint.m_Y = (y + dr * dsin) / zr;\n"
" secondPoint.m_Z -= ember->m_CamZPos;\n";
}
}
else if ((projBits & PROJBITS_PITCH) || (projBits & PROJBITS_YAW))
{
if (projBits & PROJBITS_YAW)
{
os <<
" real_t z = secondPoint.m_Z - ember->m_CamZPos;\n"
" real_t x = ember->m_C00 * secondPoint.m_X + ember->m_C10 * secondPoint.m_Y;\n"
" real_t y = ember->m_C01 * secondPoint.m_X + ember->m_C11 * secondPoint.m_Y + ember->m_C21 * z;\n"
" real_t zr = Zeps(1 - ember->m_CamPerspective * (ember->m_C02 * secondPoint.m_X + ember->m_C12 * secondPoint.m_Y + ember->m_C22 * z));\n"
"\n"
" secondPoint.m_X = x / zr;\n"
" secondPoint.m_Y = y / zr;\n"
" secondPoint.m_Z -= ember->m_CamZPos;\n";
}
else
{
os <<
" real_t z = secondPoint.m_Z - ember->m_CamZPos;\n"
" real_t y = ember->m_C11 * secondPoint.m_Y + ember->m_C21 * z;\n"
" real_t zr = Zeps(1 - ember->m_CamPerspective * (ember->m_C12 * secondPoint.m_Y + ember->m_C22 * z));\n"
"\n"
" secondPoint.m_X /= zr;\n"
" secondPoint.m_Y = y / zr;\n"
" secondPoint.m_Z -= ember->m_CamZPos;\n";
}
}
else
{
os <<
" real_t zr = Zeps(1 - ember->m_CamPerspective * (secondPoint.m_Z - ember->m_CamZPos));\n"
"\n"
" secondPoint.m_X /= zr;\n"
" secondPoint.m_Y /= zr;\n"
" secondPoint.m_Z -= ember->m_CamZPos;\n";
}
}
return os.str();
}
template EMBERCL_API class IterOpenCLKernelCreator<float>;
#ifdef DO_DOUBLE
template EMBERCL_API class IterOpenCLKernelCreator<double>;
#endif
}