--User changes

-Add support for multiple GPU devices. --These options are present in the command line and in Fractorium. -Change scheme of specifying devices from platform,device to just total device index. --Single number on the command line. --Change from combo boxes for device selection to a table of all devices in Fractorium. -Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Bug fixes -EmberAnimate, EmberRender, FractoriumSettings, FinalRenderDialog: Fix wrong order of arguments to Clamp() when assigning thread priority. -VariationsDC.h: Fix NVidia OpenCL compilation error in DCTriangleVariation. -FractoriumXformsColor.cpp: Checking for null pixmap pointer is not enough, must also check if the underlying buffer is null via call to QPixmap::isNull(). --Code changes -Ember.h: Add case for FLAME_MOTION_NONE and default in ApplyFlameMotion(). -EmberMotion.h: Call base constructor. -EmberPch.h: #pragma once only on Windows. -EmberToXml.h: --Handle different types of exceptions. --Add default cases to ToString(). -Isaac.h: Remove unused variable in constructor. -Point.h: Call base constructor in Color(). -Renderer.h/cpp: --Add bool to Alloc() to only allocate memory for the histogram. Needed for multi-GPU. --Make CoordMap() return a const ref, not a pointer. -SheepTools.h: --Use 64-bit types like the rest of the code already does. --Fix some comment misspellings. -Timing.h: Make BeginTime(), EndTime(), ElapsedTime() and Format() be const functions. -Utils.h: --Add new functions Equal() and Split(). --Handle more exception types in ReadFile(). --Get rid of most legacy blending of C and C++ argument parsing. -XmlToEmber.h: --Get rid of most legacy blending of C and C++ code from flam3. --Remove some unused variables. -EmberAnimate: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --If a render fails, exit since there is no point in continuing an animation with a missing frame. --Pass variables to threaded save better, which most likely fixes a very subtle bug that existed before. --Remove some unused variables. -EmberGenome, EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. -EmberRender: --Support multi-GPU processing that alternates full frames between devices. --Use OpenCLInfo instead of OpenCLWrapper for --openclinfo option. --Remove bucketT template parameter, and hard code float in its place. --Only print values when not rendering with OpenCL, since they're always 0 in that case. -EmberCLPch.h: --#pragma once only on Windows. --#include <atomic>. -IterOpenCLKernelCreator.h: Add new kernel for summing two histograms. This is needed for multi-GPU. -OpenCLWrapper.h: --Move all OpenCL info related code into its own class OpenCLInfo. --Add members to cache the values of global memory size and max allocation size. -RendererCL.h/cpp: --Redesign to accomodate multi-GPU. --Constructor now takes a vector of devices. --Remove DumpErrorReport() function, it's handled in the base. --ClearBuffer(), ReadPoints(), WritePoints(), ReadHist() and WriteHist() now optionally take a device index as a parameter. --MakeDmap() override and m_DmapCL member removed because it no longer applies since the histogram is always float since the last commit. --Add new function SumDeviceHist() to sum histograms from two devices by first copying to a temporary on the host, then a temporary on the device, then summing. --m_Calls member removed, as it's now per-device. --OpenCLWrapper removed. --m_Seeds member is now a vector of vector of seeds, to accomodate a separate and different array of seeds for each device. --Added member m_Devices, a vector of unique_ptr of RendererCLDevice. -EmberCommon.h --Added Devices() function to convert from a vector of device indices to a vector of platform,device indices. --Changed CreateRenderer() to accept a vector of devices to create a single RendererCL which will split work across multiple devices. --Added CreateRenderers() function to accept a vector of devices to create multiple RendererCL, each which will render on a single device. --Add more comments to some existing functions. -EmberCommonPch.h: #pragma once only on Windows. -EmberOptions.h: --Remove --platform option, it's just sequential device number now with the --device option. --Make --out be OPT_USE_RENDER instead of OPT_RENDER_ANIM since it's an error condition when animating. It makes no sense to write all frames to a single image. --Add Devices() function to parse comma separated --device option string and return a vector of device indices. --Make int and uint types be 64-bit, so intmax_t and size_t. --Make better use of macros. -JpegUtils.h: Make string parameters to WriteJpeg() and WritePng() be const ref. -All project files: Turn off buffer security check option in Visual Studio (/Gs-) -deployment.pri: Remove the line OTHER_FILES +=, it's pointless and was causing problems. -Ember.pro, EmberCL.pro: Add CONFIG += plugin, otherwise it wouldn't link. -EmberCL.pro: Add new files for multi-GPU support. -build_all.sh: use -j4 and QMAKE=${QMAKE:/usr/bin/qmake} -shared_settings.pri: -Add version string. -Remove old DESTDIR definitions. -Add the following lines or else nothing would build: CONFIG(release, debug|release) { CONFIG += warn_off DESTDIR = ../../../Bin/release } CONFIG(debug, debug|release) { DESTDIR = ../../../Bin/debug } QMAKE_POST_LINK += $$quote(cp --update ../../../Data/flam3-palettes.xml $${DESTDIR}$$escape_expand(\n\t)) LIBS += -L/usr/lib -lpthread -AboutDialog.ui: Another futile attempt to make it look correct on Linux. -FinalRenderDialog.h/cpp: --Add support for multi-GPU. --Change from combo boxes for device selection to a table of all devices. --Ensure device selection makes sense. --Remove "FinalRender" prefix of various function names, it's implied given the context. -FinalRenderEmberController.h/cpp: --Add support for multi-GPU. --Change m_FinishedImageCount to be atomic. --Move CancelRender() from the base to FinalRenderEmberController<T>. --Refactor RenderComplete() to omit any progress related functionality or image saving since it can be potentially ran in a thread. --Consolidate setting various renderer fields into SyncGuiToRenderer(). -Fractorium.cpp: Allow for resizing of the options dialog to show the entire device table. -FractoriumCommon.h: Add various functions to handle a table showing the available OpenCL devices on the system. -FractoriumEmberController.h/cpp: Remove m_FinalImageIndex, it's no longer needed. -FractoriumRender.cpp: Scale the interactive sub batch count and quality by the number of devices used. -FractoriumSettings.h/cpp: --Temporal samples defaults to 100 instead of 1000 which was needless overkill. --Add multi-GPU support, remove old device,platform pair. -FractoriumToolbar.cpp: Disable OpenCL toolbar button if there are no devices present on the system. -FractoriumOptionsDialog.h/cpp: --Add support for multi-GPU. --Consolidate more assignments in DataToGui(). --Enable/disable CPU/OpenCL items in response to OpenCL checkbox event. -Misc: Convert almost everything to size_t for unsigned, intmax_t for signed.
2015-09-12 18:33:45 -07:00
parent a4aae06b02
commit 018ba26b5f
85 changed files with 3869 additions and 2517 deletions
@@ -1,4 +1,6 @@
-#pragma once
+#ifdef WIN32
+	#pragma once
+#endif

 /// <summary>
 /// Precompiled header file. Place all system includes here with appropriate #defines for different operating systems and compilers.
@@ -37,6 +39,7 @@
 #include <CL/cl.hpp>

 #include <algorithm>
+#include <atomic>
 #include <cstdio>
 #include <cstdlib>
 #include <fstream>
@@ -15,7 +15,9 @@ IterOpenCLKernelCreator<T>::IterOpenCLKernelCreator()
 {
 	m_IterEntryPoint = "IterateKernel";
 	m_ZeroizeEntryPoint = "ZeroizeKernel";
+	m_SumHistEntryPoint = "SumHisteKernel";
 	m_ZeroizeKernel = CreateZeroizeKernelString();
+	m_SumHistKernel = CreateSumHistKernelString();
 }

 /// <summary>
@@ -24,6 +26,8 @@ IterOpenCLKernelCreator<T>::IterOpenCLKernelCreator()

 template <typename T> const string& IterOpenCLKernelCreator<T>::ZeroizeKernel() const { return m_ZeroizeKernel; }
 template <typename T> const string& IterOpenCLKernelCreator<T>::ZeroizeEntryPoint() const { return m_ZeroizeEntryPoint; }
+template <typename T> const string& IterOpenCLKernelCreator<T>::SumHistKernel() const { return m_SumHistKernel; }
+template <typename T> const string& IterOpenCLKernelCreator<T>::SumHistEntryPoint() const { return m_SumHistEntryPoint; }
 template <typename T> const string& IterOpenCLKernelCreator<T>::IterEntryPoint() const { return m_IterEntryPoint; }

 /// <summary>
@@ -703,6 +707,30 @@ string IterOpenCLKernelCreator<T>::CreateZeroizeKernelString()
 	return os.str();
 }

+template <typename T>
+string IterOpenCLKernelCreator<T>::CreateSumHistKernelString()
+{
+	ostringstream os;
+
+	os <<
+		ConstantDefinesString(typeid(T) == typeid(double)) <<//Double precision doesn't matter here since it's not used.
+		"__kernel void " << m_SumHistEntryPoint << "(__global real4_bucket* source, __global real4_bucket* dest, uint width, uint height, uint clear)\n"
+		"{\n"
+		"	if (GLOBAL_ID_X >= width || GLOBAL_ID_Y >= height)\n"
+		"		return;\n"
+		"\n"
+		"	dest[(GLOBAL_ID_Y * width) + GLOBAL_ID_X] += source[(GLOBAL_ID_Y * width) + GLOBAL_ID_X];\n"//Can't use INDEX_IN_GRID_2D here because the grid might be larger than the buffer to make even dimensions.
+		"\n"
+		"	if (clear)\n"
+		"		source[(GLOBAL_ID_Y * width) + GLOBAL_ID_X] = 0;\n"
+		"\n"
+		"	barrier(CLK_GLOBAL_MEM_FENCE);\n"//Just to be safe.
+		"}\n"
+		"\n";
+
+	return os.str();
+}
+
 /// <summary>
 /// Create the string for 3D projection based on the 3D values of the ember.
 /// Projection is done on the second point.
@@ -26,6 +26,8 @@ public:
 	IterOpenCLKernelCreator();
 	const string& ZeroizeKernel() const;
 	const string& ZeroizeEntryPoint() const;
+	const string& SumHistKernel() const;
+	const string& SumHistEntryPoint() const;
 	const string& IterEntryPoint() const;
 	string CreateIterKernelString(Ember<T>& ember, string& parVarDefines, bool lockAccum = false, bool doAccum = true);
 	static void ParVarIndexDefines(Ember<T>& ember, pair<string, vector<T>>& params, bool doVals = true, bool doString = true);
@@ -33,18 +35,21 @@ public:

 private:
 	string CreateZeroizeKernelString();
+	string CreateSumHistKernelString();
 	string CreateProjectionString(Ember<T>& ember);

 	string m_IterEntryPoint;
 	string m_ZeroizeKernel;
 	string m_ZeroizeEntryPoint;
+	string m_SumHistKernel;
+	string m_SumHistEntryPoint;
 };

 #ifdef OPEN_CL_TEST_AREA
-typedef void (*KernelFuncPointer) (uint gridWidth, uint gridHeight, uint blockWidth, uint blockHeight,
-								   uint BLOCK_ID_X, uint BLOCK_ID_Y, uint THREAD_ID_X, uint THREAD_ID_Y);
+typedef void (*KernelFuncPointer) (size_t gridWidth, size_t gridHeight, size_t blockWidth, size_t blockHeight,
+								   size_t BLOCK_ID_X, size_t BLOCK_ID_Y, size_t THREAD_ID_X, size_t THREAD_ID_Y);

-static void OpenCLSim(uint gridWidth, uint gridHeight, uint blockWidth, uint blockHeight, KernelFuncPointer func)
+static void OpenCLSim(size_t gridWidth, size_t gridHeight, size_t blockWidth, size_t blockHeight, KernelFuncPointer func)
 {
 	cout << "OpenCLSim(): " << endl;
 	cout << "	Params: " << endl;
@@ -53,13 +58,13 @@ static void OpenCLSim(uint gridWidth, uint gridHeight, uint blockWidth, uint blo
 	cout << "		blockW: " << blockWidth << endl;
 	cout << "		blockH: " << blockHeight << endl;

-	for (uint i = 0; i < gridHeight; i += blockHeight)
+	for (size_t i = 0; i < gridHeight; i += blockHeight)
 	{
-		for (uint j = 0; j < gridWidth; j += blockWidth)
+		for (size_t j = 0; j < gridWidth; j += blockWidth)
 		{
-			for (uint k = 0; k < blockHeight; k++)
+			for (size_t k = 0; k < blockHeight; k++)
 			{
-				for (uint l = 0; l < blockWidth; l++)
+				for (size_t l = 0; l < blockWidth; l++)
 				{
 					func(gridWidth, gridHeight, blockWidth, blockHeight, j / blockWidth, i / blockHeight, l, k);
 				}
@@ -0,0 +1,406 @@
+#include "EmberCLPch.h"
+#include "OpenCLInfo.h"
+
+namespace EmberCLns
+{
+/// <summary>
+/// Initialize and return a reference to the one and only OpenCLInfo object.
+/// </summary>
+/// <returns>A reference to the only OpenCLInfo object.</returns>
+OpenCLInfo& OpenCLInfo::Instance()
+{
+	static OpenCLInfo instance;
+
+	return instance;
+}
+
+/// <summary>
+/// Initialize the all platforms and devices and keep information about them in lists.
+/// </summary>
+OpenCLInfo::OpenCLInfo()
+{
+	cl_int err;
+	vector<cl::Platform> platforms;
+	vector<vector<cl::Device>> devices;
+	intmax_t workingPlatformIndex = -1;
+
+	m_Init = false;
+	cl::Platform::get(&platforms);
+	devices.resize(platforms.size());
+	m_Platforms.reserve(platforms.size());
+	m_Devices.reserve(platforms.size());
+	m_DeviceNames.reserve(platforms.size());
+	m_AllDeviceNames.reserve(platforms.size());
+	m_DeviceIndices.reserve(platforms.size());
+
+	for (size_t i = 0; i < platforms.size(); i++)
+		platforms[i].getDevices(CL_DEVICE_TYPE_ALL, &devices[i]);
+
+	for (size_t platform = 0; platform < platforms.size(); platform++)
+	{
+		bool platformOk = false;
+		bool deviceOk = false;
+		cl::Context context;
+
+		if (CreateContext(platforms[platform], context, false))//Platform is ok, now do context. Unshared by default.
+		{
+			size_t workingDeviceIndex = 0;
+
+			for (size_t device = 0; device < devices[platform].size(); device++)//Context is ok, now do devices.
+			{
+				auto q = cl::CommandQueue(context, devices[platform][device], 0, &err);//At least one GPU device is present, so create a command queue.
+
+				if (CheckCL(err, "cl::CommandQueue()"))
+				{
+					if (!platformOk)
+					{
+						m_Platforms.push_back(platforms[platform]);
+						m_PlatformNames.push_back(platforms[platform].getInfo<CL_PLATFORM_VENDOR>(nullptr) + " " + platforms[platform].getInfo<CL_PLATFORM_NAME>(nullptr) + " " + platforms[platform].getInfo<CL_PLATFORM_VERSION>(nullptr));
+						workingPlatformIndex++;
+						platformOk = true;
+					}
+
+					if (!deviceOk)
+					{
+						m_Devices.push_back(vector<cl::Device>());
+						m_DeviceNames.push_back(vector<string>());
+						m_Devices.back().reserve(devices[platform].size());
+						m_DeviceNames.back().reserve(devices[platform].size());
+						deviceOk = true;
+					}
+
+					m_Devices.back().push_back(devices[platform][device]);
+					m_DeviceNames.back().push_back(devices[platform][device].getInfo<CL_DEVICE_VENDOR>(nullptr) + " " + devices[platform][device].getInfo<CL_DEVICE_NAME>(nullptr));// + " " + devices[platform][device].getInfo<CL_DEVICE_VERSION>());
+					m_AllDeviceNames.push_back(m_DeviceNames.back().back());
+					m_DeviceIndices.push_back(pair<size_t, size_t>(workingPlatformIndex, workingDeviceIndex++));
+					m_Init = true;//If at least one platform and device succeeded, OpenCL is ok. It's now ok to begin building and running programs.
+				}
+			}
+		}
+	}
+}
+
+/// <summary>
+/// Get a const reference to the vector of available platforms.
+/// </summary>
+/// <returns>A const reference to the vector of available platforms</returns>
+const vector<cl::Platform>& OpenCLInfo::Platforms() const
+{
+	return m_Platforms;
+}
+
+/// <summary>
+/// Get a const reference to the platform name at the specified index.
+/// </summary>
+/// <param name="i">The platform index to get the name of</param>
+/// <returns>The platform name if found, else empty string</returns>
+const string& OpenCLInfo::PlatformName(size_t platform) const
+{
+	static string s;
+	return platform < m_PlatformNames.size() ? m_PlatformNames[platform] : s;
+}
+
+/// <summary>
+/// Get a const reference to a vector of all available platform names on the system as a vector of strings.
+/// </summary>
+/// <returns>All available platform names on the system as a vector of strings</returns>
+const vector<string>& OpenCLInfo::PlatformNames() const
+{
+	return m_PlatformNames;
+}
+
+/// <summary>
+/// Get a const reference to a vector of vectors of all available devices on the system.
+/// Each outer vector is a different platform.
+/// </summary>
+/// <returns>All available devices on the system, grouped by platform.</returns>
+const vector<vector<cl::Device>>& OpenCLInfo::Devices() const
+{
+	return m_Devices;
+}
+
+/// <summary>
+/// Get a const reference to the device name at the specified index on the platform
+/// at the specified index.
+/// </summary>
+/// <param name="platform">The platform index of the device</param>
+/// <param name="device">The device index</param>
+/// <returns>The name of the device if found, else empty string</returns>
+const string& OpenCLInfo::DeviceName(size_t platform, size_t device) const
+{
+	static string s;
+
+	if (platform < m_Platforms.size() && platform < m_Devices.size())
+		if (device < m_Devices[platform].size())
+			return m_DeviceNames[platform][device];
+
+	return s;
+}
+
+/// <summary>
+/// Get a const reference to a vector of pairs of uints which contain the platform,device
+/// indices of all available devices on the system.
+/// </summary>
+/// <returns>All available devices on the system as platform,device index pairs</returns>
+const vector<pair<size_t, size_t>>& OpenCLInfo::DeviceIndices() const
+{
+	return m_DeviceIndices;
+}
+
+/// <summary>
+/// Get a const reference to a vector of all available device names on the system as a vector of strings.
+/// </summary>
+/// <returns>All available device names on the system as a vector of strings</returns>
+const vector<string>& OpenCLInfo::AllDeviceNames() const
+{
+	return m_AllDeviceNames;
+}
+
+/// <summary>
+/// Get a const reference to a vector of all available device names on the platform
+/// at the specified index as a vector of strings.
+/// </summary>
+/// <param name="platform">The platform index whose devices names will be returned</param>
+/// <returns>All available device names on the platform at the specified index as a vector of strings if within range, else empty vector.</returns>
+const vector<string>& OpenCLInfo::DeviceNames(size_t platform) const
+{
+	static vector<string> v;
+
+	if (platform < m_DeviceNames.size())
+		return m_DeviceNames[platform];
+
+	return v;
+}
+
+/// <summary>
+/// Get the total device index at the specified platform and device index.
+/// </summary>
+/// <param name="platform">The platform index of the device</param>
+/// <param name="device">The device index within the platform</param>
+/// <returns>The total device index if found, else 0</returns>
+size_t OpenCLInfo::TotalDeviceIndex(size_t platform, size_t device) const
+{
+	size_t index = 0;
+	pair<size_t, size_t> p{ platform, device };
+
+	for (size_t i = 0; i < m_DeviceIndices.size(); i++)
+	{
+		if (p == m_DeviceIndices[i])
+		{
+			index = i;
+			break;
+		}
+	}
+
+	return index;
+}
+
+/// <summary>
+/// Create a context that is optionally shared with OpenGL and plact it in the
+/// passed in context ref parameter.
+/// </summary>
+/// <param name="platform">The platform object to create the context on</param>
+/// <param name="context">The context object to store the result in</param>
+/// <param name="shared">True if shared with OpenGL, else not shared.</param>
+/// <returns>True if success, else false.</returns>
+bool OpenCLInfo::CreateContext(const cl::Platform& platform, cl::Context& context, bool shared)
+{
+	cl_int err;
+
+	if (shared)
+	{
+		//Define OS-specific context properties and create the OpenCL context.
+		#if defined (__APPLE__) || defined(MACOSX)
+			CGLContextObj kCGLContext = CGLGetCurrentContext();
+			CGLShareGroupObj kCGLShareGroup = CGLGetShareGroup(kCGLContext);
+			cl_context_properties props[] =
+			{
+				CL_CONTEXT_PROPERTY_USE_CGL_SHAREGROUP_APPLE, (cl_context_properties)kCGLShareGroup,
+				0
+			};
+
+			context = cl::Context(CL_DEVICE_TYPE_GPU, props, nullptr, nullptr, &err);//May need to tinker with this on Mac.
+		#else
+			#if defined WIN32
+				cl_context_properties props[] =
+				{
+					CL_GL_CONTEXT_KHR, (cl_context_properties)wglGetCurrentContext(),
+					CL_WGL_HDC_KHR, (cl_context_properties)wglGetCurrentDC(),
+					CL_CONTEXT_PLATFORM, reinterpret_cast<cl_context_properties>((platform)()),
+					0
+				};
+
+				context = cl::Context(CL_DEVICE_TYPE_GPU, props, nullptr, nullptr, &err);
+			#else
+				cl_context_properties props[] =
+				{
+					CL_GL_CONTEXT_KHR, cl_context_properties(glXGetCurrentContext()),
+					CL_GLX_DISPLAY_KHR, cl_context_properties(glXGetCurrentDisplay()),
+					CL_CONTEXT_PLATFORM, reinterpret_cast<cl_context_properties>((platform)()),
+					0
+				};
+
+				context = cl::Context(CL_DEVICE_TYPE_GPU, props, nullptr, nullptr, &err);
+			#endif
+		#endif
+	}
+	else
+	{
+		cl_context_properties props[3] =
+		{
+			CL_CONTEXT_PLATFORM,
+			reinterpret_cast<cl_context_properties>((platform)()),
+			0
+		};
+
+		context = cl::Context(CL_DEVICE_TYPE_ALL, props, nullptr, nullptr, &err);
+	}
+
+	return CheckCL(err, "cl::Context()");
+}
+
+/// <summary>
+/// Return whether at least one device has been found and properly initialized.
+/// </summary>
+/// <returns>True if success, else false.</returns>
+bool OpenCLInfo::Ok() const
+{
+	return m_Init;
+}
+
+/// <summary>
+/// Get all information about all platforms and devices.
+/// </summary>
+/// <returns>A string with all information about all platforms and devices</returns>
+string OpenCLInfo::DumpInfo() const
+{
+	ostringstream os;
+	vector<size_t> sizes;
+
+	os.imbue(locale(""));
+
+	for (size_t platform = 0; platform < m_Platforms.size(); platform++)
+	{
+		os << "Platform " << platform << ": " << PlatformName(platform) << endl;
+
+		for (size_t device = 0; device < m_Devices[platform].size(); device++)
+		{
+			os << "Device " << device << ": " << DeviceName(platform, device) << endl;
+			os << "CL_DEVICE_OPENCL_C_VERSION: " << GetInfo<string>(platform, device, CL_DEVICE_OPENCL_C_VERSION) << endl;
+			os << "CL_DEVICE_LOCAL_MEM_SIZE: " << GetInfo<cl_ulong>(platform, device, CL_DEVICE_LOCAL_MEM_SIZE) << endl;
+			os << "CL_DEVICE_LOCAL_MEM_TYPE: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_LOCAL_MEM_TYPE) << endl;
+			os << "CL_DEVICE_MAX_COMPUTE_UNITS: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_MAX_COMPUTE_UNITS) << endl;
+			os << "CL_DEVICE_MAX_READ_IMAGE_ARGS: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_MAX_READ_IMAGE_ARGS) << endl;
+			os << "CL_DEVICE_MAX_WRITE_IMAGE_ARGS: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_MAX_WRITE_IMAGE_ARGS) << endl;
+			os << "CL_DEVICE_MAX_MEM_ALLOC_SIZE: " << GetInfo<cl_ulong>(platform, device, CL_DEVICE_MAX_MEM_ALLOC_SIZE) << endl;
+			os << "CL_DEVICE_ADDRESS_BITS: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_ADDRESS_BITS) << endl;
+
+			os << "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE) << endl;
+			os << "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE) << endl;
+			os << "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: " << GetInfo<cl_ulong>(platform, device, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE) << endl;
+			os << "CL_DEVICE_GLOBAL_MEM_SIZE: " << GetInfo<cl_ulong>(platform, device, CL_DEVICE_GLOBAL_MEM_SIZE) << endl;
+			os << "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: " << GetInfo<cl_ulong>(platform, device, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE) << endl;
+
+			os << "CL_DEVICE_MAX_CONSTANT_ARGS: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_MAX_CONSTANT_ARGS) << endl;
+			os << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: " << GetInfo<cl_uint>(platform, device, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS) << endl;
+			os << "CL_DEVICE_MAX_WORK_GROUP_SIZE: " << GetInfo<size_t>(platform, device, CL_DEVICE_MAX_WORK_GROUP_SIZE) << endl;
+
+			sizes = GetInfo<vector<size_t>>(platform, device, CL_DEVICE_MAX_WORK_ITEM_SIZES);
+			os << "CL_DEVICE_MAX_WORK_ITEM_SIZES: " << sizes[0] << ", " << sizes[1] << ", " << sizes[2] << endl << endl;
+
+			if (device != m_Devices[platform].size() - 1 && platform != m_Platforms.size() - 1)
+				os << endl;
+		}
+
+		os << endl;
+	}
+
+	return os.str();
+}
+
+/// <summary>
+/// Check an OpenCL return value for errors.
+/// </summary>
+/// <param name="err">The error code to inspect</param>
+/// <param name="name">A description of where the value was gotten from</param>
+/// <returns>True if success, else false.</returns>
+bool OpenCLInfo::CheckCL(cl_int err, const char* name)
+{
+	if (err != CL_SUCCESS)
+	{
+		ostringstream ss;
+		ss << "ERROR: " << ErrorToStringCL(err) << " in " << name << "." << endl;
+		m_ErrorReport.push_back(ss.str());
+	}
+
+	return err == CL_SUCCESS;
+}
+
+/// <summary>
+/// Translate an OpenCL error code into a human readable string.
+/// </summary>
+/// <param name="err">The error code to translate</param>
+/// <returns>A human readable description of the error passed in</returns>
+string OpenCLInfo::ErrorToStringCL(cl_int err)
+{
+	switch (err)
+	{
+		case CL_SUCCESS:								   return "Success";
+		case CL_DEVICE_NOT_FOUND:						   return "Device not found";
+		case CL_DEVICE_NOT_AVAILABLE:					   return "Device not available";
+		case CL_COMPILER_NOT_AVAILABLE:					   return "Compiler not available";
+		case CL_MEM_OBJECT_ALLOCATION_FAILURE:			   return "Memory object allocation failure";
+		case CL_OUT_OF_RESOURCES:						   return "Out of resources";
+		case CL_OUT_OF_HOST_MEMORY:						   return "Out of host memory";
+		case CL_PROFILING_INFO_NOT_AVAILABLE:			   return "Profiling information not available";
+		case CL_MEM_COPY_OVERLAP:						   return "Memory copy overlap";
+		case CL_IMAGE_FORMAT_MISMATCH:					   return "Image format mismatch";
+		case CL_IMAGE_FORMAT_NOT_SUPPORTED:				   return "Image format not supported";
+		case CL_BUILD_PROGRAM_FAILURE:					   return "Program build failure";
+		case CL_MAP_FAILURE:							   return "Map failure";
+		case CL_MISALIGNED_SUB_BUFFER_OFFSET:			   return "Misaligned sub buffer offset";
+		case CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST: return "Exec status error for events in wait list";
+		case CL_INVALID_VALUE:							   return "Invalid value";
+		case CL_INVALID_DEVICE_TYPE:					   return "Invalid device type";
+		case CL_INVALID_PLATFORM:						   return "Invalid platform";
+		case CL_INVALID_DEVICE:							   return "Invalid device";
+		case CL_INVALID_CONTEXT:						   return "Invalid context";
+		case CL_INVALID_QUEUE_PROPERTIES:				   return "Invalid queue properties";
+		case CL_INVALID_COMMAND_QUEUE:					   return "Invalid command queue";
+		case CL_INVALID_HOST_PTR:						   return "Invalid host pointer";
+		case CL_INVALID_MEM_OBJECT:						   return "Invalid memory object";
+		case CL_INVALID_IMAGE_FORMAT_DESCRIPTOR:		   return "Invalid image format descriptor";
+		case CL_INVALID_IMAGE_SIZE:						   return "Invalid image size";
+		case CL_INVALID_SAMPLER:						   return "Invalid sampler";
+		case CL_INVALID_BINARY:							   return "Invalid binary";
+		case CL_INVALID_BUILD_OPTIONS:					   return "Invalid build options";
+		case CL_INVALID_PROGRAM:						   return "Invalid program";
+		case CL_INVALID_PROGRAM_EXECUTABLE:				   return "Invalid program executable";
+		case CL_INVALID_KERNEL_NAME:					   return "Invalid kernel name";
+		case CL_INVALID_KERNEL_DEFINITION:				   return "Invalid kernel definition";
+		case CL_INVALID_KERNEL:							   return "Invalid kernel";
+		case CL_INVALID_ARG_INDEX:						   return "Invalid argument index";
+		case CL_INVALID_ARG_VALUE:						   return "Invalid argument value";
+		case CL_INVALID_ARG_SIZE:						   return "Invalid argument size";
+		case CL_INVALID_KERNEL_ARGS:					   return "Invalid kernel arguments";
+		case CL_INVALID_WORK_DIMENSION:					   return "Invalid work dimension";
+		case CL_INVALID_WORK_GROUP_SIZE:				   return "Invalid work group size";
+		case CL_INVALID_WORK_ITEM_SIZE:					   return "Invalid work item size";
+		case CL_INVALID_GLOBAL_OFFSET:					   return "Invalid global offset";
+		case CL_INVALID_EVENT_WAIT_LIST:				   return "Invalid event wait list";
+		case CL_INVALID_EVENT:							   return "Invalid event";
+		case CL_INVALID_OPERATION:						   return "Invalid operation";
+		case CL_INVALID_GL_OBJECT:						   return "Invalid OpenGL object";
+		case CL_INVALID_BUFFER_SIZE:					   return "Invalid buffer size";
+		case CL_INVALID_MIP_LEVEL:						   return "Invalid mip-map level";
+		case CL_INVALID_GLOBAL_WORK_SIZE:				   return "Invalid global work size";
+		case CL_INVALID_PROPERTY:						   return "Invalid property";
+		default:
+		{
+			ostringstream ss;
+			ss << "<Unknown error code> " << err;
+			return ss.str();
+		}
+	}
+}
+}
@@ -0,0 +1,69 @@
+#pragma once
+
+#include "EmberCLPch.h"
+
+/// <summary>
+/// OpenCLInfo class.
+/// </summary>
+
+namespace EmberCLns
+{
+/// <summary>
+/// Keeps information about all valid OpenCL devices on this system.
+/// Devices which do not successfully create a test command queue are not
+/// added to the list.
+/// The pattern is singleton, so there is only one instance per program,
+/// retreivable by reference via the Instance() function.
+/// This class derives from EmberReport, so the caller is able
+/// to retrieve a text dump of error information if any errors occur.
+/// </summary>
+class EMBERCL_API OpenCLInfo : public EmberReport
+{
+public:
+	static OpenCLInfo& Instance();
+	const vector<cl::Platform>& Platforms() const;
+	const string& PlatformName(size_t platform) const;
+	const vector<string>& PlatformNames() const;
+	const vector<vector<cl::Device>>& Devices() const;
+	const string& DeviceName(size_t platform, size_t device) const;
+	const vector<pair<size_t, size_t>>& DeviceIndices() const;
+	const vector<string>& AllDeviceNames() const;
+	const vector<string>& DeviceNames(size_t platform) const;
+	size_t TotalDeviceIndex(size_t platform, size_t device) const;
+	string DumpInfo() const;
+	bool Ok() const;
+	bool CreateContext(const cl::Platform& platform, cl::Context& context, bool shared);
+	bool CheckCL(cl_int err, const char* name);
+	string ErrorToStringCL(cl_int err);
+
+	/// <summary>
+	/// Get device information for the specified field.
+	/// Template argument expected to be cl_ulong, cl_uint or cl_int;
+	/// </summary>
+	/// <param name="platform">The index platform of the platform to use</param>
+	/// <param name="device">The index device of the device to use</param>
+	/// <param name="name">The device field/feature to query</param>
+	/// <returns>The value of the field</returns>
+	template<typename T>
+	T GetInfo(size_t platform, size_t device, cl_device_info name) const
+	{
+		T val = T();
+
+		if (platform < m_Devices.size() && device < m_Devices[platform].size())
+			m_Devices[platform][device].getInfo(name, &val);
+
+		return val;
+	}
+
+private:
+	OpenCLInfo();
+
+	bool m_Init;
+	vector<cl::Platform> m_Platforms;
+	vector<vector<cl::Device>> m_Devices;
+	vector<string> m_PlatformNames;
+	vector<vector<string>> m_DeviceNames;
+	vector<pair<size_t, size_t>> m_DeviceIndices;
+	vector<string> m_AllDeviceNames;
+};
+}
@@ -5,33 +5,23 @@ namespace EmberCLns
 {
 /// <summary>
 /// Constructor that sets everything to an uninitialized state.
-/// No OpenCL setup is done here, the caller must explicitly do it.
+/// No OpenCL setup is done here other than what's done in the
+/// global OpenCLInfo object. The caller must explicitly do it.
 /// </summary>
 OpenCLWrapper::OpenCLWrapper()
+	: m_Info(OpenCLInfo::Instance())
 {
 	m_Init = false;
 	m_Shared = false;
 	m_PlatformIndex = 0;
 	m_DeviceIndex = 0;
 	m_LocalMemSize = 0;
-	cl::Platform::get(&m_Platforms);
-	m_Devices.resize(m_Platforms.size());

-	for (size_t i = 0; i < m_Platforms.size(); i++)
-		m_Platforms[i].getDevices(CL_DEVICE_TYPE_ALL, &m_Devices[i]);
-}
-
-/// <summary>
-/// Determine if OpenCL is available on the system.
-/// </summary>
-/// <returns>True if any OpenCL platform and at least 1 device within that platform exists on the system, else false.</returns>
-bool OpenCLWrapper::CheckOpenCL()
-{
-	for (size_t i = 0; i < m_Platforms.size(); i++)
-		for (size_t j = 0; j < m_Devices[i].size(); j++)
-			return true;
-
-	return false;
+	//Pre-allocate some space to avoid temporary copying.
+	m_Programs.reserve(4);
+	m_Buffers.reserve(4);
+	m_Images.reserve(4);
+	m_GLImages.reserve(4);
 }

 /// <summary>
@@ -42,35 +32,40 @@ bool OpenCLWrapper::CheckOpenCL()
 /// <param name="device">The index device of the device to use</param>
 /// <param name="shared">True if shared with OpenGL, else false.</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::Init(uint platform, uint device, bool shared)
+bool OpenCLWrapper::Init(size_t platformIndex, size_t deviceIndex, bool shared)
 {
 	cl_int err;
+	auto& platforms = m_Info.Platforms();
+	auto& devices = m_Info.Devices();

 	m_Init = false;
 	m_ErrorReport.clear();

-	if (m_Platforms.size() > 0)
+	if (m_Info.Ok())
 	{
-		if (platform < m_Platforms.size() && platform < m_Devices.size())
+		if (platformIndex < platforms.size() && platformIndex < devices.size())
 		{
-			m_PlatformIndex = platform;//Platform is ok, now do context.
+			cl::Context context;

-			if (CreateContext(shared))
+			if (m_Info.CreateContext(platforms[platformIndex], context, shared))//Platform index is within range, now do context.
 			{
-				//Context is ok, now do device.
-				if (device < m_Devices[m_PlatformIndex].size())
+				if (deviceIndex < devices[platformIndex].size())//Context is ok, now do device.
 				{
-					//At least one GPU device is present, so create a command queue.
-					m_Queue = cl::CommandQueue(m_Context, m_Devices[m_PlatformIndex][device], 0, &err);
+					auto q = cl::CommandQueue(context, devices[platformIndex][deviceIndex], 0, &err);//At least one GPU device is present, so create a command queue.

-					if (CheckCL(err, "cl::CommandQueue()"))
+					if (m_Info.CheckCL(err, "cl::CommandQueue()"))//Everything was successful so assign temporaries to members.
 					{
-						m_DeviceIndex = device;
-						m_Platform = m_Platforms[m_PlatformIndex];
-						m_Device = m_Devices[m_PlatformIndex][device];
+						m_Platform = platforms[platformIndex];
+						m_Device = devices[platformIndex][deviceIndex];
+						m_Context = context;
+						m_Queue = q;
+						m_PlatformIndex = platformIndex;
+						m_DeviceIndex = deviceIndex;
 						m_DeviceVec.clear();
 						m_DeviceVec.push_back(m_Device);
-						m_LocalMemSize = uint(GetInfo<cl_ulong>(m_PlatformIndex, m_DeviceIndex, CL_DEVICE_LOCAL_MEM_SIZE));
+						m_LocalMemSize = size_t(m_Info.GetInfo<cl_ulong>(m_PlatformIndex, m_DeviceIndex, CL_DEVICE_LOCAL_MEM_SIZE));
+						m_GlobalMemSize = size_t(m_Info.GetInfo<cl_ulong>(m_PlatformIndex, m_DeviceIndex, CL_DEVICE_GLOBAL_MEM_SIZE));
+						m_MaxAllocSize = size_t(m_Info.GetInfo<cl_ulong>(m_PlatformIndex, m_DeviceIndex, CL_DEVICE_MAX_MEM_ALLOC_SIZE));
 						m_Shared = shared;
 						m_Init = true;//Command queue is ok, it's now ok to begin building and running programs.
 					}
@@ -96,11 +91,11 @@ bool OpenCLWrapper::AddProgram(const string& name, const string& program, const

 	if (CreateSPK(name, program, entryPoint, spk, doublePrecision))
 	{
-		for (auto& program : m_Programs)
+		for (auto& p : m_Programs)
 		{
-			if (name == program.m_Name)
+			if (name == p.m_Name)
 			{
-				program = spk;
+				p = spk;
 				return true;
 			}
 		}
@@ -144,7 +139,7 @@ bool OpenCLWrapper::AddBuffer(const string& name, size_t size, cl_mem_flags flag
 		{
 			cl::Buffer buff(m_Context, flags, size, nullptr, &err);

-			if (!CheckCL(err, "cl::Buffer()"))
+			if (!m_Info.CheckCL(err, "cl::Buffer()"))
 				return false;

 			NamedBuffer nb(buff, name);
@@ -157,7 +152,7 @@ bool OpenCLWrapper::AddBuffer(const string& name, size_t size, cl_mem_flags flag

 			cl::Buffer buff(m_Context, flags, size, nullptr, &err);//Create the new buffer.

-			if (!CheckCL(err, "cl::Buffer()"))
+			if (!m_Info.CheckCL(err, "cl::Buffer()"))
 				return false;

 			NamedBuffer nb(buff, name);//Make a named buffer out of the new buffer.
@@ -215,7 +210,7 @@ bool OpenCLWrapper::WriteBuffer(const string& name, void* data, size_t size)
 /// <param name="data">A pointer to the buffer</param>
 /// <param name="size">The size in bytes of the buffer</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::WriteBuffer(uint bufferIndex, void* data, size_t size)
+bool OpenCLWrapper::WriteBuffer(size_t bufferIndex, void* data, size_t size)
 {
 	if (m_Init && (bufferIndex < m_Buffers.size()) && (GetBufferSize(bufferIndex) == size))
 	{
@@ -225,7 +220,7 @@ bool OpenCLWrapper::WriteBuffer(uint bufferIndex, void* data, size_t size)
 		e.wait();
 		m_Queue.finish();

-		if (CheckCL(err, "cl::CommandQueue::enqueueWriteBuffer()"))
+		if (m_Info.CheckCL(err, "cl::CommandQueue::enqueueWriteBuffer()"))
 			return true;
 	}

@@ -253,7 +248,7 @@ bool OpenCLWrapper::ReadBuffer(const string& name, void* data, size_t size)
 /// <param name="data">A pointer to a buffer to copy the data to</param>
 /// <param name="size">The size in bytes of the buffer</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::ReadBuffer(uint bufferIndex, void* data, size_t size)
+bool OpenCLWrapper::ReadBuffer(size_t bufferIndex, void* data, size_t size)
 {
 	if (m_Init && (bufferIndex < m_Buffers.size()) && (GetBufferSize(bufferIndex) == size))
 	{
@@ -263,7 +258,7 @@ bool OpenCLWrapper::ReadBuffer(uint bufferIndex, void* data, size_t size)
 		e.wait();
 		m_Queue.finish();

-		if (CheckCL(err, "cl::CommandQueue::enqueueReadBuffer()"))
+		if (m_Info.CheckCL(err, "cl::CommandQueue::enqueueReadBuffer()"))
 			return true;
 	}

@@ -289,7 +284,7 @@ int OpenCLWrapper::FindBufferIndex(const string& name)
 /// </summary>
 /// <param name="name">The name of the buffer to search for</param>
 /// <returns>The size of the buffer if found, else 0.</returns>
-uint OpenCLWrapper::GetBufferSize(const string& name)
+size_t OpenCLWrapper::GetBufferSize(const string& name)
 {
 	int bufferIndex = FindBufferIndex(name);

@@ -301,10 +296,10 @@ uint OpenCLWrapper::GetBufferSize(const string& name)
 /// </summary>
 /// <param name="name">The index of the buffer to get the size of</param>
 /// <returns>The size of the buffer if found, else 0.</returns>
-uint OpenCLWrapper::GetBufferSize(uint bufferIndex)
+size_t OpenCLWrapper::GetBufferSize(size_t bufferIndex)
 {
 	if (m_Init && (bufferIndex < m_Buffers.size()))
-		return uint(m_Buffers[bufferIndex].m_Buffer.getInfo<CL_MEM_SIZE>(nullptr));
+		return m_Buffers[bufferIndex].m_Buffer.getInfo<CL_MEM_SIZE>(nullptr);

 	return 0;
 }
@@ -350,12 +345,12 @@ bool OpenCLWrapper::AddAndWriteImage(const string& name, cl_mem_flags flags, con
 				IMAGEGL2D imageGL(m_Context, flags, GL_TEXTURE_2D, 0, texName, &err);
 				NamedImage2DGL namedImageGL(imageGL, name);

-				if (CheckCL(err, "cl::ImageGL()"))
+				if (m_Info.CheckCL(err, "cl::ImageGL()"))
 				{
 					m_GLImages.push_back(namedImageGL);

 					if (data)
-						return WriteImage2D(uint(m_GLImages.size() - 1), true, width, height, row_pitch, data);//OpenGL images/textures require a separate write.
+						return WriteImage2D(m_GLImages.size() - 1, true, width, height, row_pitch, data);//OpenGL images/textures require a separate write.
 					else
 						return true;
 				}
@@ -364,7 +359,7 @@ bool OpenCLWrapper::AddAndWriteImage(const string& name, cl_mem_flags flags, con
 			{
 				NamedImage2D namedImage(cl::Image2D(m_Context, flags, format, width, height, row_pitch, data, &err), name);

-				if (CheckCL(err, "cl::Image2D()"))
+				if (m_Info.CheckCL(err, "cl::Image2D()"))
 				{
 					m_Images.push_back(namedImage);
 					return true;
@@ -381,7 +376,7 @@ bool OpenCLWrapper::AddAndWriteImage(const string& name, cl_mem_flags flags, con
 				{
 					NamedImage2DGL namedImageGL(IMAGEGL2D(m_Context, flags, GL_TEXTURE_2D, 0, texName, &err), name);//Sizes are different, so create new.

-					if (CheckCL(err, "cl::ImageGL()"))
+					if (m_Info.CheckCL(err, "cl::ImageGL()"))
 					{
 						m_GLImages[imageIndex] = namedImageGL;
 					}
@@ -403,7 +398,7 @@ bool OpenCLWrapper::AddAndWriteImage(const string& name, cl_mem_flags flags, con

 					NamedImage2D namedImage(cl::Image2D(m_Context, flags, format, width, height, row_pitch, data, &err), name);

-					if (CheckCL(err, "cl::Image2D()"))
+					if (m_Info.CheckCL(err, "cl::Image2D()"))
 					{
 						m_Images[imageIndex] = namedImage;
 						return true;
@@ -430,7 +425,7 @@ bool OpenCLWrapper::AddAndWriteImage(const string& name, cl_mem_flags flags, con
 /// <param name="row_pitch">The row pitch (usually zero)</param>
 /// <param name="data">The image data</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::WriteImage2D(uint index, bool shared, ::size_t width, ::size_t height, ::size_t row_pitch, void* data)
+bool OpenCLWrapper::WriteImage2D(size_t index, bool shared, ::size_t width, ::size_t height, ::size_t row_pitch, void* data)
 {
 	if (m_Init)
 	{
@@ -457,7 +452,7 @@ bool OpenCLWrapper::WriteImage2D(uint index, bool shared, ::size_t width, ::size
 				m_Queue.finish();

 				bool b = EnqueueReleaseGLObjects(imageGL);
-				return CheckCL(err, "cl::enqueueWriteImage()") && b;
+				return m_Info.CheckCL(err, "cl::enqueueWriteImage()") && b;
 			}
 		}
 		else if (!shared && index < m_Images.size())
@@ -465,7 +460,7 @@ bool OpenCLWrapper::WriteImage2D(uint index, bool shared, ::size_t width, ::size
 			err = m_Queue.enqueueWriteImage(m_Images[index].m_Image, CL_TRUE, origin, region, row_pitch, 0, data, nullptr, &e);
 			e.wait();
 			m_Queue.finish();
-			return CheckCL(err, "cl::enqueueWriteImage()");
+			return m_Info.CheckCL(err, "cl::enqueueWriteImage()");
 		}
 	}

@@ -505,7 +500,7 @@ bool OpenCLWrapper::ReadImage(const string& name, ::size_t width, ::size_t heigh
 /// <param name="shared">True if shared with an OpenGL texture, else false.</param>
 /// <param name="data">A pointer to a buffer to copy the data to</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::ReadImage(uint imageIndex, ::size_t width, ::size_t height, ::size_t row_pitch, bool shared, void* data)
+bool OpenCLWrapper::ReadImage(size_t imageIndex, ::size_t width, ::size_t height, ::size_t row_pitch, bool shared, void* data)
 {
 	if (m_Init)
 	{
@@ -529,13 +524,13 @@ bool OpenCLWrapper::ReadImage(uint imageIndex, ::size_t width, ::size_t height,
 			{
 				err = m_Queue.enqueueReadImage(m_GLImages[imageIndex].m_Image, true, origin, region, row_pitch, 0, data);
 				bool b = EnqueueReleaseGLObjects(m_GLImages[imageIndex].m_Image);
-				return CheckCL(err, "cl::enqueueReadImage()") && b;
+				return m_Info.CheckCL(err, "cl::enqueueReadImage()") && b;
 			}
 		}
 		else if (!shared && imageIndex < m_Images.size())
 		{
 			err = m_Queue.enqueueReadImage(m_Images[imageIndex].m_Image, true, origin, region, row_pitch, 0, data);
-			return CheckCL(err, "cl::enqueueReadImage()");
+			return m_Info.CheckCL(err, "cl::enqueueReadImage()");
 		}
 	}

@@ -572,7 +567,7 @@ int OpenCLWrapper::FindImageIndex(const string& name, bool shared)
 /// <param name="name">The name of the image to search for</param>
 /// <param name="shared">True if shared with an OpenGL texture, else false.</param>
 /// <returns>The size of the 2D image if found, else 0.</returns>
-uint OpenCLWrapper::GetImageSize(const string& name, bool shared)
+size_t OpenCLWrapper::GetImageSize(const string& name, bool shared)
 {
 	int imageIndex = FindImageIndex(name, shared);
 	return GetImageSize(imageIndex, shared);
@@ -584,7 +579,7 @@ uint OpenCLWrapper::GetImageSize(const string& name, bool shared)
 /// <param name="imageIndex">Index of the image to search for</param>
 /// <param name="shared">True if shared with an OpenGL texture, else false.</param>
 /// <returns>The size of the 2D image if found, else 0.</returns>
-uint OpenCLWrapper::GetImageSize(uint imageIndex, bool shared)
+size_t OpenCLWrapper::GetImageSize(size_t imageIndex, bool shared)
 {
 	size_t size = 0;

@@ -593,6 +588,7 @@ uint OpenCLWrapper::GetImageSize(uint imageIndex, bool shared)
 		if (shared && imageIndex < m_GLImages.size())
 		{
 			vector<cl::Memory> images;
+
 			images.push_back(m_GLImages[imageIndex].m_Image);
 			IMAGEGL2D image = m_GLImages[imageIndex].m_Image;

@@ -608,7 +604,7 @@ uint OpenCLWrapper::GetImageSize(uint imageIndex, bool shared)
 		}
 	}

-	return uint(size);
+	return size;
 }

 /// <summary>
@@ -671,7 +667,7 @@ bool OpenCLWrapper::CreateImage2D(cl::Image2D& image2D, cl_mem_flags flags, cl::
 					data,
 					&err);

-		return CheckCL(err, "cl::Image2D()");
+		return m_Info.CheckCL(err, "cl::Image2D()");
 	}

 	return false;
@@ -699,7 +695,7 @@ bool OpenCLWrapper::CreateImage2DGL(IMAGEGL2D& image2DGL, cl_mem_flags flags, GL
 					texobj,
 					&err);

-		return CheckCL(err, "cl::ImageGL()");
+		return m_Info.CheckCL(err, "cl::ImageGL()");
 	}

 	return false;
@@ -734,7 +730,7 @@ bool OpenCLWrapper::EnqueueAcquireGLObjects(IMAGEGL2D& image)
 		images.push_back(image);
 		cl_int err = m_Queue.enqueueAcquireGLObjects(&images);
 		m_Queue.finish();
-		return CheckCL(err, "cl::CommandQueue::enqueueAcquireGLObjects()");
+		return m_Info.CheckCL(err, "cl::CommandQueue::enqueueAcquireGLObjects()");
 	}

 	return false;
@@ -769,7 +765,7 @@ bool OpenCLWrapper::EnqueueReleaseGLObjects(IMAGEGL2D& image)
 		images.push_back(image);
 		cl_int err = m_Queue.enqueueReleaseGLObjects(&images);
 		m_Queue.finish();
-		return CheckCL(err, "cl::CommandQueue::enqueueReleaseGLObjects()");
+		return m_Info.CheckCL(err, "cl::CommandQueue::enqueueReleaseGLObjects()");
 	}

 	return false;
@@ -787,7 +783,7 @@ bool OpenCLWrapper::EnqueueAcquireGLObjects(const VECTOR_CLASS<cl::Memory>* memO
 		cl_int err = m_Queue.enqueueAcquireGLObjects(memObjects);

 		m_Queue.finish();
-		return CheckCL(err, "cl::CommandQueue::enqueueAcquireGLObjects()");
+		return m_Info.CheckCL(err, "cl::CommandQueue::enqueueAcquireGLObjects()");
 	}

 	return false;
@@ -805,7 +801,7 @@ bool OpenCLWrapper::EnqueueReleaseGLObjects(const VECTOR_CLASS<cl::Memory>* memO
 		cl_int err = m_Queue.enqueueReleaseGLObjects(memObjects);

 		m_Queue.finish();
-		return CheckCL(err, "cl::CommandQueue::enqueueReleaseGLObjects()");
+		return m_Info.CheckCL(err, "cl::CommandQueue::enqueueReleaseGLObjects()");
 	}

 	return false;
@@ -829,7 +825,7 @@ bool OpenCLWrapper::CreateSampler(cl::Sampler& sampler, cl_bool normalizedCoords
 				filterMode,
 				&err);

-	return CheckCL(err, "cl::Sampler()");
+	return m_Info.CheckCL(err, "cl::Sampler()");
 }

 /// <summary>
@@ -840,7 +836,7 @@ bool OpenCLWrapper::CreateSampler(cl::Sampler& sampler, cl_bool normalizedCoords
 /// <param name="argIndex">Index of the argument</param>
 /// <param name="name">The name of the buffer</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::SetBufferArg(uint kernelIndex, uint argIndex, const string& name)
+bool OpenCLWrapper::SetBufferArg(size_t kernelIndex, cl_uint argIndex, const string& name)
 {
 	int bufferIndex = OpenCLWrapper::FindBufferIndex(name);

@@ -855,7 +851,7 @@ bool OpenCLWrapper::SetBufferArg(uint kernelIndex, uint argIndex, const string&
 /// <param name="argIndex">Index of the argument</param>
 /// <param name="bufferIndex">Index of the buffer</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::SetBufferArg(uint kernelIndex, uint argIndex, uint bufferIndex)
+bool OpenCLWrapper::SetBufferArg(size_t kernelIndex, cl_uint argIndex, size_t bufferIndex)
 {
 	if (m_Init && bufferIndex < m_Buffers.size())
 		return SetArg<cl::Buffer>(kernelIndex, argIndex, m_Buffers[bufferIndex].m_Buffer);
@@ -872,7 +868,7 @@ bool OpenCLWrapper::SetBufferArg(uint kernelIndex, uint argIndex, uint bufferInd
 /// <param name="shared">True if shared with an OpenGL texture, else false</param>
 /// <param name="name">The name of the 2D image</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::SetImageArg(uint kernelIndex, uint argIndex, bool shared, const string& name)
+bool OpenCLWrapper::SetImageArg(size_t kernelIndex, cl_uint argIndex, bool shared, const string& name)
 {
 	if (m_Init)
 	{
@@ -892,7 +888,7 @@ bool OpenCLWrapper::SetImageArg(uint kernelIndex, uint argIndex, bool shared, co
 /// <param name="shared">True if shared with an OpenGL texture, else false</param>
 /// <param name="imageIndex">Index of the 2D image</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::SetImageArg(uint kernelIndex, uint argIndex, bool shared, uint imageIndex)
+bool OpenCLWrapper::SetImageArg(size_t kernelIndex, cl_uint argIndex, bool shared, size_t imageIndex)
 {
 	cl_int err;

@@ -901,12 +897,12 @@ bool OpenCLWrapper::SetImageArg(uint kernelIndex, uint argIndex, bool shared, ui
 		if (shared && imageIndex < m_GLImages.size())
 		{
 			err = m_Programs[kernelIndex].m_Kernel.setArg(argIndex, m_GLImages[imageIndex].m_Image);
-			return CheckCL(err, "cl::Kernel::setArg()");
+			return m_Info.CheckCL(err, "cl::Kernel::setArg()");
 		}
 		else if (!shared && imageIndex < m_Images.size())
 		{
 			err = m_Programs[kernelIndex].m_Kernel.setArg(argIndex, m_Images[imageIndex].m_Image);
-			return CheckCL(err, "cl::Kernel::setArg()");
+			return m_Info.CheckCL(err, "cl::Kernel::setArg()");
 		}
 	}

@@ -938,8 +934,8 @@ int OpenCLWrapper::FindKernelIndex(const string& name)
 /// <param name="blockHeight">Height of each block</param>
 /// <param name="blockDepth">Depth of each block</param>
 /// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::RunKernel(uint kernelIndex, uint totalGridWidth, uint totalGridHeight, uint totalGridDepth,
-	uint blockWidth, uint blockHeight, uint blockDepth)
+bool OpenCLWrapper::RunKernel(size_t kernelIndex, size_t totalGridWidth, size_t totalGridHeight, size_t totalGridDepth,
+	size_t blockWidth, size_t blockHeight, size_t blockDepth)
 {
 	if (m_Init && kernelIndex < m_Programs.size())
 	{
@@ -953,183 +949,24 @@ bool OpenCLWrapper::RunKernel(uint kernelIndex, uint totalGridWidth, uint totalG

 		e.wait();
 		m_Queue.finish();
-		return CheckCL(err, "cl::CommandQueue::enqueueNDRangeKernel()");
+		return m_Info.CheckCL(err, "cl::CommandQueue::enqueueNDRangeKernel()");
 	}

 	return false;
 }

-/// <summary>
-/// Get device information for the specified field.
-/// Template argument expected to be cl_ulong, cl_uint or cl_int;
-/// </summary>
-/// <param name="name">The device field/feature to query</param>
-/// <returns>The value of the field</returns>
-template<typename T>
-T OpenCLWrapper::GetInfo(size_t platform, size_t device, cl_device_info name) const
-{
-	T val;
-
-	if (platform < m_Devices.size() && device < m_Devices[platform].size())
-		m_Devices[platform][device].getInfo(name, &val);
-
-	return val;
-}
-
-/// <summary>
-/// Get the platform name at the specified index.
-/// </summary>
-/// <param name="i">The platform index to get the name of</param>
-/// <returns>The platform name if found, else empty string</returns>
-string OpenCLWrapper::PlatformName(size_t platform)
-{
-	if (platform < m_Platforms.size())
-		return m_Platforms[platform].getInfo<CL_PLATFORM_VENDOR>(nullptr) + " " + m_Platforms[platform].getInfo<CL_PLATFORM_NAME>(nullptr) + " " + m_Platforms[platform].getInfo<CL_PLATFORM_VERSION>(nullptr);
-	else
-		return "";
-}
-
-/// <summary>
-/// Get all available platform names on the system as a vector of strings.
-/// </summary>
-/// <returns>All available platform names on the system as a vector of strings</returns>
-vector<string> OpenCLWrapper::PlatformNames()
-{
-	vector<string> platforms;
-
-	platforms.reserve(m_Platforms.size());
-
-	for (size_t i = 0; i < m_Platforms.size(); i++)
-		platforms.push_back(PlatformName(i));
-
-	return platforms;
-}
-
-/// <summary>
-/// Get the device name at the specified index on the platform
-/// at the specified index.
-/// </summary>
-/// <param name="platform">The platform index of the device</param>
-/// <param name="device">The device index</param>
-/// <returns>The name of the device if found, else empty string</returns>
-string OpenCLWrapper::DeviceName(size_t platform, size_t device)
-{
-	string s;
-
-	if (platform < m_Platforms.size() && platform < m_Devices.size())
-		if (device < m_Devices[platform].size())
-			s = m_Devices[platform][device].getInfo<CL_DEVICE_VENDOR>(nullptr) + " " + m_Devices[platform][device].getInfo<CL_DEVICE_NAME>(nullptr);// + " " + m_Devices[platform][device].getInfo<CL_DEVICE_VERSION>();
-
-	return s;
-}
-
-/// <summary>
-/// Get all available device names on the platform at the specified index as a vector of strings.
-/// </summary>
-/// <param name="platform">The platform index of the devices to query</param>
-/// <returns>All available device names on the platform at the specified index as a vector of strings</returns>
-vector<string> OpenCLWrapper::DeviceNames(size_t platform)
-{
-	uint i = 0;
-	string s;
-	vector<string> devices;
-
-	do
-	{
-		s = DeviceName(platform, i);
-
-		if (s != "")
-			devices.push_back(s);
-
-		i++;
-	} while (s != "");
-
-	return devices;
-}
-
-/// <summary>
-/// Get all availabe device and platform names as one contiguous string.
-/// </summary>
-/// <returns>A string with all available device and platform names</returns>
-string OpenCLWrapper::DeviceAndPlatformNames()
-{
-	ostringstream os;
-	vector<string> deviceNames;
-
-	for (size_t platform = 0; platform < m_Platforms.size(); platform++)
-	{
-		os << PlatformName(platform) << endl;
-
-		deviceNames = DeviceNames(platform);
-
-		for (size_t device = 0; device < m_Devices[platform].size(); device++)
-			os << "\t" << deviceNames[device] << endl;
-	}
-
-	return os.str();
-}
-
-/// <summary>
-/// Get all information about the currently used device.
-/// </summary>
-/// <returns>A string with all information about the currently used device</returns>
-string OpenCLWrapper::DumpInfo()
-{
-	ostringstream os;
-	vector<size_t> sizes;
-
-	os.imbue(std::locale(""));
-
-	for (size_t platform = 0; platform < m_Platforms.size(); platform++)
-	{
-		os << "Platform " << platform << ": " << PlatformName(platform) << endl;
-
-		for (size_t device = 0; device < m_Devices[platform].size(); device++)
-		{
-			os << "Device " << device << ": " << DeviceName(platform, device) << endl;
-			os << "CL_DEVICE_OPENCL_C_VERSION: "		  << GetInfo<string>  (platform, device, CL_DEVICE_OPENCL_C_VERSION)		  << endl;
-			os << "CL_DEVICE_LOCAL_MEM_SIZE: "			  << GetInfo<cl_ulong>(platform, device, CL_DEVICE_LOCAL_MEM_SIZE)			  << endl;
-			os << "CL_DEVICE_LOCAL_MEM_TYPE: "			  << GetInfo<cl_uint> (platform, device, CL_DEVICE_LOCAL_MEM_TYPE)			  << endl;
-			os << "CL_DEVICE_MAX_COMPUTE_UNITS: "		  << GetInfo<cl_uint> (platform, device, CL_DEVICE_MAX_COMPUTE_UNITS)		  << endl;
-			os << "CL_DEVICE_MAX_READ_IMAGE_ARGS: "		  << GetInfo<cl_uint> (platform, device, CL_DEVICE_MAX_READ_IMAGE_ARGS)		  << endl;
-			os << "CL_DEVICE_MAX_WRITE_IMAGE_ARGS: "	  << GetInfo<cl_uint> (platform, device, CL_DEVICE_MAX_WRITE_IMAGE_ARGS)	  << endl;
-			os << "CL_DEVICE_MAX_MEM_ALLOC_SIZE: "		  << GetInfo<cl_ulong>(platform, device, CL_DEVICE_MAX_MEM_ALLOC_SIZE)		  << endl;
-			os << "CL_DEVICE_ADDRESS_BITS: "			  << GetInfo<cl_uint> (platform, device, CL_DEVICE_ADDRESS_BITS)			  << endl;
-
-			os << "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: "	  << GetInfo<cl_uint> (platform, device, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE)	  << endl;
-			os << "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: " << GetInfo<cl_uint> (platform, device, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE) << endl;
-			os << "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: "	  << GetInfo<cl_ulong>(platform, device, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE)	  << endl;
-			os << "CL_DEVICE_GLOBAL_MEM_SIZE: "			  << GetInfo<cl_ulong>(platform, device, CL_DEVICE_GLOBAL_MEM_SIZE)			  << endl;
-			os << "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: "  << GetInfo<cl_ulong>(platform, device, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE)  << endl;
-
-			os << "CL_DEVICE_MAX_CONSTANT_ARGS: "		  << GetInfo<cl_uint> (platform, device, CL_DEVICE_MAX_CONSTANT_ARGS)		  << endl;
-			os << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: "  << GetInfo<cl_uint> (platform, device, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS)  << endl;
-			os << "CL_DEVICE_MAX_WORK_GROUP_SIZE: "		  << GetInfo<::size_t>(platform, device, CL_DEVICE_MAX_WORK_GROUP_SIZE)		  << endl;
-
-			sizes = GetInfo<vector< ::size_t>>(platform, device, CL_DEVICE_MAX_WORK_ITEM_SIZES);
-			os << "CL_DEVICE_MAX_WORK_ITEM_SIZES: "		  << sizes[0] << ", " << sizes[1] << ", " << sizes[2] << endl << endl;
-
-			if (device != m_Devices[platform].size() - 1 && platform != m_Platforms.size() - 1)
-				os << endl;
-		}
-
-		os << endl;
-	}
-
-	return os.str();
-}
-
 /// <summary>
 /// OpenCL properties, getters only.
 /// </summary>
 bool OpenCLWrapper::Ok() const { return m_Init; }
 bool OpenCLWrapper::Shared() const { return m_Shared; }
-cl::Context OpenCLWrapper::Context() const { return m_Context; }
-uint OpenCLWrapper::PlatformIndex() const { return m_PlatformIndex; }
-uint OpenCLWrapper::DeviceIndex() const { return m_DeviceIndex; }
-size_t OpenCLWrapper::GlobalMemSize() const { return GetInfo<cl_ulong>(PlatformIndex(), DeviceIndex(), CL_DEVICE_GLOBAL_MEM_SIZE); }
-uint OpenCLWrapper::LocalMemSize() const { return m_LocalMemSize; }
-size_t OpenCLWrapper::MaxAllocSize() const { return GetInfo<cl_ulong>(PlatformIndex(), DeviceIndex(), CL_DEVICE_MAX_MEM_ALLOC_SIZE); }
+const cl::Context& OpenCLWrapper::Context() const { return m_Context; }
+size_t OpenCLWrapper::PlatformIndex() const { return m_PlatformIndex; }
+size_t OpenCLWrapper::DeviceIndex() const { return m_DeviceIndex; }
+const string& OpenCLWrapper::DeviceName() const { return m_Info.DeviceName(m_PlatformIndex, m_DeviceIndex); }
+size_t OpenCLWrapper::LocalMemSize() const { return m_LocalMemSize; }
+size_t OpenCLWrapper::GlobalMemSize() const { return m_GlobalMemSize; }
+size_t OpenCLWrapper::MaxAllocSize() const { return m_MaxAllocSize; }

 /// <summary>
 /// Makes the even grid dims.
@@ -1138,7 +975,7 @@ size_t OpenCLWrapper::MaxAllocSize() const { return GetInfo<cl_ulong>(PlatformIn
 /// <param name="blockH">The block h.</param>
 /// <param name="gridW">The grid w.</param>
 /// <param name="gridH">The grid h.</param>
-void OpenCLWrapper::MakeEvenGridDims(uint blockW, uint blockH, uint& gridW, uint& gridH)
+void OpenCLWrapper::MakeEvenGridDims(size_t blockW, size_t blockH, size_t& gridW, size_t& gridH)
 {
 	if (gridW % blockW != 0)
 		gridW += (blockW - (gridW % blockW));
@@ -1147,67 +984,6 @@ void OpenCLWrapper::MakeEvenGridDims(uint blockW, uint blockH, uint& gridW, uint
 		gridH += (blockH - (gridH % blockH));
 }

-/// <summary>
-/// Create a context that is optionall shared with OpenGL.
-/// </summary>
-/// <param name="shared">True if shared with OpenGL, else not shared.</param>
-/// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::CreateContext(bool shared)
-{
-	cl_int err;
-
-	if (shared)
-	{
-		//Define OS-specific context properties and create the OpenCL context.
-		#if defined (__APPLE__) || defined(MACOSX)
-			CGLContextObj kCGLContext = CGLGetCurrentContext();
-			CGLShareGroupObj kCGLShareGroup = CGLGetShareGroup(kCGLContext);
-			cl_context_properties props[] =
-			{
-				CL_CONTEXT_PROPERTY_USE_CGL_SHAREGROUP_APPLE, (cl_context_properties)kCGLShareGroup,
-				0
-			};
-
-			m_Context = cl::Context(CL_DEVICE_TYPE_GPU, props, nullptr, nullptr, &err);//May need to tinker with this on Mac.
-		#else
-			#if defined WIN32
-				cl_context_properties props[] =
-				{
-					CL_GL_CONTEXT_KHR, (cl_context_properties)wglGetCurrentContext(),
-					CL_WGL_HDC_KHR, (cl_context_properties)wglGetCurrentDC(),
-					CL_CONTEXT_PLATFORM, reinterpret_cast<cl_context_properties>((m_Platforms[m_PlatformIndex])()),
-					0
-				};
-
-				m_Context = cl::Context(CL_DEVICE_TYPE_GPU, props, nullptr, nullptr, &err);
-			#else
-				cl_context_properties props[] =
-				{
-					CL_GL_CONTEXT_KHR, cl_context_properties(glXGetCurrentContext()),
-					CL_GLX_DISPLAY_KHR, cl_context_properties(glXGetCurrentDisplay()),
-					CL_CONTEXT_PLATFORM, reinterpret_cast<cl_context_properties>((m_Platforms[m_PlatformIndex])()),
-					0
-				};
-
-				m_Context = cl::Context(CL_DEVICE_TYPE_GPU, props, nullptr, nullptr, &err);
-			#endif
-		#endif
-	}
-	else
-	{
-		cl_context_properties props[3] =
-		{
-			CL_CONTEXT_PLATFORM,
-			reinterpret_cast<cl_context_properties>((m_Platforms[m_PlatformIndex])()),
-			0
-		};
-
-		m_Context = cl::Context(CL_DEVICE_TYPE_ALL, props, nullptr, nullptr, &err);
-	}
-
-	return CheckCL(err, "cl::Context()");
-}
-
 /// <summary>
 /// Create an Spk object created by compiling the program arguments passed in.
 /// </summary>
@@ -1235,107 +1011,21 @@ bool OpenCLWrapper::CreateSPK(const string& name, const string& program, const s
 			//err = spk.m_Program.build(m_DeviceVec, "-cl-mad-enable -cl-no-signed-zeros -cl-fast-relaxed-math -cl-single-precision-constant");//This can cause some rounding.
 			//err = spk.m_Program.build(m_DeviceVec, "-cl-mad-enable -cl-single-precision-constant");

-		if (CheckCL(err, "cl::Program::build()"))
+		if (m_Info.CheckCL(err, "cl::Program::build()"))
 		{
 			//Building of program is ok, now create kernel with the specified entry point.
 			spk.m_Kernel = cl::Kernel(spk.m_Program, entryPoint.c_str(), &err);

-			if (CheckCL(err, "cl::Kernel()"))
+			if (m_Info.CheckCL(err, "cl::Kernel()"))
 				return true;//Everything is ok.
 		}
 		else
 		{
 			for (auto& i : m_DeviceVec)
-				m_ErrorReport.push_back(spk.m_Program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(i));
+				m_ErrorReport.push_back(spk.m_Program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(i, nullptr));
 		}
 	}

 	return false;
 }
-
-/// <summary>
-/// Check an OpenCL return value for errors.
-/// </summary>
-/// <param name="err">The error code to inspect</param>
-/// <param name="name">A description of where the value was gotten from</param>
-/// <returns>True if success, else false.</returns>
-bool OpenCLWrapper::CheckCL(cl_int err, const char* name)
-{
-	if (err != CL_SUCCESS)
-	{
-		ostringstream ss;
-		ss << "ERROR: " << ErrorToStringCL(err) << " in " << name << "." << std::endl;
-		m_ErrorReport.push_back(ss.str());
-	}
-
-	return err == CL_SUCCESS;
-}
-
-/// <summary>
-/// Translate an OpenCL error code into a human readable string.
-/// </summary>
-/// <param name="err">The error code to translate</param>
-/// <returns>A human readable description of the error passed in</returns>
-std::string OpenCLWrapper::ErrorToStringCL(cl_int err)
-{
-	switch (err)
-	{
-		case CL_SUCCESS:								   return "Success";
-		case CL_DEVICE_NOT_FOUND:						   return "Device not found";
-		case CL_DEVICE_NOT_AVAILABLE:					   return "Device not available";
-		case CL_COMPILER_NOT_AVAILABLE:					   return "Compiler not available";
-		case CL_MEM_OBJECT_ALLOCATION_FAILURE:			   return "Memory object allocation failure";
-		case CL_OUT_OF_RESOURCES:						   return "Out of resources";
-		case CL_OUT_OF_HOST_MEMORY:						   return "Out of host memory";
-		case CL_PROFILING_INFO_NOT_AVAILABLE:			   return "Profiling information not available";
-		case CL_MEM_COPY_OVERLAP:						   return "Memory copy overlap";
-		case CL_IMAGE_FORMAT_MISMATCH:					   return "Image format mismatch";
-		case CL_IMAGE_FORMAT_NOT_SUPPORTED:				   return "Image format not supported";
-		case CL_BUILD_PROGRAM_FAILURE:					   return "Program build failure";
-		case CL_MAP_FAILURE:							   return "Map failure";
-		case CL_MISALIGNED_SUB_BUFFER_OFFSET:			   return "Misaligned sub buffer offset";
-		case CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST: return "Exec status error for events in wait list";
-		case CL_INVALID_VALUE:							   return "Invalid value";
-		case CL_INVALID_DEVICE_TYPE:					   return "Invalid device type";
-		case CL_INVALID_PLATFORM:						   return "Invalid platform";
-		case CL_INVALID_DEVICE:							   return "Invalid device";
-		case CL_INVALID_CONTEXT:						   return "Invalid context";
-		case CL_INVALID_QUEUE_PROPERTIES:				   return "Invalid queue properties";
-		case CL_INVALID_COMMAND_QUEUE:					   return "Invalid command queue";
-		case CL_INVALID_HOST_PTR:						   return "Invalid host pointer";
-		case CL_INVALID_MEM_OBJECT:						   return "Invalid memory object";
-		case CL_INVALID_IMAGE_FORMAT_DESCRIPTOR:		   return "Invalid image format descriptor";
-		case CL_INVALID_IMAGE_SIZE:						   return "Invalid image size";
-		case CL_INVALID_SAMPLER:						   return "Invalid sampler";
-		case CL_INVALID_BINARY:							   return "Invalid binary";
-		case CL_INVALID_BUILD_OPTIONS:					   return "Invalid build options";
-		case CL_INVALID_PROGRAM:						   return "Invalid program";
-		case CL_INVALID_PROGRAM_EXECUTABLE:				   return "Invalid program executable";
-		case CL_INVALID_KERNEL_NAME:					   return "Invalid kernel name";
-		case CL_INVALID_KERNEL_DEFINITION:				   return "Invalid kernel definition";
-		case CL_INVALID_KERNEL:							   return "Invalid kernel";
-		case CL_INVALID_ARG_INDEX:						   return "Invalid argument index";
-		case CL_INVALID_ARG_VALUE:						   return "Invalid argument value";
-		case CL_INVALID_ARG_SIZE:						   return "Invalid argument size";
-		case CL_INVALID_KERNEL_ARGS:					   return "Invalid kernel arguments";
-		case CL_INVALID_WORK_DIMENSION:					   return "Invalid work dimension";
-		case CL_INVALID_WORK_GROUP_SIZE:				   return "Invalid work group size";
-		case CL_INVALID_WORK_ITEM_SIZE:					   return "Invalid work item size";
-		case CL_INVALID_GLOBAL_OFFSET:					   return "Invalid global offset";
-		case CL_INVALID_EVENT_WAIT_LIST:				   return "Invalid event wait list";
-		case CL_INVALID_EVENT:							   return "Invalid event";
-		case CL_INVALID_OPERATION:						   return "Invalid operation";
-		case CL_INVALID_GL_OBJECT:						   return "Invalid OpenGL object";
-		case CL_INVALID_BUFFER_SIZE:					   return "Invalid buffer size";
-		case CL_INVALID_MIP_LEVEL:						   return "Invalid mip-map level";
-		case CL_INVALID_GLOBAL_WORK_SIZE:				   return "Invalid global work size";
-		case CL_INVALID_PROPERTY:						   return "Invalid property";
-		default:
-		{
-			ostringstream ss;
-			ss << "<Unknown error code> " << err;
-			return ss.str();
-		}
-	}
-}
 }
@@ -1,6 +1,7 @@
 #pragma once

 #include "EmberCLPch.h"
+#include "OpenCLInfo.h"

 /// <summary>
 /// OpenCLWrapper, Spk, NamedBuffer, NamedImage2D, NamedImage2DGL classes.
@@ -91,7 +92,7 @@ public:
 /// <summary>
 /// Running kernels in OpenCL can require quite a bit of setup, tear down and
 /// general housekeeping. This class helps shield the user from such hassles.
-/// It's main utility is in holding collections of programs, buffers and images
+/// Its main utility is in holding collections of programs, buffers and images
 /// all identified by names. That way, a user can access them as needed without
 /// having to pollute their code.
 /// In addition, writing to an existing object by name determines if the object
@@ -103,8 +104,7 @@ class EMBERCL_API OpenCLWrapper : public EmberReport
 {
 public:
 	OpenCLWrapper();
-	bool CheckOpenCL();
-	bool Init(uint platform, uint device, bool shared = false);
+	bool Init(size_t platformIndex, size_t deviceIndex, bool shared = false);

 	//Programs.
 	bool AddProgram(const string& name, const string& program, const string& entryPoint, bool doublePrecision);
@@ -114,22 +114,22 @@ public:
 	bool AddBuffer(const string& name, size_t size, cl_mem_flags flags = CL_MEM_READ_WRITE);
 	bool AddAndWriteBuffer(const string& name, void* data, size_t size, cl_mem_flags flags = CL_MEM_READ_WRITE);
 	bool WriteBuffer(const string& name, void* data, size_t size);
-	bool WriteBuffer(uint bufferIndex, void* data, size_t size);
+	bool WriteBuffer(size_t bufferIndex, void* data, size_t size);
 	bool ReadBuffer(const string& name, void* data, size_t size);
-	bool ReadBuffer(uint bufferIndex, void* data, size_t size);
+	bool ReadBuffer(size_t bufferIndex, void* data, size_t size);
 	int FindBufferIndex(const string& name);
-	uint GetBufferSize(const string& name);
-	uint GetBufferSize(uint bufferIndex);
+	size_t GetBufferSize(const string& name);
+	size_t GetBufferSize(size_t bufferIndex);
 	void ClearBuffers();

 	//Images.
 	bool AddAndWriteImage(const string& name, cl_mem_flags flags, const cl::ImageFormat& format, ::size_t width, ::size_t height, ::size_t row_pitch, void* data = NULL, bool shared = false, GLuint texName = 0);
-	bool WriteImage2D(uint index, bool shared, ::size_t width, ::size_t height, ::size_t row_pitch, void* data);
+	bool WriteImage2D(size_t index, bool shared, size_t width, size_t height, size_t row_pitch, void* data);
 	bool ReadImage(const string& name, ::size_t width, ::size_t height, ::size_t row_pitch, bool shared, void* data);
-	bool ReadImage(uint imageIndex, ::size_t width, ::size_t height, ::size_t row_pitch, bool shared, void* data);
+	bool ReadImage(size_t imageIndex, ::size_t width, ::size_t height, ::size_t row_pitch, bool shared, void* data);
 	int FindImageIndex(const string& name, bool shared);
-	uint GetImageSize(const string& name, bool shared);
-	uint GetImageSize(uint imageIndex, bool shared);
+	size_t GetImageSize(const string& name, bool shared);
+	size_t GetImageSize(size_t imageIndex, bool shared);
 	bool CompareImageParams(cl::Image& image, cl_mem_flags flags, const cl::ImageFormat& format, ::size_t width, ::size_t height, ::size_t row_pitch);
 	void ClearImages(bool shared);
 	bool CreateImage2D(cl::Image2D& image2D, cl_mem_flags flags, cl::ImageFormat format, ::size_t width, ::size_t height, ::size_t row_pitch = 0, void* data = NULL);
@@ -143,10 +143,10 @@ public:
 	bool CreateSampler(cl::Sampler& sampler, cl_bool normalizedCoords, cl_addressing_mode addressingMode, cl_filter_mode filterMode);

 	//Arguments.
-	bool SetBufferArg(uint kernelIndex, uint argIndex, const string& name);
-	bool SetBufferArg(uint kernelIndex, uint argIndex, uint bufferIndex);
-	bool SetImageArg(uint kernelIndex, uint argIndex, bool shared, const string& name);
-	bool SetImageArg(uint kernelIndex, uint argIndex, bool shared, uint imageIndex);
+	bool SetBufferArg(size_t kernelIndex, cl_uint argIndex, const string& name);
+	bool SetBufferArg(size_t kernelIndex, cl_uint argIndex, size_t bufferIndex);
+	bool SetImageArg(size_t kernelIndex, cl_uint argIndex, bool shared, const string& name);
+	bool SetImageArg(size_t kernelIndex, cl_uint argIndex, bool shared, size_t imageIndex);

 	/// <summary>
 	/// Set an argument in the specified kernel, at the specified argument index.
@@ -157,13 +157,13 @@ public:
 	/// <param name="arg">The argument value to set</param>
 	/// <returns>True if success, else false</returns>
 	template <typename T>
-	bool SetArg(uint kernelIndex, uint argIndex, T arg)
+	bool SetArg(size_t kernelIndex, cl_uint argIndex, T arg)
 	{
 		if (m_Init && kernelIndex < m_Programs.size())
 		{
 			cl_int err = m_Programs[kernelIndex].m_Kernel.setArg(argIndex, arg);

-			return CheckCL(err, "cl::Kernel::setArg()");
+			return m_Info.CheckCL(err, "cl::Kernel::setArg()");
 		}

 		return false;
@@ -171,47 +171,37 @@ public:

 	//Kernels.
 	int FindKernelIndex(const string& name);
-	bool RunKernel(uint kernelIndex, uint totalGridWidth, uint totalGridHeight, uint totalGridDepth, uint blockWidth, uint blockHeight, uint blockDepth);
-
-	//Info.
-	template<typename T>
-	T GetInfo(size_t platform, size_t device, cl_device_info name) const;
-	string PlatformName(size_t platform);
-	vector<string> PlatformNames();
-	string DeviceName(size_t platform, size_t device);
-	vector<string> DeviceNames(size_t platform);
-	string DeviceAndPlatformNames();
-	string DumpInfo();
+	bool RunKernel(size_t kernelIndex, size_t totalGridWidth, size_t totalGridHeight, size_t totalGridDepth, size_t blockWidth, size_t blockHeight, size_t blockDepth);

 	//Accessors.
 	bool Ok() const;
 	bool Shared() const;
-	cl::Context Context() const;
-	uint PlatformIndex() const;
-	uint DeviceIndex() const;
-	uint LocalMemSize() const;
+	const cl::Context& Context() const;
+	size_t PlatformIndex() const;
+	size_t DeviceIndex() const;
+	const string& DeviceName() const;
+	size_t TotalDeviceIndex() const;
+	size_t LocalMemSize() const;
 	size_t GlobalMemSize() const;
 	size_t MaxAllocSize() const;

-	static void MakeEvenGridDims(uint blockW, uint blockH, uint& gridW, uint& gridH);
+	static void MakeEvenGridDims(size_t blockW, size_t blockH, size_t& gridW, size_t& gridH);

 private:
-	bool CreateContext(bool shared);
 	bool CreateSPK(const string& name, const string& program, const string& entryPoint, Spk& spk, bool doublePrecision);
-	bool CheckCL(cl_int err, const char* name);
-	std::string ErrorToStringCL(cl_int err);

 	bool m_Init;
 	bool m_Shared;
-	uint m_PlatformIndex;
-	uint m_DeviceIndex;
-	uint m_LocalMemSize;
+	size_t m_PlatformIndex;
+	size_t m_DeviceIndex;
+	size_t m_LocalMemSize;
+	size_t m_GlobalMemSize;
+	size_t m_MaxAllocSize;
 	cl::Platform m_Platform;
 	cl::Context m_Context;
 	cl::Device m_Device;
 	cl::CommandQueue m_Queue;
-	std::vector<cl::Platform> m_Platforms;
-	std::vector<std::vector<cl::Device>> m_Devices;
+	OpenCLInfo& m_Info;
 	std::vector<cl::Device> m_DeviceVec;
 	std::vector<Spk> m_Programs;
 	std::vector<NamedBuffer> m_Buffers;
@@ -2,9 +2,9 @@

 #include "EmberCLPch.h"
 #include "OpenCLWrapper.h"
-#include "IterOpenCLKernelCreator.h"
 #include "DEOpenCLKernelCreator.h"
 #include "FinalAccumOpenCLKernelCreator.h"
+#include "RendererClDevice.h"

 /// <summary>
 /// RendererCLBase and RendererCL classes.
@@ -26,12 +26,17 @@ public:
 /// <summary>
 /// RendererCL is a derivation of the basic CPU renderer which
 /// overrides various functions to render on the GPU using OpenCL.
+/// This supports multi-GPU rendering and is done in the following manner:
+///		-When rendering a single image, the iterations will be split between devices in sub batches.
+///		-When animating, a renderer for each device will be created by the calling code,
+///			and the frames will each be rendered by a single device as available.
+/// The synchronization across devices is done through a single atomic counter.
 /// Since this class derives from EmberReport and also contains an
 /// OpenCLWrapper member which also derives from EmberReport, the
 /// reporting functions are overridden to aggregate the errors from
 /// both sources.
-/// It does not support different types for T and bucketT, so it only has one template argument
-/// and uses both for the base.
+/// Template argument T expected to be float or double.
+/// Template argument bucketT must always be float.
 /// </summary>
 template <typename T, typename bucketT>
 class EMBERCL_API RendererCL : public Renderer<T, bucketT>, public RendererCLBase
@@ -65,6 +70,8 @@ using EmberNs::Renderer<T, bucketT>::RendererBase::m_RenderTimer;
 using EmberNs::Renderer<T, bucketT>::RendererBase::m_IterTimer;
 using EmberNs::Renderer<T, bucketT>::RendererBase::m_ProgressTimer;
 using EmberNs::Renderer<T, bucketT>::RendererBase::EmberReport::m_ErrorReport;
+using EmberNs::Renderer<T, bucketT>::RendererBase::m_ResizeCs;
+using EmberNs::Renderer<T, bucketT>::RendererBase::m_ProcessAction;
 using EmberNs::Renderer<T, bucketT>::m_RotMat;
 using EmberNs::Renderer<T, bucketT>::m_Ember;
 using EmberNs::Renderer<T, bucketT>::m_Csa;
@@ -82,45 +89,45 @@ using EmberNs::Renderer<T, bucketT>::GetSpatialFilter;
 using EmberNs::Renderer<T, bucketT>::CoordMap;
 using EmberNs::Renderer<T, bucketT>::XformDistributions;
 using EmberNs::Renderer<T, bucketT>::XformDistributionsSize;
+using EmberNs::Renderer<T, bucketT>::m_Dmap;
 using EmberNs::Renderer<T, bucketT>::m_DensityFilter;
 using EmberNs::Renderer<T, bucketT>::m_SpatialFilter;

 public:
-	RendererCL(uint platform = 0, uint device = 0, bool shared = false, GLuint outputTexID = 0);
+	RendererCL(const vector<pair<size_t, size_t>>& devices, bool shared = false, GLuint outputTexID = 0);
 	~RendererCL();

 	//Non-virtual member functions for OpenCL specific tasks.
-	bool Init(uint platform, uint device, bool shared, GLuint outputTexID);
+	bool Init(const vector<pair<size_t, size_t>>& devices, bool shared, GLuint outputTexID);
 	bool SetOutputTexture(GLuint outputTexID);

 	//Iters per kernel/block/grid.
-	inline uint IterCountPerKernel() const;
-	inline uint IterCountPerBlock() const;
-	inline uint IterCountPerGrid() const;
+	inline size_t IterCountPerKernel() const;
+	inline size_t IterCountPerBlock() const;
+	inline size_t IterCountPerGrid() const;

 	//Kernels per block.
-	inline uint IterBlockKernelWidth() const;
-	inline uint IterBlockKernelHeight() const;
-	inline uint IterBlockKernelCount() const;
+	inline size_t IterBlockKernelWidth() const;
+	inline size_t IterBlockKernelHeight() const;
+	inline size_t IterBlockKernelCount() const;

 	//Kernels per grid.
-	inline uint IterGridKernelWidth() const;
-	inline uint IterGridKernelHeight() const;
-	inline uint IterGridKernelCount() const;
+	inline size_t IterGridKernelWidth() const;
+	inline size_t IterGridKernelHeight() const;
+	inline size_t IterGridKernelCount() const;

 	//Blocks per grid.
-	inline uint IterGridBlockWidth() const;
-	inline uint IterGridBlockHeight() const;
-	inline uint IterGridBlockCount() const;
+	inline size_t IterGridBlockWidth() const;
+	inline size_t IterGridBlockHeight() const;
+	inline size_t IterGridBlockCount() const;

-	uint PlatformIndex();
-	uint DeviceIndex();
-	bool ReadHist();
+	bool ReadHist(size_t device);
 	bool ReadAccum();
-	bool ReadPoints(vector<PointCL<T>>& vec);
+	bool ReadPoints(size_t device, vector<PointCL<T>>& vec);
 	bool ClearHist();
+	bool ClearHist(size_t device);
 	bool ClearAccum();
-	bool WritePoints(vector<PointCL<T>>& vec);
+	bool WritePoints(size_t device, vector<PointCL<T>>& vec);
 #ifdef TEST_CL
 	bool WriteRandomPoints();
 #endif
@@ -136,7 +143,6 @@ public:
 	virtual size_t MemoryAvailable() override;
 	virtual bool Ok() const override;
 	virtual void NumChannels(size_t numChannels) override;
-	virtual void DumpErrorReport() override;
 	virtual void ClearErrorReport() override;
 	virtual size_t SubBatchSize() const override;
 	virtual size_t ThreadCount() const override;
@@ -151,8 +157,7 @@ public:
 protected:
 #endif
 	//Protected virtual functions overridden from Renderer.
-	virtual void MakeDmap(T colorScalar) override;
-	virtual bool Alloc() override;
+	virtual bool Alloc(bool histOnly = false) override;
 	virtual bool ResetBuckets(bool resetHist = true, bool resetAccum = true) override;
 	virtual eRenderStatus LogScaleDensityFilter() override;
 	virtual eRenderStatus GaussianDensityFilter() override;
@@ -162,17 +167,19 @@ protected:
 #ifndef TEST_CL
 private:
 #endif
+	void Init();
 	//Private functions for making and running OpenCL programs.
 	bool BuildIterProgramForEmber(bool doAccum = true);
 	bool RunIter(size_t iterCount, size_t temporalSample, size_t& itersRan);
 	eRenderStatus RunLogScaleFilter();
 	eRenderStatus RunDensityFilter();
 	eRenderStatus RunFinalAccum();
-	bool ClearBuffer(const string& bufferName, uint width, uint height, uint elementSize);
-	bool RunDensityFilterPrivate(uint kernelIndex, uint gridW, uint gridH, uint blockW, uint blockH, uint chunkSizeW, uint chunkSizeH, uint chunkW, uint chunkH);
+	bool ClearBuffer(size_t device, const string& bufferName, uint width, uint height, uint elementSize);
+	bool RunDensityFilterPrivate(size_t kernelIndex, size_t gridW, size_t gridH, size_t blockW, size_t blockH, uint chunkSizeW, uint chunkSizeH, uint chunkW, uint chunkH);
 	int MakeAndGetDensityFilterProgram(size_t ss, uint filterWidth);
 	int MakeAndGetFinalAccumProgram(double& alphaBase, double& alphaScale);
 	int MakeAndGetGammaCorrectionProgram();
+	bool SumDeviceHist();
 	void FillSeeds();

 	//Private functions passing data to OpenCL programs.
@@ -182,15 +189,12 @@ private:
 	void ConvertCarToRas(const CarToRas<T>& carToRas);

 	bool m_Init;
-	bool m_NVidia;
 	bool m_DoublePrecision;
-	uint m_IterCountPerKernel;
-	uint m_IterBlocksWide, m_IterBlockWidth;
-	uint m_IterBlocksHigh, m_IterBlockHeight;
-	uint m_MaxDEBlockSizeW;
-	uint m_MaxDEBlockSizeH;
-	uint m_WarpSize;
-	size_t m_Calls;
+	size_t m_IterCountPerKernel;
+	size_t m_IterBlocksWide, m_IterBlockWidth;
+	size_t m_IterBlocksHigh, m_IterBlockHeight;
+	size_t m_MaxDEBlockSizeW;
+	size_t m_MaxDEBlockSizeH;

 	//Buffer names.
 	string m_EmberBufferName;
@@ -214,7 +218,6 @@ private:
 	//Kernels.
 	string m_IterKernel;

-	OpenCLWrapper m_Wrapper;
 	cl::ImageFormat m_PaletteFormat;
 	cl::ImageFormat m_FinalFormat;
 	cl::Image2D m_Palette;
@@ -222,8 +225,7 @@ private:
 	GLuint m_OutputTexID;
 	EmberCL<T> m_EmberCL;
 	vector<XformCL<T>> m_XformsCL;
-	vector<glm::highp_uvec2> m_Seeds;
-	Palette<float> m_DmapCL;//Used instead of the base class' m_Dmap because OpenCL only supports float textures. Likely not needed if we switch to float only hist.
+	vector<vector<glm::highp_uvec2>> m_Seeds;
 	CarToRasCL<T> m_CarToRasCL;
 	DensityFilterCL<bucketT> m_DensityFilterCL;
 	SpatialFilterCL<bucketT> m_SpatialFilterCL;
@@ -231,6 +233,7 @@ private:
 	DEOpenCLKernelCreator m_DEOpenCLKernelCreator;
 	FinalAccumOpenCLKernelCreator m_FinalAccumOpenCLKernelCreator;
 	pair<string, vector<T>> m_Params;
+	vector<unique_ptr<RendererClDevice>> m_Devices;
 	Ember<T> m_LastBuiltEmber;
 };
 }
@@ -0,0 +1,60 @@
+#include "EmberCLPch.h"
+#include "RendererClDevice.h"
+
+namespace EmberCLns
+{
+/// <summary>
+/// Constructor that assigns members.
+/// The object is not fully initialized at this point, the caller
+/// must manually call Init().
+/// </summary>
+/// <param name="platform">The index of the platform to use</param>
+/// <param name="device">The index device of the device to use</param>
+/// <param name="shared">True if shared with OpenGL, else false.</param>
+/// <returns>True if success, else false.</returns>
+RendererClDevice::RendererClDevice(bool doublePrec, size_t platform, size_t device, bool shared)
+	: m_Info(OpenCLInfo::Instance())
+{
+	m_Init = false;
+	m_Shared = shared;
+	m_NVidia = false;
+	m_WarpSize = 0;
+	m_Calls = 0;
+	m_PlatformIndex = platform;
+	m_DeviceIndex = device;
+}
+
+/// <summary>
+/// Initialization of the OpenCLWrapper member.
+/// </summary>
+/// <returns>True if success, else false.</returns>
+bool RendererClDevice::Init()
+{
+	bool b = true;
+
+	if (!m_Wrapper.Ok())
+	{
+		m_Init = false;
+		b = m_Wrapper.Init(m_PlatformIndex, m_DeviceIndex, m_Shared);
+	}
+
+	if (b && m_Wrapper.Ok() && !m_Init)
+	{
+		m_NVidia = ToLower(m_Info.PlatformName(m_PlatformIndex)).find_first_of("nvidia") != string::npos && m_Wrapper.LocalMemSize() > (32 * 1024);
+		m_WarpSize = m_NVidia ? 32 : 64;
+		m_Init = true;
+	}
+
+	return b;
+}
+
+/// <summary>
+/// OpenCL property accessors, getters only.
+/// </summary>
+bool RendererClDevice::Ok() const { return m_Init; }
+bool RendererClDevice::Shared() const { return m_Shared; }
+bool RendererClDevice::Nvidia() const { return m_NVidia; }
+size_t RendererClDevice::WarpSize() const { return m_WarpSize; }
+size_t RendererClDevice::PlatformIndex() const { return m_PlatformIndex; }
+size_t RendererClDevice::DeviceIndex() const { return m_DeviceIndex; }
+}
@@ -0,0 +1,42 @@
+#pragma once
+
+#include "EmberCLPch.h"
+#include "OpenCLWrapper.h"
+#include "IterOpenCLKernelCreator.h"
+
+/// <summary>
+/// RendererClDevice class.
+/// </summary>
+
+namespace EmberCLns
+{
+/// <summary>
+/// Class to manage a device that does the iteration portion of
+/// the rendering process. Having a separate class for this purpose
+/// enables multi-GPU support.
+/// </summary>
+class EMBERCL_API RendererClDevice : public EmberReport
+{
+public:
+	RendererClDevice(bool doublePrec, size_t platform, size_t device, bool shared);
+	bool Init();
+	bool Ok() const;
+	bool Shared() const;
+	bool Nvidia() const;
+	size_t WarpSize() const;
+	size_t PlatformIndex() const;
+	size_t DeviceIndex() const;
+
+	size_t m_Calls;
+	OpenCLWrapper m_Wrapper;
+
+private:
+	bool m_Init;
+	bool m_Shared;
+	bool m_NVidia;
+	size_t m_WarpSize;
+	size_t m_PlatformIndex;
+	size_t m_DeviceIndex;
+	OpenCLInfo& m_Info;
+};
+}