As part of a small team of developers, I created a netwoking layer for CUDA and OpenCL API, allowing existing applications to take advantage of remote GPU resources. The layer is a full drop-in reimplementation of both APIs, and the remote resources can be used transparently by any GPU-enabled applications.
Functionally, the adaptor has three main components:
- A client-side library that serves as a replacement of the CUDA and OpenCL libraries. It intercepts function calls of those libraries and them over the network.
- An application server that accepts commands from clients, executes them with native API calls, and then returns the results back to the clients.
- An advanced code generation tool that parses CUDA and OpenCL header files, as well as user annotation and custom code provided separately, and outputs the code required for serialization and deserialization of native API call arguments.
Because of the complexity of the software, an extensive test suite was added, including official OpenCL tests from Khronos, CUDA samples, custom-built unit tests, as well as multiple existing applications for checking both correctness and performance. Since the product is aimed at compute clusters and other compute-intensive applications, performance was a very important consideration in all parts of the project.