A SWIG-generated wrapper of the CUDA Driver API.

View on GitHub



Build Status


This project is a SWIG-generated wrapper for the CUDA Driver API Version 9 in C#, compiled under Net Standard 2.0. This project targets CUDA GPU Computing Toolkit version 9.1.85, Visual Studio 2017 15.4.5, and 64-bit GPU targets. Older releases of the NVIDIA GPU Toolkit are not supported. The Toolkit version 9 must be installed, and you must have a NVIDIA GPU Pascal or newer installed. Support of 32-bit targets has been dropped, since the Toolkit drops support for 32-bit targets!

Of the entire GPU Computing Toolkit API, only the Driver API is exposed. There is at the moment no documentation for the Swigged.CUDA API, but it mirrors that of the Driver API. Alternatively, please study the example below and the example in the source code on Github.

This wrapper is Net Standard 1.1 compliant. Therefore, it is can be used in almost any Net Framework, Net Standard, or Net Core app or library. However, it has not been ported to Linux, and it is minimally tested under Mono.

What is Swigged.CUDA?

Swigged.CUDA is a low-level API for NET programmers to access the CUDA Driver API. If you want to run against the CUDA Runtime API, you will need to look elsewhere. Swigged.CUDA is not an API for writing kernels in C#. Instead, you must write your kernels in CUDA C++ and generate either PTX, CUBIN, or OBJ files that Swigged.CUDA can use.


Where can I get Swigged.CUDA?

You can access the source for Swigged.CUDA at Github: https://github.com/kaby76/swigged.cuda . Alternatively, you can download a pre-build Nuget package from Nuget: https://www.nuget.org/packages/swigged.cuda/ .

Using the API from NuGet

Net Framework App on Windows

Use the Package Manager GUI in VS 2017 to add in the package “swigged.cuda”. Or, download the package from NuGet (https://www.nuget.org/packages/swigged.cuda) and add the package “swigged.cuda” from the nuget package manager console.

Set up the build of your C# application with Platform = “AnyCPU”, Configuration = “Debug” or “Release”. In the Properties for the application, uncheck “Prefer 32-bit”. Note–You must uncheck Prefer 32-bit because the NVIDIA GPU Toolkit version 9 does not support 32-bit targets at all.

You may need to copy swigged.cuda.native.dll to the executable directory, or change the swigged.cuda.targets file in your package directory, if you changed the paths for compiler and linker output to non-standard locations, or if I was in error.

Note, you must be using Visual Studio 2017 version 15.4.5 or earlier of VS 2017. Unfortunately, Version 9 of the GPU Computing Toolkit does not work with Visual Studio 15.5.1!


using System;
using System.Runtime.InteropServices;
using Swigged.Cuda;

namespace ConsoleApp1
    class Program
        static unsafe void Main(string[] args)

            // Device api.
            var res = Cuda.cuDeviceGet(out int device, 0);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuDeviceGetPCIBusId(out string pciBusId, 100, device);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuDeviceGetName(out string name, 100, device);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();

            res = Cuda.cuCtxCreate_v2(out CUcontext cuContext, 0, device);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            string cu_kernel = @"
#include <stdio.h>

void kern(int * ar)
	int i = threadIdx.x;
	if (i < 11)
		ar[i] = ar[i] + 1;
            string compile_string = @"
nvcc --ptx --gpu-architecture=sm_20 -ccbin ""C:\Program Files(x86)\Microsoft Visual Studio 14.0\VC\bin"" y.cu";

            string kernel = @"
// Generated by NVIDIA NVVM Compiler
// Compiler Build ID: CL-21373419
// Cuda compilation tools, release 8.0, V8.0.55
// Based on LLVM 3.4svn

.version 5.0
.target sm_20
.address_size 64

	// .globl	_Z4kernPi

.visible .entry _Z4kernPi(
	.param .u64 _Z4kernPi_param_0
	.reg .pred 	%p<2>;
	.reg .b32 	%r<4>;
	.reg .b64 	%rd<5>;

	ld.param.u64 	%rd1, [_Z4kernPi_param_0];
	mov.u32 	%r1, %tid.x;
	setp.gt.s32	%p1, %r1, 10;
	@%p1 bra 	BB0_2;

	cvta.to.global.u64 	%rd2, %rd1;
	mul.wide.s32 	%rd3, %r1, 4;
	add.s64 	%rd4, %rd2, %rd3;
	ld.global.u32 	%r2, [%rd4];
	add.s32 	%r3, %r2, 1;
	st.global.u32 	[%rd4], %r3;

            IntPtr ptr = Marshal.StringToHGlobalAnsi(kernel);
            res = Cuda.cuModuleLoadData(out CUmodule cuModule, ptr);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuModuleGetFunction(out CUfunction helloWorld, cuModule, "_Z4kernPi");
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            int[] v = { 'G', 'd', 'k', 'k', 'n', (char)31, 'v', 'n', 'q', 'k', 'c' };
            GCHandle handle = GCHandle.Alloc(v, GCHandleType.Pinned);
            IntPtr pointer = IntPtr.Zero;
            pointer = handle.AddrOfPinnedObject();
            res = Cuda.cuMemAlloc_v2(out IntPtr dptr, 11*sizeof(int));
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuMemcpyHtoD_v2(dptr, pointer, 11*sizeof(int));
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();

            IntPtr[] x = new IntPtr[] { dptr };
            GCHandle handle2 = GCHandle.Alloc(x, GCHandleType.Pinned);
            IntPtr pointer2 = IntPtr.Zero;
            pointer2 = handle2.AddrOfPinnedObject();

            IntPtr[] kp = new IntPtr[] { pointer2 };
            fixed (IntPtr* kernelParams = kp)
                res = Cuda.cuLaunchKernel(helloWorld,
                    1, 1, 1, // grid has one block.
                    11, 1, 1, // block has 11 threads.
                    0, // no shared memory
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuMemcpyDtoH_v2(pointer, dptr, 11*sizeof(int));
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();

Console example provided in the source code

Please see https://github.com/kaby76/swigged.cuda/tree/master/ConsoleApp1 for a stand-alone example that uses the Swigged.CUDA API from Nuget.

Alternative CUDA Driver APIs for C

ManagedCuda 8.0 https://www.nuget.org/packages/ManagedCuda-80/ http://kunzmi.github.io/managedCuda/

“ManagedCuda aims an easy integration of NVidia’s CUDA in .net applications written in C#, Visual Basic or any other .net language.” Although it is very good, it just isn’t compatible with Net Standard and Net Core apps and libraries.

CSCuda https://www.nuget.org/packages/CSCuda/ https://github.com/DNRY/CSCuda

I haven’t tried this, but it looks like a fine wrapper library, albeit it does not seem to expose the CUDA Driver API, rather the CUDA Runtime API, NPP, and CUBLAS.