Swigged.CUDA

Status

Introduction

This project is a SWIG-generated wrapper for the NVIDIA CUDA Driver API Version 9.x in C#, compiled under Net Standard 2.0, targetting Windows and Ubuntu, and 64-bit NVIDIA GPU Kepler or newer installed. Support of 32-bit targets has been dropped due to NVIDIA no longer supporting 32-bit targets.

Of the entire GPU Computing Toolkit API, only the Driver API is exposed. There is at the moment no documentation for the Swigged.CUDA API, but it mirrors that of the Driver API. Alternatively, please study the example below and the example in the source code on Github.

What is Swigged.CUDA?

Swigged.CUDA is a low-level API for NET programmers to access the CUDA Driver API. Swigged.CUDA supports execution of compiled CUDA/C++ kernels in either PTX, CUBIN, or OBJ files. If you want to run against the CUDA Runtime API, you will need to look elsewhere. Swigged.CUDA is not an API for writing kernels in C#. If you want that, see my companion project Campy.

Targets

Windows 10 (x64), Ubuntu 16.04 (x64), CUDA GPU Computing Toolkit 9.x, NET Core 2.0/Framework 4.6.1+/Standard 2.0.

Where can I get Swigged.CUDA?

You can access the source for Swigged.CUDA at Github: https://github.com/kaby76/swigged.cuda . Alternatively, you can download a pre-build Nuget package from Nuget: https://www.nuget.org/packages/swigged.cuda/ .

Swigged.CUDA under a minute

Copy and paste the following code in a Bash shell.

#!/bin/bash
mkdir test
cd test
dotnet new console
cat - << HERE > Program.cs
using System;
using System.Linq;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using Swigged.Cuda;

namespace test
{
    class Program
    {
        static unsafe void Main(string[] args)
        {
            Cuda.cuInit(0);

            // Device api.
            var res = Cuda.cuDeviceGet(out int device, 0);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuDeviceGetPCIBusId(out string pciBusId, 100, device);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuDeviceGetName(out string name, 100, device);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();

            res = Cuda.cuCtxCreate_v2(out CUcontext cuContext, 0, device);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            string kernel = @"
//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-21373419
// Cuda compilation tools, release 8.0, V8.0.55
// Based on LLVM 3.4svn
//

                    .version 5.0
                    .target sm_20
                    .address_size 64

    // .globl   _Z4kernPi

                    .visible .entry _Z4kernPi(
                .param .u64 _Z4kernPi_param_0
                )
            {
                .reg .pred  %p<2>;
                .reg .b32   %r<4>;
                .reg .b64   %rd<5>;


                ld.param.u64    %rd1, [_Z4kernPi_param_0];
                mov.u32     %r1, %tid.x;
                setp.gt.s32 %p1, %r1, 10;
                @%p1 bra    BB0_2;

                cvta.to.global.u64  %rd2, %rd1;
                mul.wide.s32    %rd3, %r1, 4;
                add.s64     %rd4, %rd2, %rd3;
                ld.global.u32   %r2, [%rd4];
                add.s32     %r3, %r2, 1;
                st.global.u32   [%rd4], %r3;

BB0_2:
                ret;
            }
            ";
            IntPtr ptr = Marshal.StringToHGlobalAnsi(kernel);
            res = Cuda.cuModuleLoadData(out CUmodule cuModule, ptr);
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuModuleGetFunction(out CUfunction helloWorld, cuModule, "_Z4kernPi");
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            int[] v = { 'G', 'd', 'k', 'k', 'n', (char)31, 'v', 'n', 'q', 'k', 'c' };
            GCHandle handle = GCHandle.Alloc(v, GCHandleType.Pinned);
            IntPtr pointer = IntPtr.Zero;
            pointer = handle.AddrOfPinnedObject();
            res = Cuda.cuMemAlloc_v2(out IntPtr dptr, 11*sizeof(int));
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuMemcpyHtoD_v2(dptr, pointer, 11*sizeof(int));
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();

            IntPtr[] x = new IntPtr[] { dptr };
            GCHandle handle2 = GCHandle.Alloc(x, GCHandleType.Pinned);
            IntPtr pointer2 = IntPtr.Zero;
            pointer2 = handle2.AddrOfPinnedObject();

            IntPtr[] kp = new IntPtr[] { pointer2 };
            fixed (IntPtr* kernelParams = kp)
            {
                res = Cuda.cuLaunchKernel(helloWorld,
                    1, 1, 1, // grid has one block.
                    11, 1, 1, // block has 11 threads.
                    0, // no shared memory
                    default(CUstream),
                    (IntPtr)kernelParams,
                    (IntPtr)IntPtr.Zero
                    );
            }
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            res = Cuda.cuMemcpyDtoH_v2(pointer, dptr, 11*sizeof(int));
            if (res != CUresult.CUDA_SUCCESS) throw new Exception();
            Cuda.cuCtxDestroy_v2(cuContext);
            var aofc = v.Select(c => (char)c).ToArray();
            System.Console.WriteLine("Result = " + new string(aofc));
        }
    }
}
HERE
dotnet add package swigged.cuda
echo NOTE-SEIGGED.CUDA REQUIRES UNSAFE CODE. THE CSPROJ FILE MUST BE UPDATED.
cat - << HERE > test.csproj
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.0</TargetFramework>
    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="swigged.cuda" Version="9.2148.1" />
  </ItemGroup>

</Project>
HERE
dotnet build
unameOut="$(uname -s)"
case "${unameOut}" in
    Linux*)
	    dotnet publish -r ubuntu.16.04-x64
	    cd bin/Debug/netcoreapp2.0/ubuntu.16.04-x64/publish/
        ./test
	;;
    Darwin*)
	    echo Cannot target Mac yet.
	    exit 1
	;;
    CYGWIN*)
	    dotnet publish -r win-x64
	    cd bin/Debug/netcoreapp2.0/win-x64/publish/
        ./test.exe
	;;
    MINGW*)
	    dotnet publish -r win-x64
	    cd bin/Debug/netcoreapp2.0/win-x64/publish/
        ./test.exe
	;;
    *)
	echo Unknown machine.
	exit 1
	;;
esac
echo Output should be "Result = Hello world".

Console example provided in the source code

Please see https://github.com/kaby76/swigged.cuda/ for additional examples.

Alternative CUDA Driver APIs for C#

ManagedCuda 8.0 https://www.nuget.org/packages/ManagedCuda-80/ http://kunzmi.github.io/managedCuda/

“ManagedCuda aims an easy integration of NVidia’s CUDA in .net applications written in C#, Visual Basic or any other .net language.” Although it is very good, it just isn’t compatible with Net Standard and Net Core apps and libraries–which is why I wrote Swigged.CUDA. The project hasn’t been updated for a while, but there was a recent fork in the repo to add Net Standard compatibility.

CSCuda https://www.nuget.org/packages/CSCuda/ https://github.com/DNRY/CSCuda

I haven’t tried this, but it looks like a fine wrapper library, albeit it does not seem to expose the CUDA Driver API, rather the CUDA Runtime API, NPP, and CUBLAS.