Raytracing in Vulkan using C# — Part 1

von Jens -

Ein Bild

Raytracing in Vulkan using C# — Part 1

In this series we will build a small raytracer, similar to the one from the famous “Ray Tracing in One Weekend” series however we will use Vulkan Compute Shaders to generate the images.

I use C# to build the raytracer but you should be able to follow along in any programming language with existing Vulkan bindings.

We will use unsafe-Code in C# which means at first we will have to enable this feature in your project by adding the following to your .csproj-File.

<PropertyGroup>
  <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>

Initializing Vulkan

We will start with the initial steps to set up Vulkan using the Silk.NET bindings. Install the Packages Silk.NET.Vulkan, Silk.NET.Vulkan.Extensions.EXT and Silk.NET.Vulkan.Extensions.KHR using NuGet.

We will create a giant class — VkContext — for most of the Vulkan stuff which we will refactor eventually.

Silk.NET bindings use an API object (Vk in this case) which we need to initialize first. We also create some fields for the Instance, DebugUtilsMessengerEXT, PhysicalDevice and Device structs.

using Silk.NET.Vulkan;

namespace RaytracingVulkan;

public unsafe class VkContext
{
    private readonly Vk _vk = Vk.GetApi();

    private readonly Instance _instance;
    private readonly DebugUtilsMessengerEXT _debugUtilsMessenger;
    private readonly PhysicalDevice _physicalDevice;
    private readonly Device _device;
}

Creating the Vulkan Instance

At first we will create the vulkan instance with enabled debugging features. To enable debugging we create to List<string>. We will also add extensions needed on MacOS if needed.

public VkContext()
{
    //enable debugging features
    var enabledInstanceExtensions = new List<string> {ExtDebugUtils.ExtensionName};
    var enabledLayers = new List<string> {"VK_LAYER_KHRONOS_validation"};
    var flags = InstanceCreateFlags.None;
        
    //check for ios
    if(RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
    {
        enabledInstanceExtensions.Add("VK_KHR_portability_enumeration");
        flags |= InstanceCreateFlags.EnumeratePortabilityBitKhr;
    }
}

Vulkan uses structs to pass data to the construction methods. They are always named like the object suffixed with “CreateInfo” — sometimes the suffix KHR or EXT is also added to specify the extension package the structs belong to. They all have a special field “SType” which is used to specify their type. To create an Instance we therefore need anInstanceCreateInfo.

var instanceInfo = new InstanceCreateInfo
{
    SType = StructureType.InstanceCreateInfo
};
public byte** PpEnabledLayerNames;
public byte** PpEnabledExtensionNames;

Using the SilkMarshal-class we can convert string arrays to byte** as follows. Note: As done in Silk.NET we prefix a pointer variable with p and a pointer to a pointer with pP.

var pPEnabledLayers = (byte**) SilkMarshal.StringArrayToPtr(enabledLayers.ToArray());var pPEnabledInstanceExtensions = (byte**) SilkMarshal.StringArrayToPtr(enabledInstanceExtensions.ToArray());
var appInfo = new ApplicationInfo
{
    SType = StructureType.ApplicationInfo,
    ApiVersion = Vk.Version13 //Version 1.3
};

var debugInfo  = new DebugUtilsMessengerCreateInfoEXT
{
    SType = StructureType.DebugUtilsMessengerCreateInfoExt,
    MessageSeverity =DebugUtilsMessageSeverityFlagsEXT.ErrorBitExt | DebugUtilsMessageSeverityFlagsEXT.WarningBitExt,
    MessageType = DebugUtilsMessageTypeFlagsEXT.ValidationBitExt | DebugUtilsMessageTypeFlagsEXT.PerformanceBitExt | DebugUtilsMessageTypeFlagsEXT.DeviceAddressBindingBitExt,
    PfnUserCallback = (DebugUtilsMessengerCallbackFunctionEXT) DebugCallback
};

The PfnUserCallback uses Console.Writeline to output debug information.

private static uint DebugCallback(DebugUtilsMessageSeverityFlagsEXT severityFlags,
                                      DebugUtilsMessageTypeFlagsEXT messageTypeFlags,
                                      DebugUtilsMessengerCallbackDataEXT* pCallbackData,
                                      void* pUserData)
{
    var message = Marshal.PtrToStringAnsi((nint)pCallbackData->PMessage);
    Console.WriteLine($"[Vulkan]: {severityFlags}: {message}");
    return Vk.False;
}

Now we have all information we need to create a Vulkan Instance.

var instanceInfo = new InstanceCreateInfo
{
    SType = StructureType.InstanceCreateInfo,
    Flags = flags,
    EnabledLayerCount = (uint) enabledLayers.Count,
    PpEnabledLayerNames = pPEnabledLayers,
    EnabledExtensionCount = (uint) enabledInstanceExtensions.Count,
    PpEnabledExtensionNames = pPEnabledInstanceExtensions,
    PApplicationInfo = &appInfo,
    PNext = &debugInfo
 };
 if (_vk.CreateInstance(instanceInfo, null, out _instance) != Result.Success)
     throw new Exception("Instance could not be created");

Extensions in Silk.NET have their own API object — like Vk for Vulkan. We need ExtDebugUtils to create the DebugUtilsMessengerEXT which we can get from our instance. So we need to create a field for the ExtDebugUtils and get the extension object before we can create the DebugUtilsMessengerEXT using the debugInfo created before.

private readonly ExtDebugUtils _extDebugUtils;

public VkContext()
{
//... other code

    if(!_vk.TryGetInstanceExtension(_instance, out _extDebugUtils))
        throw new Exception($"Could not get instance extension {ExtDebugUtils.ExtensionName}");
    _extDebugUtils.CreateDebugUtilsMessenger(_instance, debugInfo, null, out _debugUtilsMessenger);
}

//free resources 
SilkMarshal.Free((nint) pPEnabledLayers);
SilkMarshal.Free((nint) pPEnabledInstanceExtensions);

As we are using unsafe we have to cleanup unmanaged resources when the application is disposed. Let’s make VkContext implement IDisposable.

public void Dispose()
{
    _vk.DestroyInstance(_instance, null);
    _vk.Dispose();
    _extDebugUtils.Dispose();
}

If we create an Dispose an instance of VkContext we will get an error which means debugging is working. As we are destroying the Vulkan Instance which is a dependency of the DebugUtilsMessengerEXT the validation layers will complain.

[Vulkan]: ErrorBitExt: Validation Error: [ VUID-vkDestroyInstance-instance-00629 ] Object 0: handle = 0x2ab8166c9f0, type = VK_OBJECT_TYPE_INSTANCE; Object 1: handle = 0xfd5b260000000001, type = VK_OBJECT_TYPE_DEBUG_UTILS_MESSENGER_EXT; | MessageID = 0x8b3d8e18 | OBJ ERROR : For VkInstance 0x2ab8166c9f0[], VkDebugUtilsMessengerEXT 0xfd5b260000000001[] has not been destroyed. The Vulkan spec states: All child objects created using instance must have been destroyed prior to destroying instance (https://vulkan.lunarg.com/doc/view/1.3.250.0/windows/1.3-extensions/vkspec.html#VUID-vkDestroyInstance-instance-00629)

So let’s destroy the DebugUtilsMessengerEXT first and the error message will be gone.

public void Dispose()
{
    _extDebugUtils.DestroyDebugUtilsMessenger(_instance, _debugUtilsMessenger, null);
    //... other code
}

Selecting a PhysicalDevice

With our Vulkan Instance we can now select a PhysicalDevice which is a representation of a GPU in our system. First we want to get all GPUs in our system — in this case we want to select the most potent device we have which is a discrete GPU. We could use more complex selection processes but in this case we should be fine with the following.

//select discrete gpu - if none is available use first device
var devices = _vk.GetPhysicalDevices(_instance);
foreach (var gpu in devices)
{
    var properties = _vk.GetPhysicalDeviceProperties(gpu);
    if (properties.DeviceType == PhysicalDeviceType.DiscreteGpu) _physicalDevice = gpu;
}
if (_physicalDevice.Handle == 0) _physicalDevice = devices.First();

Let’s output it’s name so we can see it worked.

var deviceProps = _vk.GetPhysicalDeviceProperties(_physicalDevice);
Console.WriteLine(SilkMarshal.PtrToString((nint)deviceProps.DeviceName));

Will output NVIDIA GeForce GTX 1660 Ti with Max-Q Design in case of the laptop i am using. We do not have to add the PhysicalDevice to our Dispose-Method as it can not be destroyed.

Create a Logical Device

A logical device handles more or less all Vulkan related stuff, such as creating other objects (e.g. images) and operate with them. We need a DeviceCreateInfo-struct for the creation of the logical device. A device can have extensions just like the instance, so we have to do the same as above to get a pointer to a pointer of type byte (byte**) from a list of strings.

//... instance creation and gpu selection
var enabledDeviceExtensions = new List<string>(); //empty for now!
if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
    enabledDeviceExtensions.Add("VK_KHR_portability_subset");
var pPEnabledDeviceExtensions = (byte**)SilkMarshal.StringArrayToPtr(enabledDeviceExtensions.ToArray());
var deviceCreateInfo = new DeviceCreateInfo
{
    SType = StructureType.DeviceCreateInfo,
    EnabledLayerCount = (uint) enabledLayers.Count,
    PpEnabledLayerNames = pPEnabledLayers,
    EnabledExtensionCount = (uint) enabledDeviceExtensions.Count,
    PpEnabledExtensionNames = pPEnabledDeviceExtensions
};
//... other SilkMarshal.Free calls
SilkMarshal.Free((nint) pPEnabledDeviceExtensions);

To create the device we have to fill the the PQueueCreateInfos field with the device queue we want to use. We want a queue that supports graphics and compute. Let’s also save the queue and its index in a field.

private readonly uint _mainQueueIndex;
private readonly Queue _mainQueue;

public VkContext()
{
    //... other code (instance, physicaldevice, ...)
    var queueFamilyCount = 0u;
    _vk.GetPhysicalDeviceQueueFamilyProperties(_physicalDevice, ref queueFamilyCount, null);
    var queueFamilies = new QueueFamilyProperties[queueFamilyCount];
    fixed (QueueFamilyProperties* pQueueFamilies = queueFamilies)
        _vk.GetPhysicalDeviceQueueFamilyProperties(_physicalDevice, ref queueFamilyCount, pQueueFamilies);
    
    for (var i = 0u; i < queueFamilies.Length; i++)
    {
        if (queueFamilies[i].QueueFlags.HasFlag(QueueFlags.GraphicsBit) ||
            queueFamilies[i].QueueFlags.HasFlag(QueueFlags.ComputeBit))
        {
            _mainQueueIndex = i;
            break;
        }
     }
    
    var defaultPriority = 1.0f;        
     var queueCreateInfo = new DeviceQueueCreateInfo
     {
         SType = StructureType.DeviceQueueCreateInfo,
         QueueCount = 1,
         QueueFamilyIndex = _mainQueueIndex,
         PQueuePriorities = &defaultPriority
     };

     // ... device creation
}

The pattern used for enumerating the QueueFamilyProperties is quite common in Vulkan. We first count how many items we expect and create an array of the given size thereafter. Than call GetPhysicalDeviceQueueFamilyProperties again this time with a pointer to the array we created. We have to use a fixed-Block to pin the arrays address so that it is not moved by the GarbageCollector while writing to the address. We can not add the queue information we gathered to the DeviceCreateInfo and create our Device which we also need to destroy in our Dispose-Method. Furthermore we can obtain the main queue.

//Constructor
var deviceCreateInfo = new DeviceCreateInfo
{
    SType = StructureType.DeviceCreateInfo,
    EnabledLayerCount = (uint) enabledLayers.Count,
    PpEnabledLayerNames = pPEnabledLayers,
    EnabledExtensionCount = (uint) enabledDeviceExtensions.Count,
    PpEnabledExtensionNames = pPEnabledDeviceExtensions,
    QueueCreateInfoCount = 1,
    PQueueCreateInfos = &queueCreateInfo
};
if (_vk.CreateDevice(_physicalDevice, deviceCreateInfo, null, out _device) != Result.Success)
    throw new Exception("Could not create device");
_vk.GetDeviceQueue(_device, _mainQueueIndex, 0, out _mainQueue);

//Dispose
_vk.DestroyDevice(_device, null);

Finally we want to create a command pool on our device.

//fields
private readonly CommandPool _commandPool;

//constructor
var poolInfo = new CommandPoolCreateInfo
{
    SType = StructureType.CommandPoolCreateInfo,
    QueueFamilyIndex = _mainQueueIndex,
    Flags = CommandPoolCreateFlags.TransientBit | CommandPoolCreateFlags.ResetCommandBufferBit
 };
 if (_vk.CreateCommandPool(_device, poolInfo, null, out _commandPool) != Result.Success)
     throw new Exception("Could not create command pool");

//Dispose
_vk.DestroyCommandPool(_device, _commandPool, null);

We now create two helper functions for our queue in theVkContextclass. One is for submitting the queue and one is used to wait for the queue being idle.

public Result SubmitMainQueue(SubmitInfo submitInfo, Fence fence) => _vk.QueueSubmit(_mainQueue, 1, submitInfo, fence);
private Result WaitForQueue() => _vk.QueueWaitIdle(_mainQueue);

Write Image Files

Creating Images and Buffers

We need to be able to somehow write and read data from/to images. In Vulkan we are not writing data to the image directly, we first write to a so called staging buffer and than copy the data to our image (and reverse). We first create two structs: AllocatedBuffer and AllocatedImage which will hold our Image or Buffer and the DeviceMemory it is bound to. We could also use ValueTuples here if we like.

public struct AllocatedBuffer
{
    public Buffer Buffer;
    public DeviceMemory Memory;
}
public struct AllocatedImage
{
    public Image Image;
    public DeviceMemory Memory;
}

In our VkContext we will add methods to create and delete images and buffers. We let the VkContext handle all this for simplicity. As we know we need certain CreateInfo structs: ImageCreateInfo/BufferCreateInfo and fill in their parameters.

To create an image we need its width, height, desired format, memory properties and we need to also specify what we are going to do with the image (ImageUsageFlags). After creation we call the AllocateImage-Method which we will write in a few moments.

public AllocatedImage CreateImage(uint width, uint height, Format imageFormat, MemoryPropertyFlags memoryFlags, ImageUsageFlags imageUsageFlags)
{
    var imageInfo = new ImageCreateInfo
    {
        SType = StructureType.ImageCreateInfo,
        ImageType = ImageType.Type2D,
        Extent = new Extent3D(width, height, 1),
        Format = imageFormat,
        Samples = SampleCountFlags.Count1Bit,
        SharingMode = SharingMode.Exclusive,
        InitialLayout = ImageLayout.Undefined,
        Tiling = ImageTiling.Optimal,
        Usage = imageUsageFlags,
        MipLevels = 1,
        ArrayLayers = 1
    };
    _vk.CreateImage(_device, imageInfo, null, out var image);
    var deviceMemory = AllocateImage(image, memoryFlags);
    _vk.BindImageMemory(_device, image, deviceMemory, 0);
    return new AllocatedImage {Image = image, Memory = deviceMemory};
}

public void DestroyImage(AllocatedImage allocatedImage)
{
    _vk.FreeMemory(_device, allocatedImage.Memory, null);
    _vk.DestroyImage(_device, allocatedImage.Image, null);
}

The same principles apply to buffers, although less parameters are needed.

public AllocatedBuffer CreateBuffer(uint size, BufferUsageFlags usageFlags, MemoryPropertyFlags memoryFlags)
{
    var bufferInfo = new BufferCreateInfo
    {
        SType = StructureType.BufferCreateInfo,
        Usage = usageFlags,
        Size = size,
        SharingMode = SharingMode.Exclusive
    };
    _vk.CreateBuffer(_device, bufferInfo, null, out var buffer);
    var deviceMemory = AllocateBuffer(buffer, memoryFlags);
    _vk.BindBufferMemory(_device, buffer, deviceMemory, 0);
    return new AllocatedBuffer{Buffer = buffer, Memory = deviceMemory};
}

public void DestroyBuffer(AllocatedBuffer allocatedBuffer)
{
    _vk.FreeMemory(_device, allocatedBuffer.Memory, null);
    _vk.DestroyBuffer(_device, allocatedBuffer.Buffer, null);
}

We will now write our AllocateImage- and AllocateBuffer-Methods. As they are both similar they will call the Allocate-Method and just pass in different MemoryRequirements.

private DeviceMemory Allocate(MemoryRequirements memoryRequirements, MemoryPropertyFlags propertyFlags)
{
    var size = memoryRequirements.Size;
    var typeIndex = FindMemoryTypeIndex(memoryRequirements.MemoryTypeBits, propertyFlags);
    var allocInfo = new MemoryAllocateInfo
    {
        SType = StructureType.MemoryAllocateInfo,
        AllocationSize = size,
        MemoryTypeIndex = typeIndex
    };
    _vk.AllocateMemory(_device, allocInfo, null, out var deviceMemory);
    return deviceMemory;
}

private DeviceMemory AllocateImage(Image image, MemoryPropertyFlags propertyFlags)
{
    _vk.GetImageMemoryRequirements(_device, image, out var memReq);
    return Allocate(memReq, propertyFlags);
}

private DeviceMemory AllocateBuffer(Buffer buffer, MemoryPropertyFlags propertyFlags)
{
    _vk.GetBufferMemoryRequirements(_device, buffer, out var memReq);
    return Allocate(memReq, propertyFlags);
}

FindMemoryTypeIndexis implemented as follows.

private uint FindMemoryTypeIndex(uint filter, MemoryPropertyFlags flags)
{
    _vk.GetPhysicalDeviceMemoryProperties(_physicalDevice, out var props);
    for (var i = 0; i < props.MemoryTypeCount; i++)
    {
        if ((filter & (uint)(1 << i)) != 0u && (props.MemoryTypes[i].PropertyFlags & flags) == flags)
            return (uint)i;
    }
    throw new Exception("Unable to find suitable memory type");
}

Writing Data to Images

In our Main-Method we now create a VkContext, an AllocatedImage and an AllocatedBuffer. We want our image to use DeviceLocal Memory (GPU Memory) and use it as storage and as source and destination for transfering data. The buffer uses HostVisible Memory (visible to CPU) and we use ist als transfer source. The image we create will for now be 500x500 uints (4 bytes).

using var ctx = new VkContext();
var image = ctx.CreateImage(500, 500, Format.R8G8B8A8Unorm, MemoryPropertyFlags.DeviceLocalBit, ImageUsageFlags.StorageBit | ImageUsageFlags.TransferDstBit | ImageUsageFlags.TransferSrcBit);
var buffer = ctx.CreateBuffer(500*500*4, BufferUsageFlags.TransferSrcBit, MemoryPropertyFlags.HostVisibleBit);

We create an array of pixels in a simple nested for loop (for now on the CPU). We will now transfer the data to the GPU, do nothing with it, and back to the CPU — which is ridiculous but used to showcase the process.

var imageData = new uint[500 * 500];
for (var i = 0; i < 500; i++)
{
    for (var j = 0; j < 500; j++)
    {
        var x = i / 500f;
        var y = j / 500f;
        imageData[i + j * 500] = 0xff000000 | (uint) (x * 255f) << 8 | (uint) (y * 255f);
    }
}

We need to add some methods to our VkContext. We need methods to Map and Unmap memory which is needed so that we can read from it and write to it. We also need a way to transfer our image layout which is Undefined right now. First the two passthrough methods for handling memory mapping, which we just want for convenience. A serious project would use a MemoryAllocator like VMA (which is currently work in progress for Silk.NET, one could import sunkin351/VMASharp: Vulkan Memory Allocator ported to C# using Silk.NET bindings (github.com) as git submodule to use VMA).

public Result MapMemory(DeviceMemory memory, ref void* pData) => _vk.MapMemory(_device, memory, 0, Vk.WholeSize, 0, ref pData);
public void UnmapMemory(DeviceMemory memory) => _vk.UnmapMemory(_device, memory);

Now we need a way to transition image layouts. First we create a single use CommandBuffer (which is used in many C++ Vulkan Tutorials, too).

BeginSingleTimeCommands will allocate a CommandBuffer in our CommandPool which is then being started by Vk.BeginCommandBufferEndSingleTimeCommands ends recording the commands and submits the CommandBuffer to our main queue.

public CommandBuffer BeginSingleTimeCommands()
{
    var allocInfo = new CommandBufferAllocateInfo
    {
        SType = StructureType.CommandBufferAllocateInfo,
        CommandPool = _commandPool,
        CommandBufferCount = 1,
        Level = CommandBufferLevel.Primary
    };
    _vk.AllocateCommandBuffers(_device, allocInfo, out var commandBuffer);
    var beginInfo = new CommandBufferBeginInfo
    {
        SType = StructureType.CommandBufferBeginInfo,
        Flags = CommandBufferUsageFlags.None
    };
    _vk.BeginCommandBuffer(commandBuffer, beginInfo);
    return commandBuffer; }

public void EndSingleTimeCommands(CommandBuffer cmd)
{
    _vk.EndCommandBuffer(cmd);
    var submitInfo = new SubmitInfo
    {
        SType = StructureType.SubmitInfo,
        CommandBufferCount = 1,
        PCommandBuffers = &cmd
    };
    SubmitMainQueue(submitInfo, default);
    WaitForQueue();
    _vk.FreeCommandBuffers(_device, _commandPool, 1, cmd);
}

To transfer the image layout we use a ImageMemoryBarrier, which wants to know the current and new layout of our image to construct the PipelineStageFlags (we will come to pipelines later) and AccessMasks. The provided code only handles some cases of transitions.

public void TransitionImageLayout(Image image, ImageLayout oldLayout, ImageLayout newLayout)
{
    var cmd = BeginSingleTimeCommands();
    var range = new ImageSubresourceRange(ImageAspectFlags.ColorBit, 0, 1, 0, 1);
    var barrierInfo = new ImageMemoryBarrier
    {
        SType = StructureType.ImageMemoryBarrier,
        OldLayout = oldLayout,
        NewLayout = newLayout,
        Image = image,
        SubresourceRange = range,
    };
    
    //determining AccessMasks and PipelineStageFlags from layouts
    PipelineStageFlags srcStage;
    PipelineStageFlags dstStage;
    if (oldLayout == ImageLayout.Undefined && newLayout is ImageLayout.TransferDstOptimal or ImageLayout.TransferSrcOptimal)
    {
        barrierInfo.SrcAccessMask = 0;
        barrierInfo.DstAccessMask = AccessFlags.TransferWriteBit;
        srcStage = PipelineStageFlags.TopOfPipeBit;
        dstStage = PipelineStageFlags.TransferBit;
    }
    else  if (oldLayout == ImageLayout.Undefined && newLayout is ImageLayout.General)
    {
        barrierInfo.SrcAccessMask = 0;
        barrierInfo.DstAccessMask = AccessFlags.ShaderReadBit;
        srcStage = PipelineStageFlags.TopOfPipeBit;
        dstStage = PipelineStageFlags.ComputeShaderBit;
    }
    else if (oldLayout == ImageLayout.TransferDstOptimal && newLayout == ImageLayout.ShaderReadOnlyOptimal)
    {
        barrierInfo.SrcAccessMask = AccessFlags.TransferWriteBit;
        barrierInfo.DstAccessMask = AccessFlags.ShaderReadBit;
        srcStage = PipelineStageFlags.TransferBit;
        dstStage = PipelineStageFlags.FragmentShaderBit;
    }
    else if (oldLayout == ImageLayout.TransferSrcOptimal && newLayout == ImageLayout.ShaderReadOnlyOptimal)
    {
        barrierInfo.SrcAccessMask = AccessFlags.TransferReadBit;
        barrierInfo.DstAccessMask = AccessFlags.ShaderReadBit;
        srcStage = PipelineStageFlags.TransferBit;
        dstStage = PipelineStageFlags.FragmentShaderBit;
    }
    else if (oldLayout == ImageLayout.ShaderReadOnlyOptimal && newLayout == ImageLayout.TransferSrcOptimal)
    {
        barrierInfo.SrcAccessMask = AccessFlags.ShaderReadBit;
        barrierInfo.DstAccessMask = AccessFlags.TransferReadBit;
        srcStage = PipelineStageFlags.FragmentShaderBit;
        dstStage = PipelineStageFlags.TransferBit;
    }
    else if (oldLayout == ImageLayout.TransferDstOptimal && newLayout == ImageLayout.General)
    {
        barrierInfo.SrcAccessMask = AccessFlags.TransferWriteBit;
        barrierInfo.DstAccessMask = AccessFlags.ShaderReadBit;
        srcStage = PipelineStageFlags.TransferBit;
        dstStage = PipelineStageFlags.ComputeShaderBit;
    }
    else if (oldLayout == ImageLayout.TransferSrcOptimal && newLayout == ImageLayout.General)
    {
        barrierInfo.SrcAccessMask = AccessFlags.TransferReadBit;
        barrierInfo.DstAccessMask = AccessFlags.ShaderReadBit;
        srcStage = PipelineStageFlags.TransferBit;
        dstStage = PipelineStageFlags.ComputeShaderBit;
    }
    else if (oldLayout == ImageLayout.General && newLayout == ImageLayout.TransferSrcOptimal)
    {
        barrierInfo.SrcAccessMask = AccessFlags.ShaderReadBit;
        barrierInfo.DstAccessMask = AccessFlags.TransferReadBit;
        srcStage = PipelineStageFlags.ComputeShaderBit;
        dstStage = PipelineStageFlags.TransferBit;
    }
    else throw new Exception("Currently unsupported Layout Transition");
    
    _vk.CmdPipelineBarrier(cmd, srcStage, dstStage, 0, 0, null, 0, null, 1, barrierInfo);
    EndSingleTimeCommands(cmd);
}

We are almost ready to see our first image, we only want to add two other passthrough methods to VkContext. One is CopyBufferToImage, the other is CopyImageToBuffer.

public void CopyBufferToImage(Buffer buffer, Image image, Extent3D imageExtent)
{
    var cmd = BeginSingleTimeCommands();
    var layers = new ImageSubresourceLayers(ImageAspectFlags.ColorBit, 0, 0, 1);
    var copyRegion = new BufferImageCopy(0, 0, 0, layers, default, imageExtent);
    _vk.CmdCopyBufferToImage(cmd, buffer, image, ImageLayout.TransferDstOptimal, 1, copyRegion);
    EndSingleTimeCommands(cmd);
}

public void CopyImageToBuffer(Image image, Buffer buffer, Extent3D imageExtent, ImageLayout layout)
{
    var cmd = BeginSingleTimeCommands();
    var layers = new ImageSubresourceLayers(ImageAspectFlags.ColorBit, 0, 0, 1);
    var copyRegion = new BufferImageCopy(0, 0, 0, layers, default, imageExtent);
    _vk.CmdCopyImageToBuffer(cmd, image, layout, buffer, 1, copyRegion);
    EndSingleTimeCommands(cmd);
}

In our Program.cs > Main we will now, after the for loop do the following:

  1. Map the buffer’s memory to a fixed memory location
  2. Copy the imageData array to the mapped memory location
  3. Unmap the memory
  4. Transition the image’s layout to be used as transfer destination
  5. Copy our buffer to our image
  6. Transition the image’s layout to General (which could be used by a compute buffer)
  7. Destroy our buffer.

The process will look quite simple as we have all our nice methods in VkContext.

void* mappedData = default;
ctx.MapMemory(buffer.Memory, ref mappedData);
fixed (void* pImageData = imageData)
    Buffer.MemoryCopy(pImageData, mappedData,
                      imageData.Length * sizeof(uint),
                      imageData.Length * sizeof(uint));
ctx.UnmapMemory(buffer.Memory);
ctx.TransitionImageLayout(image.Image, ImageLayout.Undefined, ImageLayout.TransferDstOptimal);
ctx.CopyBufferToImage(buffer.Buffer, image.Image, new Extent3D(500, 500, 1));
ctx.TransitionImageLayout(image.Image, ImageLayout.TransferDstOptimal, ImageLayout.General);
ctx.DestroyBuffer(buffer);

Our data now should be in GPU memory which we could prove if we had our UI up and running yet.

Writing Data to Disk

Assuming a compute buffer had filled our image with data, we would now like to see the image. As our program has no GUI we want to save the contents of the image to disk. We need to reverse the steps we needed to copy the array to the image.

  1. Create a new buffer used as transfer destination
  2. Transition the image’s layout to be used as transfer source
  3. Copy the image to our new buffer
  4. Transition the image’s layout back to General (optional)
  5. Map the buffer’s memory
  6. Copy the buffer’s content to a new uint-array
  7. Unmap the buffer’s memory
  8. Destroy our new buffer
var newImageData = new uint[500 * 500];
ctx.MapMemory(newBuffer.Memory, ref mappedData);
fixed (void* pNewImageData = newImageData)
    Buffer.MemoryCopy(mappedData, pNewImageData,
                      newImageData.Length * sizeof(uint),
                      newImageData.Length * sizeof(uint));
ctx.UnmapMemory(newBuffer.Memory);
ctx.DestroyBuffer(newBuffer);

Our image is now back from our GPU. For the sake of simplicity we use SkiaSharp (install via NuGet) to save it to disk:

var info = new SKImageInfo(500, 500, SKColorType.Rgba8888, SKAlphaType.Premul);
var bmp = new SKBitmap();
fixed (uint* pImageData = newImageData)
    bmp.InstallPixels(info, (nint) pImageData, info.RowBytes);
using var fs = File.Create("./render.png");
bmp.Encode(fs, SKEncodedImageFormat.Png, 100);
UV coordinates rendered by our CPU

And there is our image! Nice…but useless 😉

ComputePipeline

Compiling Shaders

No we will get to the good stuff. Shaders! Let’s create our compute shader “raytracing.comp”. I will stick it into a root>asset>shaders folder. The empty shader will look like this:

#version 450

void main() {

}

Vulkan does not deal with text based shaders, so we need to compile each shader we want to used into a binary format called SPV. The Vulkan SDK luckily comes with a tool that allows the compilation: glslc. To find out whether the Vulkan SDK is installed and your machine is able to find glslc open a command prompt and type glslc.

> glslc
glslc: error: no input files

As compiling every file by hand can be a bit tedious we want to automate this process using the csproj-File of our project. Add this block of xml code to your project file. Essentially it uses glslc to compile the shaders and copies all content of the “assets”-folders into the output directory. Play around with it if you like 😉

<!--Shader compilation-->
<ItemGroup>
    <None Update="$(ProjectDir)\assets\**\*">
        <CopyToOutputDirectory>Always</CopyToOutputDirectory>
    </None>
    <ShaderDir Include="$(ProjectDir)\assets\shaders\**\*" />
    <CompiledShaders Include="$(ProjectDir)\assets\shaders\**\*.spv" />
</ItemGroup>

<Target Name="CleanCompiledShaders" AfterTargets="Clean">
    <Message Text="Clean compiled shaders \n@(CompiledShaders)" />
    <Delete Files="@(CompiledShaders)" />
</Target>

<Target Name="CompileShaders" BeforeTargets="ResolveReferences">
    <Message Text="Compile Shaders \n@(ShaderDir)" />
    <Exec Command="glslc &quot;%(ShaderDir.FullPath)&quot; -o &quot;%(ShaderDir.FullPath).spv&quot;" Condition="'%(ShaderDir.Extension)' != '.spv'" />
    <Message Text="Copy Shaders \n@(CompiledShaders)" />
    <ItemGroup>
        <None Include="@(CompiledShaders)">
            <Link>assets/shaders/%(Filename)%(Extension)</Link>
            <CopyToOutputDirectory>Always</CopyToOutputDirectory>
        </None>
    </ItemGroup>
</Target>
<!--/Shader compilation-->

ComputePipeline

To run a compute shader we have to set up a few things now. We need to set up a compute pipeline and we have to deal with descriptorsets now. First we will add a method to load a shader into Vulkan.

public ShaderModule LoadShaderModule(string filename)
{
    var shaderCode = File.ReadAllBytes(filename);
    fixed (byte* pShaderCode = shaderCode)
    {
        var createInfo = new ShaderModuleCreateInfo
        {
            SType = StructureType.ShaderModuleCreateInfo,
            CodeSize = (nuint) shaderCode.Length,
            PCode = (uint*)pShaderCode,
        };
        _vk.CreateShaderModule(_device, createInfo, null, out var module);
        return module;
    }
}
public void DestroyShaderModule(ShaderModule shaderModule) => _vk.DestroyShaderModule(_device, shaderModule, null);

All we need now to create a compute pipeline is a PipelineLayout. As we want to use images in our compute shader, we need a way to give our pipeline some information about what we plan to do. So we will use a DescriptorSetLayout in our PipelineLayout.

public DescriptorSetLayout CreateDescriptorSetLayout(DescriptorSetLayoutBinding[] bindings)
{
    fixed (DescriptorSetLayoutBinding* pBindings = bindings)
    {
        var layoutCreateInfo = new DescriptorSetLayoutCreateInfo
        {
            SType = StructureType.DescriptorSetLayoutCreateInfo,
            BindingCount = (uint)bindings.Length,
            PBindings = pBindings
        };
        _vk.CreateDescriptorSetLayout(_device, layoutCreateInfo, null, out var setLayout);
        return setLayout;
    }
}
public void DestroyDescriptorSetLayout(DescriptorSetLayout setLayout) =>
    _vk.DestroyDescriptorSetLayout(_device, setLayout, null);

We can now create a DescriptorSetLayout so let’s also add the method to create a PipelineLayout.

public PipelineLayout CreatePipelineLayout(DescriptorSetLayout setLayout)
{
    var layoutInfo = new PipelineLayoutCreateInfo
    {
        SType = StructureType.PipelineLayoutCreateInfo,
        SetLayoutCount = 1,
        PSetLayouts = &setLayout
    };
    _vk.CreatePipelineLayout(_device, layoutInfo, null, out var pipelineLayout);
    return pipelineLayout;
}
public void DestroyPipelineLayout(PipelineLayout layout) => _vk.DestroyPipelineLayout(_device, layout, null);

The compute pipeline needs information about the PipelineLayout and our ShaderModule.

public Pipeline CreateComputePipeline(PipelineLayout layout, ShaderModule shaderModule)
{
    var entryPoint = "main";
    var pEntryPoint = (byte*)Marshal.StringToHGlobalAnsi(entryPoint);
    var stageInfo = new PipelineShaderStageCreateInfo
    {
        SType = StructureType.PipelineShaderStageCreateInfo,
        Stage = ShaderStageFlags.ComputeBit,
        Module = shaderModule,
        PName = pEntryPoint,
        Flags = PipelineShaderStageCreateFlags.None
    };
    var computeInfo = new ComputePipelineCreateInfo
    {
        SType = StructureType.ComputePipelineCreateInfo,
        Layout = layout,
        Stage = stageInfo
    };
    _vk.CreateComputePipelines(_device, default, 1, computeInfo, null, out var pipeline);
    return pipeline;
}
public void DestroyPipeline(Pipeline pipeline) => _vk.DestroyPipeline(_device, pipeline, null);

DescriptorSets

DescriptorSets are used to make data usable in our shaders. DescriptorSets are allocated by a DescriptorPool. For detailed information about DescriptorSets check Descriptor Sets — Vulkan Guide (vkguide.dev). We will first need to create a DescriptorPool. We will use an array of DescriptorPoolSize as parameter to which we will come in a few paragraphs.

public DescriptorPool CreateDescriptorPool(DescriptorPoolSize[] poolSizes)
{
    fixed (DescriptorPoolSize* pPoolSizes = poolSizes)
    {
        var createInfo = new DescriptorPoolCreateInfo
        {
            SType = StructureType.DescriptorPoolCreateInfo,
            PoolSizeCount = (uint) poolSizes.Length,
            PPoolSizes = pPoolSizes,
            MaxSets = 1,
            Flags = DescriptorPoolCreateFlags.FreeDescriptorSetBit
        };
        _vk.CreateDescriptorPool(_device, createInfo, null, out var descriptorPool);
        return descriptorPool;
    }
}
public void DestroyDescriptorPool(DescriptorPool descriptorPool) =>
    _vk.DestroyDescriptorPool(_device, descriptorPool, null);

To then be able to allocate an DescriptorSet we need the DescriptorPool and DescriptorLayout.

public DescriptorSet AllocateDescriptorSet(DescriptorPool pool, DescriptorSetLayout setLayout)
{
    var allocInfo = new DescriptorSetAllocateInfo
    {
        SType = StructureType.DescriptorSetAllocateInfo,
        DescriptorPool = pool,
        DescriptorSetCount = 1,
        PSetLayouts = &setLayout
    };
    _vk.AllocateDescriptorSets(_device, allocInfo, out var descriptorSet);
    return descriptorSet;
}

Time to update our Program.cs file. Comment out everything we wrote before to copy and paste later.

These are the steps we need:

  1. Create an instance of VkContext
  2. Create a DescriptorPool and a DescriptorSetLayout
  3. Allocate a DescriptorSet
  4. Load the ShaderModule and create a PipelineLayout
  5. Create the compute pipeline
using var ctx = new VkContext();

//pipeline creation
var poolSizes = new DescriptorPoolSize[] {new() {Type = DescriptorType.StorageImage, DescriptorCount = 1000}};
var descriptorPool = ctx.CreateDescriptorPool(poolSizes);
var binding = new DescriptorSetLayoutBinding
{
    Binding = 0,
    DescriptorCount = 1,
    DescriptorType = DescriptorType.StorageImage,
    StageFlags = ShaderStageFlags.ComputeBit
};
var setLayout = ctx.CreateDescriptorSetLayout(new[] {binding});
var descriptorSet = ctx.AllocateDescriptorSet(descriptorPool, setLayout);

var shaderModule = ctx.LoadShaderModule("./assets/shaders/raytracing.comp.spv");
var pipelineLayout = ctx.CreatePipelineLayout(setLayout);
var pipeline = ctx.CreateComputePipeline(pipelineLayout, shaderModule);

To connect an image to our DescriptorSet, we need to specify DescriptorImageInfo. If we look into the definition of the DescriptorImageInfo struct we will find out that we need a ImageView for our image. We will add methods to create one to our VkContext.

public ImageView CreateImageView(Image image, Format imageFormat)
{
    var createInfo = new ImageViewCreateInfo
    {
        SType = StructureType.ImageViewCreateInfo,
        Image = image,
        ViewType = ImageViewType.Type2D,
        Format = imageFormat,
        SubresourceRange =
        {
            AspectMask = ImageAspectFlags.ColorBit,
            BaseMipLevel = 0,
            BaseArrayLayer = 0,
            LevelCount = 1,
            LayerCount = 1
        }
    };
    _vk.CreateImageView(_device, createInfo, null, out var imageView);
    return imageView;
}
public void DestroyImageView(ImageView imageView) => _vk.DestroyImageView(_device, imageView, null);

In our Program.cs file after the pipeline creation section we can now create an Image with an ImageView. As before we need to transition the image into our layout of choice. This will be General. We will than define the DescriptorImageInfo for our Image.

//image creation
var image = ctx.CreateImage(500, 500, Format.R8G8B8A8Unorm, MemoryPropertyFlags.DeviceLocalBit, ImageUsageFlags.StorageBit | ImageUsageFlags.TransferDstBit | ImageUsageFlags.TransferSrcBit);
var imageView = ctx.CreateImageView(image.Image, Format.R8G8B8A8Unorm);
ctx.TransitionImageLayout(image.Image, ImageLayout.Undefined, ImageLayout.General);
var imageInfo = new DescriptorImageInfo
{
    ImageLayout = ImageLayout.General,
    ImageView = imageView
};
ctx.UpdateDescriptorSetImage(ref descriptorSet, imageInfo, DescriptorType.StorageImage, 0);

We did not define the UpdateDescriptorSetImage-method yet!

public void UpdateDescriptorSetImage(ref DescriptorSet set, DescriptorImageInfo imageInfo, DescriptorType type,
    uint binding)
{
    var write = new WriteDescriptorSet
    {
        SType = StructureType.WriteDescriptorSet,
        DstSet = set,
        DstBinding = binding,
        DstArrayElement = 0,
        DescriptorCount = 1,
        PImageInfo = &imageInfo,
        DescriptorType = type
    };
    _vk.UpdateDescriptorSets(_device, 1, &write, 0, default);
}

Execution of compute shaders

We’re almost there! All setup should be done by now! We now need some passthrough methods for all remaining neccessary steps.

  1. Bind the compute pipeline
  2. Bind the DescriptorSet
  3. Execute the compute shader using the Vk.CmdDispatch method
//execute compute shader
var cmd = ctx.BeginSingleTimeCommands();
ctx.BindComputePipeline(cmd, pipeline);
ctx.BindComputeDescriptorSet(cmd, descriptorSet, pipelineLayout);
ctx.Dispatch(cmd, 500/8, 500/8, 1);
ctx.EndSingleTimeCommands(cmd);

In VkContext:

public void BindComputePipeline(CommandBuffer cmd, Pipeline pipeline) => _vk.CmdBindPipeline(cmd, PipelineBindPoint.Compute, pipeline);
public void BindComputeDescriptorSet(CommandBuffer cmd, DescriptorSet set, PipelineLayout layout) =>
    _vk.CmdBindDescriptorSets(cmd, PipelineBindPoint.Compute, layout, 0, 1, set, 0, null);
public void Dispatch(CommandBuffer cmd, uint groupCountX, uint groupCountY, uint groupCountZ) =>
    _vk.CmdDispatch(cmd, groupCountX, groupCountY, groupCountZ);

We can now clean up all pipeline related objects, the Image and ImageView.

//destroy pipeline objects
ctx.DestroyDescriptorPool(descriptorPool);
ctx.DestroyDescriptorSetLayout(setLayout);
ctx.DestroyShaderModule(shaderModule);
ctx.DestroyPipelineLayout(pipelineLayout);
ctx.DestroyPipeline(pipeline);

//destroy image objects
ctx.DestroyImageView(imageView);
ctx.DestroyImage(image);

If we would run our program nothing noticeable will happen because we did not write any code in our shader nor are we using the image at all. Luckily we already have code to write the image file in a commented out section from before! We will just copy and modify the code before we destroy the Image and its view.

//copy data using a staging buffer
var newImageData = new uint[500 * 500];
ctx.MapMemory(buffer.Memory, ref mappedData);
fixed (void* pNewImageData = newImageData)
    Buffer.MemoryCopy(mappedData, pNewImageData,
        newImageData.Length * sizeof(uint),
        newImageData.Length * sizeof(uint));
ctx.UnmapMemory(buffer.Memory);
ctx.DestroyBuffer(buffer);

//save image
var info = new SKImageInfo(500, 500, SKColorType.Rgba8888, SKAlphaType.Premul);
var bmp = new SKBitmap();
fixed (uint* pImageData = newImageData)
    bmp.InstallPixels(info, (nint) pImageData, info.RowBytes);
using var fs = File.Create("./render.png");
bmp.Encode(fs, SKEncodedImageFormat.Png, 100);

We are done with the C# side now!

Writing a Compute Shader

Creating a white image

Our compute shader is pretty empty right now. After the version we will define our workgroup size which is 32 x 32 = 1024 workers and we will define a binding on slot 0 which is our image (image2D)

layout (local_size_x = 32, local_size_y = 32, local_size_z = 1) in;
layout(binding = 0, rgba8) uniform writeonly image2D resultImage;

Wen now want to store some information in our image. Adding one line into our main function gives us a boring white image. But hey: we made the GPU create it! 🎉 imageStore is a special compute shader function which stores data in our image at the current texel (gl_GlobalInvocationID).

#version 450

layout (local_size_x = 32, local_size_y = 32, local_size_z = 1) in;
layout(binding = 0, rgba8) uniform writeonly image2D resultImage;

void main() { 
    imageStore(resultImage, ivec2(gl_GlobalInvocationID.xy), vec4(1));
}

Calculate UV Coordinates

A white image is kind of boring, so let’s recreate our image from before. We can get the images size by using the imageSize function which returns an ivec2 (= Vector2 of ints). We convert it to floats (vec2) and divide the gl_GlobalInvocationID.xy by the imageSize.

#version 450

layout (local_size_x = 32, local_size_y = 32, local_size_z = 1) in;
layout(binding = 0, rgba8) uniform writeonly image2D resultImage;

void main() {
    vec2 imageSize = vec2(imageSize(resultImage));
    vec2 uv = (gl_GlobalInvocationID.xy) / imageSize.xy;
    imageStore(resultImage, ivec2(gl_GlobalInvocationID.xy), vec4(uv, 0, 1));
}

Will result in the following image

UV coordinates rendered by our GPU

Our first Sphere

We now want to render our first sphere. I will not go into detail about the math as there are plenty of good resources to it — to name two of it:

  1. Ray Tracing in One Weekend
  2. https://youtu.be/4NshnkzOdI0?si=jjxeIGrv7wy-s7dw & https://youtu.be/v9vndyfk2U8?si=j8SHadDZe5mTt6ZM (TheCherno)

We will first remap our UV coordnates from being between 0 and 1 to -1 to 1 by multiplying with 2 and subtracting 1.

In a function “PerPixel” we will solve the quadratic equation used to determine if we hit the sphere or not. We also need a ray struct which holds to vec3 — one for position, one for direction. We position our “camera” (its not really a camera system yet) one meter behind the origin and shoot its ray out to the transformed uv coordinate in -z direction.

struct ray {
    vec3 origin;
    vec3 direction;
};

vec3 PerPixel(vec2 coord) {
    ray ray = ray(vec3(0,0,1), vec3(coord, -1));
    float sphereRadius = 0.5;
    //solve sphere equation

    //return black
    return vec3(0); 
}

All we need now is to solve the equation and return pink if we hit the sphere.

float a = dot(ray.direction, ray.direction);

float b = 2 * dot(ray.origin, ray.direction);

float c = dot(ray.origin, ray.origin) - sphereRadius * sphereRadius;

float discriminant = b * b - 4 * a * c;
if(discriminant >= 0) return vec3(1, 0, 1); //pink
Ein Bild

To make it a bit more interesting we calculate the normal vector for our hit.

if(discriminant >= 0) {
    //we hit!
    float t = (-b - sqrt(discriminant) / (2 * a));
    vec3 rayAt = ray.origin + t * ray.direction;
    return normalize(-rayAt);       
}
More interesting sphere

After some reorganization our shader looks like this:

#version 450

layout (local_size_x = 32, local_size_y = 32, local_size_z = 1) in;
layout(binding = 0, rgba8) uniform writeonly image2D resultImage;

struct ray {
    vec3 origin;
    vec3 direction;
};

struct sphere {
    vec3 center;
    float radius;
};

bool hitSphere(sphere sphere, ray ray, inout vec3 normal){
    //solve sphere equation
    vec3 origin = ray.origin - sphere.center;
    
    float a = dot(ray.direction, ray.direction);
    float b = 2 * dot(origin, ray.direction);
    float c = dot(origin, origin) - sphere.radius * sphere.radius;

    float discriminant = b * b - 4 * a * c;
    if(discriminant < 0) return false;
    
    //calculate hitpoint and normal
    float t = (-b - sqrt(discriminant) / (2 * a));
    vec3 hitPoint = ray.origin + t * ray.direction;
    normal = normalize(-hitPoint - sphere.center);
    return true;
}

vec3 PerPixel(vec2 coord) {
    ray ray = ray(vec3(0,0,1), vec3(coord, -1));
    sphere sphere = sphere(vec3(0,0,0), 0.5);

    //if sphere is hit return its normal
    vec3 normal;    
    if (hitSphere(sphere, ray, normal)) {
        return normal;
    }
    
    //return black
    return vec3(0);
}

void main() {
    vec2 imageSize = vec2(imageSize(resultImage));
    
    //calculate uv coords
    vec2 uv = (gl_GlobalInvocationID.xy) / imageSize.xy;
    uv = uv * 2 - 1; //map -1 -> 1
    uv = vec2(uv.x, uv.y);
    
    vec3 color = PerPixel(uv);    
    color = clamp(color, 0, 1);
    
    imageStore(resultImage, ivec2(gl_GlobalInvocationID.xy), vec4(color, 1));
}
RaytracingC#Silk.NETCompute ShaderVulkanunsafeProgrammingShaderDevlog

Diesen Beitrag Teilen

Über Jens

Hi! Ich bin Jens, Doktor der Naturwissenschaften! Als Doktorand in Anorganischer Chemie an der Technischen Universität Braunschweig erforschte ich in der Arbeitsgruppe von Prof. Dr. Martin Bröring die faszinierende Welt der Porphyrinoide. Diese Strukturen, inspiriert von der Natur, stehen hinter lebenswichtigen Molekülen wie Häm, dem roten Blutfarbstoff, und Chlorophyll, dem grünen Pflanzenfarbstoff. Neben der Wissenschaft gehört die Softwareentwicklung zu meinen Interessen. Meine Reise begann mit einer frühen Faszination für die Softwareentwicklung, die in der Schule mit dem programmierbaren Taschenrechner geweckt wurde. In meiner Zeit als Doktorand kombinierte ich meine Forschung mit der Entwicklung von Software, um wissenschaftliche Erkenntnisse voranzutreiben.

[object Object]