Alifatisk a day ago

Very cool project and AMD graphics cards deserve this kind of work! Very well done. May I ask, is there any reason why one would focus themselfves on a single type of graphics card instead of relying on a library that works for other variants too? Is it because you get more fine grained control that you lose on a abstraction level?

  • pxl-th a day ago

    Thanks!

    > May I ask, is there any reason why one would focus themselfves on a single type of graphics card instead of relying on a library that works for other variants too?

    AMDGPU.jl is actually one of the backends supported by Julia. We do support CUDA, Metal, Intel, OpenCL as well to a varying degree: https://github.com/JuliaGPU

    Each GPU backend implements a common array interface and a way to compile Julia code for low-level kernels relying on the GPUCompiler infrastructure: https://github.com/JuliaGPU/GPUCompiler.jl

    Once that is done, users can write code and low-level kernels (using KernelAbstractions.jl) in a backend-agnostic manner.

    Here're some examples of packages that target multiple GPU backends in this way:

    - Real-time gaussian splatting supporting AMD GPU & Nvidia GPUs (probably others as well with minor work): https://github.com/JuliaNeuralGraphics/GaussianSplatting.jl

    - AcceleratedKernels.jl which is like STD library: https://github.com/JuliaGPU/AcceleratedKernels.jl

    - NNop.jl implements Flash-Attention and other NN fused kernels: https://github.com/pxl-th/NNop.jl

    - Flux.jl a Deep-Learning library: https://github.com/FluxML/Flux.jl

  • jamiejquinn 18 hours ago

    OP answered the Julia-specific part but I'll chime in with a solid, language-agnostic yes, there are still reasons a GPU dev might want to target specific hardware. GPU hardware at the moment is more vendor-specific than CPUs. Off the top of my head, major differences include specialised hardware like Nvidia's tensor cores, differences in common hardware like cache and register size, and niche features like Nvidia's combined grace-hopper machines or AMD's gpu-cpu hybrid MI300A.

    Sophisticated high-level approaches (like Julia's) will be able to utilise/mitigate some of the differences but I don't think we're fully vendor-agnostic quite yet (and probably won't ever be at the evolving cutting edge).

    • pxl-th 16 hours ago

      Definitely, with backend agnostic code you can target only a common set of features. It's convenient to use where it makes sense as it reduces the complexity: 1 kernel for all backends. And you can actually go a long way with this without sacrificing too much of the performance.

      But for squeezing maximum performance & using latest features you have to target each device individually.