Test: CUDA Volume Raycasting

YouTube Preview Image


  • CUDA based application showing gpu volume raycasting using single pass Stegmaier et al. technique.



The cuda raycasting implementation is the same approach as the Stegmaier one but with some minor changes and simplifications:

  • CUDA kernel
  • The volume is a cube: width, height and depth are all the same.
  • Proxy geometry (the six quads or the cube) is normalized and translated so the center its the same as the center of the scene.

The intersection code is really unoptimized. It’s just a straightforward implementation of the intersection ray-aabox function shown in “Real time collision detection” by Christer Ericson. I would suggest you to see the implementation of this function within the nvidia sdk volume render sample .


  • Left click plus mouse movement to rotate the camera around the volume.
  • Right click to show the context menu.
  • Middle click plus vertical movement to increase/decrease the density window width.
  • Middle click plus horizontal movement to move right/left the density window.


  • C++
  • Opengl 2.1
  • CUDA
  • Glut for Win32
  • Glew
  • Microsoft Visual Studio 2008 Professional Edition
  • Subversion
  • Redmine

Develop/Build/Test Machine Specs

  • Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
  • 3072 Mb RAM
  • Nvidia GeForce GTX 260
  • Nvidia Driver Version: 196.21 WHQL
  • Microsoft Windows 7 Professional x64


11 Responses to “Test: CUDA Volume Raycasting”

  1. stefanbanev says:

    Thanks for shearing the capabilities of CUDA for volume-ray-casting, it is quite revealing. You may get an incomparably better performance form CPU based volumetric ray caster on “i7 CPU 920”, however it is a good task to learn CUDA limitations. The texture mapping of small dynamically prefetched 3D texture bricks with adaptive sampling density per brick is more suitable for CUDA. Anyway good luck…

  2. Ruben Penalva says:

    Hi Stefan,
    thanks for posting your comment, I really appreciate it.

    This implementation is just a test not focus on speed but on using CUDA with something I’m familiar. You can expect really poor times. In the future I will give a try to a pure cpu ray casting algorithm(multithreading + simd).

    Do you have experience with ct reconstruction and volume rendering?

    Ruben Penalva

  3. stefanbanev says:

    > experience with ct reconstruction and volume rendering?

    a lot with vr; I’m not in liberty to share it though…

    good luck with your experiments with vr/cuda


  4. DrManhatten says:

    Don’t believe the hype. The memory and caching for a realistic sized volume set would be terrible on a CPU unless you allign your data structures very well.

    Texture mapping (+ free filtering) and extremly fast memory and memory caching is what will make a GPU run very fast.
    Even the i7 has nothing to compare with that. Especially if you have to do trilinear filter. This is what GPU’s are best at. CUDA however is a stupid choice for implementing a raycaster since it does not give you any advantages.

    A CPU implementation might be closer in terms of speed when it comes down to isosurface

  5. Stefan says:

    >Texture mapping (+ free filtering) and extremly
    >fast memory and memory caching is what will
    >make a GPU run very fast. Even the i7 has
    >nothing to compare with that.

    For texture mapping this statement is totally accurate since TM can be effectively implemented on SIMD machine.

    >Even the i7 has nothing to compare with that.

    Volumetric ray casting does not map well to SIMD and best CPU implementations beats badly (order of magnitude) the best GPU VR ray-caster. If you would like I may provide the links so you may compare side-by-side the best from two camps if you have a sufficient hardware and accreditation with respected research facility/university, otherwise you should wait until it becomes a main stream.

  6. Jean Luck says:

    I have :

    -Windows Vista 32 bits
    -NVIDIA GeForce 9300M GS
    -Visual Studio 2008
    i have this probelm whene i try to execute the project ( Debug Win32) :
    Exception non gérée à 0x760142eb dans CUDARayCasting.exe : Exception Microsoft C++ : char à l’emplacement mémoire 0x001df474..

    Can you help me please

  7. Ruben Penalva says:

    Hi Jean,
    try to set the “Working Directory” of the project to “$(OutDir)”. Thanks! :)

    Ruben Penalva

  8. Jean Luck says:

    it’s doesn’t work :(

    i still have the same problem, i think it depend of the variable SIZE set to 4294967295.

    Can I resolve this kind of problem please

  9. Jun says:

    I get the problem :unsolved exception:Microsoft C++ exception:char in memory 0x0024ee7c。
    I use Win7,VS2008,my CPU is i7 and my graphics card is GTX570.
    How to solve it ?
    Thanks so much.

  10. Manikandtan C K says:

    Hi, I tried building the code. I have cuda 5.5 and other required software. but I am getting these two link errors:
    1>kernel.cu.obj : error LNK2019: unresolved external symbol _cudaMalloc3DArray@20 referenced in function _CreateCUDAVolume
    1>kernel.cu.obj : error LNK2019: unresolved external symbol _cudaMallocArray@16 referenced in function _CreateCUDATransferFunction
    fatal error LNK1120: 2 unresolved externals

    I have checked the linker options for cuda and found the path correct. I am using visual studio 2008.
    What am I doing wrong?

  11. Manikandtan C K says:

    Oh and to add to the earlier post, I can run the precompiled binaries without issues. Really need help on this one.