It was a question that inspired me to start programming fishpye (also known as raycl): what would it look like to be able to see in all directions at once?

The answer, it seems, is quite weird:

The project is written in Python and OpenCL. Enj did a fantastic series of PyOpenCL tutorials that are worth reading over if you’re curious.

If you want to give it a try, install Python and the OpenCL drivers for your graphics card. You will need a beefy graphics card – I used a NVIDIA GTX 570.

Ray-tracing and rasterization

The most common algorithm for rendering 3D graphics is rasterization. It can be executed fast enough on graphics cards to update the screen in real-time, mainly because it transforms 3D vertices into a 2D plane using really fast matrix operations. It’s also an embarrasingly parallel problem.

Another algorithm for rendering 3D graphics is ray-tracing. This algorithm is usually used to produce realistic-looking images. This is made easy by the way it works – it essentially simulates each ray of light in reverse. A ray is traced from the eye of the viewer outwards, until it hits an object. At that point, more rays are generated for shadows, reflections, and so on. This is not used frequently in games – some renders can take hours or days to complete (for a single frame!), depending on the resolution and the complexity of the scene.

There’s another difference between rasterization and ray-tracing that I think doesn’t get enough attention. Rasterization is stuck with a linear transformation projecting 3D scenes onto a plane, while ray-tracing could be used to project scenes onto any surface. If the camera in the scene was a complete sphere, the field of view (FOV) could range to 360 degrees, yielding the desired fisheye effect.

Memory considerations

While ray-tracing is an embarrasingly parallel problem, a GPU implementation fast enough to render in real time has not been written yet (to my knowledge). As I understand it, the problem is that to parallelize it, each thread on the GPU has to access the same regions of memory to check what objects exist in a given volume. This can become a major bottleneck. Rasterization does not have this problem because the vertex shader simply computes a function of the vertex coordinates – it does not have to perform random access on shared memory.

In OpenCL, the memory model is a little different from what you find in typical sequential-programming memory models. There are four address spaces:

  • Global, which all work-items (threads) have read and write access to all of. Random access performance is typically poor, especially when multiple work-items need access to the same bytes of global memory.

  • Constant, which all work-items have read-only access to all of. Random read performance on this is good, but you usually only have a very small amount of space (32-64kb on high-end graphics cards).

  • Local, which is shared between groups of work-items.

  • Private, which is not shared between work-items.

To get good performance, I decided to shove the entire scene description into constant memory, which required a compact representation.

In fact, I went with a very density-poor format, a voxel grid, for the sake of simplifying the implementation. But by limiting the dimensions of the scene I could comfortably fit it within the 32kb limit.

Ray-casting and lighting hack

For the actual ray-tracing algorithm, I implemented Amanatides and Woo’s voxel traversal algorithm (PDF) to find the first objects that the rays from the camera hit. I didn’t implement secondary rays (shadow, lighting, reflection rays). Instead, I opted to treat the camera as a light source so that the same ray used to find an object could light it. This means surfaces angled away from the camera appear darker than the ones angled towards it. The effect is quite interesting.

Conclusion and future improvements

A lot could be improved, really. All the decisions I made were to simplify implementation. If I cared more about efficiency or features, things would have gone differently. Still, this was enough to get a feel for what 360-degree vision would be like at 200 FPS and 512x512 resolution.

A sparse-voxel octree would more efficiently compact the scene description into constant memory, at the cost of greater cognitive overhead.

Fisheye quake used multiple rasterized images, one for each face of a cube surrounding the camera, and stitched them together on the CPU. One particular improvement to fishpye would be to ditch the voxels and use a technique based on this. Perhaps the rasterized images could be stitched together on GPU by ray-tracing them. This would remove the 32kb limit and allow for much more complex scenes.

Next time

Shortly I will write a post about how the portals I added to fishpye work.

Imported Comments

2012/01/20 » Andie

That’s a smart answer to a difficult quetsion.

blog comments powered by Disqus


16 January 2012