Shaders
I started writing these demos in early 2009.
New: thumbnails below are stored in a generated texture atlas for your fast-browsing experience. Additionally, this page totals 262KB. Screenshots are dynamically loaded when clicked.
A histogram is built for the final image, counting luminance in 512 buckets. This is done in the fragment shader using OpenGL 4 atomic counters.
Once the histogram is built it is downloaded. Each luminance bucket is scaled according to its prefix sum and the total luminance. The values are smoothed to remove harsh changes before being uploaded and used to render the scene. Linear interpolation is used between mapped values.
With this method, specific ranges of luminance can be tone mapped without affecting the rest of the scene. The next step might be to introduce some sort of object localization to the algorithm.
Generated lens effects. These effects are caused by light bouncing between lenses, instead of refracting directly through to the sensor.
The star texture is generated by randomly, additively splatting quads with one white vertex near the center and the other three black near a random angle/radius. The quad is interpolated as two triangles, split such that the mid-grey value is near the 3 black vertices.
The rest of the flares are stored in a list with the following attributes:
- Colour
- Position along the ray
- 4 Radii giving intensities (off, on, still on, off) which use cosine interpolation.
There's other stuff in this scene but most I've already done before.
Had to give this a go when I read about it. Fairly complex and currently slow. I'm hoping to speed it up at some point and add antialiasing and perhaps soft shadows.
What it does:
- Render scene to a G-buffer (positions, normals, colour).
- Create a per-pixel linked list buffer as a kind of shadow map.
- Add all positions from the G-buffer to the corresponding linked lists (per-pixel buckets). Depth, camera space fragment x/y coord and exact light space x/y coord is stored.
- Re-render the scene from the light's point of view. For every fragment, compare the depth to the positions in the appropriate linked list.
- For each position in shadow, set a 1 in a shadow mask (using the camera space fragment x/y coord).
- Do a post process pass, applying deferred shading with the shadow mask texture.
- Expand the triangle so fragments are rasterized for ALL intersecting pixels (done in the geometry shader).
- In the fragment shader, check each fragment position from the linked list against the triangle manually using the light space x/y coord (pass in 2D vectors from the geometry shader and check with Barycentric coordinates).
- If the check passes, go on to find the interpolated depth value ( finally found a good source here. It's quite simple.) and compare it to the bucketed fragment's depth.
In progress.
Some older, less awesome videos. I can add all of these now because videos are embedded in a separate page :).
- Fractal: Midpoint displacement/diamond-square algorithm
- Perlin Noise
- The Fault Algorithm
- Smoothing and random noise
Source: terrain.frag
A texture and normal map is also generated on the GPU. Colour can be added to the terrain texture based on a min/max height, and steepness limits with a perlin noise factor (thanks to terragen for the ideas).
Textures can also be generated for sections of the terrain for LOD.
I'd quite like to take a look at tessellation shaders and attempt this kind of thing dynamically.
This demo uses geometry shaders to render the variance shadow maps to cube maps. The cube maps are blurred, again with geometry shaders, using a separable gaussian blur. While this is not a correct spherical blur, artefacts are barely noticeable. Each light is additively blended to the scene using deferred shading. Rendering to the shadow maps takes approximately 2/3 of the time taken to blend the lights with shadows into the scene.
Multisampling
In previous demos, multisampling can easily be done using a multisample renderbuffer. I had a go here doing explicit multisampling however there was not much to go on in the way of examples.
//create a multisample textureglGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_2D_MULTISAMPLE, tex);
glTexImage2DMultisample(GL_TEXTURE_2D_MULTISAMPLE, samples, format, x, y, GL_FALSE);
//... and to access the samples in a shadervec4 pos = texelFetch(deferredPositionMS, pixCoord, sampleIndex);
There is also a GL_TEXTURE_RENDERBUFFER_NV texture which maps to a renderbuffer. This way a renderbuffer can be the render target, blitted and/or read in a shader using a uniform samplerRenderbuffer. However after trying both, it seems neither is faster.
A preliminary test to generate, animate and render trees. Definitions of node types and child probabilities are stored in text files. This format supports recursion so it's relatively simple to design new trees, albeit without a nice gui and realtime sliders. Trees are generated CPU side and then uploaded to the GPU where they stay. Currently each node can have a maximum of 4 children.
A fragment shader updates the animation state and then the tree heirarchy, setting node positions relative to their parents etc. Then a bunch of points are passed through a geometry shader which will either spit out a stretched cube or plane for a branch or leaf. The fragment shader then does raycasting to render a cylinder inside branch cubes. Texture coordinates are generated for this and a normal map is applied.
This demo also uses a multisampling FBO however some form of LOD is definately required.
The tree generation needs quite a few more features implemented to give a more accurate tree. At the moment, much of it is random and needs weighting. The most evident abnormality is gravity plays no role yet. The animation speed could probably be optimized as I've implemented quaternion slerp operations on the GPU. Matrices may just be faster anyway. The ray-cylinder intersection tests are not quite right and I should use billboards anyway.
Testing out a few methods to find a good balance between quality and performance. These images splat light (essentially deferred shading, spawning down sampled per-pixel point lights) which produces artefacts when scattering surfaces do not face the viewer. An edge finding algorithm could clear up the down sampling issues but I think ultimately multiple viewing angles are needed along with depth peeling.
The light attenuation is not calculated correctly yet. For example the second bounce (three bounces are computed in these images) is strangely brighter than the first. I'll get round to fixing it some day.
Soft shadows would be a great addition. A coarse indirect illumination along with SSAO may give a decent performance/quality trade-off.
Metaballs are essentially functions in a 3D space. They apply a radial influence which decreases over distance. This could be seen for example as a density volume. It is common to render this 3D space by defining a threshold to form an isosurface. This is the 3D form of a contour line. For any point in the volume, a value is given as the sum of all metaball's functions. The points where this value is equal to the threshold define the isosurface.
This demo renders the isosurface using marching cubes, computed using geometry shaders. It starts by drawing a grid of points which are run through a geometry shader. Each point marks the center of a cube and the isosurface is evaluated at each courner. Triangles are generated from a lookup table based on the isosurface value assuming the cube is not completely inside or outside the surface. In two screenshots only face normals are generated for each triangle.
The biggest problem occured when trying to pass the lookup tables to the geometry shader. Uniform memory is not big enough to support the table needed without compression so instead texture buffers were used.
Normals can also be generated per-vertex by explicitly integrating the density function. While this requires an extra 4 function lookups (points not aligned with the grid) the results are clearly worth it given a simple function. This causes a performance hit of approx 50%.
The isosurface function should really be evaulated at each unique vertex and stored. For now, this demo simply calls the function in the geometry shader (shader source). GPU Gems 3 has a nice article on marching cubes although the geometry is static.
This demo uses the OpenGL 4 atomic operations to capture all fragments for each pixel. This is essentially depth peeling in a single pass. Currently the demo stores the fragments in multiple frame buffers which uses quite a lot of memory. A second pass sorts the fragments to be used for transparency.
ATI construct linked list to store fragments which can save a huge amount of memory. "Pages" can be used to reduce atomic operation collisions.
This method takes a cube built from triangles and recursively subdivides triangles by splitting the longest edge in two. The split point is moved a set distance from the center of the cube and randomized.
I haven't got round to getting smoothed normals and it's quite slow. The geometry feedback extension should improve the speed. Subdivision is difficult as the longest edge on each triangle may not be the same for adjacent triangles.
This scene contains
- Shadows
- Surface caustics
- Volumetric caustics
- Refraction
Based on this technique.
The first step of this approach renders depth and normals for front and back faces of refractive geometry from the light's perspective. The scene's (non-refractive geometry) depth is also rendered.
Linear ray marching refracts through the front face and then back face of refractive geometry before intersecting with the scene. The refractive exit positions and scene's intersection position is stored in a texture.
Lines are drawn in a separate eye space texture which is then blurred for volumetric caustics. Points are drawn to a light space texture to create a caustic map. The volumetric light texture is added to the scene and the caustic map is projected onto the scene.
This same ray marching method is repeated from the camera's perspective to render the refractive geometry after caustics have been added.
There are a few extras such as copying depth buffers around for correct occlusion. I'm sure some parts of the rendering pipeline for this could be reused and made more efficient but it runs in realtime on a 9600.
An attempt at using polygons instead of a caustics map was also made.
This demo uses floating point position and velocity textures which are double buffered.
Computation is done by drawing a full screen quad (althrough I was recently informed a single triangle is better) with the second buffers bound to an FBO. Then the particles are numerically integrated in the fragment shader, reading from the current buffers bound to sampler2Ds before writing the results through gl_FragData[]. The buffers are swapped and the process repeats. This way all data remains entirely on the GPU.
uniform sampler2D positions;
uniform sampler2D velocities;
//opengl3, use glBindFragDataLocationout vec4 outPosition;
out vec4 outVelocity;
void main()
{vec2 coord = gl_FragCoord.xy / textureSize(positions);
vec3 pos = texture(positions, coord).rgb;
vec3 vel = texture(positions, coord).rgb;
//sum gravity to N other particlesvec3 acc = ...
//integratepos += ...vel += ...//pre-opengl3, use gl_FragData[0/1]outPosition = pos;
outPosition = vel;
}One version uses geometry shaders to turn points into quads to render the particles. For pre-opengl3, a VBO of quads is stored where each vertex is given the texture coordinate of the particle in the position/velocity textures. Quads are rotated and stretched in their direction of travel to approximate motion blur.
Each particle gravitates towards 10 random particles each frame. To make it less boring an HDR and Bloom effect is added. Very realtime (>60fps) framerates can be achieved depending on the number of particles each gravitates towards.
For a rand() function, a noise texture can be generated and/or combinations of particle velocity and position used with floor() and mod() to produce fairly random numbers. Sometimes the less random numbers can lead to interesting results.
Running on an 8800GTS.
Many point lights orbiting the origin.
The scene is rasterized however positions, colour and normals are stored, unlike the traditional method of calculating lighting on the fly. This information (the G buffer) is used in a second pass which draws quads over point lights. For each pixel in the quad, the position and normal of the scene is sampled to calculate diffuse and specular light contributions. The result is additively blended into the final render.
Running on an 8800GTS.
A double buffered texture stores height and velocity for the grid of water. Each frame the fragment shader applies a function to simulate water movement over the surface (somewhat incorrect but it looks ok). This heightmap is used to render distortion (again, aproximated) and lighting of the water surface.
The fish flock using a kd tree which will run 5000 boids/fish run at 8fps or 2000 at 32fps.
The idea came from here. Rooms have varying walls, depths and lights can be turned on and off. The wall textures are stored in a cube texture (shader source).
The method involves intersecting the room walls/ceiling/floor after the ray entrance position is found in tangent space. A random texture can then be chosen from the cube texture.
This same technique has been extended here ("instant animated grass").
For example, after bloom is applied with floating point textures, colour intensities of 0 to 1 map to 0 to 0.8. Colours greater than 1 map to 0.8 approaching 1.0 but may never actually get there.
On a side note, the impressiveness of modern video game graphics is directly proportional, with positive correlation, to
the amount of bloom. One can project this trend to approximate what games will look like in the future.
Here's a screenshot:
Mouse-over shader
<Description>Select Shader
Email: <email hidden>
I haven't released the source for most of these, mainly because since writing them I always think of better ways to code it. If you want one of the examples or even just the GLSL, feel free to email me (or add a comment).
Tutoring