::graphics and programming

Faster Gaussian Blur in GLSL

The popular technique to make a gaussian blur in GLSL is a two-pass blur, in the horizontal and then in the vertical direction (or vice-versa), also known as a convolution. This is a classic post on this matter http://www.gamerendering.com/2008/10/11/gaussian-blur-filter-shader/, and many others you will find around the web do exactly the same. But the performance of this shader can be improved quite a lot.

The bottleneck is in the dependent texture reads, which are any texture reads in a fragment shader where the coordinates depend on some calculation, or any texture reads in a vertex shader. A non-dependent texture read is one where the coordinates are either a constant or a varying. According to the PowerVR Performance Recommendations, a dependent texture read adds a lot of additional steps in the processing of your shader because the texture coordinates cannot be known ahead of time. When you use a constant or a varying, the hardware is able to pre-fetch the texture data before it gets to run your fragment shader, hence it becomes way more efficient.

In this gaussian blur shader we can get completely rid of the dependent reads. The key is to compute all the texture coordinates in the vertex shader and store them in varyings. They will be automatically interpolated as usual and you will have the benefits of the texture pre-fetch that occurs before the GPU runs the fragment shader. In GLSL ES we can use up to 32 floats in our varyings, which allow us to pre-compute up to 16 vec2 texture coordinates in the vertex shader.

We need two vertex shaders, one for the first pass which computes the horizontal texture coordinates and another for the second pass which computes the vertical texture coordinates. And we need a single fragment shader that we can use in both passes. This is what the first vertex shader looks like:

/* HBlurVertexShader.glsl */
attribute vec4 a_position;
attribute vec2 a_texCoord;

varying vec2 v_texCoord;
varying vec2 v_blurTexCoords[14];

void main()
    gl_Position = a_position;
    v_texCoord = a_texCoord;
    v_blurTexCoords[ 0] = v_texCoord + vec2(-0.028, 0.0);
    v_blurTexCoords[ 1] = v_texCoord + vec2(-0.024, 0.0);
    v_blurTexCoords[ 2] = v_texCoord + vec2(-0.020, 0.0);
    v_blurTexCoords[ 3] = v_texCoord + vec2(-0.016, 0.0);
    v_blurTexCoords[ 4] = v_texCoord + vec2(-0.012, 0.0);
    v_blurTexCoords[ 5] = v_texCoord + vec2(-0.008, 0.0);
    v_blurTexCoords[ 6] = v_texCoord + vec2(-0.004, 0.0);
    v_blurTexCoords[ 7] = v_texCoord + vec2( 0.004, 0.0);
    v_blurTexCoords[ 8] = v_texCoord + vec2( 0.008, 0.0);
    v_blurTexCoords[ 9] = v_texCoord + vec2( 0.012, 0.0);
    v_blurTexCoords[10] = v_texCoord + vec2( 0.016, 0.0);
    v_blurTexCoords[11] = v_texCoord + vec2( 0.020, 0.0);
    v_blurTexCoords[12] = v_texCoord + vec2( 0.024, 0.0);
    v_blurTexCoords[13] = v_texCoord + vec2( 0.028, 0.0);

As you can see, the v_blurTexCoords varying is an array of vec2 where we store a bunch of vectors around the texture coordinate of the vertex a_texCoord. Each vec2 in this array will be interpolated along the triangles being rendered, as expected. We do the same for the other vertex shader which pre-computes the vertical blur texture coordinates:

/* VBlurVertexShader.glsl */
attribute vec4 a_position;
attribute vec2 a_texCoord;

varying vec2 v_texCoord;
varying vec2 v_blurTexCoords[14];

void main()
    gl_Position = a_position;
    v_texCoord = a_texCoord;
    v_blurTexCoords[ 0] = v_texCoord + vec2(0.0, -0.028);
    v_blurTexCoords[ 1] = v_texCoord + vec2(0.0, -0.024);
    v_blurTexCoords[ 2] = v_texCoord + vec2(0.0, -0.020);
    v_blurTexCoords[ 3] = v_texCoord + vec2(0.0, -0.016);
    v_blurTexCoords[ 4] = v_texCoord + vec2(0.0, -0.012);
    v_blurTexCoords[ 5] = v_texCoord + vec2(0.0, -0.008);
    v_blurTexCoords[ 6] = v_texCoord + vec2(0.0, -0.004);
    v_blurTexCoords[ 7] = v_texCoord + vec2(0.0,  0.004);
    v_blurTexCoords[ 8] = v_texCoord + vec2(0.0,  0.008);
    v_blurTexCoords[ 9] = v_texCoord + vec2(0.0,  0.012);
    v_blurTexCoords[10] = v_texCoord + vec2(0.0,  0.016);
    v_blurTexCoords[11] = v_texCoord + vec2(0.0,  0.020);
    v_blurTexCoords[12] = v_texCoord + vec2(0.0,  0.024);
    v_blurTexCoords[13] = v_texCoord + vec2(0.0,  0.028);

The fragment shader just computes the gaussian-weighted average of the color of the texels on these pre-computed texture coordinates. Since it uses the generic v_blurTexCoords array, we can use this same fragment shader in both passes.

/* BlurFragmentShader.glsl */
precision mediump float;

uniform sampler2D s_texture;

varying vec2 v_texCoord;
varying vec2 v_blurTexCoords[14];

void main()
    gl_FragColor = vec4(0.0);
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 0])*0.0044299121055113265;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 1])*0.00895781211794;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 2])*0.0215963866053;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 3])*0.0443683338718;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 4])*0.0776744219933;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 5])*0.115876621105;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 6])*0.147308056121;
    gl_FragColor += texture2D(s_texture, v_texCoord         )*0.159576912161;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 7])*0.147308056121;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 8])*0.115876621105;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[ 9])*0.0776744219933;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[10])*0.0443683338718;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[11])*0.0215963866053;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[12])*0.00895781211794;
    gl_FragColor += texture2D(s_texture, v_blurTexCoords[13])*0.0044299121055113265;

As you can see, all the texture reads (calls to the built-in texture2D function) receive a pre-computed texture coordinate, that is why the hardware is able to pre-fetch the texture data for these texture reads, which is a huge improvement. For more details about what happens in the hardware, check out the section 5 of the PowerVR SGX Architecture Guide for Developers.

Then, to render the blurred image we draw into an off-screen buffer (Render toTexture) the first pass using the HBlurVertexShader and the BlurFragmentShader and the second pass using the VBlurVertexShader and the BlurFragmentShader. Of course the second pass might be rendered directly into the main frame buffer if no more post processing is going to take place.

Obviously, this technique can also be applied in any other similar scenarios. You can find an implementation of blur and sharpen filters in XBImageFilters.


2 Responses to Faster Gaussian Blur in GLSL

  1. [...] of the current LambdaCube implementation. One more potential optimisation worth trying would be to calculate the sample coordinates in the vertex shader, making the texture reads non-dependent. Currently this is not possible due to the lack of support [...]

  2. [...] page: xissburg’s 16×16 gaussian blur, shows efficient code. I want to try that as a starting [...]