8. Vertex Shaders

A vertex shader converts coordinates, applies shadows, and performs other operations on vertex attributes that are input. The 3DS system has four vertex processors, each of which can handle vector data comprising four, 24-bit floating-point values. The input vertex data is processed, in parallel, by the four vertex processors. Although the processors can run general-purpose calculations, they cannot read data from or write data to VRAM.

Like OpenGL ES 2.0, the 3DS system does not make a clear distinction between coordinates, normals, and other attributes stored with vertex data. Attributes determine how a vertex shader processes and outputs its input data. In general, the following types of attributes are input as vertex data for 3D graphics processing.

  • Vertex coordinates
  • Normal vectors
  • Tangent vectors
  • Texture coordinates
  • Vertex color

 

Vertex shaders are written in an assembly language that is unique to the PICA graphics core. We recommend that you refer to the Vertex Shader Reference as you continue to read this chapter.

8.1. Input Vertex Data

Vertex data input by the application is passed to the vertex shader through input registers that are bound to vertex attribute numbers. Using #pragma bind_symbol, the vertex shader specifies the names and registers for input data.

Code 8-1. Binding Data Names and Registers (in Shader Assembly)
#pragma bind_symbol(AttribPosition.xyzw, v0, v0) 

In this code sample, the xyzw component for the v0 input register is bound to the AttribPosition data name. You cannot bind more than one data name to the same input register. The second and third arguments must also take the same value (or the third argument must be omitted).

The application uses the glBindAttribLocation() function to bind vertex attribute numbers and data names, the glEnableVertexAttribArray() function to enable bound vertex attribute numbers, and glVertexAttribPointer or another function to input vertex data.

Code 8-2. Binding Vertex Attribute Numbers and Inputting Vertex Data
glBindAttribLocation(program, 0, "AttribPosition");
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 0, pointer); 

In this code sample, the vertex attribute number 0 is bound to data having the name AttribPosition, and vertex attributes for four components are specified. The register number and the vertex attribute number do not have to be the same.

Input registers that are not bound by #pragma bind_symbol have undefined values. Any input vertex data that is not a GL_FLOAT value is automatically converted into a GL_FLOAT value. However, note that the data is not normalized.

When a vertex buffer is used, the vertex attribute data type and data size combination affects transfer speed of the vertex data. For more information, see 15.13. Effect of Vertex Attribute Combinations on Vertex Data Transfer Speed.

Warning:

If vertex data is input through the glVertexAttribPointer() function, the fourth argument (specifying whether to normalize values) is ignored because it is not supported by the hardware. The vertex shader must explicitly normalize values because the specified value has no effect.

You cannot specify GL_FIXED or GL_UNSIGNED_SHORT for the type parameter of the glVertexAttribPointer() function. If GL_FLOAT or GL_SHORT is specified, ptr must be a pointer that is 4-byte aligned or 2-byte aligned, respectively.

While a vertex shader is processing a single vertex, it must load vertex data from at least one input register (even a single component), or it may not run properly.

8.2. Vertex Data Output

The vertex shader's processing results are written to the output registers that are mapped to output vertex attributes, and in this way the results are passed on to the next processing stage. Using #pragma output map, the vertex shader specifies the vertex attribute names and registers for output data.

Code 8-3. Mapping Registers to Output Vertex Attributes and Writing to the Registers (in Shader Assembly)
#pragma output_map(position, o0)
mov     o0,     v0 

In this code sample, vertex coordinates are mapped to the o0 output register. Data is output as vertex coordinates by writing data to o0.

Because only a reserved shader can be used for fragment processing, the vertex shader's output vertex attributes will be fixed in advance. Output vertex attributes have the following names. The components that the vertex shader must set for the output attributes are fixed.

Table 8-1. Output Vertex Attributes

Attribute Name

Output Attribute

Required Components

position

Vertex coordinates.

4 components: x, y, z, and w.

color

Vertex color.

4 components: R, G, B, and A.

texture0

Texture coordinate 0

2 components: u and v.

texture0w

Texture coordinate 0

1 component: w.

texture1

Texture coordinate 1

2 components: u and v.

texture2

Texture coordinate 2

2 components: u and v.

quaternion

Quaternions.

4 components: x, y, z, and w.

view

View vector.

3 components: x, y, and z.

generic

General-purpose attribute.

Any number of components that are used by the geometry shader.

The vertex shader must write some value to every component (x, y, z, and w) of registers that have been mapped. Dummy values must be written to any unmapped (unused) components in the registers specified by #pragma output_map.

The vertex shader forces processing to end when values have been written to all mapped registers. It then moves on to process the next vertex (the end instruction must be called). In other words, after the last attribute data has been written to a register, later instructions might not be executed.

You can only write to (each component of) a single output register once while processing a single vertex. Correct operation is not guaranteed if data is written to the same component (of the same register) more than once.

Except for the generic attribute, output vertex attributes can be mapped to no more than seven output registers. To map eight or more non-generic output vertex attributes, you must map multiple attributes to a single register. In this case, multiple vertex attributes (up to a total of four components) must be packed into a single register. For example, you could map texture1 and texture2 to o0.xy and o0.zw, respectively.

8.3. Uniform Settings

When #pragma bind_symbol binds data names to non-input registers (floating-point constant registers, Boolean registers, or integer registers), the application can use the glUniform*() functions to set values in each register. You can transpose matrices within the glUniformMatrix*() functions (convert the matrices from column-major to row-major order), by specifying GL_TRUE for transpose.

Sample code follows for both the vertex shader and the application.

Code 8-4. Binding Data Names to Non-Input Registers (in Shader Assembly)
#pragma bind_symbol ( ModelViewMatrix , c0 , c3 )
#pragma bind_symbol ( LoopCounter0 , i1 , i1 )
#pragma bind_symbol ( bFirst , b2 , b2 )
#pragma bind_symbol ( Scalar.x , c4, c4 ) 
Code 8-5. Setting Uniforms
uniform_location = glGetUniformLocation ( program , "ModelViewMatrix" );
GLfloat matrix[4][4];
glUniform4fv ( uniform_location , 4 , matrix );

GLfloat scalar_value;
uniform_location = glGetUniformLocation ( program , "Scalar" );
glUniform1f ( uniform_location , scalar_value );

uniform_location = glGetUniformLocation ( program , "bFirst" );
glUniform1i ( uniform_location , GL_TRUE );

GLint loop_setting[3] = { 4 , 0 , 1 } ; // loop_count – 1 , init , step
uniform_location = glGetUniformLocation ( program , "LoopCounter0" );
glUniform3iv ( uniform_location , loop_setting ); 

The sample code specifies component x for the data name Scalar. Components can be specified this way when binding a floating-point constant register. You must specify components consecutively, in xyzw order. In other words, xy, zw, or yzw can be specified but not xz, yw, or xyw.

Integer registers are used to control the loop instruction in a shader program. In a register that is 24 bits wide, the loop count is assigned to bits 0 through 7, the initial value is assigned to bits 8 through 15, and the increment value is assigned to bits 16 through 23. The loop instruction initializes the loop counter register to a default value, and then repeatedly executes the instructions between loop and endloop once more than the specified loop count. The loop counter register is incremented only by the increment value each time through the loop.

8.4. Notes for the Clip Coordinate System

The vertex shader outputs Z components in a clip coordinate system that differs from the one used in OpenGL ES.

OpenGL ES clips coordinates between -Wc and Wc, but the 3DS system clips them between 0 and -Wc (the sign is reversed). To use projective transformation matrices that are compatible with OpenGL ES, applications must convert the range from –Wc to Wc into the range from 0 to –Wc.

Converting Projective Transformation Matrices in the Application

Make the following conversion and set the resulting projection matrix as a uniform.

Code 8-6. Converting to an OpenGL ES-Compatible Projective Transformation Matrix
GLfloat projection[16];
projection[2] = (projection[2] + projection[3]) * (-0.5f);
projection[6] = (projection[6] + projection[7]) * (-0.5f);
projection[10] = (projection[10] + projection[11]) * (-0.5f);
projection[14] = (projection[14] + projection[15]) * (-0.5f); 
Converting in a Vertex Shader

Apply the projection conversion as follows.

Code 8-7. Converting to an OpenGL ES-Compatible Projective Transformation Matrix (in Shader Assembly)
#pragma output_map(position, o0)
#pragma bind_symbol(attrib_position, v0)
#pragma bind_symbol(modelview, c0, c3)
#pragma bind_symbol(projection, c4, c7)
def     c8, -0.5, -0.5, -0.5, -0.5

// Model View Transformation
dp4     r0.x, v0, c0
dp4     r0.y, v0, c1
dp4     r0.z, v0, c2
dp4     r0.w, v0, c3
// Projective Transformation
dp4     o0.x, r0, c4
dp4     o0.y, r0, c5
mov     r1, c6
add     r1, r1, c7
mul     r1, r1, c8
dp4     o0.z, r0, r1
dp4     o0.w, r0, c7 

8.5. Vertex Cache

Some of the vertex data created or processed by the vertex shader is saved in a cache. The vertex shader does not process input vertex data that is determined to be the same as the original vertex data saved in the cache, based on its vertex indices. Instead, the processed data in the cache is sent to the next process. The same vertex data is often processed more than once when it is input using GL_TRIANGLES, but this can be avoided if there is processed vertex data in the cache already.

The following conditions must be satisfied to use the vertex cache.

  • Vertex data must be input in a format that accesses vertex indices. In short, glDrawElements must be called to input vertex data.
  • Input vertex data must use a vertex buffer.

The vertex cache can save 32 vertex data entries. It is implemented with a proprietary algorithm that resembles the functionality of the LRU (least recently used) algorithm.

When the repeatedly accessed vertex data contains no more than 32 vertices, there is a higher chance of a cache hit. But the efficiency of the vertex cache is affected by conditions other than index order, including the usage state of the memory holding the index array, and the length of the shader executing as the vertex shader. For these reasons, the optimal index depends on the content, and there may not be a definitive answer.

8.6. Querying the Vertex Shader

You can query the vertex shader for information about active vertex attributes and uniforms.

8.6.1. Getting Vertex Attribute Information

You can use the glGetActiveAttrib() function to get attribute information for the vertex data input to the vertex shader.

Code 8-8. Definition of the glGetActiveAttrib Function
void glGetActiveAttrib(GLuint program, GLuint index, GLsizei bufsize, 
                       GLsizei* length, GLint* size, GLenum* type, char* name); 

A GL_INVALID_OPERATION error is generated if program specifies an unlinked or otherwise invalid program object.

For index, specify a value between 0 and one less than the number of vertex attributes obtained from the glGetProgramiv() function, when it is called on the program object specified by program, when GL_ACTIVE_ATTRIBUTES is specified for pname. A GL_INVALID_VALUE error is generated if the specified value is negative or greater than or equal to the number of vertex attributes.

For bufsize, specify the size of the array specified in name. A GL_INVALID_VALUE error is generated if a negative value is specified.

The vertex attribute's type is returned in type. The vertex attribute's size is returned in size. This is the number of values indicated by type that are required to represent the vertex attribute.

The vertex attribute's name is returned in name. If there are more than bufsize characters in the vertex attribute's name, up to bufsize - 1 characters are stored with a terminating character (NULL) added at the end. The number of characters in name is returned in length (excluding the terminating null character).

8.6.2. Getting Uniform Information

You can use the glGetActiveUniform() function to get uniform information registered with a program object. This is not limited to the vertex shader; you can also get information about uniforms used by the geometry shader and reserved fragment shader, which are described later.

Code 8-9. Definition of the glGetActiveUniform Function
void glGetActiveUniform(GLuint program, GLuint index, GLsizei bufsize, 
                        GLsizei* length, GLint* size, GLenum* type, char* name); 

A GL_INVALID_OPERATION error is generated if program specifies an unlinked or otherwise invalid program object.

For index, specify a value between 0 and one less than the uniform information count obtained from the glGetProgramiv() function when it is called on the program object specified by program and when pname is GL_ACTIVE_UNIFORMS. A GL_INVALID_VALUE error is generated if the specified value is negative or greater than or equal to the uniform information count.

For bufsize, specify the size of the array specified in name. A GL_INVALID_VALUE error is generated if a negative value is specified.

The type of value for the uniform setting is returned in type. The value's size is returned in size. This is the number of elements indicated by type that are required to represent the uniform setting. For example, GL_FLOAT_VEC4 is stored in type and 4 is stored in size for a 4×4 matrix, such as the modelview matrix.

The uniform's name is returned in name. If there are more than bufsize characters in the uniform's name, up to bufsize - 1 characters are returned with a terminating character (NULL) added at the end. The number of characters in name is returned in length (excluding the terminating null character).

8.6.3. Setting Categories

The glGetActiveAttrib and glGetActiveUniform() functions can get the following types of values.

Table 8-2. List of Setting Categories

Type

Type

Number of Components

Major Use

GL_FLOAT

float

1

Bias and scale values.

GL_FLOAT_VEC2

float

2

Viewport settings.

GL_FLOAT_VEC3

float

3

Colors (RGB) and directional vectors.

GL_FLOAT_VEC4

float

4

Colors (RGBA) and transformation matrices.

GL_INT

int

1

Mode settings.

GL_INT_VEC3

int

3

Combiner source input.

GL_BOOL

bool

1

Enabling and disabling features.

GL_SAMPLER_1D

int

1

Specifying lookup tables.

Lists categories that are actually used by uniforms in each shader.

8.7. Getting and Setting the Values of Multiple Uniforms

Functions are provided for getting and setting multiple uniforms, concurrently, on the 3DS system.

By calling the glUniformsDMP() function, you can concurrently set values in multiple uniforms for the program object that is currently bound.

Code 8-10. Setting a Group of Uniforms
void glUniformsDMP(GLuint n, GLint* locations, GLsizei* counts, 
                   const GLuint* value); 

For n, specify the number of uniforms to set.

For locations, specify a pointer to an array storing n uniform locations (which can be obtained by glGetUniformLocation). For counts, specify a pointer to an array storing the number of elements in the n uniforms. count for the glUniform*() functions corresponds to the number of uniform elements. Fill the array specified by counts with the number of elements to set for each array uniform and 1 for each non-array uniform.

For value, specify a pointer to an array storing the values to set in the uniforms. Because each uniform has a different amount of data, the indices for the values to store in value are not necessarily the same as the indices in locations and counts for the corresponding uniforms. You can mix both GLfloat and GLuint data in the uniforms that you set. Store 32-bit GLfloat data for the values to set in GLfloat uniforms.

This function does not perform any error-checking. Behavior is undefined if you specify an invalid value for any argument.

By calling the glGetUniformsDMP() function, you can concurrently get values from multiple uniforms for a specified program object.

Code 8-11. Getting a Group of Uniforms
void glGetUniformsDMP(GLuint program, GLuint n, GLint* locations, 
                      GLsizei* counts, GLuint* params); 

For program, specify the program object for which to get uniform values.

For n, specify the number of uniforms to get.

For locations, specify a pointer to an array storing n uniform locations (which can be obtained by glGetUniformLocation). For counts, specify a pointer to an array storing the number of elements in the n uniforms. The number of uniform elements is either 1 (for a non-array uniform) or the number of array elements from which to get values for an array uniform.

For params, specify a pointer to an array used to get the uniform values. Because each uniform has a different amount of data, the indices for the values stored in params are not necessarily the same as the indices in locations and counts for the corresponding uniforms. Both GLfloat and GLuint data can be mixed in the uniforms from which values are obtained. The GLfloat values obtained from GLfloat uniforms are stored as 32-bit data in params.

This function does not perform any error-checking. Behavior is undefined if you specify an invalid value for any argument.

8.8. Other Notes and Cautions

You can access uniform values with the glUniform* and glGetUniform*() functions, using uniform locations that can be obtained with the glGetUniformLocation() function. By adding an offset to a location, you can specify and access a specific element in an array uniform. For example, the second element of an array uniform is accessed when 1 is added to the uniform's location.

A uniform's location is fixed when the glLinkProgram() function is called and the value originally differs for each program object. The glUniform*() functions generate an error if the location is not related to the current program object. The glGetUniform*() functions generate an error if the location is related to a program object other than program. However, the glUniform* and glGetUniform*() functions do not generate errors if the programs being queried are reserved fragment shaders and the location is specified using a bitwise OR with 0xFFF80000.

Do not include vertex attributes in #pragma output_map definitions when they do not need to be output. If you do not follow this advice, useless instructions will be required because values must be written to all defined vertex attributes (output registers) when the vertex shader is run. Some output attributes also involve clock instructions for parts of the GPU circuit, and their output can needlessly run down the battery.

The maximum number of vertex attributes when rendering without using a vertex buffer is 16, and the maximum number of vertex attributes that can use a vertex buffer is 12. However, if 12 vertex attributes are rendered using a vertex buffer, you must take care about the limitations in vertex data placement (see 6.8.1. Restrictions Affecting Only glDrawElements).

In the vertex shader assembler, you can define up to 16 vertex attributes, but if the maximum number as determined according to the conditions above is exceeded, a GL_INVALID_OPERATION error may be generated when the rendering function is called.


CONFIDENTIAL