4. Command Lists

Command lists are new to the Nintendo 3DS. The gl and nngx() functions called using 3D graphics processing can be recorded as commands and then executed all at the same time. Command list processing occurs using command list objects. The Nintendo 3DS handles command lists by the 3D graphics rendering execution unit.

Command lists include commands that write to registers using direct GPU execution (3D commands), and command requests for communicating instructions from the CPU to the GPU. 3D commands accumulate in the 3D command buffer as gl and nngx() functions carry out rendering work and other tasks. Command requests are queued by the specific gl and nngx() functions that called them. For information about the types of command requests, see 4.2. Command Request Types.

4-1. Command Lists

Command List	3D Command	3D Command Buffer	Command Request

When a 3D execution command queued in the command request is processed, the GPU loads the 3D command from the command buffer and executes it. Multiple 3D commands are handled and run together as a single command set. Each command set ends with a split command, so the GPU can track where to end the loading of the 3D buffer.

Figure 4-2. Command Sets

3D Command Buffer	Command Request	3D Execution Command	Command Set

4.1. How to Use

The 3DS GPU renders 3D graphics by running commands in units of command lists. Applications create command list objects into which gl() functions and other functions accumulate 3D commands, which are then executed as one batch when the GPU runs the command list.

4.1.1. Creating Objects

First, use the nngxGenCmdlists() function to create the command list objects.

Code 4-1. Function for Creating Command Lists
void nngxGenCmdlists(GLsizei n, GLuint* cmdlists); 

This code creates n command list objects and stores their object names in cmdlists.

Command lists have their own namespace. The command list with an object name of 0 is reserved by the system.

Table 4-1. Errors Generated by the nngxGenCmdlists Function

Error Code

Cause

GL_ERROR_8000_DMP

A negative value was specified for n.

GL_ERROR_8001_DMP

The internal buffer failed to be allocated.

4.1.2. Binding Command Lists

Next, use the nngxBindCmdlist() function to bind a generated command list object to the GPU. 3D commands are accumulated in the bound command list's 3D command buffer.

Code 4-2. Function for Binding Command Lists
void nngxBindCmdlist(GLuint cmdlist); 

If cmdlist is set to an unused object name, that object is created.

Table 4-2. Errors Generated by the nngxBindCmdlist Function

Error Code

Cause

GL_ERROR_8004_DMP

The internal buffer failed to be allocated.

GL_ERROR_8005_DMP

Called while command caches and command lists were in a state of being saved. (For more information, see the CTR Programming Manual: Advanced Graphics.)

4.1.3. Allocating Memory Regions

Use the nngxCmdlistStorage() function to allocate a memory region for the bound command list.

Code 4-3. Allocating a Memory Region for a Command List
void nngxCmdlistStorage(GLsizei bufsize, GLsizei requestcount); 

Set bufsize to the size of the 3D command buffer and requestcount to the number of command requests that can be queued.

You must call the nngxBindCmdlist() and nngxCmdlistStorage() functions on each command list object that you create. Calls to these functions are ignored when a command list with an object name of 0 is bound. If you call this function again on an object that already has an allocated region, that region is freed and reallocated.

A GL_ERROR_COMMANDBUFFER_FULL_DMP error is generated by the relevant function if 3D commands have accumulated past this allocated 3D command buffer size or if the 3D command buffer has not been set. A GL_ERROR_COMMANDREQUEST_FULL_DMP error is generated by the relevant function if command requests have been queued past this maximum queue size or if the buffer has not been set.

Table 4-3. Common Errors When Allocating a Memory Region for a Command List

Error Code

Cause

GL_ERROR_8006_DMP

Failed to allocate a memory region.

GL_ERROR_8007_DMP

Called on a command list that is being executed.

GL_ERROR_8008_DMP

A negative value was specified as an argument.

4.1.4. Running Command List Objects

Call the nngxRunCmdlist() function to start executing command requests that have been queued in the bound command list.

Code 4-4. Function for Executing Command Lists
void nngxRunCmdlist(void);
void nngxRunCmdlistByID(GLuint cmdlist); 

Execution is ignored if the bound command list has an object name of 0. Likewise, attempts to bind a different command list and run this function are ignored while command requests are executing.

After command requests have started executing, you can accumulate more commands in that same list or you can bind another command list and accumulate commands there. However, commands must be executed in the same order in which they were accumulated.

The nngxRunCmdlistByID() function runs the command list specified by cmdlist rather than the currently bound command list. Besides running the specified command list, it works the same as the nngxRunCmdlist() function.

Table 4-4. Common Errors When Running Command Lists

Error Code

Cause

GL_ERROR_8009_DMP

Called on a command list for which a memory region has not been allocated (nngxRunCmdlist).

GL_ERROR_80B1_DMP

Called on a command list for which a memory region has not been allocated (nngxRuncmdlistByID).

4.1.4.1. Getting the State of an Executing Command List

Determine whether a command list is running using the nngxGetIsRunning() function.

Code 4-5. Function for Determining Command List Execution State
GLboolean nngxGetIsRunning(void); 

This function returns GL_TRUE if a command list is currently running, regardless of whether the command list is currently bound.

Similarly, you can determine whether a command list is currently running by passing the NN_GX_CMDLIST_IS_RUNNING value for pname for the nngxGetCmdlistParameteri() function. This method can be used to determine whether the currently bound command list is running. For information about the nngxGetCmdlistParameteri() function, see 4.1.10. Getting Command List Parameters.

4.1.5. Destroying Command List Objects

You can call the nngxDeleteCmdlists() function to destroy command list objects that are no longer necessary.

Code 4-6. Function for Destroying Command Lists
void nngxDeleteCmdlists(GLsizei n, const GLuint* cmdlists); 

This destroys the command list objects specified by the n object names in cmdlists. A GL_ERROR_8003_DMP error occurs if any of the specified command list objects are currently being executed, but all other specified command list objects are still destroyed.

Table 4-5. Errors Generated by the nngxDeleteCmdlist Function

Error Code

Cause

GL_ERROR_8002_DMP

A negative value was specified for n.

GL_ERROR_8003_DMP

A command list included in cmdlists is still being executed.

4.1.6. Stopping a Command List

Call either of the following functions to stop a command list that is being executed.

Code 4-7. Functions for Stopping Command Lists
void nngxStopCmdlist(void);
void nngxReserveStopCmdlist(GLint id); 

When the nngxStopCmdlist() function is called, it waits for any executing command request to complete and then stops the command list. You cannot stop a command request after it has started executing (or is waiting to commence execution).

The nngxReserveStopCmdlist() function stops the command list immediately after the idth accumulated command request finishes executing.

Call the nngxRunCmdlist() function to resume a stopped command list. Note, however, that this function will be ignored if it is called after the instruction to stop the command list but before the executing command requests finish executing.

Table 4-6. Errors Generated by the nngxReserveStopCmdlist Function

Error Code

Cause

GL_ERROR_800A_DMP

Called on a command list that is being executed.

GL_ERROR_800B_DMP

Zero, a negative number, or a value that exceeds the maximum number of command requests was specified for id.

4.1.7. Splitting the 3D Command Buffer

Use the nngxSplitDrawCmdlist() function to add a buffer loading complete command to the 3D command buffer and start queuing render command requests. If a command list accumulates 3D commands while it is executing, its 3D commands are run up to the point at which they are split by this function.

Code 4-8. Function for Splitting the 3D Command Buffer
void nngxSplitDrawCmdlist(void); 

Render command requests are not queued until the buffer loading complete command is added. In addition to this function, other functions also queue render command requests. Because functions such as glClear and glTexImage2D must stop 3D command execution, they each add a buffer loading complete command, and then queue the render command requests.

Table 4-7. Common Errors Generated by the nngxSplitDrawCmdlist Function

Error Code

Cause

GL_ERROR_800C_DMP

Called when the bound command list’s object name is 0.

GL_ERROR_800D_DMP

The command request has already reached the maximum number of accumulated command requests allowable.

GL_ERROR_800E_DMP

The 3D command buffer is full because of commands added by this function.

These errors may also be generated by other functions that call this one internally.

4.1.7.1. Flushing the Accumulated 3D Command Buffer

When the nngxSplitDrawCmdlist() function is called, the buffer loading complete command is added and queuing of render command requests is carried out, even if there are no 3D commands accumulated in the 3D command buffer. In other words, this function can potentially add unneeded commands. We recommend calling the nngxFlush3DCommand() function (which adds a command to split the 3D command buffer) and the nngxFlush3DCommandNoCacheFlush() function only when 3D commands have accumulated. If a cache flush occurs multiple times, call a later function that reflects the cache content all at the same time by using the nngxUpdateBufferLight() function. This can reduce CPU overhead.

Code 4-9. 3D Function for Flushing the Command Buffer
void nngxFlush3DCommand(void);
void nngxFlush3DCommandNoCacheFlush(void); 

These functions will not add a buffer loading complete command and render command request if there are no 3D commands accumulated in the 3D command buffer of the bound command list after the buffer has been split for the last time. This function only adds a buffer loading complete command and render command request when 3D commands have been accumulated in the buffer. If 3D commands are accumulating as they are executing, the 3D commands execute up to where the buffer was split by this function.

Table 4-8. Errors Generated by the nngxFlush3DCommand and nngxFlush3DCommandNoCacheFlush Functions

Error Code

Cause

GL_ERROR_8084_DMP
GL_ERROR_80AE_DMP

Called when the bound command list’s object name is 0.

GL_ERROR_8085_DMP
GL_ERROR_80AF_DMP

The command request has already reached the maximum number of accumulated command requests allowable.

GL_ERROR_8086_DMP
GL_ERROR_80B0_DMP

The 3D command buffer is full because of commands added by this function.

4.1.7.2. Partially Flushing the Accumulated 3D Command Buffer

The nngxFlush3DCommandPartially() function is provided for executing a 3D command of a specified size. This function is an extension of the features provided with the nngxFlush3DCommand() function and can be called to correctly execute 3D commands, including the command buffer execution register kick command added to functions such as nngxAdd3DCommand. For more information, see 3DS Programming Manual: Advanced Graphics, in 8.8.23. Command Buffer Execution Registers (0x0238 – 0x023D).

Code 4-10. Function for Partially Flushing the Command Buffer
void nngxFlush3DCommandPartially(GLsizei buffersize); 

Specify the size, in bytes, of the command buffer to be executed in buffersize. The number must be a multiple of 16.

The size must be correctly specified in buffersize from the address following the previous command flush to the first kick command (and including the kick command). If the wrong value is specified, the command is executed in an unintended order and the operation may not be able to complete properly.

Using the application, accurately perform cache flush on 3D commands accumulated from the previous command flush up to the point this function is called. Overall flushing of the cache must be performed after calling the function because commands that generate an interrupt are generated within this function. In addition, as a means of avoiding cache execution before the cache is flushed, this function cannot be called for command lists whose execution is in progress. Functions such as glClear, nngxTransferRenderImage, and glCopyTexImage2D execute flushes according to the same method as the nngxFlush3DCommand() function. Always perform a flush using this function before calling those functions.

When a kick command is added using the nngxAddJumpCommand or nngxAddSubroutineCommand() function, the driver adjusts the size so that the command kicks the execution size up to the first kick command. It is unnecessary to call the nngxFlush3DCommandPartially() function when using those functions.

Note that when a partial flush is performed for a command buffer with a kick command added with the nngxAddSubroutineCommand() function, the execution size used is specified in buffersize instead of an execution size calculated by the driver.

Table 4-9. Errors Generated by the nngxFlush3DCommandPartially Function

Error Code

Cause

GL_ERROR_80A9_DMP

Called when the bound command list’s object name is 0.

GL_ERROR_80AA_DMP

The command request has already reached the maximum number of accumulated command requests allowable.

GL_ERROR_80AB_DMP

The 3D command buffer is full because of commands added by this function.

GL_ERROR_80AC_DMP

A value less than 0 or that is not a multiple of 16 was specified in buffersize.

GL_ERROR_80AD_DMP

Called on a command list that is being executed.

4.1.8. Clearing the Command List

The following function clears the command list and sets both the 3D command buffer and command request queue to the unused state (the state immediately after their memory regions are allocated).

Code 4-11. Function for Clearing the Command List
void nngxClearCmdlist(void); 
Table 4-10. Errors Generated by the nngxClearCmdlist Function

Error Code

Cause

GL_ERROR_800F_DMP

Called on a command list that is being executed.

4.1.8.1. Clearing the Command List and Filling Its 3D Command Buffer

The following function clears the command list and initializes the 3D command buffer with the specified data. Both the 3D command buffer and command request queue enter the unused state.

Code 4-12. Function for Clearing the Command List and Filling Its 3D Command Buffer
void nngxClearFillCmdlist(GLuint data); 
Table 4-11 Errors Generated by the nngxClearFillCmdlist Function

Error Code

Cause

GL_ERROR_8065_DMP

Called on a command list that is being executed.

4.1.9. Setting Command List Parameters

You can call the nngxSetCmdlistParameteri() function to set command list parameters

Code 4-13. Function for Setting Command List Parameters
void nngxSetCmdlistParameteri(GLenum pname, GLint param); 
Table 4-12. Command List Parameters That Can Be Set

pname

Setting

NN_GX_CMDLIST_GAS_UPDATE

This parameter is set for individual command list objects and can have one of the following values.

  • GL_TRUE: Update the additive blend results for rendering gas density information.
  • GL_FALSE: Normal behavior (default)

If this parameter is GL_TRUE, when the nngxSplitDrawCmdlist() or nngxFlush3DCommand() function is called, additive blend results are updated for rendered gas density information after the accumulated render command requests have finished running.

If this parameter is GL_FALSE, normal behavior is restored; that is, commands that update gas density information are only accumulated when necessary.

This setting takes effect depending on whether it is GL_TRUE when nngxSplitDrawCmdlist or nngxFlush3DCommand is called. A setting of GL_TRUE does not have any effect when render command requests are executed. This setting also does not affect render command requests accumulated by any function call other than nngxSplitDrawCmdlist or nngxFlush3DCommand.

For more information about how additive blend results are updated for rendered gas density information, see the Gas Control Setting Registers section in the 3DS Programming Manual: Advanced Graphics.

Table 4-13. Errors Generated by the nngxSetCmdlistParameteri Function

Error Code

Cause

GL_ERROR_8015_DMP

Called on a command list that is being executed.

GL_ERROR_8016_DMP

An invalid value was specified for pname or param.

4.1.10. Getting Command List Parameters

You can call the nngxGetCmdlistParameteri() function to get command list parameters.

Code 4-14. Function for Getting Command List Parameters
void nngxGetCmdlistParameteri(GLenum pname, GLint* param); 
Table 4-14. Getting Command List Parameters

pname

Parameter Obtained

NN_GX_CMDLIST_IS_RUNNING

The command list execution state.

GL_TRUE: The command list is currently being executed.
GL_FALSE: The command list is not currently being executed.

NN_GX_CMDLIST_USED_BUFSIZE

The number of bytes accumulated in the 3D command buffer.

NN_GX_CMDLIST_USED_REQCOUNT

The number of accumulated command requests.

NN_GX_CMDLIST_MAX_BUFSIZE

The maximum 3D command buffer size.

This value is specified as bufsize to the nngxCmdlistStorage() function.

NN_GX_CMDLIST_MAX_REQCOUNT

The maximum number of command requests.

This value was specified in the requestcount parameter to the nngxCmdlistStorage() function.

NN_GX_CMDLIST_TOP_BUFADDR

The starting address of the 3D command buffer.

NN_GX_CMDLIST_BINDING

The object name of the command list that is currently bound.

NN_GX_CMDLIST_RUN_BUFSIZE

The number of executed bytes in the 3D command buffer.

NN_GX_CMDLIST_RUN_REQCOUNT

The number of executed command requests.

NN_GX_CMDLIST_TOP_REQADDR

The starting address of the data region used for the command request queue.

NN_GX_CMDLIST_NEXT_REQTYPE

The type of command request that is currently executing or will be executed next.

The value returned in param depends on the state of the command list that is currently bound. If the command list is currently stopped, the return value is the type of command request that will be executed next. If the command list is executing, the return value is the type of command request that is currently executing. NULL is returned when all command requests have finished executing.

Command request types are defined by the following macros.

NN_GX_CMDLIST_REQTYPE_DMA: DMA transfer command request.
NN_GX_CMDLIST_REQTYPE_RUN3D: Render command request.
NN_GX_CMDLIST_REQTYPE_FILLMEM: Memory fill command request.
NN_GX_CMDLIST_REQTYPE_POSTTRANS: Post transfer command request.
NN_GX_CMDLIST_REQTYPE_COPYTEX: Copy texture command request.

NN_GX_CMDLIST_NEXT_REQINFO

The command buffer's address and byte size.

The command buffer's address is stored in the first element of param and its size (in bytes) is stored in the second element. You must pass param a pointer to an array of at least two GLint values.

If the bound command list is currently stopped, parameter information is returned for the command request that will be executed next. If the bound command list is executing, parameter information is returned for the command request that is currently executing. Nothing is returned if all command requests have finished executing.

The function only returns information when a render command request is the command request that is currently executing or that will be executed next. Nothing is returned for any other type of command.

NN_GX_CMDLIST_HW_STATE

32 bits of data indicating the hardware state.

Each of the following bits is set to 1 to indicate that the command list is in the specified state.

bit 20: A post transfer command request is executing.
bit 19: A memory fill is in progress.
bit 18: An underrun error has occurred in the FIFO for the lower screen.
bit 17: An underrun error has occurred in the FIFO for the upper screen.
bit 16: The post vertex cache is busy.
bit 15: Bits [1:0] of register 0x0252 are 1.
bit 14: Vertex processor 3 is busy.
bit 13: Vertex processor 2 is busy.
bit 12: Vertex processor 1 is busy.
bit 11: Vertex processor 0 (which can also be used as the geometry shader processor) is busy.
bit 10: Bits [1:0] of register 0x0229 are nonzero.
bit 9: Input to the module that loads command buffers and vertex arrays is busy.
bit 8: Output from the module that loads command buffers and vertex arrays is busy.
bit 7: The early depth test module is busy.
bit 6: The per-fragment operation module is busy processing data from the module in the previous stage.
bit 5: The per-fragment operation module is busy accessing the framebuffer.
bit 4: The texture combiners are busy.
bit 3: Fragment lighting is busy.
bit 2: The texture units are busy.
bit 1: The rasterization module is busy.
bit 0: Triangle setup is busy.

NN_GX_CMDLIST_CURRENT_BUFADDR

Buffer address of the next 3D command stored in the currently bound command list.

Table 4-15. Common Errors When Setting Command List Parameters

Error Code

Cause

GL_ERROR_8017_DMP

An invalid value was specified for pname or param.

GL_ERROR_8018_DMP

NN_GX_CMDLIST_BINDING is not specified for pname when the bound command list’s object name is 0.

4.1.11. Command Completion Interrupts

You can cause interrupts to occur and call interrupt handlers, when the command requests in a command list finish. You can register an interrupt handler with the nngxSetCmdlistCallback() function.

Code 4-15. Function for Registering an Interrupt Handler
void nngxSetCmdlistCallback(void (*func)(GLint)); 

An interrupt handler is valid only for the bound command list. If this function is called with func set to 0 (NULL), the handler is unregistered.

The interrupt handler is called from a different thread than the main thread, so mutual exclusion is needed when referencing any data shared with the main thread. However, mutual exclusion is not needed for data shared with any callback functions for the same graphics processing registered using the nngxSetVSyncCallback() function.

Table 4-16. Common Errors When Setting Command List Callbacks

Error Code

Cause

GL_ERROR_8010_DMP

Called on a command list that is being executed.

Use the nngxEnableCmdlistCallback() function to specify a command request that normally triggers an interrupt when it ends. The nngxDisableCmdlistCallback() function can disable interrupts.

Code 4-16. Interrupt Control Functions
void nngxEnableCmdlistCallback(GLint id);
void nngxDisableCmdlistCallback(GLint id); 

An interrupt occurs upon completion of the idth accumulated command request. You can call this function on a single command list several times with separate id values to cause multiple interrupts to occur. Note that id indicates a command request in the order that it was accumulated, not in the order that it was executed. You can call nngxGetCmdlistParameteri() with pname set to NN_GX_CMDLIST_USED_REQCOUNT to get a value to specify for id. If id is -1, an interrupt occurs when all command requests accumulated in the command list have finished.

The command list is still executing when an interrupt handler is called. This occurs for every interrupt except for the last to the command request accumulated in the command list. Consequently, the interrupt handler cannot, itself, call any functions that cannot be called while a command list is executing.

Even without registering an interrupt handler, you can determine when a command request has finished executing by calling nngxGetCmdlistParameteri() passing pname as NN_GX_CMDLIST_IS_RUNNING, and then waiting until you get a value of GL_FALSE.

Table 4-17. Common Errors When Enabling and Disabling Command List Callbacks

Error Code

Cause

GL_ERROR_8012_DMP
GL_ERROR_8014_DMP

Zero, a negative number other than -1, or a value equal to the maximum number of command requests was specified for id.

4.1.12. Waiting for Command Execution to Complete

You can call nngxWaitCmdlistDone to wait for all of the command requests accumulated in the command list to complete.

Code 4-17. Function That Waits for Commands to Complete
void nngxWaitCmdlistDone(void); 

Render command requests are executed until the point at which they are split. To execute all of the accumulated render command requests, call nngxSplitDrawCmdlist before this function.

This function does not return until command execution is complete. However, you can use the nngxSetTimeout() function to set a timeout period.

Code 4-18. Function for Setting a Timeout When Waiting for Command Execution to Complete
void nngxSetTimeout(GLint64EXT time, void (*callback)(void)); 

Set time to the number of ticks to wait before the nngxWaitCmdlistDone() function times out. Timeouts do not occur when a value of 0 is specified.

Set callback to the callback function to invoke when a timeout occurs. If this is NULL, a callback function is not invoked when the timeout occurs.

No timeouts occur by default because the initial values for time and callback are 0 and NULL respectively.

4.1.13. Adding a DMA Transfer Command Request

When the nngxAddVramDmaCommand or nngxAddVramDmaCommandNoCacheFlush() function is called, a command request that runs a DMA transfer to VRAM is accumulated in the command list. The former flushes the source cache, but the latter does not. This function can only use DMA transfers from main memory to VRAM.

Code 4-19. Function for Adding a DMA Transfer Command Request
void nngxAddVramDmaCommand(
                        const GLvoid* srcaddr, GLvoid* dstaddr, GLsizei size);
void nngxAddVramDmaCommandNoCacheFlush(
                        const GLvoid* srcaddr, GLvoid* dstaddr, GLsizei size); 

An amount of data specified by size is transferred from the address specified by srcaddr to the address specified by dstaddr.

When calling the nngxAddVramDmaCommand() function, a GL_ERROR_8062_DMP error indicates that this function was called when no valid command list was bound, while a GL_ERROR_8064_DMP error indicates that size is negative.

When calling the nngxAddVramDmaCommandNoCacheFlush() function, a GL_ERROR_8090_DMP error indicates that this function was called when no valid command list was bound, and a GL_ERROR_8091_DMP error indicates that size is negative.

4.1.14. Adding an Anti-Aliasing Filter Transfer Command Request

When the nngxFilterBlockImage() function is called, a command request that transfers an image with an anti-aliasing filter applied is accumulated in the command list. (This is one kind of post-filter command request.) The image is transferred in block format, unconverted. The only supported antialiasing specification is 2×2.

Code 4-20. Function for Adding an Anti-Aliasing Filter Transfer Command Request
void nngxFilterBlockImage(const GLvoid* srcaddr, GLvoid* dstaddr, 
                          GLsizei width, GLsizei height, GLenum format); 

An image with a width, height, and format specified by width, height, and format respectively is transferred from the address specified by srcaddr to the address specified by dstaddr.

The width and height arguments are restricted as follows by the value specified for format.

Table 4-18. Format Restrictions on the Width and Height of Images to Be Transferred

format

width

height

GL_RGBA8_OES
GL_RGB8_OES

A multiple of 64, greater than or equal to 64.

A multiple of 16, greater than or equal to 64.

GL_RGBA4
GL_RGB5_A1
GL_RGB565

A multiple of 128, greater than or equal to 128.

A multiple of 16, greater than or equal to 128.

If the transfer source and destination memory regions overlap, the function works properly when the scraddr and dstaddr values are the same, or when the scraddr value is bigger than the dstaddr value. The transfer results could be corrupted if the scraddr value is smaller than the dstaddr value.

When the value for srcaddr specifies an address in device memory, the transfer results could be incorrect if the destination memory cache has not been flushed.

Table 4-19. Errors Generated by the nngxFilterBlockImage Function

Error Code

Cause

GL_ERROR_8068_DMP

Called when a command list with an object name of 0 is bound or when there is no space in the command request queue.

GL_ERROR_8069_DMP

The address specified for srcaddr or dstaddr is not 8-byte aligned.

GL_ERROR_806A_DMP

A width or height value is specified that violates the restrictions.

GL_ERROR_806B_DMP

A format value is specified that is not listed in the restrictions.

4.1.15. Adding an Image Transfer Command Request

When the nngxTransferLinearImage() function is called, a command request that transfers an image to a render buffer or texture is accumulated in the command list. (This is one kind of copy-texture command request.) If the current 3D command buffer has accumulated unsplit commands, a split command is added, and then the transfer command request is added.

Although images are converted from linear format to block format while they are transferred, this conversion only affects addressing. If this function is called on a render buffer, the block mode setting automatically determines whether a conversion to 8 block addressing or 32 block addressing is applied during the transfer. If this function is called on a texture, a conversion to 8 block addressing is applied. In either case, you must flip an image in the V direction and convert its byte order before you transfer it.

Note:

For information about block mode, see Block Mode Settings in the 3DS Programming Manual: Advanced Graphics.

Code 4-21. Function for Adding an Image Transfer Command Request
void nngxTransferLinearImage(const GLvoid* srcaddr, GLuint dstid, 
                             GLenum target); 

For srcaddr, specify the starting address of the image to transfer. The image must have the same format, width, and height as the render buffer or texture to which it is transferred. However, the source pixel format must be 32-bit when the target pixel format is 24-bit because the hardware does not support transfers between 24-bit pixel formats. In this case, for each 4 bytes that are transferred, the first byte (the internal format's alpha component) is truncated.

The image is transferred to the render buffer or texture that has the object ID specified by dstid and the object type specified by target.

Table 4-20. Values to Specify for target and dstid

When target is:

Set dstid to:

GL_RENDERBUFFER

The object ID of a render buffer.

If a value of 0 is specified, data is transferred to the color buffer that is attached to the current framebuffer.

GL_TEXTURE_2D

The object ID of a 2D texture.

GL_TEXTURE_CUBE_MAP_POSITIVE_X{,Y,Z}
GL_TEXTURE_CUBE_MAP_NEGATIVE_X{,Y,Z}

The object ID of a cube map texture.

The width and height of the target render buffer must be multiples of 8, in block 8 mode or multiples of 32, in block 32 mode. Both the width and height must be at least 128.

Table 4-21. Errors Generated by the nngxTransferRenderImage Function

Error Code

Cause

GL_ERROR_805B_DMP

Called when the bound command list’s object name is 0.

GL_ERROR_805C_DMP

The maximum number of command requests has already accumulated.

GL_ERROR_805D_DMP

The 3D command buffer is full because of commands added by this function.

GL_ERROR_805E_DMP

The render buffer or texture specified for dstid does not exist, or it does not have an allocated memory region.

GL_ERROR_805F_DMP

There is a violation of the width and height restrictions for the target render buffer.

GL_ERROR_8060_DMP

An invalid value was specified for target.

GL_ERROR_8067_DMP

The target render buffer or texture does not use 32-bit, 24-bit, or 16-bit pixel sizes.

4.1.16. Adding a Block-to-Linear Image Conversion and Transfer Command Request

A command request for converting a block image to a linear image and transferring the result can be added to the command list by calling the nngxAddB2LTransferCommand() function. (This is one kind of post-filter command request.) Although the nngxTransferRenderImage() function provides the same functionality, the nngxAddB2LTransferCommand() function is more versatile. They also differ in that the latter function adds only a transfer request command and does not add a split command.

Code 4-22. Function for Adding a Command Request for Converting From a Block Image to a Linear Image and Transferring
void nngxAddB2LTransferCommand(
    const GLvoid* srcaddr, GLsizei srcwidth, GLsizei srcheight, GLenum srcformat,
    GLvoid* dstaddr, GLsizei dstwidth, GLsizei dstheight, GLenum dstformat,
    GLenum aamode, GLboolean yflip, GLsizei blocksize); 

The srcaddr parameter specifies the transfer source (block image) address. The dstaddr parameter specifies the transfer destination (linear image) address. Both srcaddr and dstaddr must be 16-byte aligned.

The srcwidth, srcheight, dstwidth, and dstheight parameters specify the transfer source image width and height and transfer destination width and height, in pixels. The height and width of the source image and destination image must be a multiple of the block size (8 or 32). Finally, if the pixel size of the destination image is 24 bits and the block size is 8, the width of the source image and width of the destination image must be a multiple of 16. If 0 is specified for srcwidth, srcheight, dstwidth, or dstheight, the command is not issued. The height and width of the destination image in pixels must be equal to, or less than, that of the source image.

The height and width of the source and destination images, as measured in pixels, must be at least as big as the minimum allowed. The minimum height and width for source images is 128. The minimum height and width for destination images depends on the anti-alias setting. If anti-aliasing is disabled, the minimum for both height and width is 128. If 2x1 anti-aliasing is enabled, the height minimum is 128 and the width minimum is 64. If 2x2 anti-aliasing is enabled, the minimum for both height and width is 64.

The srcformat and dstformat parameters specify the pixel format of the source and destination image. The five types of pixel formats that can be specified are listed in the following table.

Table 4-22. Pixel Format Specifications

Definition

Bits

Description of Format

GL_RGBA4

16

The R, G, B, and alpha components are 4 bits each.

GL_RGB5_A1

16

The R, G, and B components are 5 bits each, and the alpha component is 1 bit.

GL_RGB565

16

5-bit RB components and 6-bit G component. No alpha component.

GL_RGB8_OES

24

8-bit RGB components. No alpha component.

GL_RGBA8_OES

32

The R, G, B, and alpha components are 8 bits each.

Conversion to a pixel format with a higher pixel depth is not supported. For example, you cannot convert from a 24-bit format to a 32-bit format, or from a 16-bit format to the 24-bit or 32-bit format.

aamode specifies the anti-alias filter mode. The three modes that can be specified are listed in the following table. The widths and heights indicate the minimum dimensions of the source image relative to the destination image.

Table 4-23. Anti-Aliasing Specifications

Definition

Anti-Aliasing

Width

Height

NN_GX_ANTIALIASE_NOT_USED

No anti-aliasing.

Equal

Equal

NN_GX_ANTIALIASE_2x1

Transferred using 2x1 anti-aliasing.

2 times

Equal

NN_GX_ANTIALIASE_2x2

Transferred using 2x2 anti-aliasing.

2 times

2 times

yflip specifies whether vertical flipping is enabled during image transfer. Flipping is performed if GL_TRUE (or a value other than 0) is specified. Flipping is not performed if GL_FALSE (or 0) is specified.

For blocksize, specify the block size used for the transfer source image (8 or 32).

Table 4-24. Errors Generated by the nngxTransferRenderImage Function

Error Code

Cause

GL_ERROR_807C_DMP

A command list with object name 0 was bound, or there is no space in the command request queue.

GL_ERROR_807D_DMP

Either srcaddr or dstaddr is not 16-byte aligned.

GL_ERROR_807E_DMP

A value other than 8 or 32 is specified in blocksize.

GL_ERROR_807F_DMP

An invalid value is specified in aamode.

GL_ERROR_8080_DMP

An invalid value is specified in either srcformat or dstformat.

GL_ERROR_8081_DMP

The pixel size of srcformat is greater than that of dstformat.

GL_ERROR_8082_DMP

An invalid value is specified for srcwidth, srcheight, dstwidth, or dstheight.

GL_ERROR_8083_DMP

The specified width or height of the destination image is greater than the width or height in pixels of the source image.

GL_ERROR_80B7_DMP

The specified height or width of the source image was smaller than the minimum.

GL_ERROR_80B8_DMP

The specified height or width of the destination image was smaller than the minimum.

4.1.17. Adding a Linear-to-Block Image Conversion and Transfer Command Request

A command for converting from a linear image to a block image and then transferring the result can be added to the command list by calling the nngxAddL2BTransferCommand() function. (This is one kind of post-filter command request.) The nngxTransferLinearImage() function also provides the same functionality, but the nngxAddL2BTransferCommand() function is more versatile. They also differ in that the latter function adds only a transfer request command and does not add a split command.

Code 4-23. Function for Adding a Command Request for Converting From a Linear Image to a Block Image and Transferring
void nngxAddL2BTransferCommand(
        const GLvoid* srcaddr, GLvoid* dstaddr,
        GLsizei width, GLsizei height, GLenum format, GLsizei blocksize); 

srcaddr specifies the transfer source (linear image) address. dstaddr specifies the transfer destination (block image) address. Both srcaddr and dstaddr must be 16-byte aligned.

width and height specify the height and width, in pixels, of the transfer source and transfer destination images. The transfer source and transfer destination images must have the same width and height, and each dimension must be 128 or greater and a multiple of the block size (8 or 32). Finally, if the bit depth of the source image is 24 bits, the image width must be a multiple of 32, even if the block size is 8. The command is not added if 0 is specified for either width or height.

format specifies the pixel format of the image being transferred. The specifiable pixel format is the same as that for the nngxAddB2LTransferCommand() function (Table 4-22). The source and destination images must have the same pixel format. Note, however, that if the format is 24-bit, the source image must be in 32-bit format because hardware does not support 24-bit to 24-bit transfer. In this case, the last byte of every 4 bytes of source data is thrown away.

The blocksize parameter specifies the block size of the source image as either 8 or 32.

Table 4-25. Errors Generated by the nngxTransferRenderImage Function

Error Code

Cause

GL_ERROR_806F_DMP

A command list with object name 0 was bound or there is no space in the command request queue.

GL_ERROR_8070_DMP

Either srcaddr or dstaddr is not 16-byte aligned.

GL_ERROR_8071_DMP

A value other than 8 or 32 is specified in blocksize.

GL_ERROR_8072_DMP

An invalid value is specified in either width or height.

GL_ERROR_8073_DMP

An invalid value is specified in format.

4.1.18. Adding a Block Image Transfer Command Request

A command request for transferring a block image is added to the command list by calling the nngxAddBlockImageCopyCommand() function. The added command request allows you to copy graphics between textures and render buffers that contain rendered images. Because transfer is performed by specifying a combination of transfer size and skip size, you can clip part of the source image region or paste to part of the destination image region. The main purpose of this function is to transfer block format images. It can be used for transfer of various types of data because it does not perform format conversion.

Code 4-24. Function for Adding a Block Image Transfer Command Request
void nngxAddBlockImageCopyCommand(
        const GLvoid* srcaddr, GLsizei srcunit, GLsizei srcinterval,
        GLvoid* dstaddr, GLsizei dstunit, GLsizei dstinterval,
        GLsizei totalsize); 

Use the srcaddr parameter to specify the transfer source start address. dstaddr specifies the transfer destination start address. Both srcaddr and dstaddr must be 16-byte aligned.

totalsize specifies the total amount of data to be transferred, in bytes. totalsize must be 16-byte aligned.

srcunit and srcinterval specify the unit size used for reading each transfer and the skip size, respectively. srcunit bytes of data are transferred, and then srcinterval bytes in the address being read are skipped, repeating alternately. Transfer ends when the amount of data transferred reaches totalsize. If srcinterval is 0, memory is read continuously from the start address until totalsize is reached. If srcinterval is any value other than 0, srcunit bytes of data are read and then srcinterval bytes are skipped, repeatedly. This operation allows part of the source image to be clipped.

dstunit specifies the write unit size of the transfer destination, and dstinterval specifies the skip size, in bytes. dstunit bytes of data are written and dstinterval bytes in the address being written are skipped, repeating alternately. Transfer ends when the amount of data transferred reaches totalsize. If dstinterval is 0, memory is written continuously from the start address until totalsize is reached. If dstinterval is any value other than 0, writing and skipping are repeated, allowing the image to be inserted into a portion of the memory region for the transfer destination image.

Figure 4-3. Sample Block Image Transfer

srcaddr dstaddr dstunit dstinterval srcunit srcinterval

The srcunit, srcinterval, dstunit, and dstinterval parameters must be multiples of 16. Negative values and values greater than or equal to 0x100000 cannot be specified.

When transferring rendering results, such as block images, note that the start address of the transfer image (at both the source and destination) is normally the upper-left corner of the image (or the lower-left corner in OpenGL ES), and that data is arranged in block units of 8×8 pixels when using a format with a block size of 8. For more information about the block format, see 7.10. Native PICA Format.

Table 4-26. Errors Generated by the nngxAddBlockImageCopyCommand Function

Error Code

Cause

GL_ERROR_8074_DMP

A command list with object name 0 was bound or there is no space in the command request queue.

GL_ERROR_8075_DMP

Either srcaddr or dstaddr is not 16-byte aligned.

GL_ERROR_8076_DMP

totalsize is not a multiple of 16.

GL_ERROR_8077_DMP

An invalid value was specified in srcunit, srcinterval, dstunit, or dstinterval.

4.1.19. Adding a Memory Fill Command Request

A command request for filling the specified region of memory with the specified data can be added to the command list by calling the nngxAddMemoryFillCommand() function. The command request added by this function can be used for purposes such as clearing the color buffer or depth buffer (stencil buffer). The glClear() function provides the same functionality, but this function is more versatile. Two memory regions of different sizes can be cleared simultaneously by making settings for two channels with independently specifiable parameters.

Code 4-25. Function for Adding a Memory Fill Command Request
void nngxAddMemoryFillCommand(
        GLvoid* startaddr0, GLsizei size0, GLuint data0, GLsizei width0,
        GLvoid* startaddr1, GLsizei size1, GLuint data1, GLsizei width1); 

startaddr0, size0, data0, and width0 represent settings for Channel 0. startaddr1, size1, data1, and width1 represent settings for Channel 1. Memory is filled simultaneously for both channel 0 and channel 1. If the memory regions specified for Channel 0 and Channel 1 overlap, the fill data that is ultimately applied to the overlapping part is undefined.

startaddr0 and startaddr1 specify the start addresses of the memory regions. Addresses must be 16-byte aligned. If 0 is specified for an address, that channel is not used. If 0 is specified for startaddr0, no error checking is performed for size0, data0, or width0. If 0 is specified for startaddr1, no error checking is performed for size1, data1, or width1.

size0 and size1 specify the sizes of the memory regions, in bytes. Sizes must be multiples of 16.

data0 and data1 specify the fill pattern data. The specified fill pattern is repeatedly inserted into the memory region until it is full.

width0 and width1 specify the bit width of the fill pattern. The values 16, 24, or 32 can be specified for the bit width. If 16 is specified, the memory region is filled in 16-bit units using bits [15:0] of the data. If 24 is specified, the memory region is filled in 24-bit units using bits [23:0] of the data. If 32 is specified, the memory region is filled in 32-bit units using bits [31:0] of the data.

The following table provides fill pattern specifications (bit width and various parameter values) according to the render buffer format being used.

Table 4-27. Fill Pattern by Render Buffer Format

Render Buffer Format

Bit Width

R / D

G / S

B

A

GL_RGBA8_OES

32

[31:24]
0 through 255

[23:16]
0 through 255

[15:8]
0 through 255

[7:0]
0 through 255

GL_RGB8_OES

24

[23:16]
0 through 255

[15:8]
0 through 255

[7:0]
0 through 255

-

GL_RGBA4

16

[15:12]
0 through 15

[11:8]
0 through 15

[7:4]
0 through 15

[3:0]
0 through 15

GL_RGB5_A1

16

[15:11]
0 through 31

[10:6]
0 through 31

[5:1]
0 through 31

[0:0]
0 through 1

GL_RGB565

16

[15:11]
0 through 31

[10:5]
0 through 63

[4:0]
0 through 31

-

GL_DEPTH24_STENCIL8_EXT

32

[23:0]

[31:24]

-

-

GL_DEPTH_COMPONENT24_OES

24

[23:0]

-

-

-

GL_DEPTH_COMPONENT16

16

[15:0]

-

-

-

Table 4-28. Errors Generated by the nngxAddBlockImageCopyCommand Function

Error Code

Cause

GL_ERROR_8078_DMP

A command list with object name 0 was bound, or there is no space in the command request queue.

GL_ERROR_8079_DMP

startaddr0 or startaddr1 is not 16-byte aligned.

GL_ERROR_807A_DMP

size0 or size1 is not a multiple of 16.

GL_ERROR_807B_DMP

An invalid value is specified in width0 or width1.

4.1.20. Moving the 3D Command Buffer Pointer

Call the nngxMoveCommandbufferPointer() function to move the pointer in the 3D command buffer of the currently bound command list. (This 3D command buffer pointer is the position in the 3D commands from which to start running the 3D commands.)

Code 4-26. Function for Moving the 3D Command Buffer Pointer
void nngxMoveCommandbufferPointer(GLint offset); 

Specify the amount by which to move the pointer (in bytes) as the offset parameter.

A GL_ERROR_8061_DMP error occurs when no command list is bound, or this operation would move the pointer outside of the 3D command buffer region.

4.1.21. Adding Jump Commands

Call the nngxAddJumpCommand() function to add to the currently bound command list a jump command that executes a 3D command in the specified memory region. Use a jump command to move execution to a different command list without causing any interrupts.

This function uses the command buffer execution PICA register. This only uses channel 0, so the content of two registers (0x0238 and 0x023A) are both written when this function is run. For more information, see 8.8.23. Command Buffer Execution Registers (0x0238 – 0x023D) and 8.8.23.1. Consecutive Execution of Command Buffers in 3DS Programming Manual: Advanced Graphics.

Code 4-27. Function for Adding Jump Commands
void nngxAddJumpCommand(const GLvoid* bufferaddr, GLsizei buffersize); 

In bufferaddr and buffersize, specify the address and size of the command buffer to move execution to. Both bufferaddr and buffersize must be multiples of 16.

The content of the destination command buffer (the command list specified by bufferaddr and buffersize) is not copied to the command buffer of the currently bound command list. A jump command changes the execution address of a command buffer and directly executes the destination command buffer. Consequently, the application must ensure that the jump destination memory cache has been flushed.

The last command executed at the jump destination must be a split command (a command to write to the split command setting register, added by the nngxSplitDrawCmdlist() function). Alternatively, this command could be another jump command. When using multiple jump commands, the last command in the last command buffer in the chain must be a split command.

This function adds a command request for a 3D execution command. A GL_ERROR_809A_DMP error occurs when this function is called immediately after the command buffer has been flushed (for example, by a call to the nngxFlush3DCommand() function) because doing so is meaningless. To add a 3D command to the command buffer immediately after a flush, call the nngxAdd3DCommand() function.

Table 4-29. Errors Generated by the nngxAddJumpCommand Function

Error Code

Cause

GL_ERROR_8096_DMP

The bound command list’s object name is 0.

GL_ERROR_8097_DMP

buffersize is 0 or less.

GL_ERROR_8098_DMP

buffersize is not a multiple of 16.

GL_ERROR_8099_DMP

bufferaddr is not a multiple of 16.

GL_ERROR_809A_DMP

This function was called immediately after the command buffer was flushed.

GL_ERROR_809B_DMP

The command request added by this function makes the queue overflow.

GL_ERROR_809C_DMP

The command added by this function makes the command buffer overflow.

4.1.22. Adding Subroutine Commands

Call the nngxAddSubroutineCommand() function to add both a jump command to execute a 3D command in the specified memory region and a command to set the address for returning to the command buffer jumped from, to the currently bound command list. Use a subroutine command to execute another command list without causing any interrupts, as if it were a subroutine.

This function uses the command buffer execution PICA register. This uses all channels, so the content of four registers (0x0238 through 0x023B) are written when this function is run. For more information, see 8.8.23. Command Buffer Execution Registers (0x0238 – 0x023D) and 8.8.23.1. Consecutive Execution of Command Buffers in 3DS Programming Manual: Advanced Graphics.

Code 4-28. Function for Adding Subroutine Commands
void nngxAddSubroutineCommand(const GLvoid* bufferaddr, GLsizei buffersize); 

In bufferaddr and buffersize, specify the address and size of the command buffer to move execution to. Both bufferaddr and buffersize must be multiples of 16.

The content of the destination command buffer (the command list specified by bufferaddr and buffersize) is not copied to the command buffer of the currently bound command list. A jump command changes the execution address of a command buffer and directly executes the destination command buffer. Consequently, the application must ensure that the jump destination memory cache has been flushed.

The jump command is executed on channel 0, and the command to return to the command buffer jumped from is executed on channel 1. Consequently, the last command executed at the jump destination must be a kick command for channel 1 (a command to write to the command buffer execution register 0x023D). Alternatively, this command could be a jump command to another command buffer, but the channel used by the jump must not be channel 0, and the last command in the last command buffer in the chain must be a kick command for channel 1. In addition, you must not write to the channel 1 address setting registers (0x0239 and 0x023B). This function adds a jump command (channel 0) and an address setting (channel 1). The application must place the channel 1 kick command and the jump commands within the subroutine.

This function does not add a command request for a 3D execution command. After calling this function, continue accumulating commands, and then execute them after flushing the command buffer, such as by using the nngxFlush3DCommand() function. Values written to the channel 1 size setting register (0x023B) added by this function are undefined until the command buffer is flushed. Operation is similarly undefined if you reuse the copied content of this register until the command buffer is flushed.

Table 4-30. Errors Generated by the nngxAddJumpCommand Function

Error Code

Cause

GL_ERROR_809D_DMP

The bound command list’s object name is 0.

GL_ERROR_809E_DMP

buffersize is 0 or less.

GL_ERROR_809F_DMP

buffersize is not a multiple of 16.

GL_ERROR_80A0_DMP

bufferaddr is not a multiple of 16.

GL_ERROR_80A1_DMP

The command added by this function makes the command buffer overflow.

4.2. Command Request Types

The following command requests are queued in a command list.

DMA Transfer Command Requests

These command requests use DMA transfers to send texture images and vertex buffers from main memory into VRAM.

These command requests are queued by glTexImage2D and other functions that allocate texture regions, and by glBufferData and other functions that allocate vertex buffer regions.

Render Command Requests

These command requests execute a single command set of 3D commands accumulated in the 3D command buffer.

When glClear(), glTexImage2D(), and other functions are called, they write a buffer loading complete 3D command and then queue the accumulated 3D command buffer as a single render command request.

The nngxSplitDrawCmdlist() function allows you to queue render command requests at any time.

Memory-Fill Command Requests

These command requests use the GPU memory-fill feature to clear a region allocated in VRAM using a specified data pattern.

These command requests specify a render buffer and are queued when the glClear() function is called. The glClear() function also requires a 3D command other than a memory-fill command request to be executed. In other words, when the glClear() function is called, it first writes 3D commands for the glClear() function and a buffer loading complete 3D command, and then it queues a render command request and a memory-fill command request.

Post-Transfer Command Requests

These command requests use the GPU post-filter feature to convert images rendered in PICA block format into a linear format that can be read by the LCDs.

These command requests are queued when the nngxTransferRenderImage() function is called. If the nngxSplitDrawCmdlist() function has not been called in advance to stop reading from the 3D command buffer, these command requests are queued after a buffer loading complete command is written and a render command request is queued.

Copy Texture Command Requests

These command requests copy GPU rendering results into memory as texture images.

These command requests are queued when glCopyTexImage2D or glCopyTexSubImage2D is called.

If the nngxSplitDrawCmdlist() function has not been called in advance to stop reading from the 3D command buffer, these command requests are queued after a buffer loading complete command is written and a render command request is queued.

4.3. Methods for Optimizing 3D Command Buffer Performance

Methods for optimizing performance during 3D command buffer execution are described below.

4.3.1. Changes in Load Speed due to Address and Size

The address and size of a 3D command buffer can have an effect on load speed at run time.

There are two types of command buffer execution: executing 3D execution commands queued in a command request, and executing the command buffer execution register.

When executing 3D execution commands, execution is affected by the size from the address immediately after a split command added by nngxFlush3DCommand or nngxSplitDrawCmdlist, up to the next split command added. You can get the address of 3D commands being accumulated in the 3D command buffer by calling the nngxGetCmdlistParameteri() function and passing NN_GX_CMDLIST_CURRENT_BUFADDR for pname.

When executing using the command buffer execution register, execution is affected by the address and size of the following command buffers: added by nngxAddJumpCommand(), added as subroutines by nngxAddSubroutineCommand(), or executed to return from a subroutine to the calling location.

If the 3D command buffer address is 128-byte aligned, and the size is a multiple of 256 bytes (256, 512, 768, and so on), transfer speed may be faster.

If the 3D command buffer address is not 128-byte aligned and the size starting from the previous 128-byte aligned address to the end of the 3D command buffer is a multiple of 256, speed may be increased. For example, if the 3D command buffer address and size are 0x20000010 and 0x1F0 respectively, the preceding 128-byte aligned address is only 0x10 earlier, at 0x20000000. The distance from there to the end is 0x1F0 + 0x10, which is 0x200 (and a multiple of 256).

The address and size of 3D command buffers have these characteristics due to implementation details of the GPU, but they may not have significant effect in some cases due to factors such as: where the buffer is stored, details of the 3D commands, or memory access conflicts with other modules.

4.3.2. Using Subroutine Execution

It may be possible to improve performance by using 3D command buffer subroutine execution.

4.3.2.1. Overview

3D command buffer subroutine execution uses the command buffer execution register for execution. In contrast to the ordinary method of storing 3D commands in a sequence of 3D command buffers and executing it, a command buffer stored in a different location is executed successively using a command buffer address jump feature. This method is called command buffer subroutine execution because of performing the following controls: first performing an address jump specifying the address of a 3D command buffer, executing the 3D command buffer at that location, and then returning to the calling location.

For more information about using command buffer subroutine execution, see 4.1.22. Adding Subroutine Commands and also refer to Command Buffer Execution Registers in the 3DS Programming Manual: Advanced Graphics.

4.3.2.2. Effect on Behavior

Command buffer subroutine execution has the following advantages.

  • Only a jump command to the subroutine command buffer needs to be stored, eliminating the CPU processing needed to copy the 3D commands. The technique is effective for tasks that are quite large and configured frequently, such as loading reference table data or shader programs.
  • The subroutine command buffer is not copied to the current 3D command buffer, but is referenced directly by the GPU, allowing the total size of the command buffer to be reduced.
  • If the subroutine command buffer is stored in VRAM, GPU access to the command buffer is faster than if it is in main memory (device memory). If memory access to the command buffer is a performance bottleneck, this technique could improve overall system processing speed.

On the other hand, it has the following disadvantage.

  • Switching the address due to a jump command incurs memory access overhead. If the granularity of subroutines in the implementation is small and they are called frequently, a decrease in GPU processing speed could result.

The effect of converting to subroutines on processing performance is heavily influenced by issues such as memory access conflicts, so it is strongly dependent on the actual implementation of the application.

4.3.2.3. Storage Location

Command buffer access speed is faster in VRAM than in main memory (device memory), so we recommend storing subroutine command buffers in VRAM.

There is some memory access overhead when executing a subroutine command buffer using a jump command, but if the executed command buffer is stored in VRAM, this overhead is decreased.

To store a command buffer in VRAM, it must first be generated in device memory and then transferred to VRAM by DMA using nngxAddVramDmaCommand. For information about DMA transfers to VRAM, see 4.1.13. Adding a DMA Transfer Command Request.

4.3.2.4. Balance Between Execution and Access Processes

Depending on the content of subroutine command buffers, the processing bottleneck could move between accessing and executing 3D commands.

If the 3D command is the register write command of the rasterization module or a later module (including the rasterization module), each 3D command requires 2 cycles to process, so it is relatively processor-intensive. When 3D commands are composed of burst commands, execution is even more processor-intensive relative to access processing. In this case, the bottleneck is in command execution, and the processing cost of memory access due to conversion to subroutines is hidden.

If the 3D command is the register write command of a module before the rasterization module (not including the rasterization module), each 3D command requires only one cycle to process, so processing emphasis is light relative to commands discussed in the previous paragraph. In this case, the bottleneck is more likely to be access processing, and the memory access processing cost incurred by conversion to subroutines is more likely to affect the overall performance.

For information about the relative positioning of each module, see 2.2. Rendering Pipeline.


CONFIDENTIAL