4. Compiler

This chapter provides information about compilers and the optimization of generated code.

4.1. Coding With Optimization in Mind

4.1.1. Using the __restrict Keyword

By using the __restrict keyword, you can tell the compiler that a pointer or an array of function parameters do not overlap in memory (for floating-point matrix operations, for example). This allows the compiler to perform optimizations that may not be possible when aliases are generated.

Example:

MTX44*
MTX44Multiple(MTX44* pOut, const MTX44* __restrict p1, const MTX44* __restrict p2) 
{
    //・・・
}

The __restrict keyword has no effect if it is used in a template class method, but it is effective when attached to methods that are specialized for float values. The __restrict keyword also works when it is added only to the CPP file (the definition) and not the H file (the header).

4.1.2. Using the register Modifier

In a calculation involving floating-point matrices, for example, you can make dependencies between variables explicit by storing the values to be calculated in local variables declared with the register keyword. It then becomes easier to generate code that better accounts for latency by combining independent calculations and so on.

Example:

MTX34* MTX34Multiple(MTX34*__restrict pOut, const MTX34*__restrict p1, const MTX34*__restrict p2)
{
    const f32 (*const a)[4] = p1->m;
    const f32 (*const b)[4] = p2->m;
    register f32 m[3][4];
    // Store the calculated result in register variables
    m[0][0] = a[0][0] * b[0][0] + a[0][1] * b[1][0] + a[0][2] * b[2][0];
    //...
    pOut->m[0][0] = m[0][0];
    //...
}

4.1.3. Using Inline Assembly Code

You must use embedded assembly code to specify VFP instructions directly, but functions written in embedded assembly code cannot be optimized or inlined.

If you want the compiler to inline or optimize your code, you must use inline assembly code instead.

Instructions written using inline assembly code may be expanded to several instructions. The LDM, STM, LDRD, and STRD instructions may be expanded to the LDR and STR instructions.

Note:

In ARMCC 4.1 and subsequent compiler versions, VFP instructions can be written using inline assembler code.

4.1.4. Optimization Warnings

If RVCT determines that it cannot optimize a function, it outputs information similar to the following and stops optimizing that particular function (after expanding inline statements). This causes the slowest code to be output. You can improve performance by revising your C++ code so that it does not prevent optimization when information such as the following is displayed.

#1596-D:  Could not optimize:  External function reference prevents optimization

 

Pay attention to output related to optimization that begins with “Could not optimize:” or “Might not be able to optimize:”.

Note:

The CTR-SDK build system suppresses optimization warnings. To display these warnings, you must remove the --diag_suppress=optimizations option from CCFLAGS_WARNING in $(CTRSDK_ROOT)/build/omake/commondefs.cctype.RVCT.om.

4.2. Tuning Assembly Levels

4.2.1. Referencing Disassembly Listings

You can use the following steps to disassemble and analyze the code output by the compiler.

The CTR-SDK build system outputs a file with the .dasm extension, holding the disassembly results to the same location as the executable image.

If --interleave=source_only is added to the DISAS flag in $(CTRSDK_ROOT)/build/omake/commondefs.cctype.RVCT.om, the output file will include C source code.

To disassemble a single function, specify its path to armcc.exe as follows.

armcc.exe --debug -O3 -Otime --cpu MPCore --interleave -S test.cpp

When this is compiled with the -S option, only assembly code is generated. The output file has a .s extension.

Adding the --interleave option causes the output file to contain C code and a .txt extension.

4.2.2. Analyzing CPU Stalls

You can analyze interlock delays caused by the processor pipeline, such as CPU stalls caused by VFP latency, using the relevant assembler warning messages.

Compile the source to be analyzed with the -S option specified; the assembly source is output.

armcc.exe --debug -O3 -Otime --cpu MPCore -S test.cpp

 

Warnings related to interlock delays are output when the generated assembly source is assembled with the --diag_warning 1563 option specified.

armasm --diag_warning 1563 test.s --cpu=MPCore test.s

4.3. Reducing General Processing

You can reduce CPU processing to some extent by paying attention to the following points.

  • Avoid frequent use of virtual.
  • Reduce function jumps using inline or some other means.

 


CONFIDENTIAL