Implements the classes and functions for SIMD computations on integers and single-precision, floating-point, numbers.
More...
|
class | AxisAlignedBox |
| Class for representing axis-aligned bounding boxes (AABB). The class has data members to hold the minimum coordinates (point_min ) and the maximum coordinates (point_max ). More...
|
|
class | Containment |
| The class with the collection of functions that determine containment relations. More...
|
|
class | DistanceSq |
| The class with the collection of functions that perform square-of-distance calculations. More...
|
|
struct | each_float_tag |
| The tag for representing a single-precision floating-point number with an empty structure. More...
|
|
struct | each_int16_tag |
| The tag for representing a signed 16-bit integer with an empty structure. More...
|
|
struct | each_int32_tag |
| The tag for representing a signed 32-bit integer with an empty structure. More...
|
|
struct | each_int64_tag |
| The tag for representing a signed 64-bit integer with an empty structure. More...
|
|
struct | each_int8_tag |
| The tag for representing a signed 8-bit integer with an empty structure. More...
|
|
struct | each_select16_tag |
| The tag for representing the selection of a lane divided into 16-bit units with an empty structure. More...
|
|
struct | each_select32_tag |
| The tag for representing the selection of a lane divided into 32-bit units with an empty structure. More...
|
|
struct | each_select8_tag |
| The tag for representing the selection of a lane divided into 8-bit units with an empty structure. More...
|
|
struct | each_uint16_tag |
| The tag for representing an unsigned 16-bit integer with an empty structure. More...
|
|
struct | each_uint32_tag |
| The tag for representing an unsigned 32-bit integer with an empty structure. More...
|
|
struct | each_uint64_tag |
| The tag for representing an unsigned 64-bit integer with an empty structure. More...
|
|
struct | each_uint8_tag |
| The tag for representing an unsigned 8-bit integer with an empty structure. More...
|
|
class | F128 |
| The class for single-precision floating point SIMD computations using128-bit registers (MM0-XMM15 for SSE, and Q0-Q15 for NEON). More...
|
|
struct | Float3 |
| The type for reading and writing three-dimensional vectors in memory. Keeps float -type x, y, and z values as data members. More...
|
|
struct | Float3x3 |
| The type for reading and writing 3x3 matrices in memory. The data member m is a 3x3 matrix. More...
|
|
struct | Float3x4 |
| The type for reading and writing 3x4 matrices in memory. The data member m is a 3x4 matrix, located with 16-byte alignment. More...
|
|
struct | Float4 |
| The type for reading and writing four-dimensional vectors in memory. Keeps float -type x, y, z, and w values as data members. More...
|
|
struct | Float4x3 |
| The type for reading and writing 4x3 matrices in memory. The data member m is a 4x3 matrix, located with 16-byte alignment. More...
|
|
struct | Float4x4 |
| The type for reading and writing 4x4 matrices in memory. The data member m is a 4x4 matrix, located with 16-byte alignment. More...
|
|
class | Frustum |
| Class representing the view frustum. More...
|
|
class | I128 |
| The class for integer SIMD computations using128-bit registers (MM0-XMM15 for SSE, and Q0-Q15 for NEON). More...
|
|
class | I64 |
| The class for 64-bit wide integer SIMD computations that is otherwise the same as the I128 class. More...
|
|
class | Intersection |
| The class with the collection of functions that determine intersections. More...
|
|
class | Matrix |
| The class with the collection of functions that handle 4x4 matrices. More...
|
|
class | OrientedBox |
| Class for representing oriented bounding boxes (OBB). This class has data members to hold the center coordinates (center ), the size in the xyz directions (extent ), and the rotation quaternion (rotation ). More...
|
|
class | Plane |
| The class with the collection of functions that handle planes in three-dimensional space. More...
|
|
class | Quaternion |
| The class with the collection of functions that handle quaternions. More...
|
|
struct | SimdMatrix |
| The structure for keeping a 4x4 matrix. More...
|
|
class | Sphere |
| The class with the collection of static member functions that handle spheres in three-dimensional space. This class cannot be instantiated. More...
|
|
class | Vector3 |
| The class with the collection of functions that perform calculations on three-dimensional vectors. All of these functions ignore the values set in lane 3. More...
|
|
class | Vector4 |
| The class with the collection of functions that perform calculations on four-dimensional vectors. More...
|
|
|
template<size_t NumElem> |
void | MergeSortUint32A16 (uint32_t *data) |
| Uses SIMD to merge sort a sequence of 32-bit unsigned integers. More...
|
|
template<class PRED > |
const void * | nlib_memchr_pred (const void *s, PRED pred, size_t n) |
| A function template for examining the bytes in byte strings using SIMD instructions. More...
|
|
template<class PRED > |
const void * | nlib_memchr_pred_not (const void *s, PRED pred, size_t n) |
| A function template for examining the bytes in byte strings using SIMD instructions. More...
|
|
i128 | IsAlpha (i128 c) noexcept |
| Masks alphabetic letters in c.
|
|
i128 | IsDigit (i128 c) noexcept |
| Masks the characters 0 though 9 in c.
|
|
i128 | IsAlnum (i128 c) noexcept |
| Masks alphabetic letters or the characters 0 though 9 in c.
|
|
i128 | IsSpace (i128 c) noexcept |
| Masks space characters (0x20, 0x09, 0x0A, 0x0D) in c.
|
|
i128 | IsXdigit (i128 c) noexcept |
| Masks hexadecimal characters in c.
|
|
Implements the classes and functions for SIMD computations on integers and single-precision, floating-point, numbers.
- Description
- Architectures that support SIMD computations can exhibit far higher performance when conducting SIMD computations compared to when they are not (at least 2 to 3 times higher, and as much as 10 times higher).
- These classes and functions also can be implemented for each separate SIMD architecture by directly using intrinsics, but coding is far more productive if you use a library that supports a number of SIMD architectures to write the code.
- The
SIMD
library from nlib
supports SSE4.1 and NEON (currently tested with NEONvsSSE.h
distributed by Intel). Using this library to write code allows programs to be developed to support both SSE4.1 and NEON.
- See also
- https://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks
- I128, I64
- The
I128
and I64
classes included in the SIMD library support SIMD computations on 8-bit, 16-bit, 32-bit, and 64-bit wide integers, and comprise only static member functions.
- The following kinds of computations are supported.
-
Setting 64-bit and 128-bit values.
-
Storing 64-bit and 128-bit values with the specified alignment to memory (and loading them from memory).
-
Getting and setting specific 8-bit, 16-bit, 32-bit, and 64-bit wide values inside 64-bit and 128-bit values. (The value is located in a lane.)
-
8-bit, 16-bit, 32-bit, and 64-bit wide adding, subtracting, multiplying, summing, and horizontal adding.
-
The calculation of 8-bit, 16-bit, 32-bit, and 64-bit wide maximum, minimum, and absolute values.
-
Logical operations on 128-bit values.
-
Creation of masks by conducting comparison operations on 8-bit, 16-bit, 32-bit, and 64-bit wide values.
-
Bit unit shifts in widths of 8, 16, 32, and 64 bits.
-
Byte unit shift and rotation on 128-bit values.
-
Changing the bit width of each lane.
-
Sorting the lanes.
-
Endian conversion of 16-bit, 32-bit, and 64-bit wide values.
-
Miscellaneous
- F128
- The
F128
class included in the same SIMD library is for SIMD computations on sets of four single-precision floating-point numbers, and it too comprises only static member functions.
- The following kinds of computations are supported.
-
Setting four single-precision floating-point numbers.
-
Storing values with the specified alignment to memory (and loading them from memory).
-
Getting and setting single-precision floating-point numbers in specific lanes.
-
Conversion to integer values and casting, in addition to the conversion from integer values and casting.
-
Adding, subtracting, multiplying, dividing, summing, and horizontal adding.
-
Calculation of maximum, minimum, and absolute values.
-
Value clamping.
-
Reciprocals, Square Roots, and Reciprocals of Square Roots
-
Rounding.
-
Logical operations on 128-bit values.
-
Creation of masks by conducting comparison operations.
-
Trigonometric Functions
-
Interpolations
-
Fast sorting of sets of four single-precision floating-point numbers in a single 128-bit register.
-
Fast selection of any set of four single-precision floating-point numbers from among eight such numbers in two 128-bit registers.
- The
F128
class can be used even in environments where SSE and NEON cannot be used (but not with high performance).
- Purpose of the New SIMD Functions
- It may be assumed that it is adequate to keep the same specifications for the conventional functions used on vectors and matrices and implement support for SMID internally.
- But that assumption would be incorrect. The following section explains why that is.
- Normally, when writing code, there is no 'passing by value' and 'returning by value' of 16 bytes of data. To handle that kind of data, use a function prototype like the one shown below.
void doSomething(VEC4* result, const VEC4& a, const VEC4& b);
- For the a and b parameters, which are inputs, enter pointers to two sets of 16 bytes of data and call the function. That way, the total 32 bytes of data pointed to by a and b are not added to the stack, but instead are loaded 16 bytes at a time from memory by the function for its use.
- The use of the stack for the function's return value is also kept in check because of no returning by value.
- With SSE and NEON, 16 bytes of data can be stored in a single register (a single value argument). Therefore, if there are two sets of VEC4 data, they can be passed by value using two registers. Similar for the return values.
- Given this situation, loading and storing data by indirect reference is overhead. The outcome is that it would be more appropriate to code a function prototype as shown below.
VEC4 doSomething(VEC4 a, VEC4 b);
For this reason, design new functions for SIMD.
- Correspondence With DirectXMath
- The table below shows the correspondence between the functions of the
nlib
SIMD library and the DirectXMath functions provided by the Microsoft Corporation.
- Note that the order of the parameters and the computational results are not always the same order as the functions of
DirectXMath
.
DirectXMath (Vector Functions) | nlib SIMD Library |
Vector Arithmetic Functions | |
XMVectorAbs | F128::Abs() |
XMVectorAdd | F128::Add() |
XMVectorAddAngles | F128::AddAngle() |
XMVectorCeiling | F128::Ceil() |
XMVectorClamp | F128::Clamp() |
XMVectorDivide | F128::Div() |
XMVectorFloor | F128::Floor() |
XMVectorIsInfinite | F128::IsInfinite() |
XMVectorIsNaN | F128::IsNaN() |
XMVectorMax | F128::Max() |
XMVectorMin | F128::Min() |
XMVectorMod | |
XMVectorModAngles | F128::ModAngle() |
XMVectorMultiply | F128::Mult() |
XMVectorMultiplyAdd | F128::MultAdd() |
XMVectorNegate | F128::Negate() |
XMVectorNegativeMultiplySubtract | F128::MultSub() |
XMVectorPow | |
XMVectorReciprocal | F128::Recp() |
XMVectorReciprocalEst | F128::RecpEst() |
XMVectorReciprocalSqrt | F128::RecpSqrt() |
XMVectorReciprocalSqrtEst | F128::RecpSqrtEst() |
XMVectorRound | F128::Round() |
XMVectorSaturate | F128::Saturate() |
XMVectorScale | F128::Mult() |
XMVectorSqrt | F128::Sqrt() |
XMVectorSqrtEst | F128::SqrtEst() |
XMVectorSubtract | F128::Sub() |
XMVectorSubtractAngles | F128::SubAngle() |
XMVectorTruncate | F128::Truncate() |
Bit-Wise Vector Functions | |
XMVectorAndCInt | F128::AndNot() |
XMVectorAndInt | F128::And() |
XMVectorNorInt | F128::Or(), F128::Not() |
XMVectorNotEqual | F128::CmpNe() |
XMVectorNotEqualInt | F128::CastToI128(), F128::CmpEq(), F128::Not() |
XMVectorOrInt | F128::Or() |
XMVectorXorInt | F128::Xor() |
Vector Comparison Functions | |
XMVectorEqual | F128::CmpEq() |
XMVectorEqualInt | F128::CastToI128(), I128::CmpEq32() |
XMVectorGreater | F128::CmpGt() |
XMVectorGreaterOrEqual | F128::CmpGe() |
XMVectorLess | F128::CmpLt() |
XMVectorLessOrEqual | F128::CmpLe() |
XMVectorNearEqual | F128::CmpNearEq() |
Component-Wise Vector Functions | |
XMVectorMergeXY | F128::Permute<0, 4, 1, 5>() |
XMVectorMergeZW | F128::Permute<2, 6, 3, 7>() |
XMVectorPermute | Implemented as a function template. |
XMVectorRotateLeft | Implemented as a function template. |
XMVectorRotateRight | Implemented as a function template. |
XMVectorSelect | F128::Select() |
XMVectorSelectControl | I128::SetValue(uint32_t v, each_uint32_tag), F128::CastFromI128() |
XMVectorShiftLeft | Implemented as a function template. |
XMVectorSplatW | F128::SetValue<3>(f128 value, each_select32_tag) |
XMVectorSplatX | F128::SetValue<0>(f128 value, each_select32_tag) |
XMVectorSplatY | F128::SetValue<1>(f128 value, each_select32_tag) |
XMVectorSplatZ | F128::SetValue<2>(f128 value, each_select32_tag) |
XMVectorSwizzle | Implemented as a function template. |
Geometric Vector Functions | |
XMVectorBaryCentricV | F128::BaryCentric() |
XMVectorCatmullRomV | F128::CatmullRom() |
XMVectorHermiteV | F128::Hermite() |
XMVectorInBounds | F128::InBound() |
XMVectorLerpV | F128::Lerp() |
Vector Initialization Functions | |
XMVectorFalseInt | F128::SetZero() |
XMVectorReplicate | F128::SetValue(float v, each_float_tag) |
XMVectorReplicateInt | F128::SetValue(uint32_t v, each_uint32_tag) |
XMVectorSet | F128::SetValue(float a, float b, float c, float d) |
XMVectorSetInt | |
XMVectorSplatConstant | |
XMVectorSplatConstantInt | |
XMVectorSplatEpsilon | F128::SetValue(uint32_t v, each_uint32_tag), v = 0x34000000U |
XMVectorSplatInfinity | F128::SetValue(uint32_t v, each_uint32_tag), v = 0x7F800000U |
XMVectorSplatOne | F128::SetOne() |
XMVectorSplatQNaN | F128::SetValue(uint32_t v, each_uint32_tag), v = 0x7FC00000U |
XMVectorSplatSignMask | F128::SetValue(uint32_t v, each_uint32_tag), v = 0x80000000U |
XMVectorTrueInt | F128::SetValue(uint32_t v, each_uint32_tag), v = 0xFFFFFFFFU |
XMVectorZero | F128::SetZero() |
Transcendental Vector Functions | |
XMVectorACos | F128::ArcCos() |
XMVectorASin | F128::ArcSin() |
XMVectorATan | F128::ArcTan() |
XMVectorATan2 | F128::ArcTan2() |
XMVectorCos | F128::Cos() |
XMVectorCosH | F128::CosH() |
XMVectorExp | F128::Exp2() |
XMVectorExp2 | F128::Exp2() |
XMVectorExpE | F128::ExpE() |
XMVectorLog | F128::Log2() |
XMVectorLog2 | F128::Log2() |
XMVectorLogE | F128::LogE() |
XMVectorSin | F128::Sin() |
XMVectorSinCos | F128::SinCos() |
XMVectorSinH | F128::SinH() |
XMVectorTan | F128::Tan() |
XMVectorTanH | F128::TanH() |
-
-
-
-
-
-
-
-
-
-
DirectXMath (DirectXMath Triangle Test Functions) | nlib SIMD Library |
TriangleTests:: Intersects | Intersection |
DirectXMath (misc) | nlib SIMD Library |
_XM_NO_INTRINSICS_ | NLIB_F128_SIMD_NOUSE |