nlib
nn::nlib::simd Namespace Reference

Implements the classes and functions for SIMD computations on integers and single-precision, floating-point, numbers. More...

Classes

class  AxisAlignedBox
 Class for representing axis-aligned bounding boxes (AABB). The class has data members to hold the minimum coordinates (point_min) and the maximum coordinates (point_max). More...
 
class  Containment
 The class with the collection of functions that determine containment relations. More...
 
class  DistanceSq
 The class with the collection of functions that perform square-of-distance calculations. More...
 
struct  each_float_tag
 The tag for representing a single-precision floating-point number with an empty structure. More...
 
struct  each_int16_tag
 The tag for representing a signed 16-bit integer with an empty structure. More...
 
struct  each_int32_tag
 The tag for representing a signed 32-bit integer with an empty structure. More...
 
struct  each_int64_tag
 The tag for representing a signed 64-bit integer with an empty structure. More...
 
struct  each_int8_tag
 The tag for representing a signed 8-bit integer with an empty structure. More...
 
struct  each_select16_tag
 The tag for representing the selection of a lane divided into 16-bit units with an empty structure. More...
 
struct  each_select32_tag
 The tag for representing the selection of a lane divided into 32-bit units with an empty structure. More...
 
struct  each_select8_tag
 The tag for representing the selection of a lane divided into 8-bit units with an empty structure. More...
 
struct  each_uint16_tag
 The tag for representing an unsigned 16-bit integer with an empty structure. More...
 
struct  each_uint32_tag
 The tag for representing an unsigned 32-bit integer with an empty structure. More...
 
struct  each_uint64_tag
 The tag for representing an unsigned 64-bit integer with an empty structure. More...
 
struct  each_uint8_tag
 The tag for representing an unsigned 8-bit integer with an empty structure. More...
 
class  F128
 The class for single-precision floating point SIMD computations using128-bit registers (MM0-XMM15 for SSE, and Q0-Q15 for NEON). More...
 
struct  Float3
 The type for reading and writing three-dimensional vectors in memory. Keeps float-type x, y, and z values as data members. More...
 
struct  Float3x3
 The type for reading and writing 3x3 matrices in memory. The data member m is a 3x3 matrix. More...
 
struct  Float3x4
 The type for reading and writing 3x4 matrices in memory. The data member m is a 3x4 matrix, located with 16-byte alignment. More...
 
struct  Float4
 The type for reading and writing four-dimensional vectors in memory. Keeps float-type x, y, z, and w values as data members. More...
 
struct  Float4x3
 The type for reading and writing 4x3 matrices in memory. The data member m is a 4x3 matrix, located with 16-byte alignment. More...
 
struct  Float4x4
 The type for reading and writing 4x4 matrices in memory. The data member m is a 4x4 matrix, located with 16-byte alignment. More...
 
class  Frustum
 Class representing the view frustum. More...
 
class  I128
 The class for integer SIMD computations using128-bit registers (MM0-XMM15 for SSE, and Q0-Q15 for NEON). More...
 
class  Intersection
 The class with the collection of functions that determine intersections. More...
 
class  Matrix
 The class with the collection of functions that handle 4x4 matrices. More...
 
class  OrientedBox
 Class for representing oriented bounding boxes (OBB). This class has data members to hold the center coordinates (center), the size in the xyz directions (extent), and the rotation quaternion (rotation). More...
 
class  Plane
 The class with the collection of functions that handle planes in three-dimensional space. More...
 
class  Quaternion
 The class with the collection of functions that handle quaternions. More...
 
struct  SimdMatrix
 The structure for keeping a 4x4 matrix. More...
 
class  Sphere
 The class with the collection of static member functions that handle spheres in three-dimensional space. This class cannot be instantiated. More...
 
class  Vector3
 The class with the collection of functions that perform calculations on three-dimensional vectors. All of these functions ignore the values set in lane 3. More...
 
class  Vector4
 The class with the collection of functions that perform calculations on four-dimensional vectors. More...
 

Typedefs

typedef nlib_f128_t f128
 nlib_f128_t is defined using typedef.
 
typedef nlib_f128x2_t f128x2
 nlib_f128x2_t is defined using typedef.
 
typedef const f128 f128arg
 const f128 or const f128& is defined using typedef.
 
typedef f128 SimdVector
 f128 is defined using typedef. Used when handling three-dimensional or four-dimensional vectors.
 
typedef f128arg SimdVectorArg
 f128arg is defined using typedef.
 
typedef f128 SimdQuaternion
 f128 is defined using typedef. Used when handling quaternions.
 
typedef f128arg SimdQuaternionArg
 f128arg is defined using typedef.
 
typedef f128 SimdPlane
 f128 is defined using typedef. Used when handling planes.
 
typedef f128arg SimdPlaneArg
 f128arg is defined using typedef.
 
typedef f128 SimdSphere
 f128 is defined using typedef. Used when handling spheres.
 
typedef f128arg SimdSphereArg
 f128arg is defined using typedef.
 
typedef nlib_i128_t i128
 nlib_i128_t is defined using typedef.
 

Functions

template<size_t NumElem>
void MergeSortUint32A16 (uint32_t *data) noexcept
 Uses SIMD to merge and sort 32-bit unsigned integer strings in the ascending order. More...
 
errno_t MergeSortUint32A16 (uint32_t *data, size_t n) noexcept
 Uses SIMD to merge and sort 32-bit unsigned integer strings in the ascending order. More...
 
template<class PRED >
const void * nlib_memchr_pred (const void *s, PRED pred, size_t n) noexcept
 A function template for examining the bytes in byte strings using SIMD instructions. More...
 
template<class PRED >
const void * nlib_memchr_pred_not (const void *s, PRED pred, size_t n) noexcept
 A function template for examining the bytes in byte strings using SIMD instructions. More...
 
i128 IsAlpha (i128 c) noexcept
 Masks alphabetic letters in c.
 
i128 IsDigit (i128 c) noexcept
 Masks the characters 0 through 9 in c.
 
i128 IsAlnum (i128 c) noexcept
 Masks alphabetic letters or the characters 0 through 9 in c.
 
i128 IsSpace (i128 c) noexcept
 Masks space characters (0x20, 0x09, 0x0A, 0x0D) in c.
 
i128 IsXdigit (i128 c) noexcept
 Masks hexadecimal characters in c.
 
template<class T , class Compare >
errno_t KeyIdxSortN (T **dst, T *const *src, size_t n, Compare comp) noexcept
 A function that performs high-speed sorts by resolving the sort of the object pointers to a sort of 32-bit non-negative integers. Sorts the column of pointers to T. More...
 
template<class T >
errno_t KeyIdxSortN (T **dst, T *const *src, size_t n) noexcept
 Executes KeyIdxSortN(dst, src, n, std::less<T>()).
 
template<class T , class Compare >
errno_t KeyIdxSort (T **first, T **last, Compare comp) noexcept
 Allocates memory internally and executes KeyIdxSortN(T** dst, T* const* src, size_t n, Compare comp).
 
template<class T >
errno_t KeyIdxSort (T **first, T **last) noexcept
 Executes KeyIdxSort(first, last, std::less<T>()).
 

Variables

Tag Constants
constexpr const each_float_tag each_float = {}
 The tag for representing a single-precision floating-point number with an each_float_tag-type constant object.
 
constexpr const each_int8_tag each_int8 = {}
 The tag for representing a signed 8-bit integer with an each_int8_tag-type constant object.
 
constexpr const each_int16_tag each_int16 = {}
 The tag for representing a signed 16-bit integer with an each_int16_tag-type constant object.
 
constexpr const each_int32_tag each_int32 = {}
 The tag for representing a signed 32-bit integer with an each_int32_tag-type constant object.
 
constexpr const each_int64_tag each_int64 = {}
 The tag for representing a signed 64-bit integer with an each_int64_tag-type constant object.
 
constexpr const each_uint8_tag each_uint8 = {}
 The tag for representing an unsigned 8-bit integer with an each_uint8_tag-type constant object.
 
constexpr const each_uint16_tag each_uint16 = {}
 The tag for representing an unsigned 16-bit integer with an each_uint16_tag-type constant object.
 
constexpr const each_uint32_tag each_uint32 = {}
 The tag for representing an unsigned 32-bit integer with an each_uint32_tag-type constant object.
 
constexpr const each_uint64_tag each_uint64 = {}
 The tag for representing an unsigned 64-bit integer with an each_uint64_tag-type constant object.
 
constexpr const each_select32_tag each_select32 = {}
 The tag for representing the selection of a 32-bit lane with an each_select32_tag-type constant object.
 
constexpr const each_select16_tag each_select16 = {}
 The tag for representing the selection of a 16-bit lane with an each_select16_tag-type constant object.
 
constexpr const each_select8_tag each_select8 = {}
 The tag for representing the selection of an 8-bit lane with an each_select8_tag-type constant object.
 

Detailed Description

Implements the classes and functions for SIMD computations on integers and single-precision, floating-point, numbers.

Description
Architectures that support SIMD computations can exhibit far higher performance when conducting SIMD computations compared to when they are not (at least 2 to 3 times higher, and as much as 10 times higher).
These classes and functions also can be implemented for each separate SIMD architecture by directly using intrinsics, but coding is far more productive if you use a library that supports a number of SIMD architectures to write the code.
The SIMD library from nlib supports SSE4.1 and NEON. Therefore, you can use this library to write code for and develop programs that support both SSE4.1 and NEON.
See also
https://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks
I128, I64
The I128 and I64 classes included in the SIMD library support SIMD computations on 8-bit, 16-bit, 32-bit, and 64-bit wide integers, and comprise only static member functions.
The following kinds of computations are supported.
  • Setting 64-bit and 128-bit values.
  • Storing 64-bit and 128-bit values with the specified alignment to memory (and loading them from memory).
  • Getting and setting specific 8-bit, 16-bit, 32-bit, and 64-bit wide values inside 64-bit and 128-bit values. (The value is located in a lane.)
  • 8-bit, 16-bit, 32-bit, and 64-bit wide adding, subtracting, multiplying, summing, and horizontal adding.
  • The calculation of 8-bit, 16-bit, 32-bit, and 64-bit wide maximum, minimum, and absolute values.
  • Logical operations on 128-bit values.
  • Creation of masks by conducting comparison operations on 8-bit, 16-bit, 32-bit, and 64-bit wide values.
  • Bit unit shifts in widths of 8, 16, 32, and 64 bits.
  • Byte unit shift and rotation on 128-bit values.
  • Changing the bit width of each lane.
  • Sorting the lanes.
  • Endian conversion of 16-bit, 32-bit, and 64-bit wide values.
  • Miscellaneous
F128
The F128 class included in the same SIMD library is for SIMD computations on sets of four single-precision floating-point numbers, and it too comprises only static member functions.
The following kinds of computations are supported.
  • Setting four single-precision floating-point numbers.
  • Storing values with the specified alignment to memory (and loading them from memory).
  • Getting and setting single-precision floating-point numbers in specific lanes.
  • Conversion to integer values and casting, in addition to the conversion from integer values and casting.
  • Adding, subtracting, multiplying, dividing, summing, and horizontal adding.
  • Calculation of maximum, minimum, and absolute values.
  • Value clamping.
  • Reciprocals, Square Roots, and Reciprocals of Square Roots
  • Rounding.
  • Logical operations on 128-bit values.
  • Creation of masks by conducting comparison operations.
  • Trigonometric Functions
  • Interpolations
  • Fast sorting of sets of four single-precision floating-point numbers in a single 128-bit register.
  • Fast selection of any set of four single-precision floating-point numbers from among eight such numbers in two 128-bit registers.
The F128 class can be used even in environments where SSE and NEON cannot be used (but not with high performance).
Purpose of the New SIMD Functions
It may be assumed that it is adequate to keep the same specifications for the conventional functions used on vectors and matrices and implement support for SIMD internally.
But that assumption would be incorrect. The following section explains why that is.
Normally, when writing code, there is no 'passing by value' and 'returning by value' of 16 bytes of data. To handle that kind of data, use a function prototype like the one shown below.
void doSomething(VEC4* result, const VEC4& a, const VEC4& b);
For the a and b parameters, which are inputs, enter pointers to two sets of 16 bytes of data and call the function. That way, the total 32 bytes of data pointed to by a and b are not added to the stack, but instead are loaded 16 bytes at a time from memory by the function for its use.
The use of the stack for the function's return value is also kept in check because of no returning by value.
With SSE and NEON, 16 bytes of data can be stored in a single register (a single value argument). Therefore, if there are two sets of VEC4 data, they can be passed by value using two registers. Similar for the return values.
Given this situation, loading and storing data by indirect reference is overhead. The outcome is that it would be more appropriate to code a function prototype as shown below.
VEC4 doSomething(VEC4 a, VEC4 b);
For this reason, design new functions for SIMD.
Correspondence With DirectXMath
The table below shows the correspondence between the functions of the nlib SIMD library and the DirectXMath functions provided by the Microsoft Corporation.
Note that the order of the parameters and the computational results are not always the same order as the functions of DirectXMath.
DirectXMath (Vector Functions) nlib SIMD Library
Vector Arithmetic Functions
XMVectorAbs F128::Abs()
XMVectorAdd F128::Add()
XMVectorAddAngles F128::AddAngle()
XMVectorCeiling F128::Ceil()
XMVectorClamp F128::Clamp()
XMVectorDivide F128::Div()
XMVectorFloor F128::Floor()
XMVectorIsInfinite F128::IsInfinite()
XMVectorIsNaN F128::IsNaN()
XMVectorMax F128::Max()
XMVectorMin F128::Min()
XMVectorMod
XMVectorModAngles F128::ModAngle()
XMVectorMultiply F128::Mult()
XMVectorMultiplyAdd F128::MultAdd()
XMVectorNegate F128::Negate()
XMVectorNegativeMultiplySubtract F128::MultSub()
XMVectorPow
XMVectorReciprocal F128::Recp()
XMVectorReciprocalEst F128::RecpEst()
XMVectorReciprocalSqrt F128::RecpSqrt()
XMVectorReciprocalSqrtEst F128::RecpSqrtEst()
XMVectorRound F128::Round()
XMVectorSaturate F128::Saturate()
XMVectorScale F128::Mult()
XMVectorSqrt F128::Sqrt()
XMVectorSqrtEst F128::SqrtEst()
XMVectorSubtract F128::Sub()
XMVectorSubtractAngles F128::SubAngle()
XMVectorTruncate F128::Truncate()
Bit-Wise Vector Functions
XMVectorAndCInt F128::AndNot()
XMVectorAndInt F128::And()
XMVectorNorInt F128::Or(), F128::Not()
XMVectorNotEqual F128::CmpNe()
XMVectorNotEqualInt F128::CastToI128(), F128::CmpEq(), F128::Not()
XMVectorOrInt F128::Or()
XMVectorXorInt F128::Xor()
Vector Comparison Functions
XMVectorEqual F128::CmpEq()
XMVectorEqualInt F128::CastToI128(), I128::CmpEq32()
XMVectorGreater F128::CmpGt()
XMVectorGreaterOrEqual F128::CmpGe()
XMVectorLess F128::CmpLt()
XMVectorLessOrEqual F128::CmpLe()
XMVectorNearEqual F128::CmpNearEq()
Component-Wise Vector Functions
XMVectorMergeXY F128::Permute<0, 4, 1, 5>()
XMVectorMergeZW F128::Permute<2, 6, 3, 7>()
XMVectorPermute Implemented as a function template.
XMVectorRotateLeft Implemented as a function template.
XMVectorRotateRight Implemented as a function template.
XMVectorSelect F128::Select()
XMVectorSelectControl I128::SetValue(uint32_t v, each_uint32_tag), F128::CastFromI128()
XMVectorShiftLeft Implemented as a function template.
XMVectorSplatW F128::SetValue<3>(f128 value, each_select32_tag)
XMVectorSplatX F128::SetValue<0>(f128 value, each_select32_tag)
XMVectorSplatY F128::SetValue<1>(f128 value, each_select32_tag)
XMVectorSplatZ F128::SetValue<2>(f128 value, each_select32_tag)
XMVectorSwizzle Implemented as a function template.
Geometric Vector Functions
XMVectorBaryCentricV F128::BaryCentric()
XMVectorCatmullRomV F128::CatmullRom()
XMVectorHermiteV F128::Hermite()
XMVectorInBounds F128::InBound()
XMVectorLerpV F128::Lerp()
Vector Initialization Functions
XMVectorFalseInt F128::SetZero()
XMVectorReplicate F128::SetValue(float v, each_float_tag)
XMVectorReplicateInt F128::SetValue(uint32_t v, each_uint32_tag)
XMVectorSet F128::SetValue(float a, float b, float c, float d)
XMVectorSetInt
XMVectorSplatConstant
XMVectorSplatConstantInt
XMVectorSplatEpsilon F128::SetValue(uint32_t v, each_uint32_tag), v = 0x34000000U
XMVectorSplatInfinity F128::SetValue(uint32_t v, each_uint32_tag), v = 0x7F800000U
XMVectorSplatOne F128::SetOne()
XMVectorSplatQNaN F128::SetValue(uint32_t v, each_uint32_tag), v = 0x7FC00000U
XMVectorSplatSignMask F128::SetValue(uint32_t v, each_uint32_tag), v = 0x80000000U
XMVectorTrueInt F128::SetValue(uint32_t v, each_uint32_tag), v = 0xFFFFFFFFU
XMVectorZero F128::SetZero()
Transcendental Vector Functions
XMVectorACos F128::ArcCos()
XMVectorASin F128::ArcSin()
XMVectorATan F128::ArcTan()
XMVectorATan2 F128::ArcTan2()
XMVectorCos F128::Cos()
XMVectorCosH F128::CosH()
XMVectorExp F128::Exp2()
XMVectorExp2 F128::Exp2()
XMVectorExpE F128::ExpE()
XMVectorLog F128::Log2()
XMVectorLog2 F128::Log2()
XMVectorLogE F128::LogE()
XMVectorSin F128::Sin()
XMVectorSinCos F128::SinCos()
XMVectorSinH F128::SinH()
XMVectorTan F128::Tan()
XMVectorTanH F128::TanH()
DirectXMath (DirectXMath Library Template Functions) nlib SIMD Library
XMVectorPermute F128::Permute()
XMVectorRotateLeft F128::RotateRight(), not typo
XMVectorRotateRight F128::RotateLeft(), not typo
XMVectorShiftLeft F128::ShiftRight(), not typo
XMVectorSwizzle F128::Swizzle()
DirectXMath (DirectXMath Library Vector Accessor Functions) nlib SIMD Library
XMVectorGetByIndex F128::GetFloatByIndex()
XMVectorGetIntByIndex F128::GetUint32ByIndex()
XMVectorGetIntW F128::CastToI128(), I128::GetUint32FromLane<3>()
XMVectorGetIntX F128::CastToI128(), I128::GetUint32FromLane<0>()
XMVectorGetIntY F128::CastToI128(), I128::GetUint32FromLane<1>()
XMVectorGetIntZ F128::CastToI128(), I128::GetUint32FromLane<2>()
XMVectorGetW F128::GetFloatFromLane<3>()
XMVectorGetX F128::GetFloatFromLane<0>()
XMVectorGetY F128::GetFloatFromLane<1>()
XMVectorGetZ F128::GetFloatFromLane<2>()
XMVectorSetByIndex F128::SetFloatByIndex()
XMVectorSetIntByIndex
XMVectorSetIntW F128::CastToI128(), I128::SetUint32ToLane<3>()
XMVectorSetIntX F128::CastToI128(), I128::SetUint32ToLane<0>()
XMVectorSetIntY F128::CastToI128(), I128::SetUint32ToLane<1>()
XMVectorSetIntZ F128::CastToI128(), I128::SetUint32ToLane<2>()
XMVectorSetW F128::SetFloatToLane<3>()
XMVectorSetX F128::SetFloatToLane<0>()
XMVectorSetY F128::SetFloatToLane<1>()
XMVectorSetZ F128::SetFloatToLane<2>()
DirectXMath (4D Vector Functions) nlib SIMD Library
4D Vector Comparison Functions
XMVector4Equal Vector4::CmpEq()
XMVector4Greater Vector4::CmpGt()
XMVector4GreaterOrEqual Vector4::CmpGe()
XMVector4IsInfinite Vector4::IsInfinite()
XMVector4IsNaN Vector4::IsNaN()
XMVector4Less Vector4::CmpLt()
XMVector4LessOrEqual Vector4::CmpLe()
XMVector4NotEqual Vector4::CmpNe()
4D Vector Geometric Functions
XMVector4AngleBetweenNormals Vector4::GetAngle()
XMVector4Dot Vector4::Dot()
XMVector4InBounds Vector4::InBound()
XMVector4Length Vector4::Length()
XMVector4LengthEst Vector4::LengthEst()
XMVector4LengthSq Vector4::LengthSq()
XMVector4Normalize Vector4::Normalize()
XMVector4NormalizeEst Vector4::NormalizeEst()
XMVector4ReciprocalLength Vector4::RecpLength()
XMVector4ReciprocalLengthEst Vector4::RecpLengthEst()
XMVector4Reflect Vector4::Reflect()
4D Vector Transformation Functions
XMVector4Transform Vector4::Transform()
DirectXMath (3D Vector Functions) nlib SIMD Library
3D Vector Comparison Functions
XMVector3Equal Vector3::CmpEq()
XMVector3Greater Vector3::CmpGt()
XMVector3GreaterOrEqual Vector3::CmpGe()
XMVector3IsInfinite Vector3::IsInfinite()
XMVector3IsNaN Vector3::IsNaN()
XMVector3Less Vector3::CmpLt()
XMVector3LessOrEqual Vector3::CmpLe()
XMVector3NotEqual Vector3::CmpNe()
3D Vector Geometric Functions
XMVector3AngleBetweenNormals Vector3::GetAngle()
XMVector3Cross Vector3::Cross()
XMVector3Dot Vector3::Dot()
XMVector3InBounds Vector3::InBound()
XMVector3Length Vector3::Length()
XMVector3LengthEst Vector3::LengthEst()
XMVector3LengthSq Vector3::LengthSq()
XMVector3LinePointDistance DistanceSq::PointLine()
XMVector3Normalize Vector3::Normalize()
XMVector3NormalizeEst Vector3::NormalizeEst()
XMVector3Orthogonal
XMVector3ReciprocalLength Vector3::RecpLength()
XMVector3ReciprocalLengthEst Vector3::RecpLengthEst()
XMVector3Reflect Vector3::Reflect()
XMVector3Refract
3D Vector Transformation Functions
XMVector3InverseRotate Vector3::InvRotate()
XMVector3Project
XMVector3Rotate Vector3::Rotate()
XMVector3Transform Vector3::Transform()
XMVector3TransformCoord Vector3::TransformCoord()
XMVector3TransformNormal Vector3::TransformNormal()
XMVector3Unproject
DirectXMath (Plane Functions) nlib SIMD Library
XMPlaneDot Plane::Dot()
XMPlaneDotCoord Plane::DotCoord()
XMPlaneDotNormal Plane::DotNormal()
XMPlaneFromPointNormal Plane::FromPointAndNormal()
XMPlaneFromPoints Plane::FromPoint()
XMPlaneIntersectLine Intersection::PlaneLine()
XMPlaneIntersectPlane Intersection::PlanePlane()
XMPlaneNormalize Plane::Normalize()
XMPlaneNormalizeEst Plane::NormalizeEst()
XMPlaneTransform Plane::Transform()
DirectXMath (Quaternion Functions) nlib SIMD Library
XMQuaternionBaryCentric Quaternion::BaryCentric()
XMQuaternionConjugate Quaternion::Conjugate()
XMQuaternionDot Quaternion::Dot()
XMQuaternionEqual Quaternion::CmpEq()
XMQuaternionExp Quaternion::Exp()
XMQuaternionIdentity Quaternion::Identity()
XMQuaternionInverse Quaternion::Inverse()
XMQuaternionIsIdentity Quaternion::IsIdentity()
XMQuaternionIsInfinite Quaternion::IsInfinite()
XMQuaternionIsNaN Quaternion::IsNaN()
XMQuaternionLength Quaternion::Length()
XMQuaternionLengthSq Quaternion::LengthSq()
XMQuaternionLn Quaternion::Ln()
XMQuaternionMultiply Quaternion::Mult()
XMQuaternionNormalize Quaternion::Normalize()
XMQuaternionNormalizeEst Quaternion::NormalizeEst()
XMQuaternionNotEqual Quaternion::CmpNe()
XMQuaternionReciprocalLength Quaternion::RecpLength()
XMQuaternionRotationNormal Quaternion::FromRotationAxisAndSinCos()
XMQuaternionRotationMatrix Quaternion::FromRotationMatrix()
XMQuaternionRollPitchYawFromVector Quaternion::FromRotationZXY()
XMQuaternionSlerp Quaternion::Slerp()
XMQuaternionSquad Quaternion::Squad()
XMQuaternionSquadSetup Quaternion::SquadSetup()
XMQuaternionToAxisAngle Quaternion::ToAxisAngle()
DirectXMath (Matrix Functions) nlib SIMD Library
XMMatrixAffineTransformation
XMMatrixDecompose Matrix::Decompose()
XMMatrixDeterminant Matrix::Determinant()
XMMatrixIdentity Matrix::Identity()
XMMatrixInverse Matrix::Inverse()
XMMatrixIsIdentity Matrix::IsIdentity()
XMMatrixIsInfinite Matrix::IsInfinite()
XMMatrixIsNaN Matrix::IsNaN()
XMMatrixLookAtLH Matrix::LookAtLh()
XMMatrixLookAtRH Matrix::LookAtRh()
XMMatrixLookToLH Matrix::LookToLh()
XMMatrixLookToRH Matrix::LookToRh()
XMMatrixMultiply Matrix::Mult()
XMMatrixMultiplyTranspose Matrix::MultTranspose()
XMMatrixOrthographicLH Matrix::OrthographicLh()
XMMatrixOrthographicOffCenterLH Matrix::OrthographicOffCenterLh()
XMMatrixOrthographicOffCenterRH Matrix::OrthographicOffCenterRh()
XMMatrixOrthographicRH Matrix::OrthographicRh()
XMMatrixPerspectiveFovLH Matrix::PerspectiveFovLh()
XMMatrixPerspectiveFovRH Matrix::PerspectiveFovRh()
XMMatrixPerspectiveLH Matrix::PerspectiveLh()
XMMatrixPerspectiveOffCenterLH Matrix::PerspectiveOffCenterLh()
XMMatrixPerspectiveOffCenterRH Matrix::PerspectiveOffCenterRh()
XMMatrixPerspectiveRH Matrix::PerspectiveRh()
XMMatrixReflect Matrix::Reflect()
XMMatrixRotationNormal Matrix::FromRotationAxisAndSinCos()
XMMatrixRotationQuaternion Matrix::FromRotationQuaternion()
XMMatrixRotationRoolPitchYawFromVector Matrix::FromRotationZXY()
XMMatrixRotationX Matrix::FromRotationX()
XMMatrixRotationY Matrix::FromRotationY()
XMMatrixRotationZ Matrix::FromRotationZ()
XMMatrixScaling Matrix::FromScaling()
XMMatrixScalingFromVector Matrix::FromScaling()
XMMatrixShadow Matrix::Shadow()
XMMatrixTransformation
XMMatrixTranslation Matrix::FromTranslation()
XMMatrixTranslationFromVector Matrix::FromTranslation()
XMMatrixTranspose Matrix::Transpose()
DirectXMath (DirectXMath Library Vector Load Functions) nlib SIMD Library
XMLoadFloat F128::LoadA4()
XMLoadFloat4A F128::LoadA16()
XMLoadFloat3 Vector3::LoadFloat3()
XMLoadFloat4x4A Matrix::LoadFloat4x4()
XMLoadFloat4x3A Matrix::LoadFloat4x3()
XMLoadFloat3x3 Matrix::LoadFloat3x3()
DirectXMath (DirectXMath Library Vector Store Functions) nlib SIMD Library
XMStoreFloat F128::StoreA4()
XMStoreFloat4A F128::StoreA16()
XMStoreFloat3 Vector3::StoreFloat3()
XMStoreFloat4x4A Matrix::StoreFloat4x4()
XMStoreFloat4x3A Matrix::StoreFloat4x3()
XMStoreFloat3x3 Matrix::StoreFloat3x3()
DirectXMath (DirectXMath Library Classes) nlib SIMD Library
BoundingBox AxisAlignedBox
BoundingBox:: Intersects Intersection
BoundingBox:: Contains Containment
BoundingOrientedBox OrientedBox
BoundingOrientedBox:: Intersects Intersection
BoundingOrientedBox:: Contains Containment
BoundingSphere SimdSphere, Sphere
BoundingSphere:: Intersects Intersection
BoundingSphere:: Contains Containment
BoundingFrustum Frustum
BoundingFrustum:: Intersects Intersection
BoundingFrustum:: Contains Containment
DirectXMath (DirectXMath Triangle Test Functions) nlib SIMD Library
TriangleTests:: Intersects Intersection
DirectXMath (misc) nlib SIMD Library
_XM_NO_INTRINSICS_ NLIB_F128_SIMD_NOUSE

Function Documentation

◆ KeyIdxSortN()

template<class T , class Compare >
nn::nlib::simd::KeyIdxSortN ( T **  dst,
T *const *  src,
size_t  n,
Compare  comp 
)
noexcept

A function that performs high-speed sorts by resolving the sort of the object pointers to a sort of 32-bit non-negative integers. Sorts the column of pointers to T.

Template Parameters
TType of the object to be sorted.
CompareType of the function object to be compared
Parameters
[out]dstPointer to the array where the sort results will be stored.
[in]srcPointer to the array to be sorted.
[in]nThe number of elements of dst and src.
[in]compFunction object that has a comparison function for T.
Return values
0Success.
EINVALIndicates that n is too large.
ENOMEMMemory allocation has failed.
Description
The following describes the overview of the algorithm.
  1. Gets a 32-bit non-negative integer key from a T type object that can be referenced from src and concatenates that integer to an index in src to create a 32-bit non-negative integer.
    • To this end, T must implement the uint32_t GetKey32() const; member function.
  2. Sorts the above array using MergeSortUint32A16(). The keys are sorted in the ascending order.
  3. For elements with the same key, the algorithm further sorts them using std::stable_sort().
    • Note that a comparison is performed with comp, which is a function object to compare T instead of T*.
  4. Takes a T type pointer from the sorted array and stores it in dst. Note that the sort is a stable sort.
The sort is faster than a normal sort due to the following two reasons.
  • Only 32-bit non-negative integers instead of objects are referenced when the sort is actually performed, which can minimize the impacts of cache misses when the object size is large.
  • The sort of 32-bit non-negative integers by MergeSortUint32A16() is fast.
Comparison with std::sort()
The following table compares the times (msec) required to run 100000-time sorts of a column of pointers (shuffled by std::random_shuffle()) to a 64-byte object. These times were measured using Xeon W3565 3.2GHz and Visual Studio 2013 (64bit).
Number of Elements KeyIdxSortN() std::sort()
64 145 219
128 338 520
256 843 1298
512 1869 3083
1024 4242 6809
2048 9145 15277
4096 21164 34641
8192 45326 78387
Sample code
The data type to be sorted must be comparable and have the GetKey32() member function. The array of pointers to objects is sorted as follows.
class MyObj {
public:
uint32_t GetKey32() const { ...... }
bool operator<(const MyObj& rhs) const { ..... }
......
private:
......
};
......
MyObj* src[n];
......

Definition at line 272 of file SimdAlgorithm.h.

◆ MergeSortUint32A16() [1/2]

template<size_t NumElem>
nn::nlib::simd::MergeSortUint32A16 ( uint32_t *  data)
inlinenoexcept

Uses SIMD to merge and sort 32-bit unsigned integer strings in the ascending order.

Template Parameters
NumElemThe number of elements to sort. It must be a multiple of 16.
Parameters
[in]dataPointer to the element string to be sorted. It must be aligned to 16 bytes.
Description
The number of elements must be a multiple of 16.
Though, in general, the speed of std::sort is significantly affected by the differences of standard library implementations or input data, sorting can be performed about 10 times faster than std::sort when around 64 elements are processed, and about 2-4 times faster when more elements are processed. (measured with Visual Studio 2013).
This function runs on a single thread and does not perform actions like creating threads.
See also
http://www.cse.uconn.edu/~zshi/course/cse5302/ref/chhugani08sorting.pdf

Definition at line 53 of file SimdAlgorithm.h.

◆ MergeSortUint32A16() [2/2]

nn::nlib::simd::MergeSortUint32A16 ( uint32_t *  data,
size_t  n 
)
noexcept

Uses SIMD to merge and sort 32-bit unsigned integer strings in the ascending order.

Parameters
[in]dataPointer to the element string to be sorted. It must be aligned to 16 bytes.
[in]nThe number of elements. It must be a multiple of 16.
Return values
0Success.
EINVALn is not a multiple of 16, data is NULL or not aligned to 16 bytes.
ENOMEMIndicates that internal memory allocation failed.
Description
Runs by internally allocating the buffer for merge and sort from the heap.
See also
MergeSortUint32A16(uint32_t* data)

◆ nlib_memchr_pred()

template<class PRED >
nn::nlib::simd::nlib_memchr_pred ( const void *  s,
PRED  pred,
size_t  n 
)
noexcept

A function template for examining the bytes in byte strings using SIMD instructions.

Template Parameters
PREDFunction type that takes a i128-type byte string and returns a i128-type mask.
Parameters
[in]sThe byte string to examine.
[in]predThe function or function object that performs the examination.
[in]nThe length of the byte string that is the target of examination.
Return values
NULLIndicates that s is NULL.
NULLAlternatively, indicates that the byte searched for by pred could not be found.
Inall other cases, returns the pointer to the byte that was searched for by pred.
Description
pred must return a mask that was created using a function like I128::CmpEq8.
Creates a mask to make the search-target byte 0xFF.
This function returns the pointer to the first masked byte.

Definition at line 63 of file SimdAlgorithm.h.

◆ nlib_memchr_pred_not()

template<class PRED >
nn::nlib::simd::nlib_memchr_pred_not ( const void *  s,
PRED  pred,
size_t  n 
)
noexcept

A function template for examining the bytes in byte strings using SIMD instructions.

Template Parameters
PREDFunction type that takes a i128-type byte string and returns a i128-type mask.
Parameters
[in]sThe byte string to examine.
[in]predThe function or function object that performs the examination.
[in]nThe length of the byte string that is the target of examination.
Return values
NULLIndicates that s is NULL.
NULLAlternatively, indicates that the byte searched for by pred could not be found.
Inall other cases, returns the pointer to the byte that was searched for by pred.
Description
pred must return a mask that was created using a function like I128::CmpEq8.
Creates a mask to make the search-target byte 0xFF.
This function returns the pointer to the first byte that is not masked.

Definition at line 134 of file SimdAlgorithm.h.