Answered by:
DXMath library bug: XMVectorHermiteV implementation wrong

I assume that XMHermiteV is supposed to do the same as XMVectorHermite while allowing for independent interpolation factors per dimension/component. That is, a call is supposed to produce a result vector vr from points and tangents v0, vt0, v1, vt1 as:
vr[i] = (+2vs[i]^3 3vs[i]^2 + 1 ) * v0[i] +
( vs[i]^3 2vs[i]^2 + vs[i] ) * vt0[i] +
(2vs[i]^3 +3vs[i]^2 ) * v1[i] +
( vs[i]^3  vs[i]^2 ) * vt1[i]
Under this assumption the implementation is wrong. In fact, the implementation does not deliver a Hermite spline at all. I cannot fathom any use case where the current implementation would make sense. But in case, I am the one that is not making sense I would be glad to hear what the current implementation is supposed to do ;))
I could imagine that the current implementation was tested for correctness by comparing against the result of XMVectorHermite. That is, using something like:
vrs = XMVectorHermite(v0, vt0, v1, vt1, s);
vs = XMVectorSet(s, s, s, s);
vrv = XMVectorHermiteV(v0, vt0, v1, vt1, vs);
ASSERT_EQUAL(vrs, vrv);
But, this is just one special case where the implementation produces the correct result.Saturday, February 22, 2014 1:34 PM
Question
Answers

DirectXMath includes a 'NO_INTRINSICS' implementation for each function. For XMVectorHermiteV it is:
XMVECTOR T2 = XMVectorMultiply(T, T); XMVECTOR T3 = XMVectorMultiply(T , T2); XMVECTOR P0 = XMVectorReplicate(2.0f * T3.vector4_f32[0]  3.0f * T2.vector4_f32[0] + 1.0f); XMVECTOR T0 = XMVectorReplicate(T3.vector4_f32[1]  2.0f * T2.vector4_f32[1] + T.vector4_f32[1]); XMVECTOR P1 = XMVectorReplicate(2.0f * T3.vector4_f32[2] + 3.0f * T2.vector4_f32[2]); XMVECTOR T1 = XMVectorReplicate(T3.vector4_f32[3]  T2.vector4_f32[3]); XMVECTOR Result = XMVectorMultiply(P0, Position0); Result = XMVectorMultiplyAdd(T0, Tangent0, Result); Result = XMVectorMultiplyAdd(P1, Position1, Result); Result = XMVectorMultiplyAdd(T1, Tangent1, Result); return Result;
In your formulation above, that would be:
vr[i] = (+2vs[0]^3  3vs[0]^2 + 1) * v0[i]
+ (vs[1]^3  2vs[1]^2 + vs[1]) * vt0[i]
+ (2vs[2]^3 + 3vs[2]^2) * v1[i]
+ (vs[3]^3  vs[3]^2) * vt1[i]In other words, the weight vector is PER TERM not PER COMPONENT.
The documentation on MSDN actually indicates the use cases such as calculating two sets at the same time.
 Proposed as answer by Chuck Walbourn  MSFTMicrosoft employee Sunday, February 23, 2014 7:07 AM
 Marked as answer by Dirk Steenpass Sunday, February 23, 2014 4:28 PM
Sunday, February 23, 2014 7:07 AM
All replies

DirectXMath includes a 'NO_INTRINSICS' implementation for each function. For XMVectorHermiteV it is:
XMVECTOR T2 = XMVectorMultiply(T, T); XMVECTOR T3 = XMVectorMultiply(T , T2); XMVECTOR P0 = XMVectorReplicate(2.0f * T3.vector4_f32[0]  3.0f * T2.vector4_f32[0] + 1.0f); XMVECTOR T0 = XMVectorReplicate(T3.vector4_f32[1]  2.0f * T2.vector4_f32[1] + T.vector4_f32[1]); XMVECTOR P1 = XMVectorReplicate(2.0f * T3.vector4_f32[2] + 3.0f * T2.vector4_f32[2]); XMVECTOR T1 = XMVectorReplicate(T3.vector4_f32[3]  T2.vector4_f32[3]); XMVECTOR Result = XMVectorMultiply(P0, Position0); Result = XMVectorMultiplyAdd(T0, Tangent0, Result); Result = XMVectorMultiplyAdd(P1, Position1, Result); Result = XMVectorMultiplyAdd(T1, Tangent1, Result); return Result;
In your formulation above, that would be:
vr[i] = (+2vs[0]^3  3vs[0]^2 + 1) * v0[i]
+ (vs[1]^3  2vs[1]^2 + vs[1]) * vt0[i]
+ (2vs[2]^3 + 3vs[2]^2) * v1[i]
+ (vs[3]^3  vs[3]^2) * vt1[i]In other words, the weight vector is PER TERM not PER COMPONENT.
The documentation on MSDN actually indicates the use cases such as calculating two sets at the same time.
 Proposed as answer by Chuck Walbourn  MSFTMicrosoft employee Sunday, February 23, 2014 7:07 AM
 Marked as answer by Dirk Steenpass Sunday, February 23, 2014 4:28 PM
Sunday, February 23, 2014 7:07 AM 
Thanks for the help. Point understood.
 Edited by Dirk Steenpass Friday, February 28, 2014 5:01 PM
Sunday, February 23, 2014 4:28 PM