C Dowgrading code from SSE4.1 to SSE2 RRS feed

  • Вопрос

  • Hello I have code which is written in SSE4.1 but I need to downgrade to SSE2 since my 10 years old cpu does not support SSE4.1.

    PS. Cpu is AMD Phenom II X4 965

    so lets begin:


    #include "./functions.h"
    #include <assert.h>
    #include <stdint.h>
    #include <string.h>
    //#ifdef __SSE4_1__
    #include <smmintrin.h>
    //#ifdef __SSE4_1__
    void write_memory_sse(void* array, size_t size) {
      __m128i* varray = (__m128i*) array;
      __m128i vals = _mm_set1_epi32(1);
      size_t i;
      for (i = 0; i < size / sizeof(__m128i); i++) {
        _mm_store_si128(&varray[i], vals);
        vals = _mm_add_epi16(vals, vals);
    void read_memory_sse(void* array, size_t size)
      __m128i* varray = (__m128i*) array;
      __m128i accum = _mm_set1_epi32(0xDEADBEEF);
      size_t i;
      for (i = 0; i < size / sizeof(__m128i); i++) {
        accum = _mm_add_epi16(varray[i], accum);
      // This is unlikely, and we want to make sure the reads are not optimized
      // away.
      assert(!_mm_testz_si128(accum, accum));

    Main reason why I want to downgrade from SSE4.1 to SSE2 is that my old cpu could run it, because it throws runtime exception: STATUS_ILLEGAL_INSTRUCTION, and I have no idea how to work with cpu instructions

    and I higly suspect that my old cpu does not support size_t  and __m128i types, original source is: source

    • Изменено speed258 14 июня 2019 г. 8:52
    11 июня 2019 г. 13:29

Все ответы

  • Well, I can guarantee that your CPU does support size_t, this is a typedef of unsigned int for 32 bit applications and unsigned long long for 64 bit applications. These would go into the regular CPU registers.

    __m128i is the integer view of the XMM registers that SSE instructions use, this was introduced along with SSE2, so any processor that supports SSE2 also supports __m128i.

    Interestingly, all but the _mm_testz_si128 intrinsic are SSE2, and interestingly the entire function is optimised away for release builds anyway due to the compiler noticing that the function does essentially nothing.

    This is a signature. Any samples given are not meant to have error checking or show best practices. They are meant to just illustrate a point. I may also give inefficient code or introduce some problems to discourage copy/paste coding. This is because the major point of my posts is to aid in the learning process.

    24 июня 2019 г. 3:07