Back
Close

SSE & AVX Vectorization

Marchete
40.2K views

First AVX Code: SQRT calculation

Now that we have reviewed all the requirements, the autovectorization, and AVX intrinsics, we can create our first manually vectorized program. In this exercise, you need to vectorize a sqrt calculation of float numbers. We will explicitly use the __m256 datatype to store our floats, reducing the overhead in data loading.

Vectorized SQRT
#pragma GCC optimize("O3","unroll-loops","omit-frame-pointer","inline") //Optimization flags
#pragma GCC option("arch=native","tune=native","no-zeroupper") //Enable AVX
#pragma GCC target("avx") //Enable AVX
#include <x86intrin.h> //SSE Extensions
#include <bits/stdc++.h> //All main STD libraries
const int N = 64000000; //Number of tests
const int V = N/8; //Vectorized size
//linear function,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You will probably see a 600% performance improvement or more. That is, once you have the data loaded, AVX will perform up to 7 times faster than normal sqrtf. The theoretical limit is 800%, but it's rarely achieved. You can expect between a 300% and 600% average increase.

Create your playground on Tech.io
This playground was created on Tech.io, our hands-on, knowledge-sharing platform for developers.
Go to tech.io
codingame x discord
Join the CodinGame community on Discord to chat about puzzle contributions, challenges, streams, blog articles - all that good stuff!
JOIN US ON DISCORD
Online Participants