SSE & AVX Vectorization
Marchete
53.6K views
First AVX Code: SQRT calculation
Now that we have reviewed all the requirements, the autovectorization, and AVX intrinsics, we can create our first manually vectorized program. In this exercise, you need to vectorize a sqrt calculation of float numbers. We will explicitly use the __m256 datatype to store our floats, reducing the overhead in data loading.
Vectorized SQRT
1
2
3
4
5
6
7
8
9
10
#pragma GCC optimize("O3","unroll-loops","omit-frame-pointer","inline") //Optimization flags
#pragma GCC option("arch=native","tune=native","no-zeroupper") //Enable AVX
#pragma GCC target("avx") //Enable AVX
#include <x86intrin.h> //SSE Extensions
#include <bits/stdc++.h> //All main STD libraries
const int N = 64000000; //Number of tests
const int V = N/8; //Vectorized size
//linear function,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
You will probably see a 600% performance improvement or more. That is, once you have the data loaded, AVX will perform up to 7 times faster than normal sqrtf. The theoretical limit is 800%, but it's rarely achieved. You can expect between a 300% and 600% average increase.
Create your playground on Tech.io
This playground was created on Tech.io, our hands-on, knowledge-sharing platform for developers.
Join the CodinGame community on Discord to chat about puzzle contributions, challenges, streams, blog articles - all that good stuff!
JOIN US ON DISCORD Online Participants