Fast 6x12 Connected Components using bit-optimized BFS
emh
1,115 views
Fast 6x12 Connected Components using bit-optimized BFS
This is pretty standard BFS, just bit-optimized and with a lookup table for neighbours. Performance varies a lot depending on how densely populated the bitboards are, and that in turn depends a lot on the random seed they are initialized with. But you can expect 2-5 millions of iterations per 100ms, suitable for Smash the Code. Performance seems to be about same with and without global target set to sse+avx, but I do have an avx2 target locally around popcount code which improves performance by about 50%. For historical reasons I had some code that didn't optimize well with AVX on older gcc than 12, that's why I chose to enable it only for critical sections by default. Global default is without avx target. Enough talk, here's to the code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#pragma GCC optimize("Ofast","unroll-loops","omit-frame-pointer","inline")
#pragma GCC option("arch=native","tune=native","no-zeroupper") //Enable AVX
// #pragma GCC target("avx2")
// #pragma GCC target("popcnt") //Enable popcount
// #pragma GCC target("sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx,avx,avx2") //Enable AVX
#include <x86intrin.h> //SSE Extensions
#include <bits/stdc++.h> //All main STD libraries
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <sstream>
#include <chrono>
#include <bitset>
#define USE_AVX 1
typedef __uint128_t BOARD_T;
using namespace std;
using time_interval_t = std::chrono::microseconds;
using myClock = std::chrono::high_resolution_clock;
const unsigned int rowCount = 12;
const unsigned int colCount = 8;
const unsigned int realColCount = colCount;
const unsigned int twoRowsCellCount = 2 * colCount;
const unsigned int colorCount = 6;
const unsigned int cellCount = rowCount * colCount;
const unsigned int backgroundCount = 1;
const unsigned int maxComponents = backgroundCount + cellCount / 2;
const unsigned int maxComponentsPerRow = 3;
const unsigned int lookupTableSizeFor1Row = 1 << (1 * realColCount);
const unsigned int lookupTableSizeFor2Rows = 1 << (2 * realColCount);
union converter8
{
uint8_t num;
struct
{
uint8_t num8[1];
} bytes;
};
inline converter8 initializeConverter(uint8_t num) {
converter8 converter;
converter.num = num;
return converter;
}
union converter16
Create your playground on Tech.io
This playground was created on Tech.io, our hands-on, knowledge-sharing platform for developers.