Introduction to SYCL

Previous: Device Memory using Buffers and Accessors Next: Quiz

SYCL Kernel Programming - The single_task API

Declare the vector addition kernel


By specification, an unnamed type is an invalid template argument for classes used to name SYCL kernels. Therefore, you need to at least declare the kernel class before using it.

The following examples are valid SYCL kernel names:

// class
class MyKernelClass;
// struct
struct MyKernelStruct;
// enum class
enum class MyKernelEnum : int;


Forward declare the vector_addition class functor before the int main entry point of the application.

Location in the source code:

// <<Declare vector_addition kernel function object>>

class vector_addition;

Implement the vector addition kernel


Now we need to enqueue and the vector addition kernel for execution.

A kernel that is executed on one thread using NDRange(1,1,1) can be enqueued using the cl::sycl::single_task API:

single_task<typename kernel_lambda_name>([=](){});

This is equivalent to executing a kernel on a single compute unit with a single work-group of only one work-item. Thus, we can access the values of the accessor objects directly by using the 0th index as follows:

// e.g, assign accessor element in the kernel body
acc[0] = someValue;


Complete the vector addition kernel code where the accessors of a, b, c make the vectors available for calculating c = a + b.

Location in the source code:

// <<Complete the vector addition computation>>

c_acc[0] = a_acc[0] + b_acc[0];

Run it!

Hello World from SYCL
Create your playground on
This playground was created on, our hands-on, knowledge-sharing platform for developers.
Go to
#include <iostream>
// The SYCL header
#include <SYCL/sycl.hpp>
namespace sycl = cl::sycl;
// <<Declare vector_addition kernel function object>>
int main(int argc, char **) {
// <<Setup host memory>>
// define input vectors
sycl::float4 a = {1.0, 1.0, 1.0, 1.0}; // input 1
sycl::float4 b = {1.0, 1.0, 1.0, 1.0}; // input 2
// define output vector
sycl::float4 c = {0.0, 0.0, 0.0, 0.0}; // output
// <<Setup SYCL queue>>
sycl::queue myQueue(sycl::default_selector{});
// Begin SYCL scope
// <<Setup device memory>>
// define input buffers
sycl::buffer<sycl::float4, 1> buf_a(&a, sycl::range<1>{1});
sycl::buffer<sycl::float4, 1> buf_b(&b, sycl::range<1>{1});
// define output buffer
sycl::buffer<sycl::float4, 1> buf_c(&c, sycl::range<1>{1});
// Submit a command group functor for execution on a queue. This functor
// encapsulates the kernel and the data needed for its execution.
myQueue.submit([&](sycl::handler &cgh) {
// <<Request device memory access>>
// read accessors
auto a_acc = buf_a.get_access<sycl::access::mode::read>(cgh);
auto b_acc = buf_b.get_access<sycl::access::mode::read>(cgh);
// write accessor
auto c_acc = buf_c.get_access<sycl::access::mode::write>(cgh);
// Enqueue the kernel for execution using the `single_task` API
cgh.single_task<class vector_addition>([=]() {
// <<Complete the vector addition computation>>
// calculate: c = a+b;
codingame x discord
Join the CodinGame community on Discord to chat about puzzle contributions, challenges, streams, blog articles - all that good stuff!
Online Participants