I did some testing on my system. Indeed you can do something like this:
using namespace cl;
Context context({ devices[0], devices[1] });
queue = CommandQueue(context); // queue to push commands for the device, however do not specify which device, just hand over the context
Program::Sources source;
string kernel_code = get_opencl_code();
source.push_back({ kernel_code.c_str(), kernel_code.length() });
Program program(context, source);
program.build("-cl-fast-relaxed-math -w");
I found that if the two devices are from different platforms (like one Nvidia GPU and one Intel GPU), either clCreateContext will throw read access violation error at runtime or program.build will fail at runtime. If however the two devices are from the same platform, the code will compile and run; but it won't run on both devices. I tested with an Intel i7-8700K CPU and its integraed Intel UHD 630 GPU, and no matter the order of the devices in the vector that context is created on, the code will always be executed on the CPU in this case. I checked with Windows Task-Manager and also the results from kernel execution time measurement (execution times are specific for each device).
You could also monitor device usage with some tool like Task-Manager to see which device is actually running. Let me know if it is any different on your system than what I observed.
Generally parallelization across multiple devices is not done by handing the context a vector of devices, but instead you give each device a dedicated context and queue and explicitely handle which kernels are executed on which queue. This gives you full control on memory transfers and execution order / synchronization points.