i need to calculate the GPU run time code, and also the total running code (both host and device). in my code i have two gpu kernel running, and in between a host for loop to copy data, below example can show what my code looks like
cuda event start
//FIRST kernel code call <<...>>
// cuda memory copy result back from device to host
CudadeviceSyncronize()
// copy host data to host array (CPU funtion loop)
// cuda memory copy from host to device
// SECOND Kernel call <<...>>
cuda event stop
//memory copy back from device to host
what i know is that i use events to calculate the kernel, Events precisely measure the actual time taken on the GPU for a kernel. so my question & goal is :
1- is my way i put the event calling above shown : will be recording the kernel Only and neglecting the host functions ?
2- will the host loop call affect the cuda events timing?
3- my goal is to calculate the GPU only , and also GPU+CPU together, the above will it achieve it or should i use clock_gettime(CLOCK_REALTIME, timer) to calculate the host ?