I'm afraid, I have to say that the Vulkan Tutorial is wrong. In its current state, it can not be guaranteed that there are no memory hazards when using only one single depth buffer. However, it would require only a very small change so that only one depth buffer would be sufficient.
Let's analyze the relevant steps of the code that are performed within drawFrame.
We have two different queues: presentQueue and graphicsQueue, and MAX_FRAMES_IN_FLIGHT concurrent frames. I refer to the "in flight index" with cf (which stands for currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT). I am using sem1 and sem2 to represent the different arrays of semaphores and fence for the array of fences.
The relevant steps in pseudocode are the following:
vkWaitForFences(..., fence[cf], ...);
vkAcquireNextImageKHR(..., /* signal when done: */ sem1[cf], ...);
vkResetFences(..., fence[cf]);
vkQueueSubmit(graphicsQueue, ...
/* wait for: */ sem1[cf], /* wait stage: *, COLOR_ATTACHMENT_OUTPUT ...
vkCmdBeginRenderPass(cb[cf], ...);
Subpass Dependency between EXTERNAL -> 0:
srcStages = COLOR_ATTACHMENT_OUTPUT,
srcAccess = 0,
dstStages = COLOR_ATTACHMENT_OUTPUT,
dstAccess = COLOR_ATTACHMENT_WRITE
...
vkCmdDrawIndexed(cb[cf], ...);
(Implicit!) Subpass Dependency between 0 -> EXTERNAL:
srcStages = ALL_COMMANDS,
srcAccess = COLOR_ATTACHMENT_WRITE|DEPTH_STENCIL_WRITE,
dstStages = BOTTOM_OF_PIPE,
dstAccess = 0
vkCmdEndRenderPass(cb[cf]);
/* signal when done: */ sem2[cf], ...
/* signal when done: */ fence[cf]
);
vkQueuePresent(presentQueue, ... /* wait for: */ sem2[cf], ...);
The draw calls are performed on one single queue: the graphicsQueue. We must check if commands on that graphicsQueue could theoretically overlap.
Let us consider the events that are happening on the graphicsQueue in chronological order for the first two frames:
img[0] -> sem1[0] signal -> t|...|ef|fs|lf|co|b -> sem2[0] signal, fence[0] signal
img[1] -> sem1[1] signal -> t|...|ef|fs|lf|co|b -> sem2[1] signal, fence[1] signal
where t|...|ef|fs|lf|co|b stands for the different pipeline stages, a draw call passes through:
t ... TOP_OF_PIPE
ef ... EARLY_FRAGMENT_TESTS
fs ... FRAGMENT_SHADER
lf ... LATE_FRAGMENT_TESTS
co ... COLOR_ATTACHMENT_OUTPUT
b ... BOTTOM_OF_PIPE
While there might be an implicit dependency between sem2[i] signal -> present and sem1[i+1], this only applies when the swap chain provides only one image (or if it would always provide the same image). In the general case, this can not be assumed. That means, there is nothing which would delay the immediate progression of the subsequent frame after the first frame is handed over to present. The fences also do not help because after fence[i] signal, the code waits on fence[i+1], i.e. that also does not prevent progression of subsequent frames in the general case.
What I mean by all of that: The second frame starts rendering concurrently to the first frame and there is nothing that prevents it from accessing the depth buffer concurrently as far as I can tell.
The Fix:
If we wanted to use only a single depth buffer, though, we can fix the tutorial's code: What we want to achieve is that the ef and lf stages wait for the previous draw call to complete before resuming. I.e. we want to create the following scenario:
img[0] -> sem1[0] signal -> t|...|ef|fs|lf|co|b -> sem2[0] signal, fence[0] signal
img[1] -> sem1[1] signal -> t|...|________|ef|fs|lf|co|b -> sem2[1] signal, fence[1] signal
where _ indicates a wait operation.
In order to achieve this, we would have to add a barrier that prevents subsequent frames performing the EARLY_FRAGMENT_TEST and LATE_FRAGMENT_TEST stages at the same time. There is only one queue where the draw calls are performed, so only the commands in the graphicsQueue require a barrier. The "barrier" can be established by using the subpass dependencies:
vkWaitForFences(..., fence[cf], ...);
vkAcquireNextImageKHR(..., /* signal when done: */ sem1[cf], ...);
vkResetFences(..., fence[cf]);
vkQueueSubmit(graphicsQueue, ...
/* wait for: */ sem1[cf], /* wait stage: *, EARLY_FRAGMENT_TEST...
vkCmdBeginRenderPass(cb[cf], ...);
Subpass Dependency between EXTERNAL -> 0:
srcStages = EARLY_FRAGMENT_TEST|LATE_FRAGMENT_TEST,
srcAccess = DEPTH_STENCIL_ATTACHMENT_WRITE,
dstStages = EARLY_FRAGMENT_TEST|LATE_FRAGMENT_TEST,
dstAccess = DEPTH_STENCIL_ATTACHMENT_WRITE|DEPTH_STENCIL_ATTACHMENT_READ
...
vkCmdDrawIndexed(cb[cf], ...);
(Implicit!) Subpass Dependency between 0 -> EXTERNAL:
srcStages = ALL_COMMANDS,
srcAccess = COLOR_ATTACHMENT_WRITE|DEPTH_STENCIL_WRITE,
dstStages = BOTTOM_OF_PIPE,
dstAccess = 0
vkCmdEndRenderPass(cb[cf]);
/* signal when done: */ sem2[cf], ...
/* signal when done: */ fence[cf]
);
vkQueuePresent(presentQueue, ... /* wait for: */ sem2[cf], ...);
This should establish a proper barrier on the graphicsQueue between the draw calls of the different frames. Because it is an EXTERNAL -> 0-type subpass dependency, we can be sure that renderpass-external commands are synchronized (i.e. sync with the previous frame).
Update: Also the wait stage for sem1[cf] has to be changed from COLOR_ATTACHMENT_OUTPUT to EARLY_FRAGMENT_TEST. This is because layout transitions happen at vkCmdBeginRenderPass time: after the first synchronization scope (srcStages and srcAccess) and before the second synchronization scope (dstStages and dstAccess). Therefore, the swapchain image must be available there already so that the layout transition happens at the right point in time.