So, it is perfectly valid to pretend Vulkan is OpenGL\immediate API:
for( int i = 0; i < N; ++i ){
cmdbuff.begin(); cmdUpdateUniform(u[i]); cmdbuff.end();
vkQueueSubmit( q, cmdbuff ); // lookitme ama glUniform*()
// some sychronization omitted
cmdbuff.begin(); vkCmdDraw(obj[i]); cmdbuff.end();
vkQueueSubmit( q, cmdbuff ); // lookitme ama glDraw*()
vkQueueWaitIdle( q ); // lookitme ama glFinish()
}
There's a problem with this though. OpenGL driver would try to optimize this using latency vs throughput tradeof. But in Vulkan we like to have some amount of control over latency, so Vulkan driver won't (shouldn't) optimize it that way.
So we can try to guess what the OpenGL driver would do:
cmdbuff.begin();
for( int i = 0; i < N; ++i ){
cmdUpdateUniform(u[i]); // probably vkCmdUpdateBuffer
// some sychronization omitted
vkCmdDraw(obj[i]);
}
cmdbuff.end();
vkQueueSubmit( q, cmdbuff );
As you can see the memory use is back (vkCmdUpdateBuffer stores all the uniforms in the command buffer), and OpenGL driver probably has to do the same if it hopes to be performant (in attempt to aggregate all draws to one GPU submit).
There is a small problem with this approach too. All the vkCmdDraw uses the same uniform\buffer memory, so previous vkCmdDraw needs to finish using that uniform before it is updated. There is potential benefit in allowing the driver to proceed, and in not having to synchronize the vkCmdDraw and the subsequent uniform update.
There comes in the info you read online. One way would be to have an array of uniforms and access the appropriate one using index.
Another would be to bind different descriptors or pDynamicOffsets via vkCmdBindDescriptorSets.
4x4 sp matrix is 64 B. Assuming you have let's say 1024 3D objects that is 64 kB. In this day and age that is insignificant as main\GP GPU memory is concerned and will be dwarfed by even a single texture or other resources you will need.
If you experience significantly higher memory use, the problem is likely elsewhere.