These are essentially two questions: how to avoid sending data when only part of a transformation sequence changes, and how to efficiently supply per-model data which may or may not have changed since the last frame.
Transformation Sequence
For the first, you have a transformation sequence. Your positions are in model space. You then conceptually transform them into world space, then to camera/view space, then finally to clip space, where you write the position to gl_Position
.
Most of these transformations are constant throughout a frame, but may change on a frame-to-frame basis. So you want to avoid changing data that doesn't strictly need to be changed.
If you want to do this, then clearly you cannot provide an "MVP" matrix. That is, you should not have a single matrix that contains the whole transformation. You should instead have a matrix that represents particular parts of the transformation.
However, you will need to do this decomposition for reasons other than performance. You cannot do many lighting operations in clip-space; as a non-linear space, it messes up lots of lighting operations. Therefore, if you're going to do lighting at all, you need a transformation that stops before clip space.
Camera/view space is the most common stopping point for lighting computations.
Now, if you use model-to-camera and camera-to-clip, then the model-to-camera matrix for every model will change when the camera changes, even if the model itself has not moved. And therefore, you may need to upload a bunch of matrices that don't strictly need to be changed.
To avoid that, you would need to use model-to-world and world-to-clip (in this case, you do your lighting in world space). The issue here is that you are exposed to the perils of world space Numerical precision may become problematic.
But is there a genuine performance issue here? Obviously it somewhat depends on the hardware. However, consider that many applications have hundreds if not thousands of objects, each with matrices that change every frame. An animated character usually has over a hundred matrices just for themselves that change every frame.
So it seems unlikely that the performance cost of uploading a few matrices that could have been constant is a real-world problem.
Per-Object Storage
What you really want is to separate your storage of per-object data from the program object itself. This can be done with UBOs or SSBOs; in both cases, you're storing uniform data in buffer objects.
The former are typically smaller in size (64KB or so), while the latter are essentially unbounded in their storage (16MB minimum). Obviously, the former are typically faster to access, but SSBOs shouldn't be considered to be slow.
Each object would have a section of the buffer that gets used for per-object data. And thus, you could choose to change it or not as you see fit.
Even so, such a system does not guarantee faster performance. For example, if the implementation is still reading from that buffer from last frame when you try to change it this frame, the implementation will have to either allocate new memory or just wait until the GPU is finished. This is not a hypothetical possibility; GPU rendering for complex scenes frequently lags a frame behind the CPU.
So to avoid that, you would need to double-buffer your per-object data. But when you do that, you will have to always upload their data, even if it doesn't change. Why? Because it might have changed two frames ago, and your double buffer has old data in it.
Basically, your goal of trying to avoid uploading of sometimes-static per-model data is just as likely to harm performance as to help it.