If the geometry indeed changes every frame, you should do it on the GPU.
Keep in mind that every other solution that doesn't rely on the immediate mode will be faster than what you have right now. You might not even have to do it on the GPU.
But maybe you want to use shadow mapping instead, which is more efficient in some cases. It will also make it possible to render shadows for alpha tested objects like grass.
But it seems like you really need the resulting shadow geometry, so I'm not sure if that's an option for you.
Now back to the shadow volumes.
Extracting the shadow silhouette from a mesh using geometry shaders is a pretty complex process. But there's enough information about it on the internet.
Here's an article by Nvidia, which explains the process in detail:
Efficient and Robust Shadow Volumes Using Hierarchical Occlusion Culling and Geometry Shaders.
Here's another approach (from 2003) which doesn't even require geometry shaders, which could be interesting on low-end hardware:
http://de.slideshare.net/stefan_b/shadow-volumes-on-programmable-graphics-hardware
If you don't need the most efficient solution (using the shadow silhouette), you can also simply extract every triangle of the mesh on it's own. This is very easy using a geometry shader. I'd try that first before trying to implement silhouette extraction on the GPU.
About the "render to VBO" part of your question:
As far as I know there's no way to read the output of the geometry shader back to the CPU. Don't quote me on this, but I've never heard of a way to do this.