Caches are small because the silicon used to build them is quite expensive and, expecially on CISC-type CPUs, there might not be enough space on the chip to hold them. Also making chips bigger has it cost and there's the possibility that it won't fit in its socket, which adds many more issues. It's not that simple ;)
EDIT:
Well, I haven't got any papers about this, but I'll explain my opinion anyway with a simple question: if a programs needs x bytes of memory, what would be the difference if the cache's size is 10 * x bytes or 100 * x? Once all the data is loaded in the cache (which doesn't depend on its size at all), the difference is all in the cache's access speed. And given locality of reference, it's not necessary having everything on cache.
Also, having big chaches requires having better algorithm for searching requested data in it. For example accessing data in a fully associative caches will become slower than accessing the main memory as the cache size increases (which implies there are more and more places to look for the data). Considering multitasking system, though, introduces other issues which I don't actually know much of.
To conclude, the performance gain caused by increasing caches' size becomes slighter as it approaches the usual amount of data used by the whole software running on a given machine.