There are several aspects to size and this maybe confusing.
My question is if we add modules etc needed to load actual filesystem in initrd not in actual kernel image to save save then what we will achieve in case of Bootpimage where both kernel and initrd are combined to form a single bootpimage. This size of kernel would increase even by using initrd.
You are correct in that the same amount of data will be transfered no matter which mechanism you chose. In fact, the initrd with module loading will be bigger than a fully statically linked kernel and the boot time will be slower. Sounds bad.
A customized kernel which is specifically built for the device and contains no extra hardware driver nor module support is always the best. The Debian handbook on kernel compilation give two reason that a use may want to make a custom kernel.
- Limit the risk of security problems via feature minimization.
- to optimize memory consumption
The second option is often the most critical parameter. To minimize the amount of memory that a running kernel consumes. The initrd (or initramfs) is a binary disk image that is loaded as a ram disk. It is all user code with the single task of probing the devices and using module loading to get the correct drivers for the system. After this job is done, it mounts a real boot device or the normal root file system. When this happens, the initrd image is discarded.
The initrd does not consume run-time memory. You get both a generic image and one that has a fairly minimal run time footprint.
I will say that the efforts made by distro people have on occasion created performance issues. Typically ARM drivers were only compiled for one SOC; although the source supported an SOC family, but only one could be selected through conditions. In more recent kernels the ARM drivers always support the whole SOC family. The memory overhead is minimal. However, using a function pointer for a low-level driver transfer function can limit the bandwidth of the controller.
The cacheflush routine have an option for multi-cache. The function pointers cause compilers to automatically spill. However, if you compile for a specific cache type, the compiler can inline functions. This often generates much better and smaller code. Most drivers do not have this type of infra-structure. But you will have better run-time behavior if you compile a monolithic kernel that is tuned for your CPU. Several critical kernel functions will use inlined functions.
Drivers will not usually be faster when compiled in to the kernel. Many systems support hot-plug via USB, PCMCIA, SDIO, etc. These systems have a memory advantage with module loading as well.