The point you are missing here is, how scheduler looks at threads or tasks. Well, the Linux kernel scheduler will treat them as individual scheduling entity, therefore will be counted and scheduled differently.
Now let's see what CFS documentation says - it has a simplistic approach of giving out even slice of CPU time to each runnable process, therefore, if there are 4 runnable process/threads they'll get 25% of cpu time each. But on real hardware it's not possible and to fix the issue vruntime was introduced (take more on this from here
Now come back to your example, if process A creates 100 threads and B creates 1 thread then the # of running processes or threads becomes 103 (assuming all are runnable state) then CFS will evenly share the cpu using formula 1/103 (cpu/number of running tasks). And the context switching is same for all the scheduling entities, threads only shares task's internal mm_struct and when they run they have their own sets of registers, task status to load up to start with. Hope this will help to understand better.