If you have non-splittable files then you are better off using larger block sizes - as large as the files themselves (or larger, it makes no difference).
If the block size is smaller than the overall filesize then you run into the possibility that all the blocks are not all on the same data node and you lose data locality. This isn't a problem with splittable files as a map task will be created for each block.
As for an upper limit for block size, i know that for certain older version of Hadoop, the limit was 2GB (above which the block contents were unobtainable) - see https://issues.apache.org/jira/browse/HDFS-96
There is no downside for storing smaller files with larger block sizes - to emphasize this point consider a 1MB and 2 GB file, each with a block size of 2 GB:
- 1 MB - 1 block, single entry in the Name Node, 1 MB physically stored on each data node replica
- 2 GB - 1 block, single entry in the Name node, 2 GB physically stored on each data node replica
So other that the required physical storage, there is no downside to the Name node block table (both files have a single entry in the block table).
The only possible downside is the time it takes to replicate a smaller versus larger block, but on the flip side if a data node is lost from the cluster, then tasking 2000 x 1 MB blocks to replicate is slower than a single block 2 GB block.
Update - a worked example
Seeing as this is causing some confusion, heres some worked examples:
Say we have a system with a 300 MB HDFS block size, and to make things simpler we have a psuedo cluster with only one data node.
If you want to store a 1100 MB file, then HDFS will break up that file into at most 300 MB blocks and store on the data node in special block indexed files. If you were to go to the data node and look at where it stores the indexed block files on physical disk you may see something like this:
/local/path/to/datanode/storage/0/blk_000000000000001 300 MB
/local/path/to/datanode/storage/0/blk_000000000000002 300 MB
/local/path/to/datanode/storage/0/blk_000000000000003 300 MB
/local/path/to/datanode/storage/0/blk_000000000000004 200 MB
Note that the file isn't exactly divisible by 300 MB, so the final block of the file is sized as the modulo of the file by the block size.
Now if we repeat the same exercise with a file smaller than the block size, say 1 MB, and look at how it would be stored on the data node:
/local/path/to/datanode/storage/0/blk_000000000000005 1 MB
Again note that the actual file stored on the data node is 1 MB, NOT a 200 MB file with 299 MB of zero padding (which i think is where the confusion is coming from).
Now where the block size does play a factor in efficiency is in the Name Node. For the above two examples, the name node needs to maintain a map of the file names, to block names and data node locations (as well as the total file size and block size):
filename index datanode
-------------------------------------------
fileA.txt blk_01 datanode1
fileA.txt blk_02 datanode1
fileA.txt blk_03 datanode1
fileA.txt blk_04 datanode1
-------------------------------------------
fileB.txt blk_05 datanode1
You can see that if you were to use a block size of 1 MB for fileA.txt, you'd need 1100 entries in the above map rather than 4 (which would require more memory in the name node). Also pulling back all the blocks would be more expensive as you'd be making 1100 RPC calls to datanode1 rather than 4.