0xfffb
is the 16-bit signed two's complement representation of -5. So in this machine, the offset is scaled by the (presumably fixed) instruction length to get a byte address. (It could have byte sized instructions, but that is not possible since the offset itself is 16-bits.) The architecture is such that by the time the branch is executed the PC is already incremented so a 0 offset is a NOP, a -1 offset branches to the branch itself, -2 branches to the instruction before the branch, etc. Count backwards until you get to loop:
. (Either there is some more info attached to the question, or known context, giving the details of the architecture I have used in making the answer, or it is a fairly badly written question.)
For the cache question, you mostly just have to know the names used to describe cache architectures (or "cache geometry", "cache shape" etc.). "2-way set associative" means there are two places in the cache any given address can be placed. There are 128 sets, each of which can hold two blocks because it is 2-way associative, and each block is 32-bytes. (I usually call the 32-byte structure a "cache line", though it appears here the word "block" refers specifically to the data that is stored there and "line" also includes the valid bit and tag, etc.) You then break down the address starting from the least significant bit going outwards in the cache geometry.
It looks like this is an instruction cache so we're going to insist the bottom two bits are 0 and organize the cache in 32-bit items. The block is 32-bytes or 5-bits. 2 are the "byte offset" which probably should just be 0, and then 3 bits to complete the 5-bit part which gets called a "block offset" (really offset within the block). (This subdivison of the 5 low bits doesn't really change anything here.) 128 entries in the set gives a 7-bit "index." The rest of the address, 20 bits, has to be used to tag the block to make sure it holds the address being looked up. (I.e. to determine cache hit or miss.) Plus we need one more bit to say whether there's actually data in the block.
Then we just add it all up -- 32-bytes or 256 bits for the data, plus 20 bits of tag and 1 valid bit, multiply by 128 sets and 2 ways.