To answer your question piece by piece, let us first start with the definition of the receptive field in this context:
The receptive field of an individual sensory neuron is the particular region of the sensory space (e.g., the body surface, or the visual field) in which a stimulus will modify the firing of that neuron.
As taken from Wikipedia. This means we are looking for all pixels in your input that affect the current output. Logically, if you perform a convolution -say for example with a single 3x3 filter kernel - the receptive field of a single pixel is the corresponding 3x3 image region in the input area that gets convolved in that specific step.
Visually, in this graphic the underlying darker area marks the receptive field for specific pixels in the output:
Now, to answer your first question: Residual blocks of course still account for the receptive field! Let us denote the residual block as follows:
F(X)
: residual block
g_i(X)
: single convolutional block
Then we can denote the residual block as F(X) = g_3(g_2(g_1(X))) + X
, so in this case we would stack 3 convolutions (as an example). Of course, every single layers of this convolution still alters the receptive field, since it is the same as explained in the beginning. Simply adding X again will not change the receptive field, of course. But that addition alone does not make an residual block.
Similarly, skip connections do no affect the receptive field in the way that skipping layers will almost always result in a different (mostly smaller) receptive field. As explained in your linked answer though, it will make a difference if your skip connection has a larger receptive field, since the receptive field is the maximum (more specifically, union) of the different regions of your paths through your flow graph.
For the question about upsampling layers, you can guess the answer yourself by asking the following question:
Does the area of the input image get affected by upsampling anywhere within the image?
The answer should be "obviously not". Essentially, you are still looking at the same area in the input area, although now you have a higher resolution, and similar pixels might in fact look at the same area. To get back to the GIF above: If you had 4x the number of pixels in the green area, every pixel still would have to look at a particular input region in the blue area that does not change in size. So no, upscaling does not affect this.
For the last question: This is very related to the first question. In fact, the receptive field looks at all the pixels that affect the output, so depending on which feature maps you are concatenating, it might change it.
Again, the resulting receptive field is the union of the receptive fields of the feature maps you are concatenating. If they are contained in one another (either A subset of B
or B subset of A
, where A
and B
are the feature maps to be concatenated), then the receptive field does not change. Otherwise, the receptive field would be A union B
.