I think that the terminology "in-place MSD radix sort" is a bit misleading because, as you noted, it's not an in-place algorithm under a strict definition of "in-place." The "in-place" terminology here most likely refers to the fact that, unlike LSD radix sort, the algorithm doesn't require an auxiliary array to temporarily store the elements from the original input array.
You are correct that the space usage of MSD radix sort is proportional to the number of digits in the largest input number. For notational simplicity, let's have n be the length of the input array and U be the largest number in the array. The runtime of MSD radix sort is then O(n log U), since the number of digits in the number U is O(log U). O(log U) is a very, very slowly-growing function. For reference, the number of atoms in the universe is about 1080, which is about 2240. Therefore, if you're sorting numbers that were generated by any physical process, the recursion depth will be at most 240, which while large is definitely manageable.
If you're sorting really large numbers - say, numbers with thousands and thousands of bits - then you're right to be worried about blowing out the stack. However, I'd argue that it's extremely unlikely that this would happen if you have a good implementation of MSD radix sort. There's a standard optimization in quicksort - which looks a lot like MSD radix sort - where instead of making two branching recursive calls, you make one recursive call on the smaller of the two ranges to sort, then recycle the stack frame from the initial call to sort the larger range. (This is essentially a tail-call elimination). Now, suppose you apply this to MSD radix sort. Since every newly-created stack frame works on the smaller of the two ranges to sort, you can guarantee that each new stack frame has half as many elements in it than the previous stack frame. As a result, the maximum depth that the stack can reach is O(log n) - not O(log U). In order for this to blow out your stack, you'd need a truly astronomically large input array, regardless of the stack size.
In summary: you're right, the algorithm is not in place. However, due to the fact that the stack depth is O(log U) in a naive implementation and O(log n) in an optimized implementation, you shouldn't need to worry about this unless you had a naive implementation and truly colossally huge inputs.