The network architectures (single output and multi-output) are specifically for binary, multi-class, and multi-label problems.
Let's consider the following options you have -
Binary classification - You are trying to predict the probability of getting a positive class. The positive and negative classes are the only 2 options in this case. The output in this case is a probability value between 0 and 1. The loss function used here is a binary_crossentropy
Multi-class classification - You are trying to predict the probability for multiple classes individually. You are trying to get a 0 to 1 probability prediction for each of the n classes (where n>=2). If each of the samples belongs to a single class then it's called multi-class single-label classification.
Multi-label classification - You have a situation where each sample can belong to multiple classes. Here you are working with a multi-class multi-label problem. This also gives you a 0 to 1 probability value for each of the n classes and the loss used in this case is the same as what you would use for binary classification.
So, at the end of the day, it's about how you are setting up your problem.