BP and BTP are naturally closely related, but they're obviously not the same thing. I think your biggest confusion comes from the claim that since BTP predicts the target of a given branch, it can tell you the outcome (i.e. - what will be the next instruction executed). That's not the case.
A branch target is the address this branch may send you to, if it's taken. Whether or not the branch is taken is a completely different question and is addressed by the branch predictor. In fact the two units would usually work together on early stages of the pipeline - and produce (if needed) both the taken/not-taken and the address prediction. Then comes the complicated logic that says basically - If it's a branch, and it's predicted taken (or is unconditional), then jump to the target if you have it (whether known or predicted).
As you quoted yourself in the branch types list - the question of whether a branch needs to predict being taken or not (is it conditional), and whether a branch needs to predict the target (is it direct / fixed target as you call it) are both applicable and each could go both ways unrelated to the other, thereby providing you with the 4 choices you listed:
unconditional direct branches, in theory, do not require any prediction - the CPU front end would simply read the target and "take" the branch (feeding the pipeline code from the new address). However, modern CPUs would still require time to decode the branch and identify the target encoded there, so to avoid stalls at the branch predictor (which is normally at the head of the pipe), they will also have to predict that address. Confirming the prediction is simple though (immediately after decode), so the penalty for misprediction isn't very high. It could still be stalled due to code cache / tlb misses, but is otherwise the fastest (but one might say the weakest)
conditional direct branched know their target after decode (but again - must predict it ahead of that), but can't tell whether the branch is taken or not until the condition is executed and the resolution is made, which may be very far down the pipe. This in turn may depend on earlier instructions and could get stalled until the condition sources are known. So there are two predictions made - target and direction (unless the direction is fall-through in which case there's not need for a target), but the direction resolution is the more risky one. The branch predictor (actually, on modern CPUs there are usually several of them), would make an educated guess and continue fetching from there. Some studies have even been made, in the academy mostly, on trying to fetch and execute both paths (although you could immediately see that this may explode exponentially since you usually have a branch every few instruction, so it's usually reserved to hard-to-predict ones). Another popular option is "predicating" (mind the 'a' there..) the two paths, i.e. sending some bits down the pipeline to mark which path it is, for easy flushing the wrong path once the resolution is known. This is quite popular on dataflow machines due to the language structure, but that's an entirely new question.
unconditional indirect branches - these are nasty since they're both common (every ret
for e.g.), and harder to predict. While the branch resolution was simple in the previous case (and could always rely on some heuristics or pattern guessing), this one needs to provide an actual address, so you probably have to visit this specific branch with this specific target a few times to let the BTP learn the pattern there.
conditional indirect branches - well, bad luck for you, you need both predictions...
So, the decisions are orthogonal, but that doesn't mean the predictors have to be so. Keep in mind that you have a single "stream" of branch history, so it probably pays to have the predictor related somehow, sharing some tables or some logic. How exactly is a design decision and depends on the actual HW implementation, you're probably not going to get a lot of details on how Intel/AMD do that, but there are plenty of academic researches on that topic.
As for the second question - it's a bit broad, and again - you won't be able to get all the exact details on real CPUs, but you could get hints here and there - see for e.g. the diagram from this Haswell review (which may have appeared here before somewhere) :
This diagram doesn't tell you everything, it's obviously missing the inputs for the BP/BTP, or even the distinction between them (which in itself already tells you they're probably built together), but it does show you that this is apparently the first and foremost part of the pipeline. You need to predict the next instruction pointer before you can go ahead and feed it into the fetch/decode/... pipeline (or the alternative uop-cache one). This probably means that the CPU starts every cycle (well, yeah, everything is really done in parallel but it helps to think of a pipeline as a staged process), by thinking which instruction to perform next. Let's say he knows where we were the last time, so it's either a non-branch instruction (ahh, but what about varying length.. another complication this unit needs to solve), or a branch, in which case this unit should guess which of the above types this branch belongs to, and predict the next instruction accordingly.
Note that I wrote "guess" - if the diagram tells the truth, the decode stage is really far away, you don't even know it's a branch at this point. So to answer your question - this BP/BTP unit needs to communicate with the execution/WB units so it could know the outcome of conditional branches, with the decode unit so it could know what instruction currently being decided is a branch and what type it is, with the different pipelines of fetch to feed them the output. I'm guessing there are further relations with other units (for e.g. some designs may decide to send code prefetches based on target predictions, etc..).