A short history of how branches are predicted:
When Great-Granny was programming
there was no prediction and no pre-fetch, soon she started pre-fetching the next instruction while executing the current instruction. Most of the times this was correct and improved the clock per instruction in most cases by one and otherwise nothing was lost. This already had a misprediction rate of only average 34% (59%-9%, H&P AQA p.81).
When Granny was programming
There was the problem that the CPU's were getting faster and added a Decoding stage to the pipeline, making it Fetch -> Decode -> Execute -> Write back. With 5 instructions between branches 2 fetches were lost every 5 instructions if the branch was backward or forward and was respectively taken and not taken. A quick research showed that most conditional backward branches were loops and most were taken and most forward was not taken, as they mostly were bad cases. With profiling we get down to 3%-24%
The advent of the dynamic branch predictor with the saturation counter
made life for the programmer easier. From the observation that most branches do what they did last time, having a list of counters address with the low bits of the address of a branch told if the branch was taken or not and the Branch Target Buffer provided the address to be fetched. On this local predictor it lowers the mis-prediction rate to 1%-18%.
This is all good and fine, but some branches are depended on how previous other branches acted. So if we have a history of the last branches take or not taken as 1 and 0 we have 2^H different predictors depending on the history. In practice the history bits are xor'ed with the branch lower address bits, using the same array as in the previous version.
The PRO of this is that the predictor can quickly learn patterns, the CON is if there is no pattern the branch will overwrite the previous branches bits. The PRO outweighs the CON as the locality is more important than branches that are not in the current (inner) loop. This global predictor improve the mis-prediction down to 1%-11%.
That is great, but in some cases the local predictor beats the global predictor so we want both. Xor-ing the local branch history with the address improves on the local branch prediction making it a 2 level predictor as well, just with local instead of global branch history. Adding a 3rd saturation counter for each branch that counts which was right we can select between them. This tournament predictor improves the misprediction rate with around 1% point compared with the global predictor.
Now your case is one in 100 branches in another direction.
Lets examine the local two level predictor, when we get to the one case the last H branches of this branches have all been in the same direction, lets say taken, making all history 1's so the branch predictor will have chosen a single entry in the local predictor table and it will be saturated to taken. This means it will in all cases case an mis-predict on the one case, and the next call where the branch will be taken will most likely be correctly predicted (barring aliasing to the branch table entry). So the local branch predictor can't be used as having a 100 bit long history would require a 2^100 large predictor.
Maybe the global predictor catch the case then, in the last 99 cases the branch was taken, so the predictors for the last 99 will have updated according to the different behaviour of the last H branches moving them to predict taken. So if the last H branches have independent behaviour from the current branch, then all the entries in the global branch prediction table will predict taken and so you will get a mis-predict.
But if a combination of previous branches, say the 3rd, 7th and 12th, all acted so that if the right combination of these were taken/not taken it would foreshadow the opposite behavior, the branch prediction entry of this combination would correctly predict the behaviour of the branch. The problem here is that if you only seldom, seen in the runtime over the program, updates this branch entry and other branches alias to it with their behaviour then it might fail to predict anyway.
Let assume the global branch behaviour actually predicts the right outcome based on the pattern of previous branches. Then you will most likely be mislead by the tournament predictor which says the local predictor is "always" right and the local predictor will always mis-predict for your case.
Note 1: The "always" should be taken with a small grain of sand, as other branches might pollute your branch table entries with aliasing to the same entry. The designers have tried to make this less likely with having 8K different entries, creatively rearranging the bits of the lower address of the branch.
Note 2: Other schemes might be able to solve this but unlikely as its 1 in 100.