Poor implementations of exception handlers push some kind of exception handler block for each try clause on the runtime stack as the try clause is entered, and pop it off as the try clause is exited. A location holding the address of the most recently pushed exception handler block is also maintained. Typically these exception handlers are chained together so they can be found by following links from the most recent to older versions. When an exception occurs, a pointer to the last-pushed EH handler block is found, and processing of that "try" clause's EH cases is checked. A hit on an EH case causes stack cleanup to occur back to the point of pushed EH, and control transfers to the EH case. No hits on the EH causes the next EH to be found, and the process repeats. The Windows 32-bit SEH scheme is a version of this.
This is a poor implementation because the program pays a runtime price for each try clause (push then pop) even when no exception occurs.
Good implementations simply record a table of ranges where try clauses occur. This means there's zero overhead to enter/exit a try clause. (My PARLANSE parallell programming langauge uses this technique). An exception looks up the PC of the exception point in the table, and passes control to the EH selected by the table. The EH code resets the stack as appropriate. Fast and pretty.
I think the Windows 64 bit EH is of this type, but I haven't looked carefully.
[EDIT April 2020: Just measured the cost of PARLANSE exceptions recently. 0nS (by design) if no exception; 25ns on an 3Ghz i7 from "throw" to "catch" to "acknowledge" (end empty catch). OP added a link measuring C++ exception handling at roughly 1000ns for the simplest kind, and a literally nonStandard handling scheme that clocks in at 57ns for exception or no exception; CPU clock rates for the C++ versions are a bit slower so these numbers are only for rough comparison.]