1
votes

I have a question about pessimistic versus optimistic locking. Everybody says that "Optimistic locking is used when you don't expect many collisions.", for example:

For a school project I need to find the 'break-even' point when pessimistic locking is more appropriate then optimistic locking.

Now, I would like to know/understand why such a break-even point exists? How it is possible that pessimistic locking is more costly (in speed or memory usage?) then optimistic locking?

I suspect it is because of the extra read-operation that pessimistic locking requires. But with optimistic locking this extra read-operation is also needed (only just before the save operation), right?

Hopefully someone can explain this :) Thank you!

1

1 Answers

2
votes

Pessimissm vs optimism in concurrency control is re transaction implementations interfering. (Notwithstanding any definitions expressed at your links or by specific products.)

The supposed pessimistic attitude is, someone will interfere so lock them out; the supposed optimistic attitude is, maybe no one will interfere so go ahead until completion then roll some process back if there was interference.

The costs are delays due to waiting by locked out processes vs delays due to re-computation by rolled back processes. We wish to optimize throughput given expected process properties and distribution.

(In your question you address only a given process rather than a collection and ignore a process having to wait or having to throw away work on rollback.)

EDIT

Think about what the words mean. Throughput involves work and time. A " 'break-even' point " presumes a dimension (interference) along which a quantity (throughput) differs between schemes (pessimistic/optimistic). You have to come up with a way to characterize and measure work and interference. You can see others' takes on reasonable interference to test from textbooks & their biblography references. Eg On Optimistic Methods for Concurrency Control.

Experimentally, calculate throughput for each scheme running your DBMS under varying amounts of interference.

The reality is that different interference workloads (> expected process properties and distribution) make the problem multidimensional. So you may want to calculate throughput as above for different interference scenarios.