CAS belongs to a class of objects called "consensus objects", each of which has a consensus number; the maximum number of threads for which a given consensus object can solve the consensus problem.
The consensus problem goes like this: for some number of threads n, propose some value p and decide on one of the proposed values d such that n threads agrees on d.
CAS is the most "powerful" consensus object because its consensus number is infinite. That is, CAS can be used to solve the consensus problem among a theoretically infinite number of threads. It even does it in a wait-free manner.
This can't be done with atomic registers, test-and-set, fetch-add, stacks since they all have finite consensus numbers. There are proofs for these consensus numbers but that is another story...
The significance of all of this is that it can be proven that there exists a wait-free implementation of an object for n threads using a consensus object with a consensus number of at least n. CAS is especially powerful because you can use it to implement wait-free objects for an arbitrary number of threads.
As to why other RMW operations are useful? Some problems in multiprocessing don't really involve solving the consensus problem for some arbitrary number of threads. For example, mutual exclusion can be solved using less powerful RMW operations like test-and-set (a simple TAS lock), fetch-add (ticket lock) or atomic swap (CLH lock).
More information on consensus for shared memory at Wikipedia Consensus (computer_science) section: In_shared-memory_systems
Also, there's a whole chapter on consensus and universal constructions in Herlihy and Shavit's The Art of Multiprocessor Programming (WorldCat) that I highly recommend.