My approach would be to study the data: how would I separate the points in the XOR problem? When I started studying about M.L. in general, and SVM in particular that is what I did, took toy problem, draw it by hand, and try to separate the classes.
When I looked at the XOR problem the firs time, it occurred to me that both purple points (below, left) have X and Y of the same sign, in one case negative in one positive, whereas both green points have X and Y of opposite signs. Therefore, the squared sum of X and Y would be 0 (or very small with a bit of noise in the initial problem) for the green points, and 2 (or nearly 2) for the purple ones. Hence, adding a third coordinate Z = np.sqrt(np.square(X + Y)) will nicely separate the two sets:

On a side note, Z is not too dissimilar formulation from doug's rbf if you consider that np.sqrt(np.square(X + Y)) is essentially the same as as np.abs(X + Y) in this case.
I do not have access to Crisitanini's paper but I'd approach that problem too in a similar way, starting with a toy version (by the way, checkerboard code thanks to none other than doug):

A possible intuition here is that the sum of row and column indices for the black squares would be always even, whereas for the white squares would be always odd, so adding as a third dimension something like (row_index + col_index) % 2 would do the trick in this simple version. In a larger, more complex checkerboard dataset, like this I found on the web:

things aren't so simple, but perhaps one could cascade clustering to find the the 16 clusters' mean X and Y locations (perhaps using medoids clustering), then apply a version of the "modulo kernel trick"?
With the disclaimer that I have not worked with a ton of classification problems, so far I have found that in making a toy version of a complex one, I've usually gained a 'numerical' intuition as to the kind of solution that might work.
Finally, as posted in a comment to doug's answer, I don't find anything wrong with an empirical approach like his, studying the performance of all possible kernels by passing them to grid search in nested crossvalidation with the same algorithm (SVC) and changing only the kernel. You could add to that approach by plotting the respective margins in the transformed feature spaces: for example, for rbf, using the equation suggested by Doug (and Sebastian Raschka's routine for plotting decision regions - cell 13 here).
UPDATE October 27/17
In a conversation on my slack channel, another geophysicist asked me about the case in which the XOR gate is designed as 0s and 1s as opposed to -1s and 1s (the latter being similar to a classic problem in exploration geophysics, hence my initial toy example).
If I were to tackle the XOR gate with 0s and 1s, and did not have at disposal the knowledge about the rbf kernel, in this case too I'd sit and loo at the problem in terms of the coordinates of those problems and see if I could come up with a transformation.

My first observation here was that the Os sit on the x=y line, the Xs on the x=-y line, so the difference x-y would be 0 (or small with a bit of noise) in on case, +/-1 in the other, respectively. The absolute value would take care of the sign, hence Z = np.abs(X-Y) would work. Which, by the way, is very similar to doug's rbf = np.exp(-gamma * np.abs(x - y)**2) (another reason to upvote his answer); and in fact, his rbf is a more general solution, working in all XOR cases.