30
votes

The signature of the String method for percent-escaping is:

func addingPercentEncoding(withAllowedCharacters: CharacterSet)
    -> String?

(This was stringByAddingPercentEncodingWithAllowedCharacters in Swift 2.)

Why does this method return an optional?

The documentation says that the method returns nil “if the transformation is not possible,” but it's unclear under what circumstances the escaping transformation could fail:

  • Characters are escaped using UTF-8, which is a complete Unicode encoding. Any valid Unicode character can be encoded using UTF-8, and thus can be escaped.

  • I thought perhaps the method applied some kind of sanity check for bad interactions between the set of allowed chars and the chars used for escaping, but this is not the case: the method succeeds no matter whether the set of allowed chars contains "%", and also succeeds if the allowed char set is empty.

As it stands, the non-optional return value appear to be forcing a nonsensical error check.

2

2 Answers

44
votes

I filed a bug report with Apple about this, and heard back — with a very helpful response, no less!

Turns out (much to my surprise) that it’s possible to successfully create Swift strings that contain invalid Unicode in the form of unpaired UTF-16 surrogate chars. Such a string can cause UTF-8 encoding to fail. Here’s some code that illustrates this behavior:

// Succeeds (wat?!):
let str = String(
    bytes: [0xD8, 0x00] as [UInt8],
    encoding: .utf16BigEndian)!

// Returns nil:
str.addingPercentEncoding(withAllowedCharacters: .alphanumerics)
1
votes

Based on Paul Cantrell answer, small demonstration that it's also possible for the same method to also return null in Objective-C, despite String and NSString being different beasts when it comes to encodings:

uint8_t bytes[2] = { 0xD8, 0x00 };
NSString *string = [[NSString alloc] initWithBytes:bytes length:2 encoding:NSUTF16BigEndianStringEncoding];
// \ud800
NSLog(@"%@", string);

NSString *escapedString = [string stringByAddingPercentEncodingWithAllowedCharacters:NSCharacterSet.URLHostAllowedCharacterSet];
// (null)
NSLog(@"%@", escapedString);

For fun, https://r12a.github.io/app-conversion/ will percent escape the same as:

Error%20in%20convertUTF162Char%3A%20low%20surrogate%20expected%2C%20b%3D0%21%00