Update: Handling this in CSS is wonderfully simple and low overhead, but you have no control over where breaks occur when they do. That's fine if you don't care, or your data has long alphanumeric runs without any natural breaks. We had lots of long file paths, URLs, and phone numbers, all of which have places it's significantly better to break at than others.
Our solution was to first use a regex replacement to put a zero-width space (​) after every 15 (say) characters that aren't whitespace or one of the special characters where we'd prefer breaks. We then do another replacement to put a zero-width space after those special characters.
Zero-width spaces are nice, because they aren't ever visible on screen; shy hyphens were confusing when they showed, because the data has significant hyphens. Zero-width spaces also aren't included when you copy text out of the browser.
The special break characters we're currently using are period, forward slash, backslash, comma, underscore, @, |, and hyphen. You wouldn't think you'd need do anything to encourage breaking after hyphens, but Firefox (3.6 and 4 at least) doesn't break by itself at hyphens surrounded by numbers (like phone numbers).
We also wanted to control the number of characters between artificial breaks, based on the layout space available. That meant that the regex to match long non-breaking runs needed to be dynamic. This gets called a lot, and we didn't want to be creating the same identical regexes over and over for performance reasons, so we used a simple regex cache, keyed by the regex expression and its flags.
Here's the code; you'd probably namespace the functions in a utility package:
makeWrappable = function(str, position)
{
if (!str)
return '';
position = position || 15; // default to breaking after 15 chars
// matches every requested number of chars that's not whitespace or one of the special chars defined below
var longRunsRegex = cachedRegex('([^\\s\\.\/\\,_@\\|-]{' + position + '})(?=[^\\s\\.\/\\,_@\\|-])', 'g');
return str
.replace(longRunsRegex, '$1​') // put a zero-width space every requested number of chars that's not whitespace or a special char
.replace(makeWrappable.SPECIAL_CHARS_REGEX, '$1​'); // and one after special chars we want to allow breaking after
};
makeWrappable.SPECIAL_CHARS_REGEX = /([\.\/\\,_@\|-])/g; // period, forward slash, backslash, comma, underscore, @, |, hyphen
cachedRegex = function(reString, reFlags)
{
var key = reString + (reFlags ? ':::' + reFlags : '');
if (!cachedRegex.cache[key])
cachedRegex.cache[key] = new RegExp(reString, reFlags);
return cachedRegex.cache[key];
};
cachedRegex.cache = {};
Test like this:
makeWrappable('12345678901234567890 12345678901234567890 1234567890/1234567890')
Update 2: It appears that zero-width spaces in fact are included in copied text in at least some circumstances, you just can't see them. Obviously, encouraging people to copy text with hidden characters in it is an invitation to have data like that entered into other programs or systems, even your own, where it may cause problems. For instance, if it ends up in a database, searches against it may fail, and search strings like this are likely to fail too. Using arrow keys to move through data like this requires (rightly) an extra keypress to move across the character you can't see, somewhat bizarre for users if they notice.
In a closed system, you can filter that character out on input to protect yourself, but that doesn't help other programs and systems.
All told, this technique works well, but I'm not certain what the best choice of break-causing character would be.
Update 3: Having this character end up in data is no longer a theoretical possibility, it's an observed problem. Users submit data copied off the screen, it gets saved in the db, searches break, things sort weirdly etc..
We did two things:
- Wrote a utility to remove them from all columns of all tables in all datasources for this app.
- Added filtering to remove it to our standard string input processor, so it's gone by the time any code sees it.
This works well, as does the technique itself, but it's a cautionary tale.
Update 4: We're using this in a context where the data fed to this may be HTML escaped. Under the right circumstances, it can insert zero-width spaces in the middle of HTML entities, with funky results.
Fix was to add ampersand to the list of characters we don't break on, like this:
var longRunsRegex = cachedRegex('([^&\\s\\.\/\\,_@\\|-]{' + position + '})(?=[^&\\s\\.\/\\,_@\\|-])', 'g');