None of the proposed answers works for surrogate pairs used to encode characters outside of the Unicode Basic Multiligual Plane.
Here is an example using three different techniques to iterate over the "characters" of a string (incl. using Java 8 stream API). Please notice this example includes characters of the Unicode Supplementary Multilingual Plane (SMP). You need a proper font to display this example and the result correctly.
// String containing characters of the Unicode
// Supplementary Multilingual Plane (SMP)
// In that particular case, hieroglyphs.
String str = "The quick brown 𓃥 jumps over the lazy 𓊃𓍿𓅓𓃡";
Iterate of chars
The first solution is a simple loop over all char
of the string:
/* 1 */
System.out.println(
"\n\nUsing char iterator (do not work for surrogate pairs !)");
for (int pos = 0; pos < str.length(); ++pos) {
char c = str.charAt(pos);
System.out.printf("%s ", Character.toString(c));
// ^^^^^^^^^^^^^^^^^^^^^
// Convert to String as per OP request
}
Iterate of code points
The second solution uses an explicit loop too, but accessing individual
code points with codePointAt and incrementing the loop index accordingly to charCount:
/* 2 */
System.out.println(
"\n\nUsing Java 1.5 codePointAt(works as expected)");
for (int pos = 0; pos < str.length();) {
int cp = str.codePointAt(pos);
char chars[] = Character.toChars(cp);
// ^^^^^^^^^^^^^^^^^^^^^
// Convert to a `char[]`
// as code points outside the Unicode BMP
// will map to more than one Java `char`
System.out.printf("%s ", new String(chars));
// ^^^^^^^^^^^^^^^^^
// Convert to String as per OP request
pos += Character.charCount(cp);
// ^^^^^^^^^^^^^^^^^^^^^^^
// Increment pos by 1 of more depending
// the number of Java `char` required to
// encode that particular codepoint.
}
Iterate over code points using the Stream API
The third solution is basically the same as the second, but using the Java 8 Stream API:
/* 3 */
System.out.println(
"\n\nUsing Java 8 stream (works as expected)");
str.codePoints().forEach(
cp -> {
char chars[] = Character.toChars(cp);
// ^^^^^^^^^^^^^^^^^^^^^
// Convert to a `char[]`
// as code points outside the Unicode BMP
// will map to more than one Java `char`
System.out.printf("%s ", new String(chars));
// ^^^^^^^^^^^^^^^^^
// Convert to String as per OP request
});
Results
When you run that test program, you obtain:
Using char iterator (do not work for surrogate pairs !)
T h e q u i c k b r o w n ? ? j u m p s o v e r t h e l a z y ? ? ? ? ? ? ? ?
Using Java 1.5 codePointAt(works as expected)
T h e q u i c k b r o w n 𓃥 j u m p s o v e r t h e l a z y 𓊃 𓍿 𓅓 𓃡
Using Java 8 stream (works as expected)
T h e q u i c k b r o w n 𓃥 j u m p s o v e r t h e l a z y 𓊃 𓍿 𓅓 𓃡
As you can see (if you're able to display hieroglyphs properly), the first solution does not handle properly characters outside of the Unicode BMP. On the other hand, the other two solutions deal well with surrogate pairs.
char
value? And he knows how to dosubstring()
but just wants a "neater" way. FYI, I can say thatsubstring()
is the neatest way. – user845279Character.toString
fulfills all the necessary requirements and isn't messy at all. – Ricardo Altamiranosubstring()
directly. – user845279endIndex
(second parameter) ofString.substring(int, int)
is an exclusive index, and it won't throw an exception forindex + 1
as long asindex < length()
-- which is true even for the last character in the string. – William Price