(Got a feeling that above answers still didn't state the differences & relationships between string
and []rune
very clearly, so I would try to add another answer with example.)
As @Strangework
's answer said, string
and []rune
are quiet different.
Differences - string
& []rune
:
string value
is a read-only byte slice. And, a string literal is encoded in utf-8. Each char in string
actually takes 1 ~ 3 bytes, while each rune
takes 4 bytes
- For
string
, both len()
and index are based on bytes.
- For
[]rune
, both len()
and index are based on rune (or int32).
Relationships - string
& []rune
:
- When you convert from
string
to []rune
, each utf-8 char in that string becomes a rune
.
- Similarly, in the reverse conversion, when convert from
[]rune
to string
, each rune
becomes a utf-8 char in the string
.
Tips:
- You can convert between
string
and []rune
, but still they are different, in both type & overall size.
(I would add an example to show that more clearly.)
Code
string_rune_compare.go:
// string & rune compare,
package main
import "fmt"
// string & rune compare,
func stringAndRuneCompare() {
// string,
s := "hello你好"
fmt.Printf("%s, type: %T, len: %d\n", s, s, len(s))
fmt.Printf("s[%d]: %v, type: %T\n", 0, s[0], s[0])
li := len(s) - 1 // last index,
fmt.Printf("s[%d]: %v, type: %T\n\n", li, s[li], s[li])
// []rune
rs := []rune(s)
fmt.Printf("%v, type: %T, len: %d\n", rs, rs, len(rs))
}
func main() {
stringAndRuneCompare()
}
Execute:
go run string_rune_compare.go
Output:
hello你好, type: string, len: 11
s[0]: 104, type: uint8
s[10]: 189, type: uint8
[104 101 108 108 111 20320 22909], type: []int32, len: 7
Explanation:
The string hello你好
has length 11, because first 5 chars each take 1 byte only, while the last 2 Chinese chars each takes 3 bytes.
- Thus,
total bytes = 5 * 1 + 2 * 3 = 11
- Since
len()
on string is based on bytes, thus the first line printed len: 11
- Since index on string is also based on bytes, thus the following 2 lines print values of type
uint8
(since byte
is an alias type of uint8
, in go).
When convert the string
to []rune
, it found 7 utf8 chars, thus 7 runes.
- Since
len()
on []rune
is based on rune, thus the last line printed len: 7
.
- If you operate
[]rune
via index, it will access base on rune.
Since each rune is from a utf8 char in the original string, thus you can also say both len()
and index operation on []rune
are based on utf8 chars.
[]rune
can be set to a boolean, numeric, or string type. See stackoverflow.com/a/62739051/12817546. – Tom L