101

Mutable strings in Golang – Koki – Medium

 6 years ago
source link: https://medium.com/kokster/mutable-strings-in-golang-298d422d01bc
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Mutable strings in Golang

Golang strings are immutable. In general, immutable data is simpler to reason about, but it also means your program must allocate more memory to “change” that data. Sometimes, your program can’t afford that luxury. For example, there might not be any more memory to allocate. Another reason: you don’t want to create more work for the garbage collector.

In C, a string is a null-terminated sequence of chars — char*. Each char is a single byte, and the string keeps going until there’s a '\0' character. If you pointed at an arbitrary memory location and called it a C string, you’d see every byte in order until you hit a zero.

In Go, string is its own data type. At its core, it’s still a sequence of bytes, but:

  • It’s a fixed length. It doesn’t just continue on until a zero appears.
  • It comes with extra information: its length.
  • “Characters” or runes may span multiple bytes.
  • It’s immutable.

So string in Go carries some additional structure compared to char* in C. How does it do this? It’s actually a struct:

type StringHeader struct {
Data unsafe.Pointer
Len int
}

Data here is analogous to the C string, and Len is the length. The Golang struct memory layout starts with the last field, so if you were to look at a string under the microscope, you’d see Len first and then a pointer to the string's contents. (You can find documentation of these header structs in the reflect package.)

Before we start inspecting strings by looking at their StringHeader fields, how do we cast a string to a StringHeader in the first place? When you really need to convert from one Go type to another, use the unsafe package:

import (
"unsafe"
)s := "hello"
header := (*StringHeader)(unsafe.Pointer(&s))

unsafe.Pointer is an untyped pointer. It can point to any kind of value. It’s a way to tell the compiler, “Step aside. I know what I’m doing.” In this case, what we’re doing is converting a *string into an unsafe.Pointer into a *StringHeader.

1*ZY4NccZ5EtW8ctXqVwBIlA.png

break glass, access underlying representation

Now we have access to the underlying representation of the string. Ever wondered how len("hello") works? We can implement it ourselves:

func strLen(s string) int {
header := (*StringHeader)(unsafe.Pointer(&s)
return header.Len
}

Getting the length of a string is nice, but what about setting it? Here’s what happens if we artificially extend the length of a string:

s := "hello"
header := (*StringHeader)(unsafe.Pointer(&s))
header.Len = 100// cast the header back to 'string' and print it
fmt.Print(*(*string)(unsafe.Pointer(header)))/* on stdout:
helloint16int32int64panicslicestartuint8write (MB)
Value addr= code= ctxt: curg= list= m->p= p->m=
*/

By changing the Len field of the string header, we can expand the string to include other parts of memory. It’s interesting to observe this behavior, but it’s not something you’d actually want to use.

Data :: unsafe.Pointer

You may have noticed that StringHeader has an unsafe.Pointer field which points to the string’s sequence of bytes. []byte also has a sequence of bytes. In fact, we can build a []byte from this pointer. Here’s what a slice actually looks like:

type SliceHeader struct {
Data unsafe.Pointer
Len int
Cap int
}

It’s a lot like StringHeader, except it also has a Cap (capacity) field. What happens if we build a SliceHeader from the fields of a StringHeader?

func strToBytes(s string) []byte {
header := (*StringHeader)(unsafe.Pointer(&s))
bytesHeader := &SliceHeader{
Data: header.Data,
Len: header.Len,
Cap: header.Len,
}
return *(*[]byte)(unsafe.Pointer(bytesHeader))
}fmt.Print(strToBytes("hello")) // [104 101 108 108 111]

We’ve converted a string into a []byte. It’s just as easy to go the other direction:

func bytesToStr(b []byte) string {
header := (*SliceHeader)(unsafe.Pointer(&b))
strHeader := &StringHeader{
Data: header.Data,
Len: header.Len,
}
return *(*string)(unsafe.Pointer(strHeader))
}fmt.Print(bytesToStr([]byte{104, 101, 108, 108, 111}) // "hello"

Both string and []byte headers are using the same Data pointer, so they share memory. If you ever need to convert between string and []byte but there isn’t enough memory to perform a copy, this might be useful.

A word of caution, however: string is meant to be immutable, but []byte is not. If you cast a string to []byte and try to modify the byte array, it’s a segmentation fault.

s := "hello"
b := strToBytes(s)b[0] = 100// panic: runtime error: invalid memory address or nil pointer dereference
// [signal SIGSEGV: segmentation violation code=0xffffffff addr=0x0 pc=0xd56a2]

Casting in the other direction doesn’t cause a segmentation fault, but then your supposedly immutable string can change:

b := []byte{104, 101, 108, 108, 111}
s := bytesToStr(b)fmt.Print(s) // "hello"
b[0] = 100
fmt.Print(s) // "dello"

Try it out

That’s a little introduction to what you can do with unsafe.Pointer and some knowledge of the underlying representation of Go types. If you’d like to play around with the code from this post (and a substr implementation), have a look at the online Go Playground here: play.golang.org/p/PAjwbct_ohF


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK