More Research Problems of Implementing Go
Dmitry Vyukov
Dmitry Vyukov
Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.
Design began in late 2007.
Became open source in November 2009.
Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
Work continues.
Started as an answer to software problems at Google:
Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...
4A simple but powerful and fun language.
For more background on design:
5Go is designed for building production systems at Google.
Plenty of research questions about how to implement Go well.
Go provides two important concepts:
A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.
A channel carries typed messages between goroutines.
8package main import "fmt" func main() { c := make(chan string) go func() { c <- "Hello" c <- "World" }() fmt.Println(<-c, <-c) }
Channels adopted from Hoare's Communicating Sequential Processes.
Go enables simple, safe concurrent programming.
It doesn't forbid bad programming.
Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.
Experience shows this is practical.
10Sequential network address resolution, given a work list:
// +build ignore,OMIT
package main
import (
"fmt"
"math/rand"
"time"
)
func lookup() {
for _, w := range worklist { w.addrs, w.err = LookupHost(w.host) }
}
func main() {
rand.Seed(time.Now().UnixNano())
t0 := time.Now()
lookup()
fmt.Printf("\n")
for _, w := range worklist {
if w.err != nil {
fmt.Printf("%s: error: %v\n", w.host, w.err)
continue
}
fmt.Printf("%s: %v\n", w.host, w.addrs)
}
fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
}
var worklist = []*Work{
{host: "fast.com"},
{host: "slow.com"},
{host: "fast.missing.com"},
{host: "slow.missing.com"},
}
type Work struct {
host string
addrs []string
err error
}
func LookupHost(name string) (addrs []string, err error) {
t0 := time.Now()
defer func() {
fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
}()
h := hosts[name]
if h == nil {
h = failure
}
return h(name)
}
type resolver func(string) ([]string, error)
var hosts = map[string]resolver{
"fast.com": delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
"slow.com": delay(2*time.Second, fixedAddrs("10.0.0.4")),
"fast.missing.com": delay(10*time.Millisecond, failure),
"slow.missing.com": delay(2*time.Second, failure),
}
func fixedAddrs(addrs ...string) resolver {
return func(string) ([]string, error) {
return addrs, nil
}
}
func delay(d time.Duration, f resolver) resolver {
return func(name string) ([]string, error) {
time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
return f(name)
}
}
func failure(name string) ([]string, error) {
return nil, fmt.Errorf("unknown host %v", name)
}
Concurrent network address resolution, given a work list:
// +build ignore,OMIT
package main
import (
"fmt"
"math/rand"
"time"
)
func lookup() {
done := make(chan bool) for _, w := range worklist { go func(w *Work) { w.addrs, w.err = LookupHost(w.host) done <- true }(w) } for range worklist { <-done }
}
func main() {
rand.Seed(time.Now().UnixNano())
t0 := time.Now()
lookup()
fmt.Printf("\n")
for _, w := range worklist {
if w.err != nil {
fmt.Printf("%s: error: %v\n", w.host, w.err)
continue
}
fmt.Printf("%s: %v\n", w.host, w.addrs)
}
fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
}
var worklist = []*Work{
{host: "fast.com"},
{host: "slow.com"},
{host: "fast.missing.com"},
{host: "slow.missing.com"},
}
type Work struct {
host string
addrs []string
err error
}
func LookupHost(name string) (addrs []string, err error) {
t0 := time.Now()
defer func() {
fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
}()
h := hosts[name]
if h == nil {
h = failure
}
return h(name)
}
type resolver func(string) ([]string, error)
var hosts = map[string]resolver{
"fast.com": delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
"slow.com": delay(2*time.Second, fixedAddrs("10.0.0.4")),
"fast.missing.com": delay(10*time.Millisecond, failure),
"slow.missing.com": delay(2*time.Second, failure),
}
func fixedAddrs(addrs ...string) resolver {
return func(string) ([]string, error) {
return addrs, nil
}
}
func delay(d time.Duration, f resolver) resolver {
return func(name string) ([]string, error) {
time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
return f(name)
}
}
func failure(name string) ([]string, error) {
return nil, fmt.Errorf("unknown host %v", name)
}
Select statements: switch for communication.
package main
import (
"fmt"
"time"
)
func main() {
c1 := make(chan int, 1)
c2 := make(chan int, 1)
c1 <- 42
select { case v := <-c1: fmt.Println("received from c1: ", v) case c2 <- 1: fmt.Println("sent to c2") case <-time.After(time.Second): fmt.Println("timed out") default: fmt.Println("nothing ready at the moment") }
}
That's select that makes efficient implementation difficult.
13Challenge: Make channel communication scale
Research question: lock-free channels?
14On the one hand we have arbitrary user programs:
No user hints!
16On the other hand we have complex hardware topology:
Challenge: make it all magically work efficiently
Current scheduler:
┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │ │ │ │ │ │ │ │ │ │ ├─┤ ├─┤ ├─┤ ├─┤ ├─┤ Global │ │ │G│ │ │ │ │ │ │ state ├─┤ ├─┤ ├─┤ ├─┤ ├─┤ │G│ │G│ │G│ │ │ │G│ ├─┤ ├─┤ ├─┤ ├─┤ ├─┤ │G│ │G│ │G│ │G│ │G│ └┬┘ └┬┘ └┬┘ └┬┘ └─┘ │ │ │ │ ↓ ↓ ↓ ↓ ┌─┬──────┐ ┌─┬──────┐ ┌─┬──────┐ ┌─┬──────┐ ┌────┐┌──────┐┌───────┐ │P│mcache│ │P│mcache│ │P│mcache│ │P│mcache│ │heap││timers││netpoll│ └┬┴──────┘ └┬┴──────┘ └┬┴──────┘ └┬┴──────┘ └────┘└──────┘└───────┘ │ │ │ │ ↓ ↓ ↓ ↓ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │M│ │M│ │M│ │M│ │M│ │M│ │M│ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘ └─┘
G - goroutine; P - logical processor; M - OS thread (machine)
19Want:
Garbage collection simplifies APIs.
Fundamental to concurrency: too hard to track ownership otherwise.
Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.
Of course, adds cost, latency, complexity in run time system.
22Plenty of research about garbage collection, mostly in Java context.
Java collectors usually:
But Go is very different!
type Point struct { X, Y int } type Rectangle struct { Min, Max Point }
p := &rect.Max
Current GC: stop the world, parallel mark, start the world, concurrent sweep.
Concurrent mark is almost ready.
Cannot reuse Java GC algorithms directly.
Research question: what GC algorithm is the best fit for Go?
Do we need generations? Do we need compaction? What are efficient data structures that support interior pointers?
Based on ThreadSanitizer runtime, originally mainly targeted C/C++.
Traditional happens-before race detector based on vector clocks (devil in details!).
Works fine for Go, except:
$ go run -race lots_of_goroutines.go race: limit on 8192 simultaneously alive goroutines is exceeded, dying
Research question: race dectector that efficiently supports hundreds of thousands of goroutines?
27Deadlock on mutexes due to lock order inversion:
// thread 1 // thread 2 pthread_mutex_lock(&m1); pthread_mutex_lock(&m2); pthread_mutex_lock(&m2); pthread_mutex_lock(&m1); ... ... pthread_mutex_unlock(&m2); pthread_mutex_unlock(&m1); pthread_mutex_unlock(&m1); pthread_mutex_unlock(&m2);
Lock order inversions are easy to detect:
Go has channels and mutexes. Channels are semaphores. A mutex can be unlocked in
a different goroutine, so it is essentially a binary semaphore too.
Deadlock example:
// Parallel file tree walk. func worker(pendingItems chan os.FileInfo) for f := range pendingItems { if f.IsDir() { filepath.Walk(f.Name(), func(path string, info os.FileInfo, err error) error { pendingItems <- info }) } else { visit(f) } } }
pendingItems channel has limited capacity. All workers can block on send to pendingItems.
29Another deadlock example:
var ( c = make(chan T, 100) mtx sync.RWMutex ) // goroutine 1 // goroutine 2 // goroutine 3 // does send // does receive // "resizes" the channel mtx.RLock() mtx.RLock() mtx.Lock() c <- v v := <-c tmp := make(chan T, 200) mtx.RUnlock() mtx.RUnlock() copyAll(c, tmp) c = tmp mtx.Unlock()
RWMutex is fair for both readers and writers: when a writer arrives, new readers are not let to enter the critical section.
Goroutine 1 blocks on chan send; then goroutine 3 blocks on mtx.Lock; then goroutine 2 blocks on mtx.RLock.
Research question: how to detect deadlocks on semaphores?
No known theory to date.
31So now we have a new language with several complex implementations:
How do you test it?
33Csmith is a tool that generates random C programs that statically and dynamically conform to the C99 standard.
Gosmith is a tool that generates random Go programs that statically and dynamically conform to the Go standard.
Turned out to be much simpler than C: no undefined behavior all around!
But generates uninteresting programs from execution point of view: most of them deadlock or crash on nil deref.
Trophies:
Research question: how to generate random interesting concurrent Go programs?
Must:
Must not:
Plenty of research questions about how to implement Go well.
Dmitry Vyukov