More Research Problems of Implementing Go

Dmitry Vyukov

Google

About Go

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Design began in late 2007.

Became open source in November 2009.

Developed entirely in the open; very active community.
Language stable as of Go 1, early 2012.
Work continues.

2

Motivation for Go

3

Motivation for Go

Started as an answer to software problems at Google:

Deployed: parts of YouTube, dl.google.com, Blogger, Google Code, Google Fiber, ...

4

Go

A simple but powerful and fun language.

For more background on design:

5

Research and Go

Go is designed for building production systems at Google.

Plenty of research questions about how to implement Go well.

6

Concurrency

7

Concurrency

Go provides two important concepts:

A goroutine is a thread of control within the program, with its own local variables and stack. Cheap, easy to create.

A channel carries typed messages between goroutines.

8

Concurrency

package main

import "fmt"

func main() {
    c := make(chan string)
    go func() {
        c <- "Hello"
        c <- "World"
    }()
    fmt.Println(<-c, <-c)
}
9

Concurrency: CSP

Channels adopted from Hoare's Communicating Sequential Processes.

Go enables simple, safe concurrent programming.
It doesn't forbid bad programming.

Caveat: not purely memory safe; sharing is legal.
Passing a pointer over a channel is idiomatic.

Experience shows this is practical.

10

Concurrency

Sequential network address resolution, given a work list:

// +build ignore,OMIT

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func lookup() {
    for _, w := range worklist {
        w.addrs, w.err = LookupHost(w.host)
    }
}

func main() {
	rand.Seed(time.Now().UnixNano())

	t0 := time.Now()
	lookup()

	fmt.Printf("\n")
	for _, w := range worklist {
		if w.err != nil {
			fmt.Printf("%s: error: %v\n", w.host, w.err)
			continue
		}
		fmt.Printf("%s: %v\n", w.host, w.addrs)
	}
	fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
}

var worklist = []*Work{
	{host: "fast.com"},
	{host: "slow.com"},
	{host: "fast.missing.com"},
	{host: "slow.missing.com"},
}

type Work struct {
	host  string
	addrs []string
	err   error
}

func LookupHost(name string) (addrs []string, err error) {
	t0 := time.Now()
	defer func() {
		fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
	}()
	h := hosts[name]
	if h == nil {
		h = failure
	}
	return h(name)
}

type resolver func(string) ([]string, error)

var hosts = map[string]resolver{
	"fast.com":         delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
	"slow.com":         delay(2*time.Second, fixedAddrs("10.0.0.4")),
	"fast.missing.com": delay(10*time.Millisecond, failure),
	"slow.missing.com": delay(2*time.Second, failure),
}

func fixedAddrs(addrs ...string) resolver {
	return func(string) ([]string, error) {
		return addrs, nil
	}
}

func delay(d time.Duration, f resolver) resolver {
	return func(name string) ([]string, error) {
		time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
		return f(name)
	}
}

func failure(name string) ([]string, error) {
	return nil, fmt.Errorf("unknown host %v", name)
}
11

Concurrency

Concurrent network address resolution, given a work list:

// +build ignore,OMIT

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func lookup() {
    done := make(chan bool)

    for _, w := range worklist {
        go func(w *Work) {
            w.addrs, w.err = LookupHost(w.host)
            done <- true
        }(w)
    }

    for range worklist {
        <-done
    }
}

func main() {
	rand.Seed(time.Now().UnixNano())

	t0 := time.Now()
	lookup()

	fmt.Printf("\n")
	for _, w := range worklist {
		if w.err != nil {
			fmt.Printf("%s: error: %v\n", w.host, w.err)
			continue
		}
		fmt.Printf("%s: %v\n", w.host, w.addrs)
	}
	fmt.Printf("total lookup time: %.3f seconds\n", time.Since(t0).Seconds())
}

var worklist = []*Work{
	{host: "fast.com"},
	{host: "slow.com"},
	{host: "fast.missing.com"},
	{host: "slow.missing.com"},
}

type Work struct {
	host  string
	addrs []string
	err   error
}

func LookupHost(name string) (addrs []string, err error) {
	t0 := time.Now()
	defer func() {
		fmt.Printf("lookup %s: %.3f seconds\n", name, time.Since(t0).Seconds())
	}()
	h := hosts[name]
	if h == nil {
		h = failure
	}
	return h(name)
}

type resolver func(string) ([]string, error)

var hosts = map[string]resolver{
	"fast.com":         delay(10*time.Millisecond, fixedAddrs("10.0.0.1")),
	"slow.com":         delay(2*time.Second, fixedAddrs("10.0.0.4")),
	"fast.missing.com": delay(10*time.Millisecond, failure),
	"slow.missing.com": delay(2*time.Second, failure),
}

func fixedAddrs(addrs ...string) resolver {
	return func(string) ([]string, error) {
		return addrs, nil
	}
}

func delay(d time.Duration, f resolver) resolver {
	return func(name string) ([]string, error) {
		time.Sleep(d/2 + time.Duration(rand.Int63n(int64(d/2))))
		return f(name)
	}
}

func failure(name string) ([]string, error) {
	return nil, fmt.Errorf("unknown host %v", name)
}
12

Concurrency

Select statements: switch for communication.

package main

import (
	"fmt"
	"time"
)

func main() {
	c1 := make(chan int, 1)
	c2 := make(chan int, 1)
	c1 <- 42

    select {
    case v := <-c1:
        fmt.Println("received from c1: ", v)
    case c2 <- 1:
        fmt.Println("sent to c2")
    case <-time.After(time.Second):
        fmt.Println("timed out")
    default:
        fmt.Println("nothing ready at the moment")
    }
}

That's select that makes efficient implementation difficult.

13

Implementing Concurrency

Challenge: Make channel communication scale

Research question: lock-free channels?

14

Scheduling

15

Scheduling

On the one hand we have arbitrary user programs:

No user hints!

16

Scheduling

On the other hand we have complex hardware topology:

17

Scheduling

Challenge: make it all magically work efficiently

18

Scheduling

Current scheduler:

┌─┐         ┌─┐         ┌─┐         ┌─┐                  ┌─┐
│ │         │ │         │ │         │ │                  │ │
├─┤         ├─┤         ├─┤         ├─┤                  ├─┤ Global
│ │         │G│         │ │         │ │                  │ │ state
├─┤         ├─┤         ├─┤         ├─┤                  ├─┤
│G│         │G│         │G│         │ │                  │G│
├─┤         ├─┤         ├─┤         ├─┤                  ├─┤
│G│         │G│         │G│         │G│                  │G│
└┬┘         └┬┘         └┬┘         └┬┘                  └─┘
 │           │           │           │
 ↓           ↓           ↓           ↓
┌─┬──────┐  ┌─┬──────┐  ┌─┬──────┐  ┌─┬──────┐     ┌────┐┌──────┐┌───────┐
│P│mcache│  │P│mcache│  │P│mcache│  │P│mcache│     │heap││timers││netpoll│
└┬┴──────┘  └┬┴──────┘  └┬┴──────┘  └┬┴──────┘     └────┘└──────┘└───────┘
 │           │           │           │
 ↓           ↓           ↓           ↓
┌─┐         ┌─┐         ┌─┐         ┌─┐               ┌─┐ ┌─┐ ┌─┐
│M│         │M│         │M│         │M│               │M│ │M│ │M│
└─┘         └─┘         └─┘         └─┘               └─┘ └─┘ └─┘

G - goroutine; P - logical processor; M - OS thread (machine)

19

Scheduling

Want:

20

Garbage Collection

21

Garbage Collection

Garbage collection simplifies APIs.

Fundamental to concurrency: too hard to track ownership otherwise.

Fundamental to interfaces: memory management details do not bifurcate otherwise-similar APIs.

Of course, adds cost, latency, complexity in run time system.

22

Garbage Collection

Plenty of research about garbage collection, mostly in Java context.

Java collectors usually:

23

Garbage Collection

But Go is very different!

type Point struct {
    X, Y int
}
type Rectangle struct {
    Min, Max Point
}
p := &rect.Max
24

Implementing Garbage Collection

Current GC: stop the world, parallel mark, start the world, concurrent sweep.
Concurrent mark is almost ready.

Cannot reuse Java GC algorithms directly.

Research question: what GC algorithm is the best fit for Go?
Do we need generations? Do we need compaction? What are efficient data structures that support interior pointers?

25

Race and deadlock detection

26

Race detection

Based on ThreadSanitizer runtime, originally mainly targeted C/C++.
Traditional happens-before race detector based on vector clocks (devil in details!).
Works fine for Go, except:

$ go run -race lots_of_goroutines.go
race: limit on 8192 simultaneously alive goroutines is exceeded, dying

Research question: race dectector that efficiently supports hundreds of thousands of goroutines?

27

Deadlock detection

Deadlock on mutexes due to lock order inversion:

// thread 1                       // thread 2
pthread_mutex_lock(&m1);          pthread_mutex_lock(&m2);
pthread_mutex_lock(&m2);          pthread_mutex_lock(&m1);
...                               ...
pthread_mutex_unlock(&m2);        pthread_mutex_unlock(&m1);
pthread_mutex_unlock(&m1);        pthread_mutex_unlock(&m2);

Lock order inversions are easy to detect:

28

Deadlock detection

Go has channels and mutexes. Channels are semaphores. A mutex can be unlocked in
a different goroutine, so it is essentially a binary semaphore too.

Deadlock example:

// Parallel file tree walk.
func worker(pendingItems chan os.FileInfo)
    for f := range pendingItems {
        if f.IsDir() {
            filepath.Walk(f.Name(), func(path string, info os.FileInfo, err error) error {
                pendingItems <- info
            })
        } else {
            visit(f)
        }
    }
}

pendingItems channel has limited capacity. All workers can block on send to pendingItems.

29

Deadlock detection

Another deadlock example:

var (
    c = make(chan T, 100)
    mtx sync.RWMutex
 )

// goroutine 1      // goroutine 2         // goroutine 3
// does send        // does receive        // "resizes" the channel
mtx.RLock()         mtx.RLock()            mtx.Lock()
c <- v              v := <-c               tmp := make(chan T, 200)
mtx.RUnlock()       mtx.RUnlock()          copyAll(c, tmp)
                                           c = tmp
                                           mtx.Unlock()

RWMutex is fair for both readers and writers: when a writer arrives, new readers are not let to enter the critical section.
Goroutine 1 blocks on chan send; then goroutine 3 blocks on mtx.Lock; then goroutine 2 blocks on mtx.RLock.

30

Deadlock detection

Research question: how to detect deadlocks on semaphores?

No known theory to date.

31

Testing of the implementation

32

Testing of the implementation

So now we have a new language with several complex implementations:

How do you test it?

33

Testing of the implementation

Csmith is a tool that generates random C programs that statically and dynamically conform to the C99 standard.

34

Testing of the implementation

Gosmith is a tool that generates random Go programs that statically and dynamically conform to the Go standard.

Turned out to be much simpler than C: no undefined behavior all around!

35

Testing of the implementation

But generates uninteresting programs from execution point of view: most of them deadlock or crash on nil deref.

Trophies:

36

Testing of the implementation

Research question: how to generate random interesting concurrent Go programs?

Must:

Must not:

37

Research and Go

Plenty of research questions about how to implement Go well.

38

Thank you

Dmitry Vyukov

Google

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)