2
votes

I'm trying to understand more about go's channels and goroutines, so I decided to make a little program that count words from a file, read by a bufio.NewScanner object:

nCPUs := flag.Int("cpu", 2, "number of CPUs to use")
flag.Parse()
runtime.GOMAXPROCS(*nCPUs)    

scanner := bufio.NewScanner(file)
lines := make(chan string)
results := make(chan int)

for i := 0; i < *nCPUs; i++ {
    go func() {
        for line := range lines {
            fmt.Printf("%s\n", line)
            results <- len(strings.Split(line, " "))
        }
    }()
}

for scanner.Scan(){
    lines <- scanner.Text()
}
close(lines)


acc := 0
for i := range results {
      acc += i
 }

fmt.Printf("%d\n", acc)

Now, in most examples I've found so far both the lines and results channels would be buffered, such as make(chan int, NUMBER_OF_LINES_IN_FILE). Still, after running this code, my program exists with a fatal error: all goroutines are asleep - deadlock! error message.

Basically my thought it's that I need two channels: one to communicate to the goroutine the lines from the file (as it can be of any size, I don't like to think that I need to inform the size in the make(chan) function call. The other channel would collect the results from the goroutine and in the main function I would use it to e.g. calculate an accumulated result.

What should be the best option to program in this manner with goroutines and channels? Any help is much appreciated.

2

2 Answers

6
votes

As @AndrewN has pointed out, the problem is each goroutine gets to the point where it's trying to send to the results channel, but those sends will block because the results channel is unbuffered and nothing reads from them until the for i := range results loop. You never get to that loop, because you first need to finish the for scanner.Scan() loop, which is trying to send all the lines down the lines channel, which is blocked because the goroutines are never looping back to the range lines because they're stuck sending to results.

The first thing you might try to do to fix this is to put the scanner.Scan() stuff in a goroutine, so that something can start reading off the results channel right away. However, the next problem you'll have is knowing when to end the for i := range results loop. You want to have something close the results channel, but only after the original goroutines are done reading off the lines channel. You could close the results channel right after closing the lines channel, however I think that might introduce a potential race, so the safest thing to do is also wait for the original two goroutines to be done before closing the results channel: (playground link):

package main

import "fmt"
import "runtime"
import "bufio"
import "strings"
import "sync"

func main() {
    runtime.GOMAXPROCS(2)

    scanner := bufio.NewScanner(strings.NewReader(`
hi mom
hi dad
hi sister
goodbye`))
    lines := make(chan string)
    results := make(chan int)

    wg := sync.WaitGroup{}
    for i := 0; i < 2; i++ {
        wg.Add(1)
        go func() {
            for line := range lines {
                fmt.Printf("%s\n", line)
                results <- len(strings.Split(line, " "))
            }
            wg.Done()
        }()
    }

    go func() {
        for scanner.Scan() {
            lines <- scanner.Text()
        }
        close(lines)
        wg.Wait()
        close(results)
    }()

    acc := 0
    for i := range results {
        acc += i
    }

    fmt.Printf("%d\n", acc)
}
4
votes

Channels in go are unbuffered by default, which means that none of the anonymous goroutines you spawn can send to the results channel until you start trying to receive from that channel. That doesn't start executing in the main program until scanner.Scan() is done filling up the line channel...which it's blocked from doing until your anonymous functions can send to the results channel and restart their loops. Deadlock.

The other problem in your code, even when trivially fixing the above by buffering the channels, is that for i := range results will also deadlock once there are no more results being fed into it, since the channel hasn't been closed.

Edit: Here's one potential solution, if you want to avoid buffered channels. Basically, the first issue is avoided by performing the send to the results channel via a new goroutine, allowing the lines loop to complete. The second issue (not knowing when to stop reading a channel) is avoided by counting the goroutines as they are created and explicitly closing down the channel when every goroutine is accounted for. It's probably better to do something similar with waitgroups, but this is just a very fast way to show how to do this unbuffered.