2
votes

Yes it looks like one of the most duplicated questions on StackOverflow but please take a few minutes for the question.

func _Crawl(url string, fetcher Fetcher, ch chan []string) {
    if store.Read(url) == true {
        return
    } else {
        store.Write(url)
    }

    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Printf("not found: %s\n", url)
    }
    fmt.Printf("found: %s %q\n", url, body)
    ch <- urls
}

func Crawl(url string, fetcher Fetcher) {
    UrlChannel := make(chan []string, 4)
    go _Crawl(url, fetcher, UrlChannel)
    for urls, ok := <- UrlChannel; ok; urls, ok = <- UrlChannel{
        for _, i := range urls {
            go _Crawl(i, fetcher, UrlChannel)
        }
    }
    close(UrlChannel) //The channel is closed.
 }

func main() {
   Crawl("http://golang.org/", fetcher)
}

I'm closing the channel after the loop ended. The program returns correct results but raises the error at the end:

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:

main.Crawl(0x113a2f, 0x12, 0x1800d0, 0x10432220)

/tmp/sandbox854979773/main.go:55 +0x220

main.main()

/tmp/sandbox854979773/main.go:61 +0x60

What is wrong with my goroutines?

2
deadlock is occurring because call to close is never reached. ok in the channel's receive will be false only when the channel has been closed and therefore execution is stuck in that loop. You should consider using range loop (dave.cheney.net/2014/03/19/channel-axioms). If you are fine with reworking the design a bit, something like this can work as a hint: play.golang.org/p/aiuiyueyzBabhink
yes, close channel in not reachable, but that's not the root of the problem here and no, execution will stuck only in case of infinite urlsMarcel Novy

2 Answers

2
votes

Well after a first look, you can do a shorter for just using range like:

for urls := range UrlChannel { ... }

it will iterate until the channel is closed and it looks much better.

Also you have an early return in the first if of your function _Crawl(), so if that first condition is true the function will end and nothing will passed to the channel, so the code what is receiving from that channel will wait forever.

Other thing, inside your second for your're creating goroutines for each url, but you're not waiting for them and actually those goroutines will try to send something to a closed channel. It seems that this is not happening because in this case the code will panic, you can use a WaitGroup for this.

In resume you've a code with several possible dead lock conditions.

||| Super Edit |||:

I should write you that your code is a kinda messy and the solution may be a simple WaitGroup, but I was afraid of make you feel bad 'cause I found too many issues, but if you really want to learn how to write concurrent code you should think first in a code or pseudo-code without concurrency at all and then try to add the magic.

In your case what I see is a recursive solution since the url are fetched from a HTML document in form of tree, would be something like a DFS:

func crawl(url string, fetcher Fetcher) {
    // if we've visited this url just stop the recursion
    if store.Read(url) == true {
       return
    }
    store.Write(url)

    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Printf("not found: %s\n", url)
        return // early return if there's no urls there's no reason to continue
    }
    fmt.Printf("found: %s %q\n", url, body)

    // this part will change !!
    // ...
    for _, url := range urls {
        crawl(url, fetcher)
    }
    //
}

func main() {
    crawl("http://golang.org", fetcher)
}

now the second step identify concurrent code, easy in this case since each url can be fetched concurrently (sometimes in parallel), all we have to add is a WaitGroup and create a goroutine for each url, now just have to update only the for to fetch urls (it's only the for block):

// this code will be in the comment: "this part will change !!"
//
// this var is just a thread-safe counter
var wg sync.WaitGroup

// set the WaitGroup counter with the len of urls slice
wg.Add(len(urls))

for _, url := range urls {

    // it's very important pass the url as a parameter 
    // because the var url changes for each loop (url := range)
    go func(u string) {

        // Decrement the counter (-1) when the goroutine completes
        defer wg.Done()

        crawl(u, fetcher)

    }(url)
}

wg.Wait() // wait for all your goroutines
// ...

Future considerations, maybe you want to control the number of goroutines (or workers) for that you have to use something like Fan In or Fan Out, you can find more here: https://blog.golang.org/advanced-go-concurrency-patterns and https://blog.golang.org/pipelines

But don't be afraid of create thousands of goroutines in Go they're very cheap

Note: I haven't compiled the code, maybe has a little bug somewhere :)

1
votes

Both solutions described above and range loop through a channel have the same issue. The issue is that a loop will be ended after a channel be closed but a channel be closed after the loop be ended. So we need to know when to close the opened channel. I believe we need to count started jobs (goroutines). But in this case I just lost a counter variable. Since it is a Tour exercise it shouldn't be complicated.

func _Crawl(url string, fetcher Fetcher, ch chan []string) {
    if store.Read(url) == false {
        store.Write(url)    
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            fmt.Printf("not found: %s\n", url)
        } else {
            fmt.Printf("found: %s %q\n", url, body)
        }
        ch <- urls
    }
}

func Crawl(url string, depth int, fetcher Fetcher) {
    UrlChannel := make(chan []string, 4)
    go _Crawl(url, fetcher, UrlChannel)
    for urls := range UrlChannel {
        for _, url := range urls {
            go _Crawl(url, fetcher, UrlChannel)
        }
        depth--
        if depth < 0 {
            close(UrlChannel)
        }
    }
}