Well after a first look, you can do a shorter for just using range like:
for urls := range UrlChannel { ... }
it will iterate until the channel is closed and it looks much better.
Also you have an early return in the first if of your function _Crawl(), so if that first condition is true the function will end and nothing will passed to the channel, so the code what is receiving from that channel will wait forever.
Other thing, inside your second for your're creating goroutines for each url, but you're not waiting for them and actually those goroutines will try to send something to a closed channel. It seems that this is not happening because in this case the code will panic, you can use a WaitGroup for this.
In resume you've a code with several possible dead lock conditions.
||| Super Edit |||:
I should write you that your code is a kinda messy and the solution may be a simple WaitGroup, but I was afraid of make you feel bad 'cause I found too many issues, but if you really want to learn how to write concurrent code you should think first in a code or pseudo-code without concurrency at all and then try to add the magic.
In your case what I see is a recursive solution since the url are fetched from a HTML document in form of tree, would be something like a DFS:
func crawl(url string, fetcher Fetcher) {
// if we've visited this url just stop the recursion
if store.Read(url) == true {
return
}
store.Write(url)
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Printf("not found: %s\n", url)
return // early return if there's no urls there's no reason to continue
}
fmt.Printf("found: %s %q\n", url, body)
// this part will change !!
// ...
for _, url := range urls {
crawl(url, fetcher)
}
//
}
func main() {
crawl("http://golang.org", fetcher)
}
now the second step identify concurrent code, easy in this case since each url can be fetched concurrently (sometimes in parallel), all we have to add is a WaitGroup and create a goroutine for each url, now just have to update only the for to fetch urls (it's only the for block):
// this code will be in the comment: "this part will change !!"
//
// this var is just a thread-safe counter
var wg sync.WaitGroup
// set the WaitGroup counter with the len of urls slice
wg.Add(len(urls))
for _, url := range urls {
// it's very important pass the url as a parameter
// because the var url changes for each loop (url := range)
go func(u string) {
// Decrement the counter (-1) when the goroutine completes
defer wg.Done()
crawl(u, fetcher)
}(url)
}
wg.Wait() // wait for all your goroutines
// ...
Future considerations, maybe you want to control the number of goroutines (or workers) for that you have to use something like Fan In or Fan Out, you can find more here:
https://blog.golang.org/advanced-go-concurrency-patterns
and
https://blog.golang.org/pipelines
But don't be afraid of create thousands of goroutines in Go they're very cheap
Note: I haven't compiled the code, maybe has a little bug somewhere :)
close
is never reached.ok
in the channel's receive will be false only when the channel has been closed and therefore execution is stuck in that loop. You should consider using range loop (dave.cheney.net/2014/03/19/channel-axioms). If you are fine with reworking the design a bit, something like this can work as a hint: play.golang.org/p/aiuiyueyzB – abhink