I'm creating a webcrawler of sorts and when I use Anemone without storage to crawl a site it eventually crashes due to memory issues.
So I installed Redis, redis-rb, etc and changed my code to use the redis storage.. Now I get an error from rails and it does not finish crawling. It does connect to redis, as I can see activity when I monitor it, but it stops the crawl.
I did test redis and it works fine ...
Any ideas???
Errors
RuntimeError (-ERR wrong number of arguments for 'hgetall' command): /usr/lib/ruby/gems/1.8/gems/ezmobius-redis-rb-0.1/lib/redis.rb:274:in read_reply' /usr/lib/ruby/gems/1.8/gems/ezmobius-redis-rb-0.1/lib/redis.rb:198:inraw_call_command' /usr/lib/ruby/gems/1.8/gems/ezmobius-redis-rb-0.1/lib/redis.rb:196:in map' /usr/lib/ruby/gems/1.8/gems/ezmobius-redis-rb-0.1/lib/redis.rb:196:inraw_call_command' /usr/lib/ruby/gems/1.8/gems/ezmobius-redis-rb-0.1/lib/redis.rb:161:in call_command' /usr/lib/ruby/gems/1.8/gems/ezmobius-redis-rb-0.1/lib/redis.rb:151:inmethod_missing' anemone (0.5.0) lib/anemone/storage/redis.rb:82:in rget' anemone (0.5.0) lib/anemone/storage/redis.rb:41:ineach' anemone (0.5.0) lib/anemone/storage/redis.rb:40:in each' anemone (0.5.0) lib/anemone/storage/redis.rb:57:inkeys' anemone (0.5.0) lib/anemone/storage/redis.rb:12:in initialize' anemone (0.5.0) lib/anemone/storage.rb:30:innew' anemone (0.5.0) lib/anemone/storage.rb:30:in Redis' app/controllers/processpages_controller.rb:332:incrawlnewsite'
Redis monitor results:
+1297028372.281985 "keys" "anemone:pages:*"
versions/environment:
redis: tried version 2, and 2.0.4 - same p roblem either version rails. 2.3.8, ruby 1.8.7patch 174 os: centos 5 redis - default port 6379 hostgator vps / control panel x
installed gems
anemone 0.5.0 and dependencies (noko etc..) redis gem (2.1.1) redis-name-space (0.8.0)
code
require 'anemone'
require 'redis'
require 'redis-namespace'
Anemone.crawl(homepage) do |anemone|
anemone.storage = Anemone::Storage.Redis
anemone.on_every_page do |page|
...tons of code
end
config/redis.yml
defaults: &defaults host: localhost port: 6379
development: <<: *defaults
test: <<: *defaults
staging: <<: *defaults
production: <<: *defaults