Some additions to a given set of answers:
First of all if you going to use Redis hash efficiently you must know
a keys count max number and values max size - otherwise if they break out hash-max-ziplist-value or hash-max-ziplist-entries Redis will convert it to practically usual key/value pairs under a hood. ( see hash-max-ziplist-value, hash-max-ziplist-entries ) And breaking under a hood from a hash options IS REALLY BAD, because each usual key/value pair inside Redis use +90 bytes per pair.
It means that if you start with option two and accidentally break out of max-hash-ziplist-value you will get +90 bytes per EACH ATTRIBUTE you have inside user model! ( actually not the +90 but +70 see console output below )
# you need me-redis and awesome-print gems to run exact code
redis = Redis.include(MeRedis).configure( hash_max_ziplist_value: 64, hash_max_ziplist_entries: 512 ).new
=> #<Redis client v4.0.1 for redis://127.0.0.1:6379/0>
> redis.flushdb
=> "OK"
> ap redis.info(:memory)
{
"used_memory" => "529512",
**"used_memory_human" => "517.10K"**,
....
}
=> nil
# me_set( 't:i' ... ) same as hset( 't:i/512', i % 512 ... )
# txt is some english fictionary book around 56K length,
# so we just take some random 63-symbols string from it
> redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), 63] ) } }; :done
=> :done
> ap redis.info(:memory)
{
"used_memory" => "1251944",
**"used_memory_human" => "1.19M"**, # ~ 72b per key/value
.....
}
> redis.flushdb
=> "OK"
# setting **only one value** +1 byte per hash of 512 values equal to set them all +1 byte
> redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), i % 512 == 0 ? 65 : 63] ) } }; :done
> ap redis.info(:memory)
{
"used_memory" => "1876064",
"used_memory_human" => "1.79M", # ~ 134 bytes per pair
....
}
redis.pipelined{ 10000.times{ |i| redis.set( "t:#{i}", txt[rand(50000), 65] ) } };
ap redis.info(:memory)
{
"used_memory" => "2262312",
"used_memory_human" => "2.16M", #~155 byte per pair i.e. +90 bytes
....
}
For TheHippo answer, comments on Option one are misleading:
hgetall/hmset/hmget to the rescue if you need all fields or multiple get/set operation.
For BMiner answer.
Third option is actually really fun, for dataset with max(id) < has-max-ziplist-value this solution has O(N) complexity, because, surprise, Reddis store small hashes as array-like container of length/key/value objects!
But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much
But you should not worry, you'll break hash-max-ziplist-entries very fast and there you go you are now actually at solution number 1.
Second option will most likely go to the fourth solution under a hood because as question states:
Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.
And as you already said: the fourth solution is the most expensive +70 byte per each attribute for sure.
My suggestion how to optimize such dataset:
You've got two options:
If you cannot guarantee max size of some user attributes then you go for first solution, and if memory matter is crucial then
compress user json before storing in redis.
If you can force max size of all attributes.
Then you can set hash-max-ziplist-entries/value and use hashes either as one hash per user representation OR as hash memory optimization from this topic of a Redis guide: https://redis.io/topics/memory-optimization and store user as json string. Either way you may also compress long user attributes.