1
votes

I'm searching for a way to generate a SHA-512 hash from a json string in Ruby, independent from the positions of the elements in it, and independent from nestings, arrays, nested arrays and so on. I just want to hash the raw data along with its keys.

I tried some approaches with converting the JSON into a ruby hash, deep sort them by their keys, append everything into one, long string and hash it. But I bet that my solution isn't the most efficient one, and that there must be a better way to do this.

EDIT

So far, I convert JSON into a Ruby hash. Then I try to use this function to get a canonical representation:

  def self.canonical_string_from_hash value, key=nil
    str = ""
    if value.is_a? Hash
      value.keys.sort.each do |k|
        str += canonical_string_from_hash(value[k], k)
      end
    elsif value.is_a? Array
      str += key.to_s
      value.each do |v|
        str += canonical_string_from_hash(v)
      end
    else
      str += key ? "#{key}#{value}" : value.to_s
    end
    return str
  end

But I'm not sure, if this is a good and efficient way to do this.

For example, this hash

hash = {
  id: 3,
  zoo: "test",
  global: [
    {ukulele: "ringding", blub: 3},
    {blub: nil, ukulele: "rangdang", guitar: "stringstring"}
  ],
  foo: {
    ids: [3,4,5],
    bar: "asdf"
  }
}

gets converted to this string:

barasdfids345globalblub3ukuleleringdingblubguitarstringstringukulelerangdangid3zootest
1
Please post the solution you have so far.Frank Schmitt
You need a canonical form for your data, whereby you can assert that two things are "equivalent". Flattening a structure means that they are not truly equivalent (within Ruby) in a lot of ways that you don't care about for your digest, so you need to be clear about what that means to you (i.e. when two inputs have the same hash, what do you want to be able to say about them?). Actually the SHA part is pretty much irrelevant here, except to note there is no shortcut where you could get the digest algorithm to do any of this for you - it works at the wrong level to help.Neil Slater
Side note: SHA-512 hash values are not unique. There are far fewer of them than possible valid JSON strings. However there are rather a lot of them and collisions probably won't concern you.Neil Slater
We're using this method, to prevent a hash from beeing modified during communication. Before hashing, we append a long salt, that both sides of the communication knows, and then we hash that string. The other side can do the same with the parameters and the salt, and then can decide, if the request is the original request or was modified in some places (like a man in the middle).23tux
What is the expected behaviour in case of hash = { id: 3, zoo: "test", global: [ {blub: nil, ukulele: "rangdang", guitar: "stringstring"}, {ukulele: "ringding", blub: 3} ], foo: { ids: [3,4,5], bar: "asdf" } } ? Will it be treated as the same string, or should it be treaed as a different string?Anshul Goyal

1 Answers

1
votes

But I'm not sure, if this is a good and efficient way to do this.

Depends on what you are trying to do. Your canonical/equivalent structures need to represent what is important to you for the comparison. Removing details such as object structure makes sense if you consider two items with different structure but same string values equivalent.

According to your comments, you are attempting to sign a request that is being transferred from one system to a second one. In other words you want security, not a measure of similarity or a digital fingerprint for some other purpose. Therefore equivalent requests are ones that are identical in all the ways that affect the processing that you want to protect. It is simpler, and very likely more secure, to lock down the raw bytes of data that transfer between your two systems.

In which case your whole approach needs a re-think. The reasons for that are probably best discussed on security.stackoverflow.com

However, in brief:

  • Use an HMAC routine (HMAC-SHA512), it is designed for your purpose. Instead of a salt, this uses a secret, which is essentially the same thing (in fact you need to keep your salt a secret in your implementation too, which is unusual for something called a salt), but has been combined with the SHA in a way which makes it resilient to a couple of attack forms possible against simple concatenation followed by SHA. The worst of these is that it is possible to extend the data and have it generate the same SHA when processed, without needing to know the salt. In other words, an attacker could take a known valid request and use it to forge other requests which will get past your security check. Your proposed solution looks vulnerable to this form of attack to me.

  • Unpacking the request and analysing the details to get a "canonical" view of the request is not necessary, and also reduces the security of your solution. The only reason for doing this is that you are for some reason not able to handle the request once it has been serialised to JSON, and are forced to work only with the de-serialised request at one end or another of the two systems. If that is purely a knowledge or convenience thing, then fix that problem rather than trying to roll your own security protocol using SHA-512.

  • You should sign the request, and check the signature, against the fully serialised JSON string. If you need to de-serialise data from a "man-in-the-middle" attack, then you are potentially already exposed to some attacks via the parser. You should work to reject suspect requests before any data processing has been done to them.

TL;DR - ALthough not a direct answer to your question, the correct solution for you is to not write this code at all. Instead you need to place your secure signature code closer to the ins and outs of your two services that need to trust each other.