I sort of had the same requirement as you.. so decided to throw my thoughts here.
1) There is a great tool for that: jol.
2) Arrays are objects too, and every object in java has two additional headers: mark and klass, usually 4 and 8 bytes in size (this can be tweaked via compressed pointers, but not going to go into details).
3) Is is important to note about the load factor here of the map (because it influences the resize of the internal array). Here is an example:
HashMap<Integer, Integer> map = new HashMap<>(16, 1);
for (int i = 0; i < 13; ++i) {
map.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map).toFootprint());
HashMap<Integer, Integer> map2 = new HashMap<>(16);
for (int i = 0; i < 13; ++i) {
map2.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map2).toFootprint());
Output of this is different(only the relevant lines):
1 80 80 [Ljava.util.HashMap$Node; // first case
1 144 144 [Ljava.util.HashMap$Node; // second case
See how the size is bigger for the second case because the backing array is twice as big (32 entries). You can only put 12 entries in a 16 size array, because the default load factor is 0.75: 16 * 0.75 = 12.
Why 144? The math here is easy: an array is an object, thus: 8+4 bytes for headers. Plus 32 * 4 for references = 140 bytes. Due to memory alignment of 8 bytes, there are 4 bytes for padding resulting in a total 144 bytes.
4) entries are stored inside either a Node or a TreeNode inside the map (Node is 32 bytes and TreeNode is 56 bytes). As you use ONLY integers, you will have only Nodes, as there should be no hash collisions. There might be collisions, but this does not yet mean that a certain array entry will be converted to a TreeNode, there is a threshold for that. We can easily prove that there will be Nodes only:
public static void main(String[] args) {
Map<Integer, List<Integer>> map = IntStream.range(0, 15_000_000).boxed()
.collect(Collectors.groupingBy(WillThereBeTreeNodes::hash)); // WillThereBeTreeNodes - current class name
System.out.println(map.size());
}
private static int hash(Integer key) {
int h = 0;
return (h = key.hashCode()) ^ h >>> 16;
}
The result of this will be 15_000_000, there was no merging, thus no hash-collisions.
5) When you create Integer objects there is pool for them (ranging from -127 to 128 - this can be tweaked as well, but let's not for simplicity).
6) an Integer is an object, thus it has 12 bytes header and 4 bytes for the actual int value.
With this in mind, let's try and see the output for 15_000_000 entries (since you are using a load factor of one, there is no need to create the internal capacity of 16_000_000). It will take a lot of time, so be patient. I also gave it a
-Xmx12G and -Xms12G
HashMap<Integer, Integer> map = new HashMap<>(15_000_000, 1);
for (int i = 0; i < 15_000_000; ++i) {
map.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map).toFootprint());
Here is what jol said:
java.util.HashMap@9629756d footprint:
COUNT AVG SUM DESCRIPTION
1 67108880 67108880 [Ljava.util.HashMap$Node;
29999872 16 479997952 java.lang.Integer
1 48 48 java.util.HashMap
15000000 32 480000000 java.util.HashMap$Node
44999874 1027106880 (total)
Let's start from bottom.
total size of the hashmap footprint is: 1027106880 bytes or 1 027 MB.
Node instance is the wrapper class where each entry resides. it has a size of 32 bytes; there are 15 million entries, thus the line:
15000000 32 480000000 java.util.HashMap$Node
Why 32 bytes? It stores the hashcode(4 bytes), key reference (4 bytes), value reference (4 bytes), next Node reference (4 bytes), 12 bytes header, 4 bytes padding, resulting in 32 bytes total.
1 48 48 java.util.HashMap
A single hashmap instance - 48 bytes for it's internals.
If you really want to know why 48 bytes:
System.out.println(ClassLayout.parseClass(HashMap.class).toPrintable());
java.util.HashMap object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 Set AbstractMap.keySet N/A
16 4 Collection AbstractMap.values N/A
20 4 int HashMap.size N/A
24 4 int HashMap.modCount N/A
28 4 int HashMap.threshold N/A
32 4 float HashMap.loadFactor N/A
36 4 Node[] HashMap.table N/A
40 4 Set HashMap.entrySet N/A
44 4 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Next the Integer instances:
29999872 16 479997952 java.lang.Integer
30 million integer objects (minus 128 that are cached in the pool)
1 67108880 67108880 [Ljava.util.HashMap$Node;
we have 15_000_000 entries, but the internal array of a HashMap is a power of two size, that's 16,777,216 references of 4 bytes each.
16_777_216 * 4 = 67_108_864 + 12 bytes header + 4 padding = 67108880
HashMap
: 30_000_000Integer
objects (15_000_000 keys, 15_000_000 values) need another 240 MB - Thomas Kläger