Maps and memory leaks
When working with maps in Go, we need to understand some important characteristics of how a map grows and shrinks. Letâs delve into this to prevent issues that can cause memory leaks.
First, to view a concrete example of this problem, letâs design a scenario where we will work with the following map:
Each value of m is an array of 128 bytes. We will do the following:
- Allocate an empty map.
- Add 1 million elements.
- Remove all the elements, and run a Garbage Collection (GC).
After each step, we want to print the size of the heap (using a printAlloc
utility function). This shows us how this example behaves memory-wise:
func main() {
n := 1_000_000
m := make(map[int][128]byte)
printAlloc()
for i := 0; i < n; i++ { // Adds 1 million elements
m[i] = [128]byte{}
}
printAlloc()
for i := 0; i < n; i++ { // Deletes 1 million elements
delete(m, i)
}
runtime.GC() // Triggers a manual GC
printAlloc()
runtime.KeepAlive(m) // Keeps a reference to m so that the map isnât collected
}
func printAlloc() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("%d MB\n", m.Alloc/(1024*1024))
}
We allocate an empty map, add 1 million elements, remove 1 million elements, and then run a GC. We also make sure to keep a reference to the map using runtime.KeepAlive
so that the map isnât collected as well. Letâs run this example:
0 MB <-- After m is allocated
461 MB <-- After we add 1 million elements
293 MB <-- After we remove 1 million elements
What can we observe? At first, the heap size is minimal. Then it grows significantly after having added 1 million elements to the map. But if we expected the heap size to decrease after removing all the elements, this isnât how maps work in Go. In the end, even though the GC has collected all the elements, the heap size is still 293 MB. So the memory shrunk, but not as we might have expected. Whatâs the rationale? We need to delve into how a map works in Go.
A map provides an unordered collection of key-value pairs in which all the keys are distinct. In Go, a map is based on the hash table data structure: an array where each element is a pointer to a bucket of key-value pairs, as shown in figure 1.
Each bucket is a fixed-size array of eight elements. In the case of an insertion into a bucket that is already full (a bucket overflow), Go creates another bucket of eight elements and links the previous one to it. Figure 2 shows an example:
Under the hood, a Go map is a pointer to a runtime.hmap struct. This struct contains multiple fields, including a B field, giving the number of buckets in the map:
type hmap struct {
B uint8 // log_2 of # of buckets
// (can hold up to loadFactor * 2^B items)
// ...
}
After adding 1 million elements, the value of B equals 18, which means 2š⸠= 262,144 buckets. When we remove 1 million elements, whatâs the value of B? Still 18. Hence, the map still contains the same number of buckets.
The reason is that the number of buckets in a map cannot shrink. Therefore, removing elements from a map doesnât impact the number of existing buckets; it just zeroes the slots in the buckets. A map can only grow and have more buckets; it never shrinks.
In the previous example, we went from 461 MB to 293 MB because the elements were collected, but running the GC didnât impact the map itself. Even the number of extra buckets (the buckets created because of overflows) remains the same.
Letâs take a step back and discuss when the fact that a map cannot shrink can be a problem. Imagine building a cache using a map[int][128]byte
. This map holds per customer ID (the int
), a sequence of 128 bytes. Now, suppose we want to save the last 1,000 customers. The map size will remain constant, so we shouldnât worry about the fact that a map cannot shrink.
However, letâs say we want to store one hour of data. Meanwhile, our company has decided to have a big promotion for Black Friday: in one hour, we may have millions of customers connected to our system. But a few days after Black Friday, our map will contain the same number of buckets as during the peak time. This explains why we can experience high memory consumption that doesnât significantly decrease in such a scenario.
What are the solutions if we donât want to manually restart our service to clean the amount of memory consumed by the map? One solution could be to re-create a copy of the current map at a regular pace. For example, every hour, we can build a new map, copy all the elements, and release the previous one. The main drawback of this option is that following the copy and until the next garbage collection, we may consume twice the current memory for a short period.
Another solution would be to change the map type to store an array pointer: map[int]*[128]byte
. It doesnât solve the fact that we will have a significant number of buckets; however, each bucket entry will reserve the size of a pointer for the value instead of 128 bytes (8 bytes on 64-bit systems and 4 bytes on 32-bit systems).
Coming back to the original scenario, letâs compare the memory consumption for each map type following each step. The following table shows the comparison.
Step | map[int][128]byte |
map[int]*[128]byte |
---|---|---|
Allocate an empty map | 0 MB | 0 MB |
Add 1 million elements | 461 MB | 182 MB |
Remove all the elements and run a GC | 293 MB | 38 MB |
Note
If a key or a value is over 128 bytes, Go wonât store it directly in the map bucket. Instead, Go stores a pointer to reference the key or the value.
As we have seen, adding n elements to a map and then deleting all the elements means keeping the same number of buckets in memory. So, we must remember that because a Go map can only grow in size, so does its memory consumption. There is no automated strategy to shrink it. If this leads to high memory consumption, we can try different options such as forcing Go to re-create the map or using pointers to check if it can be optimized.