Memoization: A Simple Cache for Expensive Operations
By Aaron Speer Published August 2, 2023When building complex web applications, one of the most immediate ways to improve performance is to implement an intelligent caching paradigm. Good caching prevents multiple generations of static content, protects high-traffic pages from getting bogged down, and, if implemented correctly, allows the vast majority of your visitors to have a smooth, fast experience.
But caching isn’t without its pitfalls. Object Caches require a specific server setup, and often don’t work on older systems. Page Caches are incredibly effective, essentially transforming your application into static HTML, but they quickly become stale, and are notoriously difficult to use in combination with dynamic or user-specific content.
Thankfully, there’s another type of cache, and you can start using it in your code today without any changes to your server, no matter the hosting environment: Memoization!
A Simple Use Case
Memoization (sometimes called function caching), is the process of storing the results of an expensive process in order to prevent it from firing multiple times for a single server request. That last bit is important, because unlike any persistent caches you may have dealt with before, memoization doesn’t actually hang on to its data between server requests. This makes it ideal for situations where multiple pieces of your codebase utilize the same method and expect the result to be the same every time it’s called.
As an example we ran into recently at Gravity Forms, imagine you have a directory full of SVG images that you need to iterate over and get the file contents of in order to display them elsewhere in the application. At a basic level, it would look something like this:
class SVG_Finder { public function svgs() { $svgs = array(); foreach ( glob( '/path/to/svgs/*.svg' ) as $filename ) { $key = pathinfo( $filename, PATHINFO_FILENAME ); $svgs[ $key ] = file_get_contents( $filename ); } return $svgs; } }
Nothing too complicated – we simply loop through the results of calling glob() on our file path and call file_get_contents() on each item we find, storing that in an array. This works well, and on most systems won’t take much time to process.
But now imagine that this method is called a hundred times by the core plugin and the various add-ons the site uses, and that the site lives on a shared server already over-taxed by the number of applications it hosts. Suddenly this simple loop that doesn’t impact load times when called in isolation can cause every page load to grind to a halt.
Thankfully, any time this method is called, it should return the exact same array of SVG contents. This means that after we call it once, we can reliably return the same array to anything that calls it again. An Object Cache would work here, as we could simply store the resulting array in the Cache and return that when called. As we pointed out earlier, however, we can’t always assume that an Object Cache will be present on a given server, so it would be best if we could find a way to cache these results without requiring one.
The answer lies in a clever usage of private static class properties.
Memoization via Static Properties
Static class properties, at their most-basic level, are simply class properties which, just like any other type of property, exist within a given class and can be accessed from within that class context. Unlike normal class properties, however, static properties can be accessed without first instantiating the class to which they belong. That, in itself, isn’t particularly helpful for our purposes. We don’t necessarily need to be able to access the property outside the class.
As it turns out, static properties have another interesting aspect that is much-more-useful in our situation: no matter how many times the class is instantiated, called, or otherwise interacted with, the contents of the static property will remain in-place.
That means that anything we do to this static property will persist for the entire server request, which is precisely how we want our memoization to function! With this in mind, let’s update our SVG method to take advantage of a static property:
class SVG_Finder { private static $svgs; public function svgs() { if ( ! is_null( self::$svgs ) ) { return self::$svgs; } $svgs = array(); foreach ( glob( '/path/to/svgs/*.svg' ) as $filename ) { $key = pathinfo( $filename, PATHINFO_FILENAME ); $svgs[ $key ] = file_get_contents( $filename ); } self::$svgs = $svgs; return $svgs; } }
We made a few changes here. First, we declared the property as `private static $svgs`. Since we don’t set a default value for it, the property will initialize as `null`.
Next, we updated the svgs() method to use the new property. This is relatively straightforward: before the method does any file parsing, we check to see if the $svgs property is non-null. If it’s not, in fact, null, we know that it’s been processed at least once, so we can safely return it’s value instead of moving on to our expensive process.
If it is null, however, we know that it hasn’t yet been populated, so we go through our process as usual. Before returning the values, we simply save the array to our $svgs property, ensuring that the next time this function is called, the values will exist and can be returned without running through the entire method again.
Just like that, we now have a function that will only ever be processed a single time, no matter how many times its called in the application. You could call this function a thousand times in a loop, and it would only ever glob() the directory of SVGs a single time, saving a ton of processing power.
That works great for a method that doesn’t take any arguments, but what if our svg() method accepted a file path parameter? Thankfully, that’s an easy change to make.
Memoizing Functions with Parameters
Once you introduce parameters into your memoized functions, things can get a little bit trickier. With a bit of thought, however, we can update our cache to handle things gracefully. Here’s our updated code to handle this:
class SVG_Finder { private static $svgs = array(); public function svgs( $file_path ) { if ( isset( self::$svgs[ $file_path ] ) ) { return self::$svgs[ $file_path ]; } $svgs = array(); foreach ( glob( $file_path . '/*.svg' ) as $filename ) { $key = pathinfo( $filename, PATHINFO_FILENAME ); $svgs[ $key ] = file_get_contents( $filename ); } self::$svgs[ $file_path ] = $svgs; return $svgs[ $file_path ]; } }
Let’s go over the changes, starting from the top:
First, we updated the property to initialize as an array(). This is important, as we’ll be accessing this under the assumption that it’s an array, so leaving it as null would throw errors the first time we tried to access it.
Next, we updated our method to take a $file_path parameter. We pass this parameter to the glob() call, allowing the location of the SVGs to be dynamic.
Now to make our cache respect this parameter. Since we only have a single parameter to worry about, we can simply update our is_null() check to see if that parameter exists as a key within the array. If it does, we return the value stored within that key.
Finally, we need to update how we populate our property after the first run. Here we pass the $file_path as the key to the array, and store the results of our glob() call as the value. This allows the $svgs property to contain a unique set of contents for every file path passed to it, while eliminating duplicate calls to that specific file path.
With those changes in-place, we now have a dynamic method, capable of accepting any file path argument, while still preventing identical calls from bogging down our system.
Pitfalls, Gotchas, and Limitations
It’s important to keep in mind that memoization is a great tool for very-specific situations. One of its biggest limitations is the fact that there are many valid situations in which a single function call should return different results. For instance, if your function hits a database, it’s entirely possible for a table to be updated in-between calls on a single server request. As such, memoizing those results would cause issues with stale data.
Another aspect to keep in mind is the size of the data you’re storing in your cache. Keeping dozens of very-large datasets hanging out in-memory during a request can quickly exhaust the max memory limits of a managed server, resulting in fatal errors.
Finally, when memoizing function calls that utilize many arguments, there’s definitely an inflection point at which it becomes cumbersome to try to cache all the various permutations of those arguments. If you find yourself spending more time dealing with cache invalidation and instantiation than actually dealing with the data, it’s probably a sign that you’re attempting to memoize a function that simply doesn’t lend itself to being memoized.
Wrapping Up
Memoization is a quick, relatively-simple, and effective way to improve the performance of your applications with very little overhead. As long as you pay close attention to what you’re caching and how it’s accessed, memoization can go a long way towards making even the most-intense processes substantially faster for your end users.
So the next time you find yourself dealing with increasingly-long load times in your application, take a look and see if memoizing some of your functions could help ease that burden. Chances are, faster performance is just a few lines of code away!
If you want to keep up-to-date with what’s happening on the blog sign up for the Gravity Forms newsletter!