The other day I was having a conversation with a colleague around an asynchronous file hashing operation that triggers off new objects uploaded to a S3 bucket. At one point we were talking throughput. The design has a notification configuration that sends the S3 events into a SQS queue for processing. This means for the first minute we have five Lambda functions each processing a file one at a time (batch size of 1: this is an implementation decision, and for the sake of this article we won’t get into larger batch sizes), and then at the second minute 65, the thirds 125, and so on.
The napkin math of this discussion assumed 1 GB file size average and an ideal 100 MBps throughput. At 10 seconds per file, 6 files per minute per Lambda function, we could expect during a scale up to process 30 files in the first minute, 380 in the second minute, and 730 in the third minute. Our current implementation allows us to hash through 1,170 files at the (admittedly on the higher end) 1 GB files within 3 minutes.
That is, if we actually get 100 MBps.
What can we actually expect?
The rest of my afternoon became focused on finding out what the realistic throughput for this code would be. I stripped down the service’s Lambda function to the key items and preloaded a S3 bucket with four test files at sizes we commonly expect coming into the system: 100 MB, 500 MB, 1 GB, and 5 GB.
Here’s the code:
Now came the testing portion. I need to see how this code performs not only at different memory settings (remember: the memory setting also allocates CPU and network IO to our functions), but also the chunk size that will be streamed from S3 for each object. The code above uses the chunk size to stream X bytes of the S3 object into memory, then updates the hash digest with it, and discards before moving onto the next chunk. This makes our actual memory utilization very low. In fact, the above hashing operation works at 128 MB of memory for the Lambda function even if the execution time isn’t great.
At this point I must inform you all that I did my testing like a barbarian of old by clicking “Invoke” in the console after changing my memory and chunk settings. If you’re looking to do performance testing, I recommend you go check out the AWS Lambda Power Tuning project for this. It’s pretty great.
The table below is the data I recorded as a part of this effort. A few things to note that limit this data and makes it incomplete:
- I only performed two runs at each configuration. This is a very limited data set and there’s clearly environmental variance between executions that affected the times. These could have been leveled out, and outliers dropped, if I obtained a larger data set.
- With only one run being performed at a time I have no clear indication if a mass number of parallel read operations of different files will impact read speeds. My assumption is no, but that is an assumption.
- While it would be possible to multi-thread this workflow, and potentially multi-process it at much higher memory settings, I don’t see the benefit in doing so for the added code complexity. Plus, splitting up the downloads across threads likely won’t increased read speeds from S3 as now there are multiple streams competing for bandwidth.
We’ll pick up on my thought process on the other side of this table.
| File Size (MB) | Memory Used | First Run (Seconds) | Second Run | Avg Speed (MBps) |
| 128 MB Memory / 1 MB Chunk Size | ||||
| 100 | 82 | 4.91 | 7.49 | 16.13 |
| 500 | 82 | 27.26 | 30.74 | 17.24 |
| 1000 | 82 | 55.7 | 74.82 | 15.32 |
| 5000 | 82 | 329.68 | 329.68 | 15.17 |
| 128 MB Memory / 10 MB Chunk Size | ||||
| 100 | 108 | 5.18 | 7.64 | 15.60 |
| 500 | 108 | 21.26 | 24.96 | 21.64 |
| 1000 | 108 | 43.32 | 49.82 | 21.47 |
| 5000 | 108 | 217.84 | 240.16 | 21.83 |
| 256 MB Memory / 10 MB Chunk Size | ||||
| 100 | 108 | 2.65 | 2.51 | 38.76 |
| 500 | 108 | 9.8 | 10.8 | 48.54 |
| 1000 | 108 | 20.32 | 21.48 | 47.85 |
| 5000 | 108 | 118.7 | 99.08 | 45.92 |
| 256 MB Memory / 20 MB Chunk Size | ||||
| 100 | 138 | 2.64 | 2.4 | 39.68 |
| 500 | 138 | 9.74 | 9.92 | 50.86 |
| 1000 | 138 | 19.58 | 19.92 | 50.63 |
| 5000 | 138 | 99.42 | 97.44 | 50.80 |
| 256 MB Memory / 50 MB Chunk Size | ||||
| 100 | 245 | 3.55 | 2.71 | 31.95 |
| 500 | 245 | 12.68 | 12.52 | 39.68 |
| 1000 | 245 | 25.54 | 25.4 | 39.26 |
| 5000 | 245 | 128.06 | 127.5 | 39.13 |
| 512 MB Memory / 20 MB Chunk Size | ||||
| 100 | 137 | 1.5 | 1.16 | 75.19 |
| 500 | 137 | 5.38 | 5.34 | 93.28 |
| 1000 | 137 | 13.76 | 13.8 | 72.57 |
| 5000 | 137 | 69.74 | 69.74 | 71.69 |
| 512 MB Memory / 50 MB Chunk Size | ||||
| 100 | 245 | 2.07 | 1.73 | 52.63 |
| 500 | 245 | 6.74 | 6.52 | 75.41 |
| 1000 | 245 | 13.66 | 14.3 | 71.53 |
| 5000 | 245 | 68.78 | 70.21 | 71.95 |
| 1024 MB Memory / 20 MB Chunk Size | ||||
| 100 | 137 | 1.2 | 1.1 | 86.96 |
| 500 | 137 | 6.6 | 5.29 | 84.10 |
| 1000 | 137 | 14.57 | 13.93 | 70.18 |
| 5000 | 137 | 72.76 | 69.57 | 70.26 |
| 1024 MB Memory / 50 MB Chunk Size | ||||
| 100 | 246 | 1.33 | 1.21 | 78.74 |
| 500 | 246 | 6.52 | 6.53 | 76.63 |
| 1000 | 246 | 14.46 | 14.61 | 68.80 |
| 5000 | 246 | 72.65 | 72.69 | 68.80 |
| 2048 MB Memory / 20 MB Chunk Size | ||||
| 100 | 138 | 1.09 | 1.06 | 93.02 |
| 500 | 138 | 5.33 | 5.35 | 93.63 |
| 1000 | 138 | 13.89 | 13.91 | 71.94 |
| 5000 | 138 | 69.69 | 69.56 | 71.81 |
I started off at the default 128 MB and a typo of 1 MB chunks (I thought I had written 10000000 😅 ). The smaller chunk size means we’re making many, many more requests to S3, so increasing it to 10 MB is a simple way to improve performance. At the higher chunk size we’re now getting close to utilizing all the available memory and we can’t increase it again.
I think it should be said that anyone deploying Lambda functions should default their memory setting to 256 MB to start no matter what. The leap in performance is clear no matter what you’re doing, and at per-millisecond billing there’s no reason not to go for it.
With the additional memory overhead I decided to see what would happen if I 5x the chunk size. While within the limit, my performance actually decreased. Dropping the chunk size down to 20 MB revealed a sweet spot (someone help here, but I know I’ve heard of the 20 MB number being used in a few other places within AWS for chunking/in-memory caching) where we can now consistency get ~50 MBps reads from S3.
At 512 MB of memory and the 20 MB chunk size we’ve hit the optimal settings across object sizes. 70+ MBps baseline with variance up to 90+ MBps.
If I were to do more intensive performance testing I would be focused here. Increasing memory to 1024 MB and 2048 MB improved the read speed for < 1 GB objects, but not the ≥ 1 GB ones. I still tested 50 MB chunks at 512 MB and 1024 MB but it again resulted in performance hits.
It might be tempting to look at the speed increases for < 1 GB files and say the function should run at that to burn through those faster, but the timing difference is insignificant in our context with 1.06 seconds at 2048 MB vs 1.5 seconds at our “optimal” 512 MB for 100 MB objects.
I say it that way because this system isn’t expected to have to deal with constant, high volume ingress of objects to our bucket. Ingress will be inconsistent and spiky at certain times of a monthly cycle. Now, if I were expecting high volume ingress and at a more constant rate I might find the increase warranted. ~3,400 100 MB objects per hour vs ~2,400 is a very different kind of measurement.
I hope you all enjoyed coming along for this little journey. Perhaps some day I’ll come back to it and put it through some proper performance tuning analysis.





