Tuning concurrency settings for AWS S3 CLI
Here is a interesting use case that I worked on recently, I had to process about 4Tb of 360° panoramic images stored in AWS S3 and generate tile images for them.
I had a lambda function which was listening on any s3:ObjectCreated
event on an S3 bucket, which would in turn process the image and generate the tiles. So all I had to do was to copy the existing images from one the bucket to a temporary bucket and make the lambda function listen on the s3 event.
Pretty simple .. eh?
Well, here is the catch. I needed to control the rate at which the objects were copied in order to make sure that I do not shoot it over the roof and make the lambda function throttle. By default, an AWS account has a limit on how many lambda invocation can be made in parallel - only 1000/account.
I was using AWS CLI to copy the image from one bucket to another, and pretty soon, I was hitting that bottleneck, and the lambda functions started to throttle.
Solution.
Luckily, AWS CLI S3 has some configurations to tweak concurrency settings, which I could easily tweak to adjust to my need.
Setting the max_concurrent_requests
in your aws config (~/.aws/config)
s3 =
max_concurrent_requests = 500
max_queue_size = 10000
use_accelerate_endpoint = true
I was to specify the max_concurrent_requests
value, and after a few trial and errors and monitoring the results, I was able to control the objects transferred per second and able to keep it within limits.
Notes.
While in my case I wanted to throttle the no of objects that were copied, tweaking the same configuration would also allow us to copy the objects much faster for a different use case. If you have resources on your machine to spawn multiple threads, then you increase the value of the max_concurrent_requests
and have the objects copied much faster.
References:
https://docs.aws.amazon.com/cli/latest/topic/s3-config.html#configuration-values
Hope its helpful for someone out there.