Are incomplete Multipart Uploads in AWS S3 hurting your budget?
June 21, 2023
By Timothy Hunt
Are Multipart Uploads Hurting Your Budget?
What are Multipart uploads?
In simple terms, your client splits the file up into multiple parts, uploads them individually, then AWS S3 reassembles them at the other end. There are a number of benefits:
- you can upload in parallel, providing better throughput
- if you have a network failure, and one chunk fails, you only need to re-upload that one chunk
- you can upload parts over time, because multipart uploads don't expire
- you can begin an upload before you know the final size, for example if you're have streaming data
Wait a minute, what was that? Multipart uploads don't expire? So what happens if our upload fails, and we never complete it? Then the chunks that got uploaded stay there. Forever.
So what can we do about this? First, we identify where we have significant incomplete multipart uploads, and then we implement a Lifecycle Rule to delete them.
Identify buckets with incomplete multipart uploads
To identify the buckets with significant incomplete Multipart Uploads, we'll use S3 Storage Lens. In the AWS S3 console, on the left sidebar, we’ll see a link to Storage Lens Dashboards.
If we've not worked in here yet, we will have a default-account-dashboard that we can use.
Partway down that page, we find a "Top N overview for [today's date]" section. In the Metric dropdown, we'll select "incomplete multipart upload bytes greater than 7 days old". If a Multipart Upload hasn't been completed within 7 days, it's likely that something failed and it's not going to recover (unless, of course, we are deliberately uploading parts over time, and that time can exceed 7 days. For our purposes, we'll assume that’s not the case). We also don't need to identify incomplete multipart uploads that are new, because we might still be in the process of uploading them.
That will give us the top 3 (by default) buckets that have stale incomplete multipart uploads. We can change that number up to a max of the top 25.
Create a lifecycle rule
For each of those buckets, we'll find them in the AWS S3 console. In the Management tab, we can Create a Lifecycle Rule.
We'll give it a meaningful name. We'll make sure we're applying this rule to all objects in the bucket, and AWS will ask us to confirm that we really mean that.
The action we want is "Delete expired object delete markers or incomplete multipart uploads", then select the "Delete incomplete multipart uploads" option from the new section of the form that appears.
We select 7 days, because we don't want to delete chunks when we're actively uploading them, and then we’ll create the rule.
It will take a couple of days for the rule to process the outstanding old incomplete multipart uploads, but before long, all the storage that the incomplete multipart uploads was using will no longer be consumed.
For a rough estimate of cost savings, if we already exceed 500TB/month and so we're in the cheapest tier of Standard S3, then the ~30TB in our example buckets that we're able to delete is costing us around $645/month, or around $7,700/year. The savings are greater if we're not storing as much and thus we are in a more expensive S3 tier.
Unless we have a very specific use case that means we expect multipart uploads to extend longer than 7 days, we should strongly consider applying this to all of our buckets in our Infrastructure as Code.