10

Three Ways to Count the Objects in an AWS S3 Bucket

 3 years ago
source link: https://fuzzyblog.io/blog/aws/2019/10/24/three-ways-to-count-the-objects-in-an-aws-s3-bucket.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Three Ways to Count the Objects in an AWS S3 Bucket

Oct 24, 2019

IMG_7776.jpeg

AWS S3, "simple storage service", is the classic AWS service. It was the first to launch, the first one I ever used and, seemingly, lies at the very heart of almost everything AWS does.

Given that S3 is essentially a filesystem, a logical thing is to be able to count the files in an S3 bucket. Illustrated below are three ways.

Method 1: aws s3 ls

S3 is fundamentally a filesystem and you can just call ls on it. Yep – ls in the cloud. blink

aws s3 ls s3://adl-ohi/ --recursive --summarize | grep "Total Objects:"
Total Objects: 444803

Method 2: aws s3api

And since S3 is a modern filesystem, it actually has an API that you can call. Yep – a json api. blink blink

aws s3api list-objects --bucket adl-ohi --output json --query "[length(Contents[])]"
[
    448444
]

Method 3: A Python Example

Naturally you can just run code to do all this. I started with an example from the Stack Overflow link below that was written for boto and upgraded it to boto3 (as still a Python novice, I feel pretty good about doing this successfully; I remember when Ruby went thru the same AWS v2 to v3 transition and it sucked there too). I also learned how to dynamically introspect methods from Python objects as part of this debugging cycle.

#!/usr/local/bin/python

import sys
import boto3

s3 = boto3.resource('s3')
s3bucket = s3.Bucket(sys.argv[1])
size = 0
totalCount = 0

for key in s3bucket.objects.all():
    totalCount += 1
    size += key.size

print('total size:')
print("%.3f GB" % (size*1.0/1024/1024/1024))
print('total count:')
print(totalCount)

which gives output like this:

python3 scratch/count_s3.py adl-ohi
total size:
0.298 GB
total count:
486468

Note: I have a live upload happening on another machine so the numbers do change and that's actually fine.

References


Posted In: #aws #s3


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK