preloader

Advanced AWS CLI with Jupyter Notebooks (Part 2)

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
Advanced AWS CLI with Jupyter Notebooks (Part 2)

Published on Jul 29, 2021 by Raghuveer Varahagiri

Advanced AWS CLI with Jupyter Notebooks (Part 2). [Getting started with managing Amazon S3 with AWS CLI]

In a previous post we saw how to setup Jupyter Notebooks to work with the AWS CLI natively. In this installment, we will build on that and bring some additional power to the table by combining a bit of Python magic. We will take Amazon S3 as the service to work through the examples. Let’s start.

Setup Notebook for AWS CLI

First we quickly go through the same steps as we did previously to get this notebook to work with AWS CLI – setting up enviornment variables, iPython shell aliases, and enabling auto-magic.

In []:
%env AWS_PROFILE = handsonaws-demo
Out[]:
env: AWS_PROFILE=handsonaws-demo
In []:
%rehashx
In []:
%automagic 1
Out[]:

    
    Automagic is ON, % prefix IS NOT needed for line magics.
In []:
aws whoami
Out[]:

    {
        "UserId": "AIDA6DXXXXXXXXW52WEP",
        "Account": "97XXXXXX5541",
        "Arn": "arn:aws:iam::97XXXXXX5541:user/iamadmin"
    }

As noted in previous post, even though we have enabled auto-magic here same commands will still work better only with the shell magic prefix !. Luckily if you run into one of those, the fix is as simple as inserting the ! prefix for that cell alone. You do not need to turn off auto-magic for the notebook or do anything else that impacts other cells.

Amazon S3 Basics

AWS CLI offers two ways of interacting with S3 buckets and objects – the s3 and s3api commands. The former is a higher level command that offers a limited number of abstracted S3 operations that makes working with S3 buckets and objects very easy. The latter s3api more closely replicates all the API methods offered by S3 and allows more fine-grained control when you need it. We will look at mostly s3 here.

There are also two other S3 commands offered by the CLI – the s3control and s3outposts – that are for specialized purposes which are outside the scope of this post. s3control command gives you access to control plane operations on S3 like S3 batch operations, Public Access blocking, and Storage Lens.

One useful reference is the help documentation available through the CLI itself. The subcommand help displays the help documentation for each command and the list of all subcommands available under it.

Here is one place where the regular automagic doesn’t work (At least for me. The aws ... help command seems to internally invoke the cat command to display the help manual pages – which for some reason does not work from within my notebook setup on Win10, even when I alias cat command to type. Need to look deeper into this.)

The cell magic %%bash comes in handy here. The output is verbose – collapsing the output or enabling auto-scroll on the output makes it more manageable in the notebook context. You can scroll-up to this cell whenever you need to refer to it.

In []:
%%bash
aws s3 help
Out[]:
s3
^^

Description
***********
This section explains prominent concepts and notations in the set of
high-level S3 commands provided.

    ....

Available Commands
******************

* cp
* ls
* mb
* mv
* presign
* rb
* rm
* sync
* website
In []:
%%bash
aws s3api help
Out[]:
s3api
^^^^^

Available Commands
******************

* abort-multipart-upload
* complete-multipart-upload
* copy-object
* create-bucket
* create-multipart-upload
* delete-bucket
* delete-bucket-analytics-configuration
* delete-bucket-cors
...
* put-object-tagging
* put-public-access-block
* restore-object
* select-object-content
* upload-part
* upload-part-copy
* wait

As you can see the s3api command has a much larger set of commands closely replicating the API methods.

We will start with the basics of s3 command - listing buckets and objects, uploading and downloading objects, etc – with an eye on some quirks.

One thing you will notice right away is that s3 command provides a more free-form text output and does not support the --output parameter where you can force the output to show up as JSON or YAML. We will see that the s3api command on the other hand provides JSON output by default like most other CLI commands.

Reading S3 Buckets

List all buckets you own –

In []:
aws s3 ls
Out[]:

    2021-07-04 21:32:39 cf-templates-1bwhqi89f3g43-us-east-1
    2021-06-08 09:26:48 raghuvv-training-detailed-billing-reports-legacy
    2021-04-10 08:25:58 s3demo-20210410
    2021-04-10 12:28:19 s3demo-20210410122735
    2021-07-13 20:12:37 s3demo-logs

List all objects of a specific bucket –

In []:
aws s3 ls s3://s3demo-20210410/
Out[]:

                               PRE testprefix/
    2021-07-13 17:32:28     164736 aws.jpg
    2021-07-13 17:25:35     455833 casestudy.pdf
    2021-07-13 18:51:44      53237 india.jpg
    2021-07-15 17:10:31          0 test1.txt
    2021-07-15 16:10:23          0 test2.txt

By default any prefixes are listed with a PRE preceding it and any objects with that prefix are ommited. This mimics the more common directory structure that we see in our local machines’ file systems. To recursively list all objects in the bucket – including those in the folders (more correctly prefixes) –

In []:
aws s3 ls s3://s3demo-20210410/ --recursive
Out[]:

    2021-07-13 17:32:28     164736 aws.jpg
    2021-07-13 17:25:35     455833 casestudy.pdf
    2021-07-13 18:51:44      53237 india.jpg
    2021-07-15 17:10:31          0 test1.txt
    2021-07-15 16:10:23          0 test2.txt
    2021-07-24 22:30:51          0 testprefix/
    2021-07-24 23:06:03         13 testprefix/hello.txt

The file sizes are listed in total number of bytes by default. To make them more easily understandable by humans you can use the --human-readable option that shows the file sizes in Bytes/MiB/KiB/GiB/TiB/PiB/EiB.

In []:
aws s3 ls s3://s3demo-20210410/ --recursive --human-readable
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-24 22:30:51    0 Bytes testprefix/
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt

You can use the --summarize option to display a summary of total number of objects and their aggregate size.

In []:
aws s3 ls s3://s3demo-20210410/ --recursive --human-readable --summarize
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-24 22:30:51    0 Bytes testprefix/
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    
    Total Objects: 7
       Total Size: 658.0 KiB

Note that wherever the s3 subcommand does not deal with the local file system, you can drop the s3:// prefix to the bucketname as there is no ambiguity (unlike for eaxmple the cp command we will see later). But you might still prefer to keep the prefix anyway to make it very obvious to the reader (which could be yourself in the future).

Here is an example with the s3:// prefix omitted.

In []:
aws s3 ls s3demo-20210410 --recursive --human-readable --summarize
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-24 22:30:51    0 Bytes testprefix/
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    
    Total Objects: 7
       Total Size: 658.0 KiB

At this stage if you prefer this format of ‘directory listing’ for all your buckets you can save yourself some keystrokes and create an alias shortcut for this. (We covered aws aliases in the previous post.)

In []:
aws s3dir s3demo-20210410
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-24 22:30:51    0 Bytes testprefix/
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    
    Total Objects: 7
       Total Size: 658.0 KiB

As you may have guessed the above command uses an alias s3dir as a shorthand for s3 ls --recursive --human-readable --summarize. You can echo the contents of the alias file for reference –

In []:
!type C:\Users\Raghu\.aws\cli\alias
Out[]:

    [toplevel]
    
    whoami = sts get-caller-identity
    
    s3dir = s3 ls --recursive --human-readable --summarize 

One thing you may not anticipate is that the S3 ls command returns a list of all objects and prefixes whose key matches the string provided – even if partially – as if it were a wildcard match. Here is an example –

In []:
aws s3 ls s3://s3demo-20210410/test
Out[]:

                               PRE testprefix/
    2021-07-15 17:10:31          0 test1.txt
    2021-07-15 16:10:23          0 test2.txt
In []:
aws s3dir s3://s3demo-20210410/test
Out[]:

    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-24 22:30:51    0 Bytes testprefix/
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    
    Total Objects: 4
       Total Size: 13 Bytes

Writing to S3 buckets

Let’s create a file locally and upload it to S3.

In []:
echo helloworld > demo.txt
In []:
aws s3 cp demo.txt s3://s3demo-20210410/testprefix
Out[]:

    Completed 13 Bytes/13 Bytes (7 Bytes/s) with 1 file(s) remaining
    upload: .\demo.txt to s3://s3demo-20210410/testprefix           

While the above may look correct, there is a simple mistake that might trip up beginners. If you are thinking of uploading a file to a “folder” (more correctly a prefix) in a S3 bucket then you should take care to include the trailing forward-slash / at the end of the prefix. Otherwise the last string testprefix in the example command above is assumed to be the name of the object itself in the target bucket (root).

In []:
aws s3dir s3demo-20210410
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-25 16:08:31   13 Bytes testprefix
    2021-07-24 22:30:51    0 Bytes testprefix/
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    
    Total Objects: 8
       Total Size: 658.0 KiB

As you can see, a new object by name “testprefix” (size 13 Bytes) got created. An no new object is uploaded under the prefix. Let us get rid of this mistake and upload it correctly – using a training / after the prefix.

In []:
aws s3 rm s3://s3demo-20210410/testprefix
Out[]:

    delete: s3://s3demo-20210410/testprefix
In []:
aws s3 cp demo.txt s3://s3demo-20210410/testprefix/
Out[]:

    Completed 13 Bytes/13 Bytes (8 Bytes/s) with 1 file(s) remaining
    upload: .\demo.txt to s3://s3demo-20210410/testprefix/demo.txt  

You can see the command reports the file uploaded with the correct prefix and key this time. By default the keyname in target bucket stays the same as the local filename.

In []:
aws s3dir s3demo-20210410
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-25 16:20:38    0 Bytes testprefix/
    2021-07-25 16:21:25   13 Bytes testprefix/demo.txt
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    
    Total Objects: 8
       Total Size: 658.0 KiB

You can specify a key name explicitly if you want the uploaded file to have a different filename than the local machine.

In []:
aws s3 cp demo.txt s3://s3demo-20210410/testprefix/newdemo.txt
Out[]:

    Completed 13 Bytes/13 Bytes (7 Bytes/s) with 1 file(s) remaining
    upload: .\demo.txt to s3://s3demo-20210410/testprefix/newdemo.txt

Similar to recursive listing of objects, you can also upload to S3 recursively – namely including all folders and subfolders under it – using the --recursive parameter.

I have a folder structure setup here locally for demo purposes with a few files in them. There is also one subfolder (subfolder2a) that is empty.

In []:
!dir demoupload /s /b
Out[]:

    D:\dev\code\awscli\demoupload\folder1
    D:\dev\code\awscli\demoupload\folder2
    D:\dev\code\awscli\demoupload\folder3
    D:\dev\code\awscli\demoupload\test.txt
    D:\dev\code\awscli\demoupload\folder1\test.txt
    D:\dev\code\awscli\demoupload\folder2\subfolder2a
    D:\dev\code\awscli\demoupload\folder2\test.txt
    D:\dev\code\awscli\demoupload\folder3\subfolder3a
    D:\dev\code\awscli\demoupload\folder3\subfolder3b
    D:\dev\code\awscli\demoupload\folder3\test.txt
    D:\dev\code\awscli\demoupload\folder3\subfolder3a\test.txt
    D:\dev\code\awscli\demoupload\folder3\subfolder3b\test.txt
In []:
aws s3 cp demoupload s3://s3demo-20210410/demoupload
Out[]:

    upload failed: demoupload\ to s3://s3demo-20210410/demoupload Need to rewind the stream , but stream is not seekable.

Here we tried uploading a folder without the --recursive parameter, and hence this error. This leaves a 0 byte object named demoupload in the target bucket that we need to delete.

Note that we also failed to provide a trailing / after the target prefix but that is optional when you use the --recursive parameter.

In []:
aws s3 cp demoupload s3://s3demo-20210410/demoupload --recursive
Out[]:

    Completed 13 Bytes/78 Bytes (7 Bytes/s) with 6 file(s) remaining
    upload: demoupload\test.txt to s3://s3demo-20210410/demoupload/test.txt
    Completed 13 Bytes/78 Bytes (7 Bytes/s) with 5 file(s) remaining
    Completed 26 Bytes/78 Bytes (14 Bytes/s) with 5 file(s) remaining
    upload: demoupload\folder3\subfolder3b\test.txt to s3://s3demo-20210410/demoupload/folder3/subfolder3b/test.txt
    Completed 26 Bytes/78 Bytes (14 Bytes/s) with 4 file(s) remaining
    Completed 39 Bytes/78 Bytes (21 Bytes/s) with 4 file(s) remaining
    upload: demoupload\folder3\subfolder3a\test.txt to s3://s3demo-20210410/demoupload/folder3/subfolder3a/test.txt
    Completed 39 Bytes/78 Bytes (21 Bytes/s) with 3 file(s) remaining
    Completed 52 Bytes/78 Bytes (28 Bytes/s) with 3 file(s) remaining
    upload: demoupload\folder2\test.txt to s3://s3demo-20210410/demoupload/folder2/test.txt
    Completed 52 Bytes/78 Bytes (28 Bytes/s) with 2 file(s) remaining
    Completed 65 Bytes/78 Bytes (34 Bytes/s) with 2 file(s) remaining
    upload: demoupload\folder1\test.txt to s3://s3demo-20210410/demoupload/folder1/test.txt
    Completed 65 Bytes/78 Bytes (34 Bytes/s) with 1 file(s) remaining
    Completed 78 Bytes/78 Bytes (41 Bytes/s) with 1 file(s) remaining
    upload: demoupload\folder3\test.txt to s3://s3demo-20210410/demoupload/folder3/test.txt
In []:
aws s3dir s3demo-20210410
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-25 16:50:48   13 Bytes demoupload/folder1/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder2/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/subfolder3a/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/subfolder3b/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/test.txt
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-25 16:20:38    0 Bytes testprefix/
    2021-07-25 16:21:25   13 Bytes testprefix/demo.txt
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    2021-07-25 16:30:11   13 Bytes testprefix/newdemo.txt
    
    Total Objects: 15
       Total Size: 658.1 KiB

You may notice that no empty prefix named subfolder2a was created in the target bucket as the source folder was empty.

If you specifically want to create an empty folder (prefix) in a target bucket, you need to use the put-object operation of the s3api command as follows.

In []:
aws s3api put-object --bucket s3demo-20210410 --key newfolder/
Out[]:

    {
        "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\""
    }
In []:
aws s3dir s3demo-20210410
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-25 16:50:48   13 Bytes demoupload/folder1/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder2/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/subfolder3a/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/subfolder3b/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/test.txt
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-25 16:58:28    0 Bytes newfolder/
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-25 16:20:38    0 Bytes testprefix/
    2021-07-25 16:21:25   13 Bytes testprefix/demo.txt
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    2021-07-25 16:30:11   13 Bytes testprefix/newdemo.txt
    
    Total Objects: 16
       Total Size: 658.1 KiB

Combining the power of Python with CLI

Using Jupyter notebooks we can bring a bit of Python magic into the mix to make life easier and interesting.

Python Variables

For a start we can assign frequently used values to a python variable and reuse them inside the AWs CLI commands using the {} characters to enclose the variable names.

In []:
bucketname = "s3demo-20210410"
In []:
aws s3dir {bucketname}
Out[]:

    2021-07-13 17:32:28  160.9 KiB aws.jpg
    2021-07-13 17:25:35  445.1 KiB casestudy.pdf
    2021-07-25 16:50:48   13 Bytes demoupload/folder1/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder2/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/subfolder3a/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/subfolder3b/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/folder3/test.txt
    2021-07-25 16:50:48   13 Bytes demoupload/test.txt
    2021-07-13 18:51:44   52.0 KiB india.jpg
    2021-07-25 16:58:28    0 Bytes newfolder/
    2021-07-15 17:10:31    0 Bytes test1.txt
    2021-07-15 16:10:23    0 Bytes test2.txt
    2021-07-25 16:20:38    0 Bytes testprefix/
    2021-07-25 16:21:25   13 Bytes testprefix/demo.txt
    2021-07-24 23:06:03   13 Bytes testprefix/hello.txt
    2021-07-25 16:30:11   13 Bytes testprefix/newdemo.txt
    
    Total Objects: 16
       Total Size: 658.1 KiB

Another handy use for python is to create a random string or timestamp that you want to use repeatedly within your notebook across multiple commands.

Date-time stamps that we will repeatedly use in constructing S3 bucket names or prefixes is one good use case.

In []:
from datetime import datetime

rundatestamp = datetime.now().strftime('%Y%m%d')
runtimestamp = datetime.now().strftime('%Y%m%d%H%M%S')

prefixbydate = datetime.now().strftime('%Y/%m/%d')
prefixbyhour = datetime.now().strftime('%Y/%m/%d/%H')

print(prefixbydate)
print(prefixbyhour)
Out[]:

    2021/07/28
    2021/07/28/18
In []:
aws s3 cp demo.txt s3://{bucketname}/{prefixbydate}/
Out[]:

    Completed 13 Bytes/13 Bytes (7 Bytes/s) with 1 file(s) remaining
    upload: .\demo.txt to s3://s3demo-20210410/2021/07/28/demo.txt  
In []:
aws s3dir {bucketname}/{prefixbydate}
Out[]:

    2021-07-28 19:00:05   13 Bytes 2021/07/28/demo.txt
    
    Total Objects: 1
       Total Size: 13 Bytes
In []:
newbucketname = "s3demo"
newbucketname = newbucketname+'-'+rundatestamp
print(newbucketname)
Out[]:

    s3demo-20210728

Or a string of random numbers to ensue an S3 bucketname is always unique.

In []:
import random
randomsuffix=str(random.randint(1000,9999))

newbucketname = "s3demo"
newbucketname = newbucketname+'-'+randomsuffix
print(newbucketname)
Out[]:

    s3demo-6423

Creating and Deleting Buckets

In []:
aws s3 mb s3://{newbucketname}
Out[]:

    make_bucket failed: s3://s3demo-3324 An error occurred (InvalidLocationConstraint) when calling the CreateBucket operation: The specified location-constraint is not valid
In []:
aws s3 mb s3://{newbucketname} --region us-east-1
Out[]:

    make_bucket: s3demo-6423

Using the %%time cell magic you can time the execution of any command block (cell) if that is of interest.

In []:
%%time

aws s3 cp demoupload s3://{newbucketname}/{prefixbydate}/ --recursive
Out[]:

    Completed 13 Bytes/78 Bytes (7 Bytes/s) with 6 file(s) remainingWall time: 8.4 s
    
    Completed 26 Bytes/78 Bytes (14 Bytes/s) with 6 file(s) remaining
    upload: demoupload\folder2\test.txt to s3://s3demo-6423/2021/07/28/folder2/test.txt
    Completed 26 Bytes/78 Bytes (14 Bytes/s) with 5 file(s) remaining
    Completed 39 Bytes/78 Bytes (22 Bytes/s) with 5 file(s) remaining
    upload: demoupload\folder1\test.txt to s3://s3demo-6423/2021/07/28/folder1/test.txt
    Completed 39 Bytes/78 Bytes (22 Bytes/s) with 4 file(s) remaining
    Completed 52 Bytes/78 Bytes (29 Bytes/s) with 4 file(s) remaining
    upload: demoupload\folder3\test.txt to s3://s3demo-6423/2021/07/28/folder3/test.txt
    Completed 52 Bytes/78 Bytes (29 Bytes/s) with 3 file(s) remaining
    upload: demoupload\folder3\subfolder3b\test.txt to s3://s3demo-6423/2021/07/28/folder3/subfolder3b/test.txt
    Completed 52 Bytes/78 Bytes (29 Bytes/s) with 2 file(s) remaining
    Completed 65 Bytes/78 Bytes (36 Bytes/s) with 2 file(s) remaining
    upload: demoupload\folder3\subfolder3a\test.txt to s3://s3demo-6423/2021/07/28/folder3/subfolder3a/test.txt
    Completed 65 Bytes/78 Bytes (36 Bytes/s) with 1 file(s) remaining
    Completed 78 Bytes/78 Bytes (44 Bytes/s) with 1 file(s) remaining
    upload: demoupload\test.txt to s3://s3demo-6423/2021/07/28/test.txt
In []:
aws s3dir {newbucketname}
Out[]:

    2021-07-30 00:48:43   13 Bytes 2021/07/28/folder1/test.txt
    2021-07-30 00:48:43   13 Bytes 2021/07/28/folder2/test.txt
    2021-07-30 00:48:43   13 Bytes 2021/07/28/folder3/subfolder3a/test.txt
    2021-07-30 00:48:43   13 Bytes 2021/07/28/folder3/subfolder3b/test.txt
    2021-07-30 00:48:43   13 Bytes 2021/07/28/folder3/test.txt
    2021-07-30 00:48:43   13 Bytes 2021/07/28/test.txt
    
    Total Objects: 6
       Total Size: 78 Bytes

Before you can delete a bucket you need to ensure it has no objects remaining in it. Or you would encounter a failure.

In []:
aws s3 rb s3://{newbucketname}
Out[]:

    remove_bucket failed: s3://s3demo-6423 An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: The bucket you tried to delete is not empty

Emptying a bucket involves recursively deleting all objects.

In []:
%%time

aws s3 rm s3://{newbucketname} --recursive
Out[]:

    delete: s3://s3demo-6423/2021/07/28/folder2/test.txt
    delete: s3://s3demo-6423/2021/07/28/folder3/test.txt
    delete: s3://s3demo-6423/2021/07/28/folder3/subfolder3a/test.txt
    delete: s3://s3demo-6423/2021/07/28/folder1/test.txt
    delete: s3://s3demo-6423/2021/07/28/test.txt
    delete: s3://s3demo-6423/2021/07/28/folder3/subfolder3b/test.txt
    Wall time: 8.78 s

Now you can remove the empty bucket

In []:
aws s3 rb s3://{newbucketname}
Out[]:

    remove_bucket: s3demo-6423

We will look at more bucket operations and the use of advanced CLI in subsequent installments.

Additional Tips & Notes

  1. Useful extensions for JupyterLab
  1. GNU Utils for Win32 – https://sourceforge.net/projects/unxutils/
  2. Git-bash for Windows – https://gitforwindows.org/

References

  1. AWS CLI (v2) reference for S3 command – https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/index.html
  2. AWS CLI (v2) reference for S3api command – https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3api/index.html
Categories:
Tags: