Amazon S3 is part of Amazon's Web Service offering and the name is an abbreviation for Simple Storage Service:
"Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers."A few months ago, Jason Straub published an Oracle whitepaper on how to integrate Oracle Application Express (Apex) with Amazon S3.
As Jason points out, Amazon has a Free Usage Tier which allows you to get started using Amazon S3 for free. If you have ever bought a book from Amazon, they already have your credit card on file, so signing up for Amazon Web Services is quick and easy (and they won't start charging your credit card until the free trial period is over).
Introducing the S3 API for PL/SQL
Inspired by Jason's whitepaper, I decided to write a stand-alone PL/SQL API for Amazon S3. This API can be used in any PL/SQL solution, with or without Apex.
The API supports all common S3 operations, including
- Authentication
- Creating new buckets
- Listing existing buckets
- Listing existing objects (with or without filtering)
- Creating (uploading) new objects (and setting an Access Control List - ACL)
- Generating download links (with or without expiry dates)
- Downloading objects
- Deleting objects
See the examples below for more details.
Use Cases
So what can you do with Amazon's S3 service in combination with a PL/SQL API?
I can think of several interesting use cases, some of which I might explore further in future posts:
- Backing up your database (use DBMS_DATAPUMP to dump a file to disk, then compress it using ZIP_UTIL_PKG, then encrypt it using CRYPTO_UTIL_PKG, and upload it to S3)
- Backing up your PL/SQL source code (use data dictionary views or DBMS_METADATA to extract the source code, optionally zip and/or encrypt it, and upload to S3)
- Backing up your Apex applications (use WWV_FLOW_UTILITIES.EXPORT_APPLICATION_TO_CLOB to generate export file, optionally zip and/or encrypt it, and upload to S3)
- Cloud storage for file uploads (instead of storing [large] files inside your database, store them in the cloud and download them on demand -- especially relevant for Oracle XE which has a file size limit)
- Serve static content (generate static [text, CSV, HTML, PDF] files from the database and upload to S3)
- Replication or shared storage (upload from one machine/database, download to another)
- Data loading or message processing (set up to poll for new incoming files - uploaded by other S3 clients - and process them)
Remember that all these things can be scheduled to run in the database using DBMS_JOB or DBMS_SCHEDULER.
Where to get the Amazon S3 API for PL/SQL
You can download the API as part of the Alexandria Utility Library for PL/SQL.
Getting started
Download and compile the relevant PL/SQL API packages. Then register with Amazon for an S3 account and get your AWS keys (key and secret key), and login to the AWS Management Console to get familiar with the basic operations.
If you are unfamiliar with Amazon S3, I recommend that you read this short getting started guide that describes the common operations.
In the following examples we shall see how you can do the same operations using PL/SQL.
Authentication
From your Amazon account control panel, you'll get the key strings you need to use the Amazon web services.
Before you call any of the following API methods, you must initialize the authentication package. You only have to do this once per database session (but remember, on the web, every page view is a separate database session, so in Apex you'll need to run this code for every page, typically as a Before Header page process).
Creating new buckets
Buckets are what you use to organize your objects (files) in Amazon S3. Think of them as top-level folders, but note that you cannot create more than 100 buckets in a single account, and the bucket name must be unique across all user accounts on Amazon S3. So creating buckets is not really something you'd do very often, and usually it will be done manually (to resolve any name conflicts with existing buckets).
A bucket is associated with a specific region where your objects will be stored. For reasons of latency/speed and possibly legal issues, it makes sense to select a region that's close to you and your users (although you may actually want to locate it far away if the purpose is backup for a major disaster in your own area).
Here's how to create a new bucket via PL/SQL code:
Checking the AWS management console to verify that the bucket has indeed been created (in the specified region):
Listing existing buckets
With one or more buckets created in your account, you can list the bucket names.
There are two way to do this, either by retrieving an index-by PL/SQL table using the GET_BUCKET_LIST function:
or, alternatively, via SQL using a pipelined function named GET_BUCKET_TAB:
Creating (uploading) new objects
An "object" is a file, and this is really what the S3 service is all about, storing files. So let's upload a file or two to our new bucket!
The API lets you upload any BLOB data to S3 using the NEW_OBJECT procedure.
When you upload a file to S3, the default Access Control List (ACL) makes sure that only the owner of the file (you!) can access (download) it.
Others get an "Access Denied" message (but see the "Generating download links" section for how to generate special time-limited download links):
There are a number of predefined ACLs that you can specify if, for example, you want to make the file publicly available.
Which can then be freely downloaded by anyone (the use of HTTPS is optional).
A note about "folders": S3 has no concept of "folders", but you can simulate folders by using a forward slash in your file names (as seen in the previous example). Some S3 clients, such as the AWS management console, will present such files in a folder structure. As far as the PL/SQL API is concerned, the slash is simply part of the file name and has no special meaning.
Listing existing objects
Now that we have uploaded a couple of files, we can list the contents of the bucket via the GET_OBJECT_LIST function:
You can also get a list in SQL via a pipelined function named GET_OBJECT_TAB:
In both cases, you can optionally specify a prefix that acts as a search filter for the file names you want to return, and/or the maximum number of items you want to return.
Generating download links
You can access a file that has been protected by an ACL by including a special checksum parameter in the URL.
The GET_DOWNLOAD_URL function lets you generate the URL needed to access the file. You can specify when the link should expire, so this means you can share a download link that will stop working after a specified amount of time, which can obviously be useful in a number of scenarios.
Pasting the generated URL into the browser allows us to access the file:
Downloading objects
Downloading a file from S3 using PL/SQL is straightforward with a call to the GET_OBJECT function which returns a BLOB:
Deleting objects
Removing a file is likewise very simple, just call the DELETE_OBJECT procedure:
Summary
The ability to upload and download any file from the Oracle database to "the cloud" (and vice versa) via PL/SQL is extremely useful for a number of purposes.
Let me know if you find this API useful!