Monday, March 28, 2011

Amazon S3 API for PL/SQL



Amazon S3 is part of Amazon's Web Service offering and the name is an abbreviation for Simple Storage Service:
"Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers."
A few months ago, Jason Straub published an Oracle whitepaper on how to integrate Oracle Application Express (Apex) with Amazon S3.

As Jason points out, Amazon has a Free Usage Tier which allows you to get started using Amazon S3 for free. If you have ever bought a book from Amazon, they already have your credit card on file, so signing up for Amazon Web Services is quick and easy (and they won't start charging your credit card until the free trial period is over).

Introducing the S3 API for PL/SQL

Inspired by Jason's whitepaper, I decided to write a stand-alone PL/SQL API for Amazon S3. This API can be used in any PL/SQL solution, with or without Apex.

The API supports all common S3 operations, including

  • Authentication
  • Creating new buckets
  • Listing existing buckets
  • Listing existing objects (with or without filtering)
  • Creating (uploading) new objects (and setting an Access Control List - ACL)
  • Generating download links (with or without expiry dates)
  • Downloading objects
  • Deleting objects


See the examples below for more details.

Use Cases

So what can you do with Amazon's S3 service in combination with a PL/SQL API?

I can think of several interesting use cases, some of which I might explore further in future posts:

  • Backing up your database (use DBMS_DATAPUMP to dump a file to disk, then compress it using ZIP_UTIL_PKG, then encrypt it using CRYPTO_UTIL_PKG, and upload it to S3)
  • Backing up your PL/SQL source code (use data dictionary views or DBMS_METADATA to extract the source code, optionally zip and/or encrypt it, and upload to S3)
  • Backing up your Apex applications (use WWV_FLOW_UTILITIES.EXPORT_APPLICATION_TO_CLOB to generate export file, optionally zip and/or encrypt it, and upload to S3)
  • Cloud storage for file uploads (instead of storing [large] files inside your database, store them in the cloud and download them on demand -- especially relevant for Oracle XE which has a file size limit)
  • Serve static content (generate static [text, CSV, HTML, PDF] files from the database and upload to S3)
  • Replication or shared storage (upload from one machine/database, download to another)
  • Data loading or message processing (set up to poll for new incoming files - uploaded by other S3 clients - and process them)

Remember that all these things can be scheduled to run in the database using DBMS_JOB or DBMS_SCHEDULER.


Where to get the Amazon S3 API for PL/SQL

You can download the API as part of the Alexandria Utility Library for PL/SQL.


Getting started

Download and compile the relevant PL/SQL API packages. Then register with Amazon for an S3 account and get your AWS keys (key and secret key), and login to the AWS Management Console to get familiar with the basic operations.



If you are unfamiliar with Amazon S3, I recommend that you read this short getting started guide that describes the common operations.

In the following examples we shall see how you can do the same operations using PL/SQL.


Authentication

From your Amazon account control panel, you'll get the key strings you need to use the Amazon web services.

Before you call any of the following API methods, you must initialize the authentication package. You only have to do this once per database session (but remember, on the web, every page view is a separate database session, so in Apex you'll need to run this code for every page, typically as a Before Header page process).



Creating new buckets

Buckets are what you use to organize your objects (files) in Amazon S3. Think of them as top-level folders, but note that you cannot create more than 100 buckets in a single account, and the bucket name must be unique across all user accounts on Amazon S3. So creating buckets is not really something you'd do very often, and usually it will be done manually (to resolve any name conflicts with existing buckets).

A bucket is associated with a specific region where your objects will be stored. For reasons of latency/speed and possibly legal issues, it makes sense to select a region that's close to you and your users (although you may actually want to locate it far away if the purpose is backup for a major disaster in your own area).

Here's how to create a new bucket via PL/SQL code:


Checking the AWS management console to verify that the bucket has indeed been created (in the specified region):



Listing existing buckets

With one or more buckets created in your account, you can list the bucket names.

There are two way to do this, either by retrieving an index-by PL/SQL table using the GET_BUCKET_LIST function:



or, alternatively, via SQL using a pipelined function named GET_BUCKET_TAB:



Creating (uploading) new objects

An "object" is a file, and this is really what the S3 service is all about, storing files. So let's upload a file or two to our new bucket!

The API lets you upload any BLOB data to S3 using the NEW_OBJECT procedure.



When you upload a file to S3, the default Access Control List (ACL) makes sure that only the owner of the file (you!) can access (download) it.

Others get an "Access Denied" message (but see the "Generating download links" section for how to generate special time-limited download links):



There are a number of predefined ACLs that you can specify if, for example, you want to make the file publicly available.



Which can then be freely downloaded by anyone (the use of HTTPS is optional).


A note about "folders": S3 has no concept of "folders", but you can simulate folders by using a forward slash in your file names (as seen in the previous example). Some S3 clients, such as the AWS management console, will present such files in a folder structure. As far as the PL/SQL API is concerned, the slash is simply part of the file name and has no special meaning.

Listing existing objects

Now that we have uploaded a couple of files, we can list the contents of the bucket via the GET_OBJECT_LIST function:



You can also get a list in SQL via a pipelined function named GET_OBJECT_TAB:



In both cases, you can optionally specify a prefix that acts as a search filter for the file names you want to return, and/or the maximum number of items you want to return.



Generating download links

You can access a file that has been protected by an ACL by including a special checksum parameter in the URL.

The GET_DOWNLOAD_URL function lets you generate the URL needed to access the file. You can specify when the link should expire, so this means you can share a download link that will stop working after a specified amount of time, which can obviously be useful in a number of scenarios.



Pasting the generated URL into the browser allows us to access the file:



Downloading objects

Downloading a file from S3 using PL/SQL is straightforward with a call to the GET_OBJECT function which returns a BLOB:



Deleting objects

Removing a file is likewise very simple, just call the DELETE_OBJECT procedure:




Summary

The ability to upload and download any file from the Oracle database to "the cloud" (and vice versa) via PL/SQL is extremely useful for a number of purposes.

Let me know if you find this API useful!

48 comments:

  1. Very nice post and definitely useful.

    Thanks,
    Dimitri

    ReplyDelete
  2. Awesome, thanks for sharing.

    ReplyDelete
  3. Very nice Morten! It will be put to good use, rest assured!

    H.

    ReplyDelete
  4. You have opened up a lot of possibilites with this API. Nice work!

    ReplyDelete
  5. Morten, you are the best. Thanks for doing this. I had designs on something similar, but would have never done it as well.

    I am having one issue. I am able to list my buckets fine, but when I go to put a new object up there, I get;

    ORA-20000: The request signature we calculated does not match the signature you provided. Check your key and signing method.
    ORA-06512: at "PROD.AMAZON_AWS_S3_PKG", line 41
    ORA-06512: at "PROD.AMAZON_AWS_S3_PKG", line 69
    ORA-06512: at "PROD.AMAZON_AWS_S3_PKG", line 708
    ORA-06512: at line 13

    Same aws id/key, but this error. I suspect something to do with the timestamp, but am not clear why this would not fail for both calls (get_bucket_list and new_object). I can provide far more detail in email if needed. I suspect you may not have much time to answer, but I would appreciate a quick word on what possible issues might be.

    Again, you rock! Thanks so much.

    Andy

    ReplyDelete
  6. @Andy: When you upload a file, the file name, content-type (and optionally ACL) are also part of the calculated signature. What are your values for those parameters?

    Also, try putting a call to debug_pkg.debug_on (and remember to enable DBMS output) before you call the new_object procedure. That should give you the full error message (including details of the signature that Amazon expects).

    - Morten

    ReplyDelete
  7. Hi Morten,

    I double checked my parameters and realized I did have the wrong content-type which may have been part of the issue.

    After solving that I was getting a whole new issue, HTTP client error 403 - Forbidden.

    Tried playing with permissions of the buckets, but no luck.

    Created a new bucket, set permissions, and voila, it worked.

    Now in business. Thanks Again. Great stuff.

    Andy

    ReplyDelete
  8. AnonymousMay 31, 2011

    extremely helpful, thanks alot for this work.

    Fadi.

    ReplyDelete
  9. Hi. I tried to compile the package but it says that identifier T_STR_ARRAY must be declared. How should I declare this type?

    Thank you,
    Mike

    ReplyDelete
  10. @Anonymous: The declaration of T_STR_ARRAY (and other types) are found in the /pub/setup/types.sql file in the Alexandria PL/SQL Library download (zip) file.

    - Morten

    ReplyDelete
  11. Hello,
    Thanks for sharing this Package.

    I am getting this error when I try to create a bucket:

    begin
    TEST.debug_pkg.debug_on;
    TEST.amazon_aws_s3_pkg.new_bucket('my-bDDDucket-name');
    end;

    The error:
    ORA-29273: HTTP request failed
    ORA-06512: at "SYS.UTL_HTTP", line 1130
    ORA-24247: network access denied by access control list (ACL)
    ORA-06512: at "TEST.AMAZON_AWS_S3_PKG", line 177
    ORA-06512: at "TEST.AMAZON_AWS_S3_PKG", line 457
    ORA-06512: at line 3
    29273. 00000 - "HTTP request failed"
    *Cause: The UTL_HTTP package failed to execute the HTTP request.
    *Action: Use get_detailed_sqlerrm to check the detailed error message.
    Fix the error and retry the HTTP request.


    -----------------------------

    Keeping in mind that this procedure is executed fine:

    DECLARE
    http_req utl_http.req;
    http_resp utl_http.resp;
    BEGIN
    http_req := utl_http.begin_request('www.yahoo.com');
    http_resp := utl_http.get_response(http_req);
    utl_http.end_response(http_resp);
    END;

    ------------
    Any help please ?

    Regards,
    Fateh

    ReplyDelete
  12. @Fateh: The error message is pretty clear, "ORA-24247: network access denied by access control list (ACL)" means that you are on Oracle 11g and that you need to adjust the ACL, which by default blocks all network access. You should open for traffic to the Amazon AWS domain ("*.amazonaws.com") on port 80.

    See, for example, this article for more info: http://blog.whitehorses.nl/2010/03/17/oracle-11g-access-control-list-and-ora-24247/

    - Morten

    ReplyDelete
  13. Thanks, It is working fine now... Sorry to keep pestering you.

    But I got another error:
    The difference between the request time and the current time is too large.

    I understand that I have to sync my server clock with Amazon.

    This is my amazon console link :
    https://console.aws.amazon.com/s3/home?region=us-west-2

    I am in Dubai-UAE

    I tried to alter the session time:
    alter session set time_zone='US/Arizona';


    But got the same error.
    The difference between the request time and the current time is too large.

    Any help please ?

    Regard,

    Fateh

    ReplyDelete
  14. @Fateh: The amazon_aws_auth_pkg.init() procedure has a parameter to set your offset from GMT time. Use this to set your current time zone, before you call any other AWS operation.

    - Morten

    ReplyDelete
  15. Frank HogenesMay 05, 2014

    Great solution, thanks !

    Is there a known issue when having > 999 files in a bucket ?
    Query
    select *
    from table(amazon_aws_s3_pkg.get_object_tab('my-bucket-name')) t
    ends up in a infinite loop on Oracle 11.2.0.3.0 and I have pinpointed this issue to the number of files.

    Thanks, Frank

    ReplyDelete
  16. Morten,

    This is fabulous. But one quick question: Have you tried this on an Oracle on Amazon RDS database yet? I'd be curious as to your experiences with that platform. I don't see why it wouldn't work, except that RDS has restrictions on what libraries and accounts can be accessed.

    Thanks,

    John Julian

    ReplyDelete
  17. Hi,

    this is exactly what i was looking for and it's amazing!

    I have set it up, and overcome quite a few initial problems and errors.

    I can list by buckets, create a new bucket, but i cannot list the contents of a bucket.

    When trying to retrieve the contents of a bucket
    SELECT * FROM table (amazon_aws_s3_pkg.get_object_tab('my-new-bucket'));

    i'm getting this error

    (i've changed the RequestId and HostId just in case)

    I would be much obliged if anyone could help me on this, since i've been pounding on it for two days.

    Thank you.
    Teo

    ReplyDelete
  18. @John Julian Sr: I had a look in the documentation for Oracle on Amazon RDS (see http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Oracle.html) and it says that "Network access utilities such as utl_http, utl_tcp, utl_smtp, and utl_mail, are not supported at this time.". So anything that uses webservices (like this package) won't work with RDS, unfortunately. Ironic, given that RDS and S3 are services provided by the same company!

    - Morten

    ReplyDelete
  19. @Teo: I can't see the actual error message you posted (perhaps it contained XML tags and was "eaten" by the Blogger software?). You might want to re-post with the error message in plain text.

    - Morten

    ReplyDelete
  20. @Morten

    Hi Morten and thanks for the reply. I found that the problem was from the marker parameter that allows get_object_tab to return multiple times. I neglected it so it returns the full results an once and it's working.

    I do have another question though, if you can help.

    The key values that i want to use contain Greek characters. nls_language in oracle is set to
    AMERICAN
    as our windows client machines have the following setting in registry for NLS_LANG
    AMERICAN_AMERICA.EL8MSWIN1253

    Are there any other settings for the aws s3 i could try to see if this works?

    Thank you

    ReplyDelete
  21. @Morten,

    Everything works fine, but i do have one last problem.

    I have created a bucket that contains directories that use Greek characters. I can view these directories and their contents correctly from the amazon s3 web console.

    Also if a issue from dos prompt the following
    aws s3 ls s3://my-new-bucket

    the contents are displayed correctly.

    But when i do this from pl/sql using
    SELECT * FROM table (amazon_aws_s3_pkg.get_object_tab('my-new-bucket'));

    the returning key values have incorrect characters (gibberish).

    I'm using oracle XE 10g and NLS_CHARACTERSET is set to AL32UTF8.


    Teo

    ReplyDelete
  22. Dear Morten,
    Thanks for your contributions.
    It worked perfectly for a Bucket in US region. But I got the below error when I tried to upload files to a Bucket in Frankfurt.

    ORA-20000: Please re-send this request to the specified temporary endpoint. Continue to use the original request endpoint for future requests.

    * I searched the net, but to no avail

    ReplyDelete
  23. @Anonymous: See this issue for more info about buckets in the Frankfurt AWS region: https://github.com/mortenbra/alexandria-plsql-utils/issues/24

    - Morten

    ReplyDelete
  24. Hi Morten, wow what a gr8 job!!

    It is like superHero now that i need to migrate around 500000 blobs from Oracle cmsdk to S3 :)


    I just want to know if it is possible to install ONLY the amazon_s3 related packages, and which of the others are needed....
    i suppose the types sql script is mandatory ... and which other?

    Thanks a lot, i will let you know how it goes :)

    ReplyDelete
  25. @ILSabbe: Thanks, glad the package is useful to you.

    To do a minimal install, first install the "core" packages and types:

    https://github.com/mortenbra/alexandria-plsql-utils/blob/master/setup/install_core.sql

    Then install the Amazon (S3) packages:

    https://github.com/mortenbra/alexandria-plsql-utils/blob/master/setup/install_amazon.sql

    - Morten

    ReplyDelete
  26. Very useful library.
    Thank you for your job! :)

    ReplyDelete
  27. Hi Morten,
    RDS now supports utl_file and utl_tcp and all that.
    If you have an object in S3 and want to push it into an Oracle directory in RDS (which is now also allowed)... I'm struggling to adapt what you have to do that.
    Have you thought about giving this utility a once-over, now that what Oracle can do in RDS has been extended a fair bit? Pulling files into db from S3 that way would be super helpful.
    Thank you!

    ReplyDelete
  28. @anonymous: Downloading an object from S3 is explained in the "Downloading objects" section. You call amazon_aws_s3_pkg.get_object () and it returns a BLOB. You can do whatever you want with this BLOB, including saving it to a directory on your database server using utl_file.

    - Morten

    ReplyDelete
  29. Hi Morten,
    Writing files to S3 now requires sigv4. Replacing dbms_crypto.hmac_sh1 with hmac_sh256 doesn't seem to do the trick, as even a GET fails with "The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.". Looking at sigv4 docs, looks like what is encrypted with it is also different, not just how it gets encrypted. Tried to tinker with the code, but not getting far. Any chance you might be working on fixing up the package, so it can be operational again?
    Thank you!

    ReplyDelete
  30. @Anonymous: Please see this issue regarding the AWS signature v4:

    https://github.com/mortenbra/alexandria-plsql-utils/issues/24

    - Morten

    ReplyDelete
  31. AnonymousMay 09, 2018

    Hi Morten,

    I did see issue 24. Sounds like everyone is having the same problem, namely, the answer to your question of, "Have you implemented the full signing method as described in the Amazon docs for AWS signature version 4?" being "tried but did not succeed".... So I think everyone was kind of hoping you might have done it successfully. I certainly tried building the signature, but getting its complexity in correctly all the way way till the end was a problem.

    ReplyDelete
  32. Brian McGinityAugust 18, 2018

    Great package. Very useful. To make this work over https, install stunnel https://www.stunnel.org/ and then add to the config this:

    [AWS-S3]
    client = yes
    accept = 127.0.0.1:1921
    connect = s3.amazonaws.com:443
    ;sslVersion = TLSv1
    ;sslVersion = TLSv1.2

    and then change:
    g_aws_url_s3 constant varchar2(255) := 'http://s3.amazonaws.com/';

    to:
    g_aws_url_s3 constant varchar2(255) := 'http://localhost:1921/';

    ReplyDelete
  33. Brian McGinityAugust 18, 2018

    Also there is a bug when in the get_object_list procedure.

    When fetching multiple rows and if the p_next_continuation_token contains certain characters, it errors with: "ORA-20000: The continuation token provided is incorrect:

    This is because utl_url.escape() oes not work correctly (really this is an Oracle bug which has never been fixed.

    To fix this change this:
    l_clob := make_request (get_url(p_bucket_name) || '?list-type=2&continuation-token=' || utl_url.escape(p_next_continuation_token) || '&max-keys=' || p_max_keys || '&prefix=' || utl_url.escape(p_prefix), 'GET', l_header_names, l_header_values, null);

    to:
    l_clob := make_request (get_url(p_bucket_name) || '?list-type=2&continuation-token=' || eUrl(p_next_continuation_token) || '&max-keys=' || p_max_keys || '&prefix=' || eUrl(p_prefix), 'GET', l_header_names, l_header_values, null);

    and use this function:

    CREATE OR REPLACE function eUrl(p_str varchar2) return varchar2 is
    l_str varchar2(32000) := regexp_replace(p_str, '[^[:print:]]');
    l_tmp varchar2(32000);
    l_len number default length(l_str);
    l_bad varchar2(100) default ' >%}\~];?@&<#{|^[`/:=$+''"' || chr(10);
    l_char char(1);
    begin

    if ( l_str is NULL ) then
    return NULL;
    end if;

    for i in 1 .. l_len loop
    l_char := substr(l_str,i,1);
    if ( instr( l_bad, l_char ) > 0 ) then
    l_tmp := l_tmp || '%' || to_char(ascii(l_char),'fm0X');
    else
    l_tmp := l_tmp || l_char;
    end if;
    end loop;

    return l_tmp;

    end;
    /


    ReplyDelete
  34. I need to write to a bucket with a directory. What would be the syntax to accomplish this as bucket_name/directly_name gives me 15:22:02 ORA-12545: Connect failed because target host or object does not exist?

    ReplyDelete
  35. I need to write to bucket with a sub directory, but when I attempt to do so I get the error below:

    15:22:02 ORA-12545: Connect failed because target host or object does not exist

    What would be the syntax to do this?
    amazon_aws_s3_pkg.new_object ('bucket/directory',
    'test.csv',
    l_blob,
    'text/csv');

    ReplyDelete
  36. I was a having a problem writing a file to a sub directory of a bucket.
    The following was giving errors:

    amazon_aws_s3_pkg.new_object ('bucket_name/directory_name',
    'test.csv',
    l_blob,
    'text/csv');

    Once I realized that directories were in actuality not buckets, the solution was straightforward:

    amazon_aws_s3_pkg.new_object ('bucket_name',
    'directory_name/test.csv',
    l_blob,
    'text/csv');
    This is the url that it builds:
    https://bucket_name.s3.amazonaws.com/directory_name/test.csv

    ReplyDelete
  37. How do you install when you aren't using apex?

    I am seeing this message on the AMAZON_AWS_S3_PKG package
    Error: PLS-00201: identifier 'WWV_FLOW_UTILITIES.URL_ENCODE2' must be declared

    ReplyDelete
  38. How do you install without apex?

    AMAZON_AWS_S3_PKG gives this error
    Error: PLS-00201: identifier 'WWV_FLOW_UTILITIES.URL_ENCODE2' must be declared

    ReplyDelete
  39. Hey this blog is really awesome and was really useful in our project. But we are stuck with creating sub folders within S3 using pl/sql... so Please if you could tell us that how do we create a subfolder inside the S3 bucket ... it would be really helpful..

    Thanks ans Regards,
    Nisha K

    ReplyDelete
  40. @Nisha K: There is no concept of folders in S3, but you can use slash in key names, which some clients present as folders, see http://stackoverflow.com/questions/1939743/amazon-s3-boto-how-to-create-folder

    So using the PL/SQL API you would simply do it like this:

    amazon_aws_s3_pkg.new_object ('my-bucket-name', 'my-new-folder/some-subfolder/the-file-name.gif', l_blob, 'image/gif');

    - Morten

    ReplyDelete
  41. Hi Mortan,
    This is really helpful.Could you please advise if we will be able to use this package to transfer files from RDS directory to S3 bucket which is in a different account.

    ReplyDelete
  42. Hi Morten,

    Will you be able to help regarding connecting to Amazon Seller Partner - API using PL/SQL.

    It will be a great help.

    Thank you.

    ReplyDelete
  43. Hi Morten, how are you? I know the post is a little old, but today I need to perform the aws s3 integration using the packages you provided. I do everything as described in your tutorial but I get the error:

    now-28239 no key provided
    ora-06512 dbms_crypto_ffi

    ora-06512 dbms_crypto

    If you still read the message and manage to help me, I would appreciate it.

    Daniel
    daniel@asinc.com.br

    ReplyDelete
  44. I'm Getting this error in each function! What I've missed!?

    Error at line 1
    ORA-20000: Your request contains duplicate headers.
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 41
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 95
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 608
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 656
    ORA-06512: at line 5

    ReplyDelete
  45. I'm getting this error in each. what I've missed?

    Error at line 1
    ORA-20000: Your request contains duplicate headers.
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 41
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 95
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 608
    ORA-06512: at "PROD1.AMAZON_AWS_S3_PKG", line 656
    ORA-06512: at line 5

    ReplyDelete
  46. Getting this message! What I've missed?
    Error at line 1
    ORA-20000: Your request contains duplicate headers.
    ORA-06512: at "AMAZON_AWS_S3_PKG", line 41
    ORA-06512: at "AMAZON_AWS_S3_PKG", line 95
    ORA-06512: at "AMAZON_AWS_S3_PKG", line 608
    ORA-06512: at "AMAZON_AWS_S3_PKG", line 656

    ReplyDelete
  47. @PC: The "duplicate headers" error is due to some change at Amazon. You can find the issue discussed (and the fix you need to do) here: https://github.com/mortenbra/alexandria-plsql-utils/issues/83

    ReplyDelete