I recently needed to retrieve/download a comma-separated values (CSV) file from a website, and insert the data in an Oracle database table.
After googling around a bit, I found various pieces of the solution on AskTom, ExpertsExchange and other sites, which I put together in the following generic utility package for CSV files.
Usage
Because I have implemented the main parsing routine as a pipelined function, you can process the data either using straight SQL, or in a PL/SQL program.
For example, you can retrieve a download a CSV file as a clob directly from the web and return it as a table with a single statement:
select * from table(csv_util_pkg.clob_to_csv(httpuritype('http://www.foo.example/bar.csv').getclob()))
And maybe do a direct insert via INSERT .. SELECT :
insert into my_table (first_column, second_column) select c001, c002 from table(csv_util_pkg.clob_to_csv(httpuritype('http://www.foo.example/bar.csv').getclob()))
You can of course also use SQL to filter the results (although this may affect performance):
select * from table(csv_util_pkg.clob_to_csv(httpuritype('http://www.foo.example/bar.csv').getclob())) where c002 = 'Chevy'
Or you can do it in a more procedural fashion, like this:
create table x_dump (clob_value clob, dump_date date default sysdate, dump_id number); declare l_clob clob; cursor l_cursor is select csv.* from x_dump d, table(csv_util_pkg.clob_to_csv(d.clob_value)) csv where d.dump_id = 1; begin l_clob := httpuritype('http://www.foo.example/bar.csv').getclob(); insert into x_dump (clob_value, dump_id) values (l_clob, 1); commit; dbms_lob.freetemporary (l_clob); for l_rec in l_cursor loop dbms_output.put_line ('row ' || l_rec.line_number || ', col 1 = ' || l_rec.c001); end loop; end;
Auxiliary functions
There are a few additional functions in the package that are not necessary for normal usage, but may be useful if you are doing any sort of lower-level CSV parsing. The csv_to_array function operates on a single CSV-encoded line (so to use this you would have to split the CSV lines yourself first, and feed them one by one to this function):
declare l_array t_str_array; l_val varchar2(4000); begin l_array := csv_util_pkg.csv_to_array ('10,SMITH,CLERK,"1200,50"'); for i in l_array.first .. l_array.last loop dbms_output.put_line('value ' || i || ' = ' || l_array(i)); end loop; -- should output SMITH l_val := csv_util_pkg.get_array_value(l_array, 2); dbms_output.put_line('value = ' || l_val); -- should give an error message stating that there is no column called DEPTNO because the array does not contain seven elements -- leave the column name out to fail silently and return NULL instead of raising exception l_val := csv_util_pkg.get_array_value(l_array, 7, 'DEPTNO'); dbms_output.put_line('value = ' || l_val); end;
Installation
In order to compile the package, you will need these SQL types in your schema:
create type t_str_array as table of varchar2(4000); / create type t_csv_line as object ( line_number number, line_raw varchar2(4000), c001 varchar2(4000), c002 varchar2(4000), c003 varchar2(4000), c004 varchar2(4000), c005 varchar2(4000), c006 varchar2(4000), c007 varchar2(4000), c008 varchar2(4000), c009 varchar2(4000), c010 varchar2(4000), c011 varchar2(4000), c012 varchar2(4000), c013 varchar2(4000), c014 varchar2(4000), c015 varchar2(4000), c016 varchar2(4000), c017 varchar2(4000), c018 varchar2(4000), c019 varchar2(4000), c020 varchar2(4000) ); / create type t_csv_tab as table of t_csv_line; /UPDATE 04.04.2012: The latest version of the package itself (CSV_UTIL_PKG) can be found as part of the Alexandria Utility Library for PL/SQL.
Performance
On my test server (not my laptop), it takes about 35 seconds to process 12,000 rows in CSV format. I don't consider this super-fast, but probably fast enough for many CSV processing scenarios.
If you have any performance-enhancing tips, do let me know!
Bonus: Exporting CSV data
You can also use this package to export CSV data, for example by using a query like this.
select csv_util_pkg.array_to_csv (t_str_array(company_id, company_name, company_type)) as the_csv_data from company order by company_name THE_CSV_DATA -------------------------------- 260,Acorn Oil & Gas,EXT 261,Altinex,EXT 262,Amerada Hess,EXT 263,Atlantic Petroleum,EXT 264,Beryl,EXT 265,BG,EXT 266,Bow Valley Energy,EXT 267,BP,EXT
This might come in handy, even in these days of XML and JSON ... :-)