Sunday, December 27, 2009

Using Google Translate from PL/SQL

UPDATE, JANUARY 2013: Google no longer offers the Translate API for free: "Google Translate API v1 is no longer available as of December 1, 2011 and has been replaced by Google Translate API v2. Google Translate API v1 was officially deprecated on May 26, 2011. The decision to deprecate the API and replace it with the paid service was made due to the substantial economic burden caused by extensive abuse.More information.


In today's globalized world, being able to communicate in different languages is important. Personally, I'm struggling to get my level of Spanish above the "una cerveza, por favor" level, and Google Translate is a great tool that I use often.


Wouldn't it be great to have the power of Google Translate directly from SQL? Google exposes the translation service via a RESTful (JSON) API, so I decided to write a small PL/SQL wrapper for it.

Here is the package specification:

create or replace package google_translate_pkg
as

/*

Purpose:    PL/SQL wrapper package for Google Translate API

Remarks:   see http://code.google.com/apis/ajaxlanguage/documentation/ 

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

-- http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
g_lang_AFRIKAANS               constant varchar2(5) := 'af';
g_lang_ALBANIAN                constant varchar2(5) := 'sq';
g_lang_AMHARIC                 constant varchar2(5) := 'am';
g_lang_ARABIC                  constant varchar2(5) := 'ar';
g_lang_ARMENIAN                constant varchar2(5) := 'hy';
g_lang_AZERBAIJANI             constant varchar2(5) := 'az';
g_lang_BASQUE                  constant varchar2(5) := 'eu';
g_lang_BELARUSIAN              constant varchar2(5) := 'be';
g_lang_BENGALI                 constant varchar2(5) := 'bn';
g_lang_BIHARI                  constant varchar2(5) := 'bh';
g_lang_BULGARIAN               constant varchar2(5) := 'bg';
g_lang_BURMESE                 constant varchar2(5) := 'my';
g_lang_CATALAN                 constant varchar2(5) := 'ca';
g_lang_CHEROKEE                constant varchar2(5) := 'chr';
g_lang_CHINESE                 constant varchar2(5) := 'zh';
g_lang_CHINESE_SIMPLIFIED      constant varchar2(5) := 'zh-CN';
g_lang_CHINESE_TRADITIONAL     constant varchar2(5) := 'zh-TW';
g_lang_CROATIAN                constant varchar2(5) := 'hr';
g_lang_CZECH                   constant varchar2(5) := 'cs';
g_lang_DANISH                  constant varchar2(5) := 'da';
g_lang_DHIVEHI                 constant varchar2(5) := 'dv';
g_lang_DUTCH                   constant varchar2(5) := 'nl';  
g_lang_ENGLISH                 constant varchar2(5) := 'en';
g_lang_ESPERANTO               constant varchar2(5) := 'eo';
g_lang_ESTONIAN                constant varchar2(5) := 'et';
g_lang_FILIPINO                constant varchar2(5) := 'tl';
g_lang_FINNISH                 constant varchar2(5) := 'fi';
g_lang_FRENCH                  constant varchar2(5) := 'fr';
g_lang_GALICIAN                constant varchar2(5) := 'gl';
g_lang_GEORGIAN                constant varchar2(5) := 'ka';
g_lang_GERMAN                  constant varchar2(5) := 'de';
g_lang_GREEK                   constant varchar2(5) := 'el';
g_lang_GUARANI                 constant varchar2(5) := 'gn';
g_lang_GUJARATI                constant varchar2(5) := 'gu';
g_lang_HEBREW                  constant varchar2(5) := 'iw';
g_lang_HINDI                   constant varchar2(5) := 'hi';
g_lang_HUNGARIAN               constant varchar2(5) := 'hu';
g_lang_ICELANDIC               constant varchar2(5) := 'is';
g_lang_INDONESIAN              constant varchar2(5) := 'id';
g_lang_INUKTITUT               constant varchar2(5) := 'iu';
g_lang_IRISH                   constant varchar2(5) := 'ga';
g_lang_ITALIAN                 constant varchar2(5) := 'it';
g_lang_JAPANESE                constant varchar2(5) := 'ja';
g_lang_KANNADA                 constant varchar2(5) := 'kn';
g_lang_KAZAKH                  constant varchar2(5) := 'kk';
g_lang_KHMER                   constant varchar2(5) := 'km';
g_lang_KOREAN                  constant varchar2(5) := 'ko';
g_lang_KURDISH                 constant varchar2(5) := 'ku';
g_lang_KYRGYZ                  constant varchar2(5) := 'ky';
g_lang_LAOTHIAN                constant varchar2(5) := 'lo';
g_lang_LATVIAN                 constant varchar2(5) := 'lv';
g_lang_LITHUANIAN              constant varchar2(5) := 'lt';
g_lang_MACEDONIAN              constant varchar2(5) := 'mk';
g_lang_MALAY                   constant varchar2(5) := 'ms';
g_lang_MALAYALAM               constant varchar2(5) := 'ml';
g_lang_MALTESE                 constant varchar2(5) := 'mt';
g_lang_MARATHI                 constant varchar2(5) := 'mr';
g_lang_MONGOLIAN               constant varchar2(5) := 'mn';
g_lang_NEPALI                  constant varchar2(5) := 'ne';
g_lang_NORWEGIAN               constant varchar2(5) := 'no';
g_lang_ORIYA                   constant varchar2(5) := 'or';
g_lang_PASHTO                  constant varchar2(5) := 'ps';
g_lang_PERSIAN                 constant varchar2(5) := 'fa';
g_lang_POLISH                  constant varchar2(5) := 'pl';
g_lang_PORTUGUESE              constant varchar2(5) := 'pt-PT';
g_lang_PUNJABI                 constant varchar2(5) := 'pa';
g_lang_ROMANIAN                constant varchar2(5) := 'ro';
g_lang_RUSSIAN                 constant varchar2(5) := 'ru';
g_lang_SANSKRIT                constant varchar2(5) := 'sa';
g_lang_SERBIAN                 constant varchar2(5) := 'sr';
g_lang_SINDHI                  constant varchar2(5) := 'sd';
g_lang_SINHALESE               constant varchar2(5) := 'si';
g_lang_SLOVAK                  constant varchar2(5) := 'sk';
g_lang_SLOVENIAN               constant varchar2(5) := 'sl';
g_lang_SPANISH                 constant varchar2(5) := 'es';
g_lang_SWAHILI                 constant varchar2(5) := 'sw';
g_lang_SWEDISH                 constant varchar2(5) := 'sv';
g_lang_TAJIK                   constant varchar2(5) := 'tg';
g_lang_TAMIL                   constant varchar2(5) := 'ta';
g_lang_TAGALOG                 constant varchar2(5) := 'tl';
g_lang_TELUGU                  constant varchar2(5) := 'te';
g_lang_THAI                    constant varchar2(5) := 'th';
g_lang_TIBETAN                 constant varchar2(5) := 'bo';
g_lang_TURKISH                 constant varchar2(5) := 'tr';
g_lang_UKRAINIAN               constant varchar2(5) := 'uk';
g_lang_URDU                    constant varchar2(5) := 'ur';
g_lang_UZBEK                   constant varchar2(5) := 'uz';
g_lang_UIGHUR                  constant varchar2(5) := 'ug';
g_lang_VIETNAMESE              constant varchar2(5) := 'vi';
g_lang_WELSH                   constant varchar2(5) := 'cy';
g_lang_YIDDISH                 constant varchar2(5) := 'yi';
g_lang_UNKNOWN                 constant varchar2(5) := '';


-- translate a piece of text
function translate_text (p_text in varchar2,
p_to_lang in varchar2,
p_from_lang in varchar2 := null,
p_use_cache in varchar2 := 'YES') return varchar2;

-- detect language code for text
function detect_lang (p_text in varchar2) return varchar2;

-- get number of texts in cache
function get_translation_cache_count return number;

-- clear translation cache
procedure clear_translation_cache;

end google_translate_pkg;
/




And here is the package body:

create or replace package body google_translate_pkg
as

/*

Purpose:    PL/SQL wrapper package for Google Translate API

Remarks:   see http://code.google.com/apis/ajaxlanguage/documentation/ 

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

m_http_referrer                constant varchar2(255) := 'your-domain-name-or-website-here'; -- insert your domain/website here (required by Google's terms of use)
m_api_key                      constant varchar2(255) := null; -- insert your Google API Key here (optional but recommended)

m_service_url                  constant varchar2(255) := 'http://ajax.googleapis.com/ajax/services/language/';
m_service_version              constant varchar2(10)  := '1.0';

m_max_text_size                constant pls_integer   := 500; -- can be increased up towards 32k, the cache name size (below) must be increased accordingly 

type t_translation_cache is table of varchar2(32000) index by varchar2(550);

m_translation_cache            t_translation_cache;
m_cache_id_separator           constant varchar2(1) := '|';


procedure add_to_cache (p_from_text in varchar2,
p_from_lang in varchar2,
p_to_text in varchar2,
p_to_lang in varchar2)
as
begin

/*

Purpose:    add translation to cache

Remarks:    

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

m_translation_cache (p_from_lang || m_cache_id_separator || p_to_lang || m_cache_id_separator || replace(substr(p_from_text,1,m_max_text_size), m_cache_id_separator, '')) := p_to_text;

end add_to_cache;


function get_from_cache (p_text in varchar2,
p_from_lang in varchar2,
p_to_lang in varchar2) return varchar2
as
l_returnvalue varchar2(32000);
begin

/*

Purpose:    get translation from cache

Remarks:    

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

begin
l_returnvalue := m_translation_cache (p_from_lang || m_cache_id_separator || p_to_lang || m_cache_id_separator || replace(substr(p_text,1,m_max_text_size), m_cache_id_separator, ''));
exception
when no_data_found then
l_returnvalue := null;
end;

return l_returnvalue;

end get_from_cache;


function get_clob_from_http_post (p_url in varchar2,
p_values in varchar2) return clob
as
l_request     utl_http.req;
l_response    utl_http.resp;
l_buffer      varchar2(32767);
l_returnvalue clob := ' ';
begin

/*

Purpose:    do a HTTP POST and get results back in a CLOB

Remarks:    

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

l_request := utl_http.begin_request (p_url, 'POST', utl_http.http_version_1_1);

utl_http.set_header (l_request, 'Referer', m_http_referrer); -- note that the actual header name is misspelled in the HTTP protocol
utl_http.set_header (l_request, 'Content-Type', 'application/x-www-form-urlencoded');
utl_http.set_header (l_request, 'Content-Length', to_char(length(p_values)));
utl_http.write_text (l_request, p_values);

l_response := utl_http.get_response (l_request);

if l_response.status_code = utl_http.http_ok then

begin
loop
utl_http.read_text (l_response, l_buffer);
dbms_lob.writeappend (l_returnvalue, length(l_buffer), l_buffer);
end loop;
exception
when utl_http.end_of_body then
null;
end;

end if;

utl_http.end_response (l_response);

return l_returnvalue;

end get_clob_from_http_post;


function translate_text (p_text in varchar2,
p_to_lang in varchar2,
p_from_lang in varchar2 := null,
p_use_cache in varchar2 := 'YES') return varchar2
as
l_values      varchar2(2000);
l_response    clob;
l_start_pos   pls_integer;
l_end_pos     pls_integer;
l_returnvalue varchar2(32000) := null;
begin

/*

Purpose:    translate a piece of text

Remarks:    if the "from" language is left blank, Google Translate will attempt to autodetect the language

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created
MBR     25.12.2009  Added cache for translations

*/

if trim(p_text) is not null then

if p_use_cache = 'YES' then
l_returnvalue := get_from_cache (p_text, p_from_lang, p_to_lang);
end if;

if l_returnvalue is null then

l_values := 'v=' || m_service_version || '&q=' || utl_url.escape (substr(p_text,1,m_max_text_size), false, 'UTF8') || '&langpair=' || p_from_lang || '|' || p_to_lang;

if m_api_key is not null then
l_values := l_values || '&key=' || m_api_key;
end if;

l_response := get_clob_from_http_post (m_service_url || 'translate', l_values);

if l_response is not null then

l_start_pos := instr(l_response, '{"translatedText":"');
l_start_pos := l_start_pos + 19;
l_end_pos := instr(l_response, '"', l_start_pos);

l_returnvalue := substr(l_response, l_start_pos, l_end_pos - l_start_pos);

if (p_use_cache = 'YES') and (l_returnvalue is not null) then
add_to_cache (p_text, p_from_lang, l_returnvalue, p_to_lang);
end if;

end if;

end if;

end if;

return l_returnvalue;

end translate_text;


function detect_lang (p_text in varchar2) return varchar2
as
l_url         varchar2(2000);
l_response    clob;
l_start_pos   pls_integer;
l_end_pos     pls_integer;
l_returnvalue varchar2(255);
begin

/*

Purpose:    detect language code for text

Remarks:    

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

if trim(p_text) is not null then

l_url := m_service_url || 'detect?v=' || m_service_version || '&q=' || utl_url.escape (substr(p_text,1,m_max_text_size), false, 'UTF8');

if m_api_key is not null then
l_url := l_url || '&key=' || m_api_key;
end if;

l_response := httpuritype(l_url).getclob();

l_start_pos := instr(l_response, '{"language":"');
l_start_pos := l_start_pos + 13;
l_end_pos := instr(l_response, '",', l_start_pos);

l_returnvalue := substr(l_response, l_start_pos, l_end_pos - l_start_pos);

end if;

return l_returnvalue;

end detect_lang;


function get_translation_cache_count return number
as
l_returnvalue number;
begin

/*

Purpose:    get number of texts in cache

Remarks:    

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

l_returnvalue := m_translation_cache.count;

return l_returnvalue;

end get_translation_cache_count;


procedure clear_translation_cache
as
begin

/*

Purpose:    clear translation cache

Remarks:    

Who     Date        Description
------  ----------  -------------------------------------
MBR     25.12.2009  Created

*/

m_translation_cache.delete;

end clear_translation_cache;


end google_translate_pkg;
/






So, let's take the package for a test drive!

Detecting languages



Google can (try to) figure out which language a specific text is in.

select google_translate_pkg.detect_lang ('hola mundo') as detect1,
google_translate_pkg.detect_lang ('ich bin ein berliner') as detect2
from dual

detect1              detect2             
-------------------- --------------------
es                   de                  

1 row selected.



Translating text



This is the most common usage of the package. Note that if you leave out the "from" language parameter, Google will attempt to autodetect the language.

select google_translate_pkg.translate_text ('excuse me, where is the toilet?', 'es') as the_phrase
from dual

the_phrase                              
----------------------------------------
Disculpe, ¿dónde está el baño?          

1 row selected.



Mass translation



Here is an example that translates several rows -- the product descriptions from the demo products table (from the Application Express demo application) -- into several languages.

select pi.product_id, pi.product_name, pi.product_description,
google_translate_pkg.translate_text (pi.product_description, 'es', 'en') as spanish_description,
google_translate_pkg.translate_text (pi.product_description, 'de', 'en') as german_description
from demo_product_info pi
order by pi.product_name


product_id product_name         product_description            spanish_description            german_description            
---------- -------------------- ------------------------------ ------------------------------ ------------------------------
3 Bluetooth Headset    Hands-Free without the wires!  Manos libres sin cables!       Hands-Free ohne Kabel!        
8 Classic Projector    Does not include transparencie No incluye transparencias o lá Enthält keine Folien oder Fett
s or grease pencil             piz de grasa                   stift                         

2 MP3 Player           Store up to 1000 songs and tak Almacena hasta 1000 canciones  Speichern Sie bis zu 1000 Song
e them with you                y llevarlos con usted          s, und nehmen Sie sie mit     

4 PDA Cell Phone       Combine your cell phone and PD Combine su teléfono celular y  Kombinieren Sie Ihre Handy und
A into one device              PDA en un solo dispositivo      PDA in einem Gerät           

5 Portable DVD Player  Small enough to take anywhere! Lo suficientemente pequeño com Klein genug, um überall hin mi
o para tener en cualquier luga tnehmen!                      
r!                             tnehmen!                      

10 Stereo Headphones    Noise-cancelling headphones pe El ruido auriculares con perfe Noise-Cancelling-Kopfhörer ide
rfect for the traveler         cta para el viajero            al für den Reisenden          

9 Ultra Slim Laptop    The power of a desktop in a po El poder de una computadora de Die Leistung eines Desktop in 
rtable design                   escritorio en un diseño portá ein tragbares Design          
rtable design                  til                            ein tragbares Design          

1 3.2 GHz Desktop PC   All the options, this machine  Todas las opciones, se carga e Alle Optionen, ist diese Masch
is loaded!                     sta máquina!                   ine geladen!                  

6 512 MB DIMM          Expand your PCs memory and gai Amplíe su PC la memoria y obte Erweitern Sie Ihren PC Speiche
n more performance             ner más rendimiento            r und gewinnen mehr Leistung  

7 54" Plasma Flat Scre Mount on the wall or ceiling,  Montar en la pared o el techo, Montage auf der Wand oder Deck
en                   the picture is crystal clear!   el panorama es claro!         e, das Bild ist glasklar!     


10 rows selected.



Apex translation



Combine this package with the Apex dictionary views to get a kick-start when you are translating your own Apex applications into other languages (some tweaking of the results is probably necessary...).

select item_name, label,
google_translate_pkg.translate_text (label, 'es', 'en') as spanish_label,
google_translate_pkg.translate_text (label, 'nl', 'en') as dutch_label
from apex_application_page_items
where application_id = 103
and label is not null
and display_as <> 'Hidden'



item_name            label                spanish_label        dutch_label         
-------------------- -------------------- -------------------- --------------------
P101_PASSWORD        Password             Contraseña           Wachtwoord          
P101_USERNAME        User Name            Nombre de usuario    Gebruikersnaam      
P11_CUSTOMER_ID      Customer             Cliente              Klant               
P29_CUSTOMER_INFO    Customer Info        Información del clie Customer Info       
nte                                      
P29_ORDER_TIMESTAMP  Order Date           Fecha de pedido      Orderdatum          
P29_ORDER_TOTAL      Order Total          Orden total          Bestel Totaal       
P29_USER_ID          Sales Rep            Sales Rep            Vertegenwoordiger   


7 rows selected.



Caching



I have built in a simple cache mechanism, so that you avoid the network traffic if the phrase has already been translated in your current session (and the performance benefit can be huge if you repeatedly translate the same strings).
You can specify whether you want to use the cache (it's on by default), and there is also a procedure to clear (reset) the cache.


Conclusion



It is time to throw out your old-fashioned dictionaries and fire up SQL Plus instead :-)