NAAM
Oracle Character sets
Aino Andriessen
1
Demo1
2
nls_length_semantics
Intializatie parameter
CHAR of BYTE (default)
Van toepassing op multi byte character sets
Definieert het type voor de lengte van character
kolommen en variabelen
alter session set nls_length_semantics=CHAR;
niet met terugwerkende kracht
ev pl/sql recompile
alter system
4
nls_length_semantics 2
lengte van karakter kolommen en variabelen
expliciet opgeven
create table demo (naam varchar2(4 char))
create table demo (naam varchar2(4 byte))
t_naam varchar2(4 char);
t_naam demo2.naam%TYPE
5
Demo2
6
Character encoding
8
Character set
Character set definieert de 'mapping' tussen
binary/headecimale code en het character
UTF8
WE8MSWIN1252
WE8ISO8859P1
JA16EUC
US7ASCII
WE8DEC
...
Code pages
IBM / windows terminologie
~ analoog met character set
code page per language
9
Character sets 2
ASCII
1 byte
128 karakters
standaard letters uit het engels zonder accenten
ISO 8859 en latin-1
1 byte (8 bit)
256 karakters
CP-1252
Windows variant op latin 1
UTF8
variabel, multibyte
max 4 bytes
~100000 karakters
•
~1 miljoen beschikbaar
meertalig
ascii codes zijn gelijk
10
Voorbeelden
Character Set
Hexadecimale code - Euro
AL32UTF8
E282AC
WE8MSWIN1252
80
ASCII
-
WE8ISO8859P1
-
WE8ISO8859P15
164
Character Set
Hexadecimale code - é
AL32UTF8
C3A9 (50089)
WE8MSWIN1252
E9 (233)
ASCII
-
WE8ISO8859P1
E9
WE8ISO8859P15
E9
11
Unicode / UTF 8 example
The image shows the number of bytes needed to store different kinds of characters in the
UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and
Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes.
The supplementary character (treble clef sign) requires 4 bytes of storage.
12
Diakrieten en speciale tekens
Diakrieten zijn accenten die bij (boven, onder of
zelfs door) een letter gezet worden om de uitspraak
van een letter te veranderen en daarmee taaleigen
klanken van een (gewijzigde) letter te voorzien.
àÿęňĜş etc.
Speciale tekens
ßæ¿
13
Diakrieten en speciale tekens
Single byte character sets
1 byte voor samengesteld karakter
Niet alle combinaties mogelijk
code pages
UTF-8
diakriet heeft eigen codering
samengesteld karakter heeft eigen codering
•
meestal (altijd) samenstelling van oorspronkelijke karakter +
diakriet
14
Database functies
Character functies
substr - substrb - substrc - substr2
instr - ...
length - lengthb
chr (n)
Returns a character corresponding to the number passed in as the argument in the
database character set
select chr (50089) from dual;
dump
Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal
representation of expr. The returned result is always in the database character set.
select dump (naam, 1017) from demo2;
convert
Converts a character string from one character set to another
utl_raw
select utl_raw.cast_to_raw(naam) from demo2;
unistr()
Converts the characters in x to the national language character set
select (unistr('Ren\00e9')) from dual;
15
Demo3
16
nls_lang
Client character set
When the client NLS_LANG character set is set to
the same value as the database character set,
Oracle assumes that the data being sent or
received are of the same (correct) encoding, so no
conversions or validations may occur for
performance reasons. The data is just stored as
delivered by the client, bit by bit.
18
nls lang 2
language_country.character set
american_america.UTF8
dutch_the netherlands.WE8MSWIN1252
american_THE NETHERLANDS.WE8MSWIN1252
Environment variable, nls_lang
Verschil in Windows GUI (WE8MSWIN1252) en
command line (WE8PC850)
Wordt niet door Java clients gebruikt
19
Demo4
20
National character set
Support for another character set next to the
database character set
e.g to allow japanese in a MSWIN1252 or ISO8859
character set
Less necessary in a UTF8 database
Multibyte
nvarchar, nclob etc.
22
Case
TELETEX karakterset
bestaat niet meer in Oracle
select convert(naam,’TELETEX’,’UTF8’) from
tabel;
Locale builder
23
sql> select name from emp
sql> select name from emp@db
sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw
(name)) from emp@db
sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw@db
(name)) from emp@db
25
Vraag
Diacrietloos zoeken
Case insensitive zoeken
26
Summary
nls_lenght_semantics
Always explicitly define a character column with its
type (CHAR or BYTE)
Oracle performs automatic character set
conversion
wysinawyg
Use a Java client
Working with character sets can be confusing
UTF8 is often the preferred character set
27
Referenties
Unicode en Ultraedit
http://www.ultraedit.com/support/tutorials_power_tips/ultr
aedit/unicode.html
nls_lang
http://www.oracle.com/technology/tech/globalization/htdo
cs/nls_lang%20faq.htm
Oracle globalization support
http://download.oracle.com/docs/cd/B28359_01/server.1
11/b28298/toc.htm
Wikipedia
28