2
votes

I'm working on a database access layer and have just noticed that Oracle 11g seems to have some issues handling non-latin characters in the aliases.

It seeems that characters over 0x7F, in an alias, appear to count as two characters as far as the 30 character alias length limit is concerned.

For instance in both Oracle SQL Developer and ODP.net:

SELECT
 LENGTH('ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔ') "ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔ"
FROM DUAL

Works and reports a string length of 15, however:

SELECT
 LENGTH('ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔx') "ÔÔÔÔÔÔÔÔÔÔÔÔÔÔÔx"
FROM DUAL

reports an ORA-00972: 'identifier too long' error.

This seems to imply that the alias string is being encoded in a way that means the accented characters are becoming two characters.

Is this expected and does anyone know what the actual restriction/encoding is here?

I need a reliable way to determine if a provided alias string is permitted.

For what it's worth the Oracle settings are as follows:

Client:

  • NLS_LANG = ENGLISH_UNITED KINGDOM.WE8MSWIN1252

Database:

  • NLS_CHARACTERSET = AL32UTF8
  • NLS_NCHAR_CHARACTERSET = AL16UTF16
1

1 Answers

0
votes

column_name in dba_tab_cols is a varchar2(30 byte). That means that it stores up to 30 bytes of data. Your database character set is UTF-8 so each character may require up to 3 bytes of data which would mean, worst case, that you might be limited to 10 characters. Assuming that all your identifiers use valid Windows-1252 characters, I don't think any characters would require more than 2 bytes of storage.

If you're trying to determine whether an identifier is valid from a client programming language

  • Convert the identifier to UTF-8
  • Get the byte length of the UTF-8 encoded identifier
  • Check if the byte length is greater than 30