6
votes

I have a database with the below NLS settings

NLS_NCHAR_CHARACTERSET  - AL16UTF16

NLS_CHARACTERSET - AL32UTF8

There's a table with a clob column storing a base64 encoded data. Since the characters are mostly english and letters, I would assume each character takes up 1 byte only as clob using the charset of NLS_CHARACTERSET for encoding.

With a inline enabled clob column, the clob will be stored inline unless it goes more that 4096 bytes in size. However, when I tried to store a set of data with 2048 chars, I found that it is not stored inline (By checking the table DBA_TABLES). So does it mean each character is not using only 1 byte? Can anyone elaborate on this?

Another test added: Create a table with clob column with chunk size 8kb so that initial segment size is 65536 bytes. After insert a row with 32,768 chars in clob column. The 2nd extent creation can be told by querying dba_segments.

1
Is you column really a CLOB or NCLOB ? - Sylvain Leroux
@SylvainLeroux It is CLOB indeed. I even tried to create a new table and the observation is the same. - Quincy
Since the CLOB contains base64 encoded data, why not base64 decode it and store the bytes in a BLOB instead? First you would halve the bytes because of the CLOB stored as UCS-2 (Rustys answer), then you would reduce the size 25% further? The drawback would be that you probably would have to base64 encode it when you query it out, so it would be a choice between using more space (CLOB) or using more CPU (BLOB with decode/encode). - Kim Berg Hansen
@KimBergHansen This is a good suggestion. But I cant do anything with the definition as it's already in production. I am just trying to evaluate the disk usage. - Quincy

1 Answers

7
votes

http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch6unicode.htm#r2c1-t12

It says:

Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8. This means that the storage space required for an English document doubles when the data is converted

So it looks like CLOB internally stores everything as UCS-2 (Unicode), i.e. 2 bytes fixed per symbol. Consequently, it stores inline 4096/2 = 2048 chars.