0
votes

Greenplum external table loading HDFS data, the data is as follows:

    S1000001 ^ @ ^ 200001 ^ @ ^ 300001
    S1000002 ^ @ ^ 200002 ^ @ ^ 300002

Separator is ^ @ ^

In greenplum external table mode loading, can only use a single delimiter, is there any way to customize the delimiter? Best to have an example, thank you.

I tried to modify the greenplum source code, in the copy.c file, modify the following code, build the table can be successful, but the data is wrong.

/* single byte encoding such as ascii, latinx and other */
if (strlen(delim) != 1 && !delim_off)
   ereport(ERROR,
      (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
           errmsg("delimiter must be a single one-byte character, or \'off\'")));
1

1 Answers

0
votes

Greenplum doesn't support multi-byte delimiters. You can do this trick instead. First, pick a character that doesn't exist in your data. In this example, I'll use '~' but it can be any character that doesn't exist in your data.

create external table ext_example
(data text)
location ('<storage stuff here>')
format 'text' (delimiter as '~');

Next, use split_part to extract the columns you want.

insert into target_table (col1, col2, col3)
select split_part(data, '^ @ ^', 1) as col1,
       split_part(data, '^ @ ^', 2) as col2,
       split_part(data, '^ @ ^', 3) as col3
from ext_example;