1
votes

I am generating my usql script dynamically reading a table schema. I have existing tsv files present on data lake store and I need to append the new data to those exsiting tsv files. But when I generate usql script, table schema may get change and table may have extra columns added.

As far as I know, we need to have same number of columns in usql script as that of in tsv file. Is it possible to have these newly added columns with some default values? for e.g.

@Result =    EXTRACT id string,            
firstname string,            
lastname string,            
department string = "",

emp_id int = 0
    FROM @inputfile
    USING Extractors.Tsv();

As you can see, department and emp_id columns are newly added and I want to insert them in output file with some default values. If columns are already present in the tsv, pick up the column values else insert some default values for them.

Thanks.

1

1 Answers

1
votes

You currently cannot specify defaulted columns as part of the extract statement schema. The current recommended patterns are:

  1. Extract the right columns of your TSV versions (make sure you have them organized so you easily know which columns are present) and then use UNION ALL BY NAME to union the rowsets. That will add null/default valued columns to the rowsets that are missing columns.

  2. Write a custom extractor that takes a SqlMap as argument to fill in the missing columns with the specified default.