1
votes

I have datetime data in my input and would like to load it correctly from Pig. I googled and learned it's suggested to load as chararray then covert to datetime with ToDate function. However, sometimes my datetime fields are NULL. Then, I'm getting NULL Pointer Exception from PIG when I try to apply the ToDate Function. I am trying to use bincond operator but I'm getting the following error:

mismatched input '?' expecting SEMI_COLON

Which does not make sense.

=====================================================================

Here is the code that I have so far:

transactions_edited = FOREACH transactions GENERATE  
  id,
  code,
  user_id,
  visit_code,
  channel_id,
  transaction_type,
  product_category,
  product_subcategory,
  specific_id,
  specific_type,
  email,
  cpf,
  name,
  last_name,
  gender,
  birth_date,
  phone_code,
  phone,
  additional_phone_code,
  additional_phone,
  zip_code,
  monthly_income,
  status,
  opportunity_status,
  ToDate(created_at,'yyyy-MM-dd HH:mm:ss') AS created_at,
  ToDate(updated_at,'yyyy-MM-dd HH:mm:ss') AS updated_at,
  old_status,
  old_masked_id,
  address_type,
  address,
  address_number,
  address_complement,
  neighborhood,
  city,
  state,
  landing_path,
  referrer,
  source,
  source_advertising,
  keyword,
  ad_id,
  ad_name,
  ad_network,
  ad_placement,
  ad_device,
  cpf_restriction,
  mother_name,
  registration_form_closed,
  (opportunity_status_updated_at is not NULL ) ?ToDate(opportunity_status_updated_at,'yyyy-MM-dd HH:mm:ss') : AS opportunity_status_updated_at,
  potential,
  interest,
  lead_id,
  (integrated_at is not NULL) ? ToDate(integrated_at,'yyyy-MM-dd HH:mm:ss') : AS integrated_at,
  starred,
  channel_input_type,
  rg 
  ;

Any help will be really appreciated.

Thanks !

1
Could you try (integrated_at is not NULL? ToDate(integrated_at,'yyyy-MM-dd HH:mm:ss') : NULL) ? I think you're very close, you only forgot to tell pig what to do if the condition is not true.AntonyBrd
Thank you ! It was the problem :)abarbosa

1 Answers

0
votes

There is a bit correction in ternary operator. Please use below modified code -

transactions_edited = FOREACH transactions GENERATE  
  id,
  code,
  user_id,
  visit_code,
  channel_id,
  transaction_type,
  product_category,
  product_subcategory,
  specific_id,
  specific_type,
  email,
  cpf,
  name,
  last_name,
  gender,
  birth_date,
  phone_code,
  phone,
  additional_phone_code,
  additional_phone,
  zip_code,
  monthly_income,
  status,
  opportunity_status,
  ToDate(created_at,'yyyy-MM-dd HH:mm:ss') AS created_at,
  ToDate(updated_at,'yyyy-MM-dd HH:mm:ss') AS updated_at,
  old_status,
  old_masked_id,
  address_type,
  address,
  address_number,
  address_complement,
  neighborhood,
  city,
  state,
  landing_path,
  referrer,
  source,
  source_advertising,
  keyword,
  ad_id,
  ad_name,
  ad_network,
  ad_placement,
  ad_device,
  cpf_restriction,
  mother_name,
  registration_form_closed,
  (opportunity_status_updated_at is not NULL ? ToDate(opportunity_status_updated_at,'yyyy-MM-dd HH:mm:ss') : opportunity_status_updated_at) AS opportunity_status_updated_at,
  potential,
  interest,
  lead_id,
  (integrated_at is not NULL ? ToDate(integrated_at,'yyyy-MM-dd HH:mm:ss') : integrated_at) AS integrated_at,
  starred,
  channel_input_type,
  rg 
  ;