2
votes

I am using cross join to access the data from 2 tables. But with the cross join, I get the error "d.DebugData not found in table "bigdata:RawDebug.CarrierDetails". Any help would be appreciated !!

SELECT 
HardwareId, DebugReason, DebugData, 
CASE
  WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
  ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
END
as ActualDebugData 
FROM(
 SELECT 
 HardwareId, DebugReason, DebugData
 FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP ('2016-05-15'),TIMESTAMP('2016-05-15'))
 WHERE Reason = 500  
 ) as d
 CROSS JOIN (
   SELECT Network 
   FROM [bigdata:RawDebug.CarrierDetails] 
   WHERE Mcc = substr(d.DebugData,0,3) AND Mnc =   substr(d.DebugData,4,LENGTH(d.Reason - 1)) 
   LIMIT 1 
 ) AS c

Tried this but i get this error: " ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name."

 %%sql --module Test2
 DEFINE QUERY Test2
 SELECT 
 HardwareId, DebugReason, DebugData, 
 CASE
  WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
  ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
 END AS ActualDebugData 
 FROM (
 SELECT 
   HardwareId, DebugReason, DebugData, 
   SUBSTR(DebugData,0,3) AS d1, REGEXP_REPLACE(SUBSTR(DebugData,3,LENGTH(DebugData)-1),'%[^a-zA-Z0-9, ]%',' ') as d2
   FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP('2016-05-15'),TIMESTAMP('2016-05-15'))
   WHERE DebugReason = 500  
   ) AS d
   LEFT JOIN (
   SELECT 
    Network, Mcc, Mnc
    ,ROW_NUMBER() OVER(PARTITION BY Mcc, Mnc) AS pos 
    FROM [bigdata:RawDebug.CarrierDetails] 
    ) AS c
    ON c.Mcc = INTEGER(d.d1) AND c.Mnc = INTEGER(d.d2)  
    WHERE c.pos = 1

I am adding the following structures:

 RawDebug:
 HardwareId   DebugReason   DebugData   
 550029358    50013            VER%     
 550029359    50013            RO%      
 550029360    50013            34020?   
 550029361    50013            34021?

When the DebugData has characters, then i have case statements matching it, when it has numerals, then i have to take a substring of first 3 characters and match it with Mcc in Carrierdetails and remaining characters and match it with Mnc in Carrierdetails.

With the recent query, it does not consider all the cases. Rather, it takes one specific number and uses tat ActualDebugData for all the rows.

1

1 Answers

1
votes
SELECT 
  HardwareId, DebugReason, DebugData, 
  CASE
    WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
    ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
  END AS ActualDebugData 
FROM (
  SELECT 
    HardwareId, DebugReason, DebugData, 
    SUBSTR(DebugData,0,3) AS d1, SUBSTR(DebugData,4,LENGTH(Reason - 1)) AS d2 
  FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP('2016-05-15'),TIMESTAMP('2016-05-15'))
  WHERE Reason = 500  
) AS d
LEFT JOIN (
  SELECT 
    Network, Mcc, Mnc
    //,ROW_NUMBER() OVER(PARTITION BY Mcc, Mnc) AS pos 
  FROM [bigdata:RawDebug.CarrierDetails] 
) AS c
ON c.Mcc = d.d1 AND c.Mnc = d.d2  
//WHERE c.pos = 1 

If network is guaranteed to be unique for each entry in d - you can delete commented rows.
Otherwise you should uncomment them