I have a product information file with more than millions records. The CSV file looks like:
Product CategoryName SalesUnit Other Columns...
p1 a12 41
p2 x5 72
p3 x5 69
p4 c21 80
p5 b16 59
p6 x5 75
.. .. ..
And I have a mapping file(CategoryCode <-> CategoryName) as following. The mapping file has around 200 records:
CategoryCode CategoryName
1.0 a12
2.0 b13
3.0 b16
4.0 c12
5.0 c21
6.0 x5
.. ..
Eventually, I want to replace value of CategoryName with CategoryCode:
Product Category SalesUnit Other Colulmns..
p1 1.0 41
p2 6.0 72
p3 6.0 69
p4 5.0 80
p5 3.0 59
p6 6.0 75
.. .. ..
My approach is to use the udf of spark dataframe:
udf { (CategoryName: String) =>
if (CategoryName.trim() == "a12") 1.0
else if (CategoryName.trim() == "b13") 2.0
else if (CategoryName.trim() == "b16") 3.0
else if (CategoryName.trim() == "c12") 4.0
else if (CategoryName.trim() == "c21") 5.0
else if (CategoryName.trim() == "x5") 6.0
else if (CategoryName.trim() == "z12") 7.0
else if (...) ...
... ...
else 999.0
}
Any other elegant approach to achieve the replacement without by coding so many if...else clause? Thanks.