2
votes

As in HIVE doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string

HIVE can implicitly convert STRING to DOUBLE. I have a HIVE script which joins two table on some fields, among which there is one field defined as STRING. Unfortunately, there are two records one starts with '0' and the other isn't but the rest part are the same. This result in join two tables with wrong results.

In 'EXPLAIN SQL' I found uses UDFToDouble(field) convert this record to double. What are the possible cases that HIVE did this implicitly conversion? Improving hash/sort performance?

1

1 Answers

2
votes

If your script contains a JOIN clause that matches two columns, one STRING and the other DOUBLE, then Hive has to resolve the mismatch by casting one into the other's format. Just like in Oracle, MySQL, whatever.

The result is often disastrous, because of non-default STRING formats. Bottom line: just detect the type mismatchs in your data model, and manage the type conversions explicitly, all by yourself.

By the way, DOUBLE is disastrous in itself, because of subtle rounding errors, e.g. 3.0 /3.0 *3.0 will probably return 0.9999999999... which is not equal to 1.0; fortunately Hive now supports fixed-precision numbers such as DECIMAL(22,7)