0
votes

Is there any regex or translate or any other expression in hive consider only key board characters and ignore control characters and ascii characters in Hive table?

Example: regexp_replace(option_type,'[^a-zA-Z0-9]+','')
In the above expression only characters and numbers are considering but any keyboard special character data like %,&,*,.,?,.. available then i am getting output as blank.

Col: bhuvi?Where are you ?
Result: bhuviWhere are you
but i want output as bhuvi?Where are you?

like that if any special keyboard characters comes then it will appear as is and any control or ascii character comes it will ignore.

2

2 Answers

1
votes

you should consider that various keyboard layouts (languages) have various "special" characters, like german ö ä ü or spanish Ñ (just examples - not talking about asian, hebrew or arabic keyboards).

I see two solutions: 1.) Maybe you should define a list of allowed characters and put them into a character class, so you can heavily control what is allowed, but you might exclude most languages

2.) your you might have a look into regular expression unicode classes, you can allow any "letter" \p{L} or "number" \p{N} and even punctuation \p{P} and disallow only those characters you KNOW will cause problems like control characters \p{C} please see see regular-expression.info for more details about Unicode Regular Expressions

edit:

IF you want to stick with english only and can assume you will only have ASCII to allow, you can either type every key you find on your keyboard in a character class, as a not complete example: /^[-a-zA-Z0-9,.-;:_!"§$%&]+$/

or

you could use an ASCII table to determine the range of allowed characters, in your case a assume from "space" to "curly closing bracket" } and trick the character class in allowing all of them: /^[ -}]+$/

0
votes

I got the solution regexp_replace(option_type,'[^a-zA-Z0-9*!@+-/#$%()_=/<>?\|&]+',' ') works