0
votes

So I've been scouring SO for answer and I've seen some great SQL functions to help try and remove non-ascii characters from my db, but I wanted to post the entire question / process here first to see if maybe upstream on my select from db2 into sql there is a fix.

What I'm doing: Getting data from a db2 database into SQL

Issue: Non-ascii characters causing problems

Process: It's pretty simple. I have a SQL Insert statement to select a bunch of columns from a db2 linkedserver using open query

insert into [table](stuff) select (stuff) From Openquery(SSF400,'select stuff from table')

However, in my SQL db, when editing the landed table, I'm getting weird trailing characters that appear as a space in a sql select statement, but are actually artifacts in SQL Edit mode:

enter image description here

I've tried using a few functions I found here on SO to strip these characters, but after these function(s) I'm leftover with a combination of greek/english characters similar to the below:

enter image description here

I'm thinking there must be a better way for me to do the initial insert other than using openquery so that the junk characters don't come over. I know SQL pretty well, but DB2 not so much...any advice?

Update: There does seem to be a junk character or two in the source system. Discovered using iNavigator. Also, source system is using db2 v7r3m0

enter image description here

Update here is a screenshot of the regexp expression mentioned in the comments used in a query in iNavigator. Although several characters were removed, some do remain. The original column is on the left, the cleansed column is on the right.

enter image description here

Cheers, MD

1
What's your db2 version and platform? Are you able to check, if these junk characters are really in this column in the db2 database with other tools (some db2 client or some tool using db2 jdbc driver), and this is not a result of using openquery functionality? - Mark Barinstein
In addition, what are the data types and character sets of the source and target columns? select stuff is not particularly useful in itself. - mustaccio
@Mark From what I can tell using iNavigator, the as400 is using v7r3m0. I added a picture to my original post of what I see in the DB2 database using iNavigator, there does seem to be a funky character or two..i hadn't expanded the column fully to see that earlier, apologies. - user2178462
mustaccio the source system column is identified as CHAR(95) CCSID 37. When I created the landing table in SQL, I used the same CHAR(95) but CCSID 37 appears to be db2 specific syntax so that was omitted. Character set of source and destination are a product number that contains letters numbers and spaces(.e.g. PRODUCT 12345 or ABC1234 ) - user2178462
Try to remove non-ascii characters with the REGEXP_REPLACE(stuff, '[^\x00-\x7F]', '') expression, if it's suitable for you. - Mark Barinstein

1 Answers

0
votes

I would try REGEXP_REPLACE(stuff,'[^\u0020-\u007E\u0009\u000A\u000D]+','') which will remove everything that is not a character from the 7-bit ASCII set but also removes any 7-bit ASCII control characters apart from Tab, New Line and Carriage Return. It also removes DEL