Removing non-alphanumeric characters with sed

Question

I am trying to validate some inputs to remove a set of characters. Only alphanumeric characters plus, period, underscore, hyphen are allowed. I've tested the regex expression [^\w.-] here http://gskinner.com/RegExr/ and it matches what I want removed so I not sure why sed is returning the opposite. What am I missing?

My end goal is to input "Â10.41.89.50 " and get "10.41.89.50".

I've tried:

echo "Â10.41.89.50 " | sed s/[^\w.-]//g returns Â...

echo "Â10.41.89.50 " | sed s/[\w.-]//g and echo "Â10.41.89.50 " | sed s/[\w^.-]//g returns Â10418950

I attempted the answer found here Skip/remove non-ascii character with sed but nothing was removed.

Try adding the -r option to sed so it will recognize extended regular expressions. — Barmar
sed doesn't understand the special character classes like \w. Just use [a-zA-Z0-9_-]. — Mark Reed
neither -r nor using [a-zA-Z0-9_-] works. Well echo "Â10.41.89.50 " | sed s/[a-zA-Z0-9.-]//g returned Â but echo "Â10.41.89.50 " | sed s/[^a-zA-Z0-9.-]//g still returned Â10.41.89.50. — wanderingandy

iruvar iruvar · Accepted Answer · 2013-11-15T17:57:06

tr's -c (complement) flag may be an option

echo "Â10.41.89.50-._ " | tr -cd '[:alnum:]._-'

Removing non-alphanumeric characters with sed

6 Answers