0
votes

I want to remove one ASCII character and then I want replace it with non-ASCII. My code is :

sed -e 's/[\d100\d130]/g' 

To explain: I want to replace "100" (in ASCII ,decimal ) with "135" (in ASCII, decimal.) In short, I want to replace 2 letters and one of them will remove. This code is valid?

2
Use tr: tr '\144' '\206'. - gniourf_gniourf
It doesn't work. I tried. @gniourf_gniourf - esrtr
What does It doesn't work mean? (do you get an error? aren't the d replaced?) - gniourf_gniourf
d isn't replaced .@gniourf_gniourf - esrtr
(Apart from the obvious typo—it should be tr '\144' '\207'—see Thomas Dickey's answer). This is not going to edit your file… is this what you're expecting? - gniourf_gniourf

2 Answers

1
votes

This is not a valid sed command:

sed -e 's/[\d100\d135]/g'

Perhaps something like

sed -e 's/[\d100]/[\d135]/g'

In a quick test, this "works":

echo 'd' | sed -e 's/[\d100]/[\d135]/g'

The suggested tr command is close, but 135 translates to octal 207, e.g,

tr '\144' '\207'

In a UTF-8 system, you likely will run into problems with 135, since it is not a valid single-byte code as such. The corresponding UTF-8 encoding for 135 uses two bytes, e.g., \302\207

echo 'd' | sed -e 's/\d100/\d194\d135/g'

might be what OP intended. With my locale en_US.UTF-8, it produces a UTF-8 encoded 135 (which shows up in vi-like-emacs as \u0087: this happens to be valid UTF-8, but not a printable character since it is actually a control character in Unicode). Given more information about what OP intended for the output, better advice can be offered.

1
votes

Decimal 100 is a "d", and 135 is an extended ascii "ç" or cedilla.
Setting a to all values:

a="$(printf "$(printf '\\x%x' {95..105} 135 135 135 {130..140} )")"

Both this work:

echo "$a"| tr '\144' '\207'
echo "$a"| sed -e $'s/\144/\207/g'    # Note the $

If you want to see this characters, write to a file, and open it with encoding IBM850. In an text editor with that capacity you will see (three times a cedilla ç, and the d changed as well):

_`abcçefghiçççéâäàåçêëèïî

UTF-8

For utf-8, things are diferent.
The cedilla in UTF-8 is decimal 231 (hex E7), and it is output with this:

$ printf $'\U0E7'
ç

To get the UTF-8 of values above 127 (7F) and up to 255 (FF) may get tricky because Bash misinterprets some values. This function will allow the conversion from a value to the correct character:

function chr_utf8 {
    local val
    [[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1

    if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then

        # bash 4.2 incorrectly encodes
        # \U000000ff as \xff so encode manually
        printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
    else
        printf -v val '\\U%08x' "${2}"
    fi
    printf -v ${1?Missing Dest Variable} ${val}
}

chr_utf8 a 231
echo "$a"

Conclusion

The solution was actually very simple:

echo "aadddcc" | sed $'s/d/\U0E7/g'       # echo $'\U0E7' should output ç
aaçççcc

Test that you get a ç from echo $'\U0E7', if not, you need the function above.