5
votes

Why doesn't work the following bash code?

for i in $( echo "emmbbmmaaddsb" | split -t "mm"  )
do
    echo "$i"
done

expected output:

e
bb
aaddsb
4
...huh? That's not what split does at all. As in, completely unrelated to its actual function.Charles Duffy
Do you want to know how to split an arbitrary string on an arbitrary multi-character separator in bash? Why not edit your question to ask that instead, if it's what you really want to know?Charles Duffy
split splits a file into a bunch of smaller files. Not names written to stdout, like your script expects, but actual files. And -t provides a single character it uses to determine where records begin and end, and thus to do those splits on record boundaries.Charles Duffy
Of course not, BECAUSE YOU'RE EXPECTING NAMES WRITTEN TO STDOUT. I already told you it doesn't write names to stdout.Charles Duffy
If nothing's written to stdout, nothing gets captured by a command substitution.Charles Duffy

4 Answers

9
votes

Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:

in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"

If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:

# Taken from http://mywiki.wooledge.org/BashFAQ/021

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

...used, in this context, as:

gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
8
votes

A more general example, without replacing the multi-character delimiter with a single character delimiter is given below :

Using parameter expansions : (from the comment of @gniourf_gniourf)

#!/bin/bash

str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
    array+=( "${s%%"$delimiter"*}" );
    s=${s#*"$delimiter"};
done;
declare -p array

A more crude kind of way

#!/bin/bash

# main string
str="LearnABCtoABCSplitABCaABCString"

# delimiter string
delimiter="ABC"

#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}

#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0

array=()
while [ $i -lt $strLen ]; do
    if [ $delimiter == ${str:$i:$dLen} ]; then
        array+=(${str:strP:$wordLen})
        strP=$(( i + dLen ))
        wordLen=0
        i=$(( i + dLen ))
    fi
    i=$(( i + 1 ))
    wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})

declare -p array

Reference - Bash Tutorial - Bash Split String

7
votes

With awk you can use the gsub to replace all regex matches.

As in your question, to replace all substrings of two or more 'm' chars with a new line, run:

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'

e

bb

aaddsb

The ‘g’ in gsub() stands for “global,” which means replace everywhere.

You may also ask to print just N match, for example:

echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print $2; }'

bb

6
votes

The recommended tool for character subtitution is sed's command s/regexp/replacement/ for one regexp occurence or global s/regexp/replacement/g, you do not even need a loop or variables.

Pipe your echo output and try to substitute the characters mm witht the newline character \n:

echo "emmbbmmaaddsb" | sed 's/mm/\n/g'

The output is:

e
bb
aaddsb