Compatible answer
There are a lot of different ways to do this in bash.
However, it's important to first note that bash
has many special features (so-called bashisms) that won't work in any other shell.
In particular, arrays, associative arrays, and pattern substitution, which are used in the solutions in this post as well as others in the thread, are bashisms and may not work under other shells that many people use.
For instance: on my Debian GNU/Linux, there is a standard shell called dash; I know many people who like to use another shell called ksh; and there is also a special tool called busybox with his own shell interpreter (ash).
Requested string
The string to be split in the above question is:
IN="[email protected];[email protected]"
I will use a modified version of this string to ensure that my solution is robust to strings containing whitespace, which could break other solutions:
IN="[email protected];[email protected];Full Name <[email protected]>"
Split string based on delimiter in bash (version >=4.2)
In pure bash
, we can create an array with elements split by a temporary value for IFS (the input field separator). The IFS, among other things, tells bash
which character(s) it should treat as a delimiter between elements when defining an array:
IN="[email protected];[email protected];Full Name <[email protected]>"
# save original IFS value so we can restore it later
oIFS="$IFS"
IFS=";"
declare -a fields=($IN)
IFS="$oIFS"
unset oIFS
In newer versions of bash
, prefixing a command with an IFS definition changes the IFS for that command only and resets it to the previous value immediately afterwards. This means we can do the above in just one line:
IFS=\; read -a fields <<<"$IN"
# after this command, the IFS resets back to its previous value (here, the default):
set | grep ^IFS=
# IFS=$' \t\n'
We can see that the string IN
has been stored into an array named fields
, split on the semicolons:
set | grep ^fields=\\\|^IN=
# fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
# IN='[email protected];[email protected];Full Name <[email protected]>'
(We can also display the contents of these variables using declare -p
:)
declare -p IN fields
# declare -- IN="[email protected];[email protected];Full Name <[email protected]>"
# declare -a fields=([0]="[email protected]" [1]="[email protected]" [2]="Full Name <[email protected]>")
Note that read
is the quickest way to do the split because there are no forks or external resources called.
Once the array is defined, you can use a simple loop to process each field (or, rather, each element in the array you've now defined):
# `"${fields[@]}"` expands to return every element of `fields` array as a separate argument
for x in "${fields[@]}" ;do
echo "> [$x]"
done
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]
Or you could drop each field from the array after processing using a shifting approach, which I like:
while [ "$fields" ] ;do
echo "> [$fields]"
# slice the array
fields=("${fields[@]:1}")
done
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]
And if you just want a simple printout of the array, you don't even need to loop over it:
printf "> [%s]\n" "${fields[@]}"
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]
Update: recent bash >= 4.4
In newer versions of bash
, you can also play with the command mapfile
:
mapfile -td \; fields < <(printf "%s\0" "$IN")
This syntax preserve special chars, newlines and empty fields!
If you don't want to include empty fields, you could do the following:
mapfile -td \; fields <<<"$IN"
fields=("${fields[@]%$'\n'}") # drop '\n' added by '<<<'
With mapfile
, you can also skip declaring an array and implicitly "loop" over the delimited elements, calling a function on each:
myPubliMail() {
printf "Seq: %6d: Sending mail to '%s'..." $1 "$2"
# mail -s "This is not a spam..." "$2" </path/to/body
printf "\e[3D, done.\n"
}
mapfile < <(printf "%s\0" "$IN") -td \; -c 1 -C myPubliMail
(Note: the \0
at end of the format string is useless if you don't care about empty fields at end of the string or they're not present.)
mapfile < <(echo -n "$IN") -td \; -c 1 -C myPubliMail
# Seq: 0: Sending mail to '[email protected]', done.
# Seq: 1: Sending mail to '[email protected]', done.
# Seq: 2: Sending mail to 'Full Name <[email protected]>', done.
Or you could use <<<
, and in the function body include some processing to drop the newline it adds:
myPubliMail() {
local seq=$1 dest="${2%$'\n'}"
printf "Seq: %6d: Sending mail to '%s'..." $seq "$dest"
# mail -s "This is not a spam..." "$dest" </path/to/body
printf "\e[3D, done.\n"
}
mapfile <<<"$IN" -td \; -c 1 -C myPubliMail
# Renders the same output:
# Seq: 0: Sending mail to '[email protected]', done.
# Seq: 1: Sending mail to '[email protected]', done.
# Seq: 2: Sending mail to 'Full Name <[email protected]>', done.
Split string based on delimiter in shell
If you can't use bash
, or if you want to write something that can be used in many different shells, you often can't use bashisms -- and this includes the arrays we've been using in the solutions above.
However, we don't need to use arrays to loop over "elements" of a string. There is a syntax used in many shells for deleting substrings of a string from the first or last occurrence of a pattern. Note that *
is a wildcard that stands for zero or more characters:
(The lack of this approach in any solution posted so far is the main reason I'm writing this answer ;)
${var#*SubStr} # drops substring from start of string up to first occurrence of `SubStr`
${var##*SubStr} # drops substring from start of string up to last occurrence of `SubStr`
${var%SubStr*} # drops substring from last occurrence of `SubStr` to end of string
${var%%SubStr*} # drops substring from first occurrence of `SubStr` to end of string
As explained by Score_Under:
#
and %
delete the shortest possible matching substring from the start and end of the string respectively, and
##
and %%
delete the longest possible matching substring.
Using the above syntax, we can create an approach where we extract substring "elements" from the string by deleting the substrings up to or after the delimiter.
The codeblock below works well in bash (including Mac OS's bash
), dash, ksh, and busybox's ash:
(Thanks to Adam Katz's comment, making this loop a lot simplier!)
IN="[email protected];[email protected];Full Name <[email protected]>"
while [ "$IN" != "$iter" ] ;do
# extract the substring from start of string up to delimiter.
iter=${IN%%;*}
# delete this first "element" AND next separator, from $IN.
IN="${IN#$iter;}"
# Print (or doing anything with) the first "element".
echo "> [$iter]"
done
# > [[email protected]]
# > [[email protected]]
# > [Full Name <[email protected]>]
Have fun!
local IFS=...
where possible; (b) -1 forunset IFS
, this doesn't exactly reset IFS to its default value, though I believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it seems bad practice to be assuming blindly that your code will never be invoked with IFS set to a custom value; (c) another idea is to invoke a subshell:(IFS=$custom; ...)
when the subshell exits IFS will return to whatever it was originally. – dubiousjimruby -e "puts ENV.fetch('PATH').split(':')"
. If you want to stay pure bash won't help but using any scripting language that has a built-in split is easier. – nicoogafor x in $(IFS=';';echo $IN); do echo "> [$x]"; done
– user2037659\n
for just a space. So the final line ismails=($(echo $IN | tr ";" " "))
. So now I can check the elements ofmails
by using the array notationmails[index]
or just iterating in a loop – afranques