We required a script that simulates Associative arrays or Map like data structure for Shell Scripting, any body?
17 Answers
Another option, if portability is not your main concern, is to use associative arrays that are built in to the shell. This should work in bash 4.0 (available now on most major distros, though not on OS X unless you install it yourself), ksh, and zsh:
declare -A newmap
newmap[name]="Irfan Zulfiqar"
newmap[designation]=SSE
newmap[company]="My Own Company"
echo ${newmap[company]}
echo ${newmap[name]}
Depending on the shell, you may need to do a typeset -A newmap
instead of declare -A newmap
, or in some it may not be necessary at all.
Another non-bash 4 way.
#!/bin/bash
# A pretend Python dictionary with bash 3
ARRAY=( "cow:moo"
"dinosaur:roar"
"bird:chirp"
"bash:rock" )
for animal in "${ARRAY[@]}" ; do
KEY=${animal%%:*}
VALUE=${animal#*:}
printf "%s likes to %s.\n" "$KEY" "$VALUE"
done
echo -e "${ARRAY[1]%%:*} is an extinct animal which likes to ${ARRAY[1]#*:}\n"
You could throw an if statement for searching in there as well. if [[ $var =~ /blah/ ]]. or whatever.
I think that you need to step back and think about what a map, or associative array, really is. All it is is a way to store a value for a given key, and get that value back quickly and efficiently. You may also want to be able to iterate over the keys to retrieve every key value pair, or delete keys and their associated values.
Now, think about a data structure you use all the time in shell scripting, and even just in the shell without writing a script, that has these properties. Stumped? It's the filesystem.
Really, all you need to have an associative array in shell programming is a temp directory. mktemp -d
is your associative array constructor:
prefix=$(basename -- "$0")
map=$(mktemp -dt ${prefix})
echo >${map}/key somevalue
value=$(cat ${map}/key)
If you don't feel like using echo
and cat
, you can always write some little wrappers; these ones are modelled off of Irfan's, though they just output the value rather than setting arbitrary variables like $value
:
#!/bin/sh
prefix=$(basename -- "$0")
mapdir=$(mktemp -dt ${prefix})
trap 'rm -r ${mapdir}' EXIT
put() {
[ "$#" != 3 ] && exit 1
mapname=$1; key=$2; value=$3
[ -d "${mapdir}/${mapname}" ] || mkdir "${mapdir}/${mapname}"
echo $value >"${mapdir}/${mapname}/${key}"
}
get() {
[ "$#" != 2 ] && exit 1
mapname=$1; key=$2
cat "${mapdir}/${mapname}/${key}"
}
put "newMap" "name" "Irfan Zulfiqar"
put "newMap" "designation" "SSE"
put "newMap" "company" "My Own Company"
value=$(get "newMap" "company")
echo $value
value=$(get "newMap" "name")
echo $value
edit: This approach is actually quite a bit faster than the linear search using sed suggested by the questioner, as well as more robust (it allows keys and values to contain -, =, space, qnd ":SP:"). The fact that it uses the filesystem does not make it slow; these files are actually never guaranteed to be written to the disk unless you call sync
; for temporary files like this with a short lifetime, it's not unlikely that many of them will never be written to disk.
I did a few benchmarks of Irfan's code, Jerry's modification of Irfan's code, and my code, using the following driver program:
#!/bin/sh
mapimpl=$1
numkeys=$2
numvals=$3
. ./${mapimpl}.sh #/ <- fix broken stack overflow syntax highlighting
for (( i = 0 ; $i < $numkeys ; i += 1 ))
do
for (( j = 0 ; $j < $numvals ; j += 1 ))
do
put "newMap" "key$i" "value$j"
get "newMap" "key$i"
done
done
The results:
$ time ./driver.sh irfan 10 5 real 0m0.975s user 0m0.280s sys 0m0.691s $ time ./driver.sh brian 10 5 real 0m0.226s user 0m0.057s sys 0m0.123s $ time ./driver.sh jerry 10 5 real 0m0.706s user 0m0.228s sys 0m0.530s $ time ./driver.sh irfan 100 5 real 0m10.633s user 0m4.366s sys 0m7.127s $ time ./driver.sh brian 100 5 real 0m1.682s user 0m0.546s sys 0m1.082s $ time ./driver.sh jerry 100 5 real 0m9.315s user 0m4.565s sys 0m5.446s $ time ./driver.sh irfan 10 500 real 1m46.197s user 0m44.869s sys 1m12.282s $ time ./driver.sh brian 10 500 real 0m16.003s user 0m5.135s sys 0m10.396s $ time ./driver.sh jerry 10 500 real 1m24.414s user 0m39.696s sys 0m54.834s $ time ./driver.sh irfan 1000 5 real 4m25.145s user 3m17.286s sys 1m21.490s $ time ./driver.sh brian 1000 5 real 0m19.442s user 0m5.287s sys 0m10.751s $ time ./driver.sh jerry 1000 5 real 5m29.136s user 4m48.926s sys 0m59.336s
To add to Irfan's answer, here is a shorter and faster version of get()
since it requires no iteration over the map contents:
get() {
mapName=$1; key=$2
map=${!mapName}
value="$(echo $map |sed -e "s/.*--${key}=\([^ ]*\).*/\1/" -e 's/:SP:/ /g' )"
}
Bash4 supports this natively. Do not use grep
or eval
, they are the ugliest of hacks.
For a verbose, detailed answer with example code see: https://stackguides.com/questions/3467959
####################################################################
# Bash v3 does not support associative arrays
# and we cannot use ksh since all generic scripts are on bash
# Usage: map_put map_name key value
#
function map_put
{
alias "${1}$2"="$3"
}
# map_get map_name key
# @return value
#
function map_get
{
alias "${1}$2" | awk -F"'" '{ print $2; }'
}
# map_keys map_name
# @return map keys
#
function map_keys
{
alias -p | grep $1 | cut -d'=' -f1 | awk -F"$1" '{print $2; }'
}
Example:
mapName=$(basename $0)_map_
map_put $mapName "name" "Irfan Zulfiqar"
map_put $mapName "designation" "SSE"
for key in $(map_keys $mapName)
do
echo "$key = $(map_get $mapName $key)
done
Now answering this question.
Following scripts simulates associative arrays in shell scripts. Its simple and very easy to understand.
Map is nothing but a never ending string that has keyValuePair saved as --name=Irfan --designation=SSE --company=My:SP:Own:SP:Company
spaces are replaced with ':SP:' for values
put() {
if [ "$#" != 3 ]; then exit 1; fi
mapName=$1; key=$2; value=`echo $3 | sed -e "s/ /:SP:/g"`
eval map="\"\$$mapName\""
map="`echo "$map" | sed -e "s/--$key=[^ ]*//g"` --$key=$value"
eval $mapName="\"$map\""
}
get() {
mapName=$1; key=$2; valueFound="false"
eval map=\$$mapName
for keyValuePair in ${map};
do
case "$keyValuePair" in
--$key=*) value=`echo "$keyValuePair" | sed -e 's/^[^=]*=//'`
valueFound="true"
esac
if [ "$valueFound" == "true" ]; then break; fi
done
value=`echo $value | sed -e "s/:SP:/ /g"`
}
put "newMap" "name" "Irfan Zulfiqar"
put "newMap" "designation" "SSE"
put "newMap" "company" "My Own Company"
get "newMap" "company"
echo $value
get "newMap" "name"
echo $value
edit: Just added another method to fetch all keys.
getKeySet() {
if [ "$#" != 1 ];
then
exit 1;
fi
mapName=$1;
eval map="\"\$$mapName\""
keySet=`
echo $map |
sed -e "s/=[^ ]*//g" -e "s/\([ ]*\)--/\1/g"
`
}
For Bash 3, there is a particular case that has a nice and simple solution:
If you don't want to handle a lot of variables, or keys are simply invalid variable identifiers, and your array is guaranteed to have less than 256 items, you can abuse function return values. This solution does not require any subshell as the value is readily available as a variable, nor any iteration so that performance screams. Also it's very readable, almost like the Bash 4 version.
Here's the most basic version:
hash_index() {
case $1 in
'foo') return 0;;
'bar') return 1;;
'baz') return 2;;
esac
}
hash_vals=("foo_val"
"bar_val"
"baz_val");
hash_index "foo"
echo ${hash_vals[$?]}
Remember, use single quotes in case
, else it's subject to globbing. Really useful for static/frozen hashes from the start, but one could write an index generator from a hash_keys=()
array.
Watch out, it defaults to the first one, so you may want to set aside zeroth element:
hash_index() {
case $1 in
'foo') return 1;;
'bar') return 2;;
'baz') return 3;;
esac
}
hash_vals=("", # sort of like returning null/nil for a non existent key
"foo_val"
"bar_val"
"baz_val");
hash_index "foo" || echo ${hash_vals[$?]} # It can't get more readable than this
Caveat: the length is now incorrect.
Alternatively, if you want to keep zero-based indexing, you can reserve another index value and guard against a non-existent key, but it's less readable:
hash_index() {
case $1 in
'foo') return 0;;
'bar') return 1;;
'baz') return 2;;
*) return 255;;
esac
}
hash_vals=("foo_val"
"bar_val"
"baz_val");
hash_index "foo"
[[ $? -ne 255 ]] && echo ${hash_vals[$?]}
Or, to keep the length correct, offset index by one:
hash_index() {
case $1 in
'foo') return 1;;
'bar') return 2;;
'baz') return 3;;
esac
}
hash_vals=("foo_val"
"bar_val"
"baz_val");
hash_index "foo" || echo ${hash_vals[$(($? - 1))]}
Yet another non-bash-4 (i.e., bash 3, Mac-compatible) way:
val_of_key() {
case $1 in
'A1') echo 'aaa';;
'B2') echo 'bbb';;
'C3') echo 'ccc';;
*) echo 'zzz';;
esac
}
for x in 'A1' 'B2' 'C3' 'D4'; do
y=$(val_of_key "$x")
echo "$x => $y"
done
Prints:
A1 => aaa
B2 => bbb
C3 => ccc
D4 => zzz
The function with the case
acts like an associative array. Unfortunately it cannot use return
, so it has to echo
its output, but this is not a problem, unless you are a purist that shuns forking subshells.
You can use dynamic variable names and let the variables names work like the keys of a hashmap.
For example, if you have an input file with two columns, name, credit, as the example bellow, and you want to sum the income of each user:
Mary 100
John 200
Mary 50
John 300
Paul 100
Paul 400
David 100
The command bellow will sum everything, using dynamic variables as keys, in the form of map_${person}:
while read -r person money; ((map_$person+=$money)); done < <(cat INCOME_REPORT.log)
To read the results:
set | grep map
The output will be:
map_David=100
map_John=500
map_Mary=150
map_Paul=500
Elaborating on these techniques, I'm developing on GitHub a function that works just like a HashMap Object, shell_map.
In order to create "HashMap instances" the shell_map function is able create copies of itself under different names. Each new function copy will have a different $FUNCNAME variable. $FUNCNAME then is used to create a namespace for each Map instance.
The map keys are global variables, in the form $FUNCNAME_DATA_$KEY, where $KEY is the key added to the Map. These variables are dynamic variables.
Bellow I'll put a simplified version of it so you can use as example.
#!/bin/bash
shell_map () {
local METHOD="$1"
case $METHOD in
new)
local NEW_MAP="$2"
# loads shell_map function declaration
test -n "$(declare -f shell_map)" || return
# declares in the Global Scope a copy of shell_map, under a new name.
eval "${_/shell_map/$2}"
;;
put)
local KEY="$2"
local VALUE="$3"
# declares a variable in the global scope
eval ${FUNCNAME}_DATA_${KEY}='$VALUE'
;;
get)
local KEY="$2"
local VALUE="${FUNCNAME}_DATA_${KEY}"
echo "${!VALUE}"
;;
keys)
declare | grep -Po "(?<=${FUNCNAME}_DATA_)\w+((?=\=))"
;;
name)
echo $FUNCNAME
;;
contains_key)
local KEY="$2"
compgen -v ${FUNCNAME}_DATA_${KEY} > /dev/null && return 0 || return 1
;;
clear_all)
while read var; do
unset $var
done < <(compgen -v ${FUNCNAME}_DATA_)
;;
remove)
local KEY="$2"
unset ${FUNCNAME}_DATA_${KEY}
;;
size)
compgen -v ${FUNCNAME}_DATA_${KEY} | wc -l
;;
*)
echo "unsupported operation '$1'."
return 1
;;
esac
}
Usage:
shell_map new credit
credit put Mary 100
credit put John 200
for customer in `credit keys`; do
value=`credit get $customer`
echo "customer $customer has $value"
done
credit contains_key "Mary" && echo "Mary has credit!"
I've found it true, as already mentioned, that the best performing method is to write out key/vals to a file, and then use grep/awk to retrieve them. It sounds like all sorts of unnecessary IO, but disk cache kicks in and makes it extremely efficient -- much faster than trying to store them in memory using one of the above methods (as the benchmarks show).
Here's a quick, clean method I like:
hinit() {
rm -f /tmp/hashmap.$1
}
hput() {
echo "$2 $3" >> /tmp/hashmap.$1
}
hget() {
grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'
}
hinit capitols
hput capitols France Paris
hput capitols Netherlands Amsterdam
hput capitols Spain Madrid
echo `hget capitols France` and `hget capitols Netherlands` and `hget capitols Spain`
If you wanted to enforce single-value per key, you could also do a little grep/sed action in hput().
What a pity I did not see the question before - I've wrote library shell-framework which contains among others the maps(Associative arrays). The last version of it can be found here.
Example:
#!/bin/bash
#include map library
shF_PATH_TO_LIB="/usr/lib/shell-framework"
source "${shF_PATH_TO_LIB}/map"
#simple example get/put
putMapValue "mapName" "mapKey1" "map Value 2"
echo "mapName[mapKey1]: $(getMapValue "mapName" "mapKey1")"
#redefine old value to new
putMapValue "mapName" "mapKey1" "map Value 1"
echo "after change mapName[mapKey1]: $(getMapValue "mapName" "mapKey1")"
#add two new pairs key/values and print all keys
putMapValue "mapName" "mapKey2" "map Value 2"
putMapValue "mapName" "mapKey3" "map Value 3"
echo -e "mapName keys are \n$(getMapKeys "mapName")"
#create new map
putMapValue "subMapName" "subMapKey1" "sub map Value 1"
putMapValue "subMapName" "subMapKey2" "sub map Value 2"
#and put it in mapName under key "mapKey4"
putMapValue "mapName" "mapKey4" "subMapName"
#check if under two key were placed maps
echo "is map mapName[mapKey3]? - $(if isMap "$(getMapValue "mapName" "mapKey3")" ; then echo Yes; else echo No; fi)"
echo "is map mapName[mapKey4]? - $(if isMap "$(getMapValue "mapName" "mapKey4")" ; then echo Yes; else echo No; fi)"
#print map with sub maps
printf "%s\n" "$(mapToString "mapName")"
several years ago I wrote script library for bash which supported associative arrays among other features (logging, configuration files, extended support for command line argument, generate help, unit testing, etc). The library contains a wrapper for associative arrays and automatically switches to appropriate model (internal for bash4 and emulate for previous versions). It was called shell-framework and hosted at origo.ethz.ch but today the resource is closed. If someone still needs it I can share it with you.
Shell have no built-in map like data structure, I use raw string to describe items like that:
ARRAY=(
"item_A|attr1|attr2|attr3"
"item_B|attr1|attr2|attr3"
"..."
)
when extract items and its attributes:
for item in "${ARRAY[@]}"
do
item_name=$(echo "${item}"|awk -F "|" '{print $1}')
item_attr1=$(echo "${item}"|awk -F "|" '{print $2}')
item_attr2=$(echo "${item}"|awk -F "|" '{print $3}')
echo "${item_name}"
echo "${item_attr1}"
echo "${item_attr2}"
done
This seems like not clever than other people's answer, but easy to understand for new people to shell.
I modified Vadim's solution with the following:
####################################################################
# Bash v3 does not support associative arrays
# and we cannot use ksh since all generic scripts are on bash
# Usage: map_put map_name key value
#
function map_put
{
alias "${1}$2"="$3"
}
# map_get map_name key
# @return value
#
function map_get {
if type -p "${1}$2"
then
alias "${1}$2" | awk -F "'" '{ print $2; }';
fi
}
# map_keys map_name
# @return map keys
#
function map_keys
{
alias -p | grep $1 | cut -d'=' -f1 | awk -F"$1" '{print $2; }'
}
The change is to map_get in order to prevent it from returning errors if you request a key that doesn't exist, though the side-effect is that it will also silently ignore missing maps, but it suited my use-case better since I just wanted to check for a key in order to skip items in a loop.
Late reply, but consider addressing the problem in this way, using the bash builtin read as illustrated within the code snippet from a ufw firewall script that follows. This approach has the advantage of using as many delimited field sets (not just 2) as are desired. We have used the | delimiter because port range specifiers may require a colon, ie 6001:6010.
#!/usr/bin/env bash
readonly connections=(
'192.168.1.4/24|tcp|22'
'192.168.1.4/24|tcp|53'
'192.168.1.4/24|tcp|80'
'192.168.1.4/24|tcp|139'
'192.168.1.4/24|tcp|443'
'192.168.1.4/24|tcp|445'
'192.168.1.4/24|tcp|631'
'192.168.1.4/24|tcp|5901'
'192.168.1.4/24|tcp|6566'
)
function set_connections(){
local range proto port
for fields in ${connections[@]}
do
IFS=$'|' read -r range proto port <<< "$fields"
ufw allow from "$range" proto "$proto" to any port "$port"
done
}
set_connections