29
votes

What are the syntax rules for identifiers, especially function and variable names, in Bash?

I wrote a Bash script and tested it on various versions of Bash on Ubuntu, Debian, Red Hat 5 and 6, and even an old Solaris 8 box. The script ran well, so it shipped.

Yet when a user tried it on SUSE machines, it gave a "not a valid identifier" error. Fortunately, my guess that there was an invalid character in the function name was right. The hyphens were messing it up.

The fact that a script that was at least somewhat tested would have completely different behaviour on another Bash or distro was disconcerting. How can I avoid this?

6
What version of bash was on these machines? (Specifically the SUSE machine?)Etan Reisner
Unfortunately, I don't have easy access to them now. They were 3.something.labyrinth

6 Answers

29
votes

From the manual:

   Shell Function Definitions
       ...
       name () compound-command [redirection]
       function name [()] compound-command [redirection]

name is defined elsewhere:

       name   A  word  consisting  only  of alphanumeric characters and under‐
              scores, and beginning with an alphabetic character or an  under‐
              score.  Also referred to as an identifier.

So hyphens are not valid. And yet, on my system, they do work...

$ bash --version
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
10
votes

The question was about "the rules", which has been answered two different ways, each correct in some sense, depending on what you want to call "the rules". Just to flesh out @rici's point that you can shove about any character in a function name, I wrote a small bash script to try to check every possible (0-255) character as a function name, as well as as the second character of a function name:

#!/bin/bash
ASCII=( nul soh stx etx eot enq ack bel bs tab nl vt np cr so si dle \
            dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us sp )

for((i=33; i < 127; ++i)); do
    printf -v Hex "%x" $i

    printf -v Chr "\x$Hex"
    ASCII[$i]="$Chr"
done
ASCII[127]=del
for((i=128; i < 256; ++i)); do
    ASCII[$i]=$(printf "0X%x" $i)
done

# ASCII table is now defined

function Test(){
    Illegal=""
    for((i=1; i <= 255; ++i)); do
        Name="$(printf \\$(printf '%03o' $i))"
        eval "function $1$Name(){ return 0; }; $1$Name ;" 2>/dev/null
        if [[ $? -ne 0 ]]; then
            Illegal+=" ${ASCII[$i]}"
            #        echo Illegal: "${ASCII[$i]}"
        fi
    done
    printf "Illegal: %s\n" "$Illegal"
}
echo "$BASH_VERSION"
Test
Test "x"

# can we really do funky crap like this?
function [}{(){
   echo "Let me take you to, funkytown!"
}
[}{    # why yes, we can!
# though editor auto-indent modes may punish us

I actually skip NUL (0x00), as that's the one character bash may object to finding in the input stream. The output from this script was:

4.4.0(1)-release
Illegal:  soh tab nl sp ! " # $ % & ' ( ) * 0 1 2 3 4 5 6 7 8 9 ; < > \ ` { | } ~ del
Illegal:  soh " $ & ' ( ) ; < > [ \ ` | del
Let me take you to, funkytown!

Note that bash happily lets me name my function "[}{". Probably my code is not quite rigorous enough to provide the exact rules for legality-in-practice, but it should give a flavor of what manner of abuse is possible. I wish I could mark this answer "For mature audiences only."

8
votes

Command identifiers and variable names have different syntaxes. A variable name is restricted to alphanumeric characters and underscore, not starting with a digit. A command name, on the other hand, can be just about anything which doesn't contain bash metacharacters (and even then, they can be quoted).

In bash, function names can be command names, as long as they would be parsed as a WORD without quotes. (Except that, for some reason, they cannot be integers.) However, that is a bash extension. If the target machine is using some other shell (such as dash), it might not work, since the Posix standard shell grammar only allows "NAME" in the function definition form (and also prohibits the use of reserved words).

2
votes

From 3.3 Shell Functions:

Shell functions are a way to group commands for later execution using a single name for the group. They are executed just like a "regular" command. When the name of a shell function is used as a simple command name, the list of commands associated with that function name is executed. Shell functions are executed in the current shell context; no new process is created to interpret them.

Functions are declared using this syntax:

name () compound-command [ redirections ]

or

function name [()] compound-command [ redirections ]

and from 2 Definitions:

name

A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.

1
votes

This script tests all valid chars for function names with 1 char.


It outputs 53 valid chars (a-zA-Z and underscore) using
a POSIX shell and 220 valid chars with BASH v4.4.12.

The Answer from Ron Burk is valid, but lacks the numbers.

#!/bin/sh

FILE='/tmp/FOO'
I=0
VALID=0

while [ $I -lt 256 ]; do {
        NAME="$( printf \\$( printf '%03o' $I ))"
        I=$(( I + 1 ))

        >"$FILE"
        ( eval "$NAME(){ rm $FILE;}; $NAME" 2>/dev/null )

        if [ -f "$FILE" ]; then
                rm "$FILE"
        else
                VALID=$(( VALID + 1 ))
                echo "$VALID/256 - OK: $NAME"   
        fi
} done
1
votes

Note The biggest correction here is that newline is never allowed in a function name.

My answer:

  • Bash --posix: [a-zA-Z_][0-9a-zA-Z_]*
  • Bash 3.0-4.4: [^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
  • Bash 5.0: [^#%0-9\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
    • \1 and \x7f works now
  • Bash 5.1: [^#%\0\9\10 "$&'();<>\`|][^\0\9\10 "$&'();<>\`|]*
    • Numbers can come first?! Yep!
  • Any bash 3-5: [^#%0-9\0\1\9\10 "$&'();<>\`|\x7f][^\0\1\9\10 "$&'();<>\`|\x7f]*
    • Same as 3.0-4.4
  • My suggestion (opinion): [^#%0-9\0-\f "$&'();<>\`|\x7f-\xff][^\0-\f "$&'();<>\`|\x7f-\xff]
    • Positive version: [!*+,-./:=?@A-Z\[\]^_a-z{}~][#%0-9!*+,-./:=?@A-Z\[\]^_a-z{}~]*

My version of the test:

for ((x=1; x<256; x++)); do
  hex="$(printf "%02x" $x)"
  name="$(printf \\x${hex})"
  if [ "${x}" = "10" ]; then
    name=$'\n'
  fi
  if [ "$(echo -n "${name}" | xxd | awk '{print $2}')" != "${hex}" ]; then
    echo "$x failed first sanity check"
  fi
  (
    eval "function ${name}(){ echo ${x};}" &>/dev/null
    if test "$("${name}" 2>/dev/null)" != "${x}"; then
      eval "function ok${name}doe(){ echo ${x};}" &>/dev/null
      if test "$(type -t okdoe 2>/dev/null)" = "function"; then
        echo "${x} failed second sanity test"
      fi
      if test "$("ok${name}doe" 2>/dev/null)" != "${x}"; then
        echo "${x}(${name}) never works"
      else
        echo "${x}(${name}) cannot be first"
      fi
    else
      # Just assume everything over 128 is hard, unless this says otherwise
      if test "${x}" -gt 127; then
        if declare -pF | grep -q "declare -f \x${hex}"; then
          echo "${x} works, but is actually not difficult"
          declare -pF | grep "declare -f \x${hex}" | xxd
        fi
      elif ! declare -pF | grep -q "declare -f \x${hex}"; then
        echo "${x} works, but is difficult in bash"
      fi
    fi
  )
done

Some additional notes:

  • Characters 1-31 are less than ideal, as they are more difficult to type.
  • Characters 128-255 are even less ideal in bash (except on bash 3.2 on macOS. It might be compiled differently?) because commands like declare -pF do not render the special characters, even though they are there in memory. This means any introspection code will incorrectly assume that these functions are not there. However, features like compgen still correctly render the characters.
  • Out of my testing scope, but some unicode does work too, although it's extra hard to paste/type on macOS over ssh.