How can I convert tabs to spaces in every file of a directory (possibly recursively)?
Also, is there a way of setting the number of spaces per tab?
Warning: This will break your repo.
This will corrupt binary files, including those under
svn
,.git
! Read the comments before using!
find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +
The original file is saved as [filename].orig
.
Replace '*.java' with the file ending of the file type you are looking for. This way you can prevent accidental corruption of binary files.
Downsides:
Simple replacement with sed
is okay but not the best possible solution. If there are "extra" spaces between the tabs they will still be there after substitution, so the margins will be ragged. Tabs expanded in the middle of lines will also not work correctly. In bash
, we can say instead
find . -name '*.java' ! -type d -exec bash -c 'expand -t 4 "$0" > /tmp/e && mv /tmp/e "$0"' {} \;
to apply expand
to every Java file in the current directory tree. Remove / replace the -name
argument if you're targeting some other file types. As one of the comments mentions, be very careful when removing -name
or using a weak, wildcard. You can easily clobber repository and other hidden files without intent. This is why the original answer included this:
You should always make a backup copy of the tree before trying something like this in case something goes wrong.
Try the command line tool expand
.
expand -i -t 4 input | sponge output
where
-i
is used to expand only leading tabs on each line;-t 4
means that each tab will be converted to 4 whitespace chars (8 by default).sponge
is from the moreutils
package, and avoids clearing the input file. On macOS, the package moreutils
is available via Homebrew (brew install moreutils
) or MacPorts (sudo port install moreutils
).Finally, you can use gexpand
on macOS, after installing coreutils
with Homebrew (brew install coreutils
) or MacPorts (sudo port install coreutils
).
Collecting the best comments from Gene's answer, the best solution by far, is by using sponge
from moreutils.
sudo apt-get install moreutils
# The complete one-liner:
find ./ -iname '*.java' -type f -exec bash -c 'expand -t 4 "$0" | sponge "$0"' {} \;
Explanation:
./
is recursively searching from current directory-iname
is a case insensitive match (for both *.java
and *.JAVA
likes)type -f
finds only regular files (no directories, binaries or symlinks)-exec bash -c
execute following commands in a subshell for each file name, {}
expand -t 4
expands all TABs to 4 spacessponge
soak up standard input (from expand
) and write to a file (the same one)*. NOTE: * A simple file redirection (> "$0"
) won't work here because it would overwrite the file too soon.
Advantage: All original file permissions are retained and no intermediate tmp
files are used.
Use backslash-escaped sed
.
On linux:
Replace all tabs with 1 hyphen inplace, in all *.txt files:
sed -i $'s/\t/-/g' *.txt
Replace all tabs with 1 space inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
On a mac:
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i '' $'s/\t/ /g' *.txt
You can use the generally available pr
command (man page here). For example, to convert tabs to four spaces, do this:
pr -t -e=4 file > file.expanded
-t
suppresses headers-e=num
expands tabs to num
spacesTo convert all files in a directory tree recursively, while skipping binary files:
#!/bin/bash
num=4
shopt -s globstar nullglob
for f in **/*; do
[[ -f "$f" ]] || continue # skip if not a regular file
! grep -qI "$f" && continue # skip binary files
pr -t -e=$num "$f" > "$f.expanded.$$" && mv "$f.expanded.$$" "$f"
done
The logic for skipping binary files is from this post.
NOTE:
How can I convert tabs to spaces in every file of a directory (possibly recursively)?
This is usually not what you want.
Do you want to do this for png images? PDF files? The .git directory? Your
Makefile
(which requires tabs)? A 5GB SQL dump?
You could, in theory, pass a whole lot of exlude options to find
or whatever
else you're using; but this is fragile, and will break as soon as you add other
binary files.
What you want, is at least:
expand
does this, sed
doesn't).As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.
A while ago I created a little script called
sanitize_files which does exactly
that. It also fixes some other common stuff like replacing \r\n
with \n
,
adding a trailing \n
, etc.
You can find a simplified script without the extra features and command-line arguments below, but I recommend you use the above script as it's more likely to receive bugfixes and other updated than this post.
I would also like to point out, in response to some of the other answers here,
that using shell globbing is not a robust way of doing this, because sooner
or later you'll end up with more files than will fit in ARG_MAX
(on modern
Linux systems it's 128k, which may seem a lot, but sooner or later it's not
enough).
#!/usr/bin/env python
#
# http://code.arp242.net/sanitize_files
#
import os, re, sys
def is_binary(data):
return data.find(b'\000') >= 0
def should_ignore(path):
keep = [
# VCS systems
'.git/', '.hg/' '.svn/' 'CVS/',
# These files have significant whitespace/tabs, and cannot be edited
# safely
# TODO: there are probably more of these files..
'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock'
]
for k in keep:
if '/%s' % k in path:
return True
return False
def run(files):
indent_find = b'\t'
indent_replace = b' ' * indent_width
for f in files:
if should_ignore(f):
print('Ignoring %s' % f)
continue
try:
size = os.stat(f).st_size
# Unresolvable symlink, just ignore those
except FileNotFoundError as exc:
print('%s is unresolvable, skipping (%s)' % (f, exc))
continue
if size == 0: continue
if size > 1024 ** 2:
print("Skipping `%s' because it's over 1MiB" % f)
continue
try:
data = open(f, 'rb').read()
except (OSError, PermissionError) as exc:
print("Error: Unable to read `%s': %s" % (f, exc))
continue
if is_binary(data):
print("Skipping `%s' because it looks binary" % f)
continue
data = data.split(b'\n')
fixed_indent = False
for i, line in enumerate(data):
# Fix indentation
repl_count = 0
while line.startswith(indent_find):
fixed_indent = True
repl_count += 1
line = line.replace(indent_find, b'', 1)
if repl_count > 0:
line = indent_replace * repl_count + line
data = list(filter(lambda x: x is not None, data))
try:
open(f, 'wb').write(b'\n'.join(data))
except (OSError, PermissionError) as exc:
print("Error: Unable to write to `%s': %s" % (f, exc))
if __name__ == '__main__':
allfiles = []
for root, dirs, files in os.walk(os.getcwd()):
for f in files:
p = '%s/%s' % (root, f)
if do_add:
allfiles.append(p)
run(allfiles)
I like the "find" example above for the recursive application. To adapt it to be non-recursive, only changing files in the current directory that match a wildcard, the shell glob expansion can be sufficient for small amounts of files:
ls *.java | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh -v
If you want it silent after you trust that it works, just drop the -v
on the sh
command at the end.
Of course you can pick any set of files in the first command. For example, list only a particular subdirectory (or directories) in a controlled manner like this:
ls mod/*/*.php | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh
Or in turn run find(1) with some combination of depth parameters etc:
find mod/ -name '*.php' -mindepth 1 -maxdepth 2 | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh
My recommendation is to use:
find . -name '*.lua' -exec ex '+%s/\t/ /g' -cwq {} \;
Comments:
sed
is a stream editor. Use ex
for in place editing. This avoids creating extra temp files and spawning shells for each replacement as in the top answer. find|xargs
instead of find -exec
. As pointed out by @gniourf-gniourf this leads to problems with spaces, quotes and control chars in file names cf. Wheeler.You can use find
with tabs-to-spaces
package for this.
First, install tabs-to-spaces
npm install -g tabs-to-spaces
then, run this command from the root directory of your project;
find . -name '*' -exec t2s --spaces 2 {} \;
This will replace every tab
character with 2 spaces
in every file.
I used astyle
to re-indent all my C/C++ code after finding mixed tabs and spaces. It also has options to force a particular brace style if you'd like.
One can use vim
for that:
find -type f \( -name '*.css' -o -name '*.html' -o -name '*.js' -o -name '*.php' \) -execdir vim -c retab -c wq {} \;
As Carpetsmoker stated, it will retab according to your vim
settings. And modelines in the files, if any. Also, it will replace tabs not only at the beginning of the lines. Which is not what you generally want. E.g., you might have literals, containing tabs.
Download and run the following script to recursively convert hard tabs to soft tabs in plain text files.
Execute the script from inside the folder which contains the plain text files.
#!/bin/bash
find . -type f -and -not -path './.git/*' -exec grep -Iq . {} \; -and -print | while read -r file; do {
echo "Converting... "$file"";
data=$(expand --initial -t 4 "$file");
rm "$file";
echo "$data" > "$file";
}; done;
Git repository friendly method
git-tab-to-space() (
d="$(mktemp -d)"
git grep --cached -Il '' | grep -E "${1:-.}" | \
xargs -I'{}' bash -c '\
f="${1}/f" \
&& expand -t 4 "$0" > "$f" && \
chmod --reference="$0" "$f" && \
mv "$f" "$0"' \
'{}' "$d" \
;
rmdir "$d"
)
Act on all files under the current directory:
git-tab-to-space
Act only on C or C++ files:
git-tab-to-space '\.(c|h)(|pp)$'
You likely want this notably because of those annoying Makefiles which require tabs.
The command git grep --cached -Il ''
:
.git
as explained at: How to list all text (non-binary) files in a git repository?
chmod --reference
keeps the file permissions unchanged: https://unix.stackexchange.com/questions/20645/clone-ownership-and-permissions-from-another-file Unfortunately I can't find a succinct POSIX alternative.
If your codebase had the crazy idea to allow functional raw tabs in strings, use:
expand -i
and then have fun going over all non start of line tabs one by one, which you can list with: Is it possible to git grep for tabs?
Tested on Ubuntu 18.04.
The use of expand
as suggested in other answers seems the most logical approach for this task alone.
That said, it can also be done with Bash and Awk in case you may want to do some other modifications along with it.
If using Bash 4.0 or greater, the shopt builtin globstar
can be used to search recursively with **
.
With GNU Awk version 4.1 or greater, sed like "inplace" file modifications can be made:
shopt -s globstar
gawk -i inplace '{gsub("\t"," ")}1' **/*.ext
In case you want to set the number of spaces per tab:
gawk -i inplace -v n=4 'BEGIN{for(i=1;i<=n;i++) c=c" "}{gsub("\t",c)}1' **/*.ext
Use the vim-way:
$ ex +'bufdo retab' -cxa **/*.*
globstar
(**
) for recursion, activate by shopt -s globstar
.**/*.c
.To modify tabstop, add +'set ts=2'
.
However the down-side is that it can replace tabs inside the strings.
So for slightly better solution (by using substitution), try:
$ ex -s +'bufdo %s/^\t\+/ /ge' -cxa **/*.*
Or by using ex
editor + expand
utility:
$ ex -s +'bufdo!%!expand -t2' -cxa **/*.*
For trailing spaces, see: How to remove trailing whitespaces for multiple files?
You may add the following function into your .bash_profile
:
# Convert tabs to spaces.
# Usage: retab *.*
# See: https://stackoverflow.com/q/11094383/55075
retab() {
ex +'set ts=2' +'bufdo retab' -cxa $*
}
pr
is a wonderful utility for this. See this answer. – codeforester