It seems that ls
doesn't sort the files correctly when doing a recursive call:
ls -altR . | head -n 3
How can I find the most recently modified file in a directory (including subdirectories)?
It seems that ls
doesn't sort the files correctly when doing a recursive call:
ls -altR . | head -n 3
How can I find the most recently modified file in a directory (including subdirectories)?
find . -type f -printf '%T@ %p\n' \
| sort -n | tail -1 | cut -f2- -d" "
For a huge tree, it might be hard for sort
to keep everything in memory.
%T@
gives you the modification time like a unix timestamp, sort -n
sorts numerically, tail -1
takes the last line (highest timestamp), cut -f2 -d" "
cuts away the first field (the timestamp) from the output.
Edit: Just as -printf
is probably GNU-only, ajreals usage of stat -c
is too. Although it is possible to do the same on BSD, the options for formatting is different (-f "%m %N"
it would seem)
And I missed the part of plural; if you want more then the latest file, just bump up the tail argument.
Following up on @plundra's answer, here's the BSD and OS X version:
find . -type f -print0 \
| xargs -0 stat -f "%m %N" \
| sort -rn | head -1 | cut -f2- -d" "
Instead of sorting the results and keeping only the last modified ones, you could use awk to print only the one with greatest modification time (in unix time):
find . -type f -printf "%T@\0%p\0" | awk '
{
if ($0>max) {
max=$0;
getline mostrecent
} else
getline
}
END{print mostrecent}' RS='\0'
This should be a faster way to solve your problem if the number of files is big enough.
I have used the NUL character (i.e. '\0') because, theoretically, a filename may contain any character (including space and newline) but that.
If you don't have such pathological filenames in your system you can use the newline character as well:
find . -type f -printf "%T@\n%p\n" | awk '
{
if ($0>max) {
max=$0;
getline mostrecent
} else
getline
}
END{print mostrecent}' RS='\n'
In addition, this works in mawk too.
I had the trouble to find the last modified file under Solaris 10. There find
does not have the printf
option and stat
is not available. I discovered the following solution which works well for me:
find . -type f | sed 's/.*/"&"/' | xargs ls -E | awk '{ print $6," ",$7 }' | sort | tail -1
To show the filename as well use
find . -type f | sed 's/.*/"&"/' | xargs ls -E | awk '{ print $6," ",$7," ",$9 }' | sort | tail -1
Explanation
find . -type f
finds and lists all filessed 's/.*/"&"/'
wraps the pathname in quotes to handle whitespacesxargs ls -E
sends the quoted path to ls
, the -E
option makes sure that a full timestamp (format year-month-day hour-minute-seconds-nanoseconds) is returnedawk '{ print $6," ",$7 }'
extracts only date and timeawk '{ print $6," ",$7," ",$9 }'
extracts date, time and filenamesort
returns the files sorted by datetail -1
returns only the last modified fileI use something similar all the time, as well as the top-k list of most recently modified files. For large directory trees, it can be much faster to avoid sorting. In the case of just top-1 most recently modified file:
find . -type f -printf '%T@ %p\n' | perl -ne '@a=split(/\s+/, $_, 2); ($t,$f)=@a if $a[0]>$t; print $f if eof()'
On a directory containing 1.7 million files, I get the most recent one in 3.4s, a speed-up of 7.5x against the 25.5s solution using sort.
On Ubuntu 13, the following does it, maybe a tad faster, as it reverses the sort and uses 'head' instead of 'tail', reducing the work. To show the 11 newest files in a tree:
find . -type f -printf '%T@ %p\n' | sort -n -r | head -11 | cut -f2- -d" " | sed -e 's,^./,,' | xargs ls -U -l
This gives a complete ls listing without re-sorting and omits the annoying './' that 'find' puts on every file name.
Or, as a bash function:
treecent () {
local numl
if [[ 0 -eq $# ]] ; then
numl=11 # Or whatever default you want.
else
numl=$1
fi
find . -type f -printf '%T@ %p\n' | sort -n -r | head -${numl} | cut -f2- -d" " | sed -e 's,^\./,,' | xargs ls -U -l
}
Still, most of the work was done by plundra's original solution. Thanks plundra.
I faced the same issue. I need to find the most recent file recursively. find took around 50 minutes to find.
Here is a little script to do it faster:
#!/bin/sh
CURRENT_DIR='.'
zob () {
FILE=$(ls -Art1 ${CURRENT_DIR} | tail -n 1)
if [ ! -f ${FILE} ]; then
CURRENT_DIR="${CURRENT_DIR}/${FILE}"
zob
fi
echo $FILE
exit
}
zob
It's a recursive function who get the most recent modified item of a directory. If this item is a directory, the function is called recursively and search into this directory, etc.
I find the following shorter and with more interpretable output:
find . -type f -printf '%TF %TT %p\n' | sort | tail -1
Given the fixed length of the standardised ISO format datetimes, lexicographical sorting is fine and we don't need the -n
option on the sort.
If you want to remove the timestamps again, you can use:
find . -type f -printf '%TFT%TT %p\n' | sort | tail -1 | cut -f2- -d' '
I wrote a pypi/github package for this question because I needed a solution as well.
https://github.com/bucknerns/logtail
Install:
pip install logtail
Usage: tails changed files
logtail <log dir> [<glob match: default=*.log>]
Usage2: Opens latest changed file in editor
editlatest <log dir> [<glob match: default=*.log>]
To search for files in /target_directory and all its sub-directories, that have been modified in the last 60 minutes:
$ find /target_directory -type f -mmin -60
To find the most recently modified files, sorted in the reverse order of update time (i.e., the most recently updated files first):
$ find /etc -type f -printf '%TY-%Tm-%Td %TT %p\n' | sort -r
After using a find
-based solution for years, I found myself wanting the ability to exclude directories like .git
.
I switched to this rsync
-based solution. Put this in ~/bin/findlatest
:
#!/bin/sh
# Finds most recently modified files.
rsync -rL --list-only "$@" | grep -v '^d' | sort -k3,4r | head -5
Now findlatest .
will list the 5 most recently modified files, and findlatest --exclude .git .
will list the 5 excluding ones in .git
.
This works by taking advantage of some little-used rsync functionality: "if a single source arg is specified [to rsync] without a destination, the files are listed in an output format similar to ls -l" (rsync
man page).
The ability to take rsync args is useful in conjunction with rsync-based backup tools. For instance I use rsnapshot
, and I back up an application directory with rsnapshot.conf
line:
backup /var/atlassian/application-data/jira/current/ home +rsync_long_args=--archive --filter="merge /opt/atlassian/jira/current/backups/rsync-excludes"
where rsync-excludes
lists directories I don't want to backup:
- log/
- logs/
- analytics-logs/
- tmp/
- monitor/*.rrd4j
I can see now the latest files that will be backed up with:
findlatest /var/atlassian/application-data/jira/current/ --filter="merge /opt/atlassian/jira/current/backups/rsync-excludes"
Here is how to find and list the latest modified files in a directory with subdirectories. Hidden files are ignored on purpose. The time format can be customised.
$ find . -type f -not -path '*/\.*' -printf '%TY.%Tm.%Td %THh%TM %Ta %p\n' |sort -nr |head -n 10
Handles spaces in filenames well — not that these should be used!
2017.01.25 18h23 Wed ./indenting/Shifting blocks visually.mht
2016.12.11 12h33 Sun ./tabs/Converting tabs to spaces.mht
2016.12.02 01h46 Fri ./advocacy/2016.Vim or Emacs - Which text editor do you prefer?.mht
2016.11.09 17h05 Wed ./Word count - Vim Tips Wiki.mht
More find
galore following the link.