18
votes

I have an amazon s3 bucket that has tens of thousands of filenames in it. What's the easiest way to get a list of all file or text file that lists all the filenames in the bucket?

I have tried with listObject(), but It seems that it only list 1000 files.

amazon-s3-returns-only-1000-entries-for-one-bucket-and-all-for-another-bucket-u S3-Provider-does-not-get-more-than-1000-items-from-bucket

--> Listing Keys Using the AWS SDK for PHP but in aws docs I read

max-keys - string - Optional - The maximum number of results returned by the method call. The returned list will contain no more results than the specified value, but may return fewer. The default value is 1000.

AWS DOC FOR list_objects

Is there some way to list it all and print it to a text file using AWS PHP SDK ?

Possible repeat : quick-way-to-list-all-files-in-amazon-s3-bucket

I have reposted the question because am looking for the solution in php.

Code :

$s3Client = S3Client::factory(array('key' => $access, 'secret' => $secret));

$response = $s3Client->listObjects(array('Bucket' => $bucket, 'MaxKeys' => 1000, 'Prefix' => 'files/'));
$files = $response->getPath('Contents');
$request_id = array();
foreach ($files as $file) {
    $filename = $file['Key'];
    print "\n\nFilename:". $filename;

 }
3
Note that in newer versions of the PHP SDK the client must be created like this instead: $s3Client = S3Client::factory(array('credentials' => array('key' => $access, 'secret' => $secret))); - TheStoryCoder
@TheStoryCoder : Thanks for information - Hitesh

3 Answers

18
votes

To get more than 1000 objects, you must make multiple requests using the Marker parameter to tell S3 where you left off for each request. Using the Iterators feature of the AWS SDK for PHP makes it easier to get all of your objects, because it encapsulates the logic of making multiple API requests. Try this:

$objects = $s3Client->getListObjectsIterator(array(
    'Bucket' => $bucket,
    'Prefix' => 'files/'
));

foreach ($objects as $object) {
    echo $object['Key'] . "\n";
}

With latest PHP SDK (as of March 2016) the code must be written like this instead:

$objects = $s3Client->getIterator('ListObjects', array(
    'Bucket' => $bucket,
    'Prefix' => 'files/'
));
5
votes

Use Paginator to get all files

    $client = new S3Client([
        'version' => AWS_S3_CLIENT_FACTORY_VERSION,
        'region' => AWS_S3_CLIENT_FACTORY_REGION,

    ]);
    $objects = $client->getPaginator('ListObjects', ['Bucket' => "my-bucket"]);
    foreach ($objects as $listResponse) {
        $items = $listResponse->search("Contents[?starts_with(Key,'path/to/folder/')]");
        foreach($items as $item) {
            echo $item['Key'] . PHP_EOL;
        }
    }

To get all files change the search to:

$listResponse->search("Contents[*]");
1
votes

Below code is just one trick, work around for this problem, I have pointed to my CDN bucket folder which have lot of folder alphabetically (a-z & A-Z), so I just made a multiple requests to make it list all files,

This code is to list mp4, pdf, png, jpg or all files

//letter range a-z and A-Z
$az = range('a', 'z');
$AZ = range('A', 'Z');
//To get the total no of files
$total = 0;
//text file
$File = "CDNFileList.txt"; 

//getting dropdownlist values 
$selectedoption = $_POST['cdn_dropdown_list'];
$file_ext = '';
if ($selectedoption == 'pdf'){
    $file_ext = 'PDF DOCUMENTS';
}else if(($selectedoption == 'jpg')){
    $file_ext = 'JPEG IMAGES';
}else if(($selectedoption == 'png')){
    $file_ext = 'PNG IMAGES';
}else if($selectedoption == 'mp4'){
    $file_ext = 'MP4 VIDEOS';
}else if($selectedoption == 'all'){
    $file_ext = 'ALL CONTENTS';
}
//Creating table
echo "<table style='width:300px' border='1'><th colspan='2'><b>List of $file_ext</b></th><tr><td><b>Name of the File</b></td><td><b>URL of the file</b></td></tr>";

foreach($az as $value){
        $response = $s3Client->listObjects(array('Bucket' => $bucket, 'MaxKeys' => 1000, 'Prefix' => 'files/'.$value));
        $files = $response->getPath('Contents');
        $file_list = array();
        foreach ($files as $file) {
                $filename = $file['Key'];
                if ( 'all' == ($selectedoption)){
                        $file_path_parts = pathinfo($filename);
                        $file_name = $file_path_parts['filename'];
                        echo "<tr><td>$file_name</td><td><a href = '";
                        echo $baseUrl.$filename;
                        echo "' target='_blank'>";
                        echo $baseUrl.$filename;
                        echo "</a></td></tr>";
                        $filename = $baseUrl.$filename.PHP_EOL; 
                        array_push($file_list, $filename);
                        $total++;
                }else{
                    $filetype = strtolower(substr($filename, strrpos($filename, '.')+1));
                    if ($filetype == ($selectedoption)){
                        $file_path_parts = pathinfo($filename);
                        $file_name = $file_path_parts['filename'];
                        echo "<tr><td>$file_name</td><td><a href = '";
                        echo $baseUrl.$filename;
                        echo "' target='_blank'>";
                        echo $baseUrl.$filename;
                        echo "</a></td></tr>";
                        $filename = $baseUrl.$filename.PHP_EOL; 
                        array_push($file_list, $filename);
                        $total++;
                    }
                }
        }
}

foreach($AZ as $value){
        $response = $s3Client->listObjects(array('Bucket' => $bucket, 'MaxKeys' => 1000, 'Prefix' => 'files/'.$value));
        $files = $response->getPath('Contents');
        $file_list = array();
        foreach ($files as $file) {
            $filename = $file['Key'];
            if ( 'all' == ($selectedoption)){
                    $file_path_parts = pathinfo($filename);
                    $file_name = $file_path_parts['filename'];
                    echo "<tr><td>$file_name</td><td><a href = '";
                    echo $baseUrl.$filename;
                    echo "' target='_blank'>";
                    echo $baseUrl.$filename;
                    echo "</a></td></tr>";
                    $filename = $baseUrl.$filename.PHP_EOL; 
                    array_push($file_list, $filename);
                    $total++;
            }else{
                $filetype = strtolower(substr($filename, strrpos($filename, '.')+1));
                if ($filetype == ($selectedoption)){
                    $file_path_parts = pathinfo($filename);
                    $file_name = $file_path_parts['filename'];
                    echo "<tr><td>$file_name</td><td><a href = '";
                    echo $baseUrl.$filename;
                    echo "' target='_blank'>";
                    echo $baseUrl.$filename;
                    echo "</a></td></tr>";
                    $filename = $baseUrl.$filename.PHP_EOL; 
                    array_push($file_list, $filename);
                    $total++;
                }
            }
        }
}
echo "</table><br/>";
print "\n\nTOTAL NO OF $file_ext ".$total;

This is just a workaround for this problem,Since there is no AWS API to list all the files (more than 1000). hope it helps someone.