9
votes

Has anyone had any success reading S3 buckets as subfolders?

folder1

-- subfolder2

---- file3

---- file4

-- file1

-- file2

folder2

-- subfolder3

-- file5

-- file6

My task is to read folder1. I expect to see subfolder2, file1 and file2, but NOT file3 or file4. Right now, because I restrict the bucket keys to prefix => 'folder1/', you still get file3 and 4 since they technically have the folder1 prefix.

It seems the only way to really do this is suck in all the keys under folder1 and then use string searching to actually exclude file3 and file4 from your results array.

Has anyone had experience doing this? I know FTP-style S3 clients like Transmit and Cyberduck must be doing this but it's not apparent from the S3 API itself.

Thanks ahead, Conrad

I've looked into both AWS::S3 and right_aws.

4
Thanks everyone for your answer. I ended up doing what coreyward suggested and calculated depth based on count('/') in the key.chuboy
Here's my code: gist.github.com/797841chuboy

4 Answers

7
votes

The S3 API has no notion of a folder. It does, however, allow for filenames with "/" in them, and it allows you to query with a prefix. You seem to be familiar with that already, but just wanted to be clear.

When you query with a prefix of folder1/, S3 is going to return everything under that "folder". In order to manipulate only direct descendants, you are going to have to filter the results yourself in Ruby (pick your poison: reject or select). This isn't going to help performance (a common reason to use "folders" in S3), but it gets the job done.

1
votes

You've run into a limitation of the S3 API, and the only way to do this is to do the filtering on the client.

The best (and most performant) option would be to 'mirror' your S3 storage structure in a database/xml file etc. and do you querying against that instead. Then just retrieve the files from S3 when the user has found the files they want.

1
votes

Update: for version II of the AWS SDK

Amazon has now created iterators that do allow for a 'prefixed search'. One can use this to emulate a directory/folder structure. In the above example (in PHP) the following should work:

$client = S3Client::factory(array(
        'key'    => $this->aKey, 
        'secret' => $this->sKey, 
        'region' => $this->region,
    )); 


$iterator = $client->getIterator('ListObjects', array(
            'Bucket' => 'folder1',
            'Prefix' => 'subfolder2/',  // supposing that the forward slash has been used to emulate diretcories
        ));

foreach ($iterator as $object) {
    echo $object['Key'] . "\n"; // will echo only file 3 and file 4
}
0
votes

Here's sample of using Virtual File System with S3 driver.

As said before S3 has no concept of folder, but it provides abilities to fake it. Virtual File System uses these abilities to provide You with 'virtual folder'

http://alexeypetrushin.github.com/vfs/basics.html

http://alexeypetrushin.github.com/vfs/s3_basics.html