6
votes

How can I remove unwanted files in an S3 bucket as the output of a pipeline in CodePipeline, using CodeBuild's buildspec.yml file?

For example:

The build folder of a GitHub repo is put in the designated S3 bucket so the bucket can be used as a static website.

I pushed a file earlier to the bucket which I don't need anymore. How do I use the buildspec.yml file to "clean" the bucket before pushing the artifacts of my pipeline to the bucket?

An example buildspec.yml file:

version: 0.2

phases:
  build:
    commands:
      - mkdir build-output
      - find . -type d -name public -exec cp -R {} build-output \;
      - find . -mindepth 1 -name build-output -prune -o -exec rm -rf {} +
  post_build:
    commands:
      - mv build-output/**/* ./
      - mv build-output/* ./
      - rm -R build-output
artifacts:
  files:
    - '**/*'

Should the command:

rm -rf *

in build phase like this?

build:

commands:
  - aws s3 rm s3://mybucket/ --recursive

And how do I reference the right bucket instead of hardcoding the name in the file?

1
Do you need to remove unnecessary files from the bucket every time or is it just a one-off operation?Milan Cermak
I don't have to but if the build changes often, I don't want to leave too many unused files in prod so I think it's best to "clean" the bucket every time the artefacts are pushed to it. Just my thought, I'm not sure if you and the others would do the same.Viet

1 Answers

12
votes

To delete the files in the S3 bucket, you can use the aws s3 rm --recursive command as you already alluded to.

You can pass in the bucket name from the pipeline to CodeBuild by setting it in the environment variable.

ArtifactsBucket:
  Type: AWS::S3::Bucket
  Properties:
    BucketName: my-artifacts

CodeBuildProject:
  Type: AWS::CodeBuild::Project
  Properties:
    Environment:
      EnvironmentVariables:
          - Name: ARTIFACTS_BUCKET
            Value: !Ref ArtifactsBucket
            Type: PLAINTEXT

In the buildspec, you can then refer to the ARTIFACTS_BUCKET env var, for example:

build:
  commands:
    - aws s3 rm --recursive "s3://${ARTIFACTS_BUCKET}/" 

An alternative approach you could take is to declare lifecycle management on the bucket. For example, you can say "delete all objects after 30 days" like so:

ArtifactsBucket:
  Type: AWS::S3::Bucket
  Properties:
    BucketName: my-artifacts
    LifecycleConfiguration:
      Rules:
        - ExpirationInDays: 30
          Id: Expire objects in 30 days
          Status: Enabled