3
votes

I wrote a pipline to build my Java application with Maven. I have feature branches and master branch in my Git repository, so I have to separate Maven goal package and deploy. Therefore I created two jobs in my pipeline. Last job needs job results from first job.

I know that I have to cache the job results, but I don't want to

  • expose the job results to GitLab UI
  • expose it to the next run of the pipeline

I tried following solutions without success.

Using cache

I followed How to deploy Maven projects to Artifactory with GitLab CI/CD:

Caching the .m2/repository folder (where all the Maven files are stored), and the target folder (where our application will be created), is useful for speeding up the process by running all Maven phases in a sequential order, therefore, executing mvn test will automatically run mvn compile if necessary.

but this solution shares job results between piplines, see Cache dependencies in GitLab CI/CD:

If caching is enabled, it’s shared between pipelines and jobs at the project level by default, starting from GitLab 9.0. Caches are not shared across projects.

and also it should not be used for caching in the same pipeline, see Cache vs artifacts:

Don’t use caching for passing artifacts between stages, as it is designed to store runtime dependencies needed to compile the project:

cache: For storing project dependencies

Caches are used to speed up runs of a given job in subsequent pipelines, by storing downloaded dependencies so that they don’t have to be fetched from the internet again (like npm packages, Go vendor packages, etc.) While the cache could be configured to pass intermediate build results between stages, this should be done with artifacts instead.

artifacts: Use for stage results that will be passed between stages.

Artifacts are files generated by a job which are stored and uploaded, and can then be fetched and used by jobs in later stages of the same pipeline. This data will not be available in different pipelines, but is available to be downloaded from the UI.

Using artifacts

This solution is exposing the job results to the GitLab UI, see artifacts:

The artifacts will be sent to GitLab after the job finishes and will be available for download in the GitLab UI.

and there is no way to expire the cache after finishing the pipeline, see artifacts:expire_in:

The value of expire_in is an elapsed time in seconds, unless a unit is provided.

Is there any way to cache job results only for the running pipline?

1
The normal way to send build artifacts between jobs in a pipeline is by using artifacts. You write that you don't want to "expose the job results to GitLab UI", what does that mean exactly? The build artifacts are available to be downloaded from the GitLab UI for the time you put into expire_in, and the build logs of your jobs are also stored in GitLab UI. Is there some kind of security concern here?MrBerta
@MrBerta You can browse and download artifacts of a job in the GitLab UI. That is at least unneccessary for intermediate data. Nobody wants to see that data, so it is wasting space and performance. In some cases it could be a security issue, but right now I have no security issue. No sensitive data.dur
@MrBerta BTW: It is also a little confusing, that one part of the documentation recommends cache and the other part artifacts. There is room for improvement.dur
It is the way that GitLab's CI architecture is set up. They want to support distributed runners that only communicate through the main GitLab instance. This is why all jobs push artifacts to the main instance and the other jobs retrieve them from here. I also think that the cache works the same way, that it uploads files to the main GitLab instance. The idea about the cache is to be able to reuse external things that your pipeline needs and that can be shared between different pipelines. Do you have very large build artifacts? Otherwise artifacts are quite convenient.MrBerta
Alright, so I wrote the comment before checking the documentation about cache... It is stored where the runner is installed or on s3. I will read a little bit about it.MrBerta

1 Answers

2
votes

There is no way to send build artifacts between jobs in GitLab that only keeps them as long as the pipeline is running. This is how GitLab has designed their CI solution.

The recommended way to send build artifacts between jobs in GitLab is to use artifacts. This feature always upload the files to the GitLab instance, that they call the coordinator in this case. These files are available through the GitLab UI, as you write. For most cases this is a complete waste of space, but in rare cases it is very useful as you can download the artifacts and check why your pipeline broke.

The artifacts are available for download by project members that are at least Reporters, but can be viewed by everybody if public pipelines is enabled. You can read more about permissions here.

To not fill up your hard disk or quotas, you should use an expire_in. You could set it to just a few hours if you really don't want to waste space. I would not recommend this though, as if a job that depend on these artifacts fails and you retry it, if the artifacts have expired, you will have to restart the whole pipeline. I usually put this to one week for intermediate build artifacts as that often fits my needs.

If you want to use caches for keeping build artifacts, maybe because your build artifacts are huge and you need to optimize it, it should be possible to use CI_PIPELINE_ID as the key of the cache (I haven't tested this):

cache:
  key: ${CI_PIPELINE_ID}

The files in the cache should be stored where your runner is installed. If you make sure that all jobs that need these build artifacts are executed by runners that have access to this cache, it should work.

You could also try some of the other predefined environment variables as key our your cache.