6
votes

I am fairly new to GitLab CI and I've been trying different approaches to use the node_modules directory in my entire pipeline. From what I've read in the official docs, cache and artifacts seem to be valid approaches to pass on files between jobs:

cache is used to specify a list of files and directories which should be cached between jobs. You can only use paths that are within the project workspace.

However, my issue with the caching method is that the node_modules would be persisted between pipelines by default:

  • cache can be set globally and per-job.
  • from GitLab 9.0, caching is enabled and shared between pipelines and jobs by default.

I do not want to persist the node_modules between pipelines. What I actually want is to trigger a fresh install with npm in my setup stage and then allow all further jobs in the pipeline to use these modules. Hence, I started using artifacts instead of cache, which is described similarly:

artifacts is used to specify a list of files and directories which should be attached to the job after success. [...]

The artifacts will be sent to GitLab after the job finishes successfully and will be available for download in the GitLab UI. The dependency feature should be used in conjunction with artifacts and allows you to define the artifacts to pass between different jobs.

The artifact-dependency method seems to be usable in my case. However, both cache and artifacts are extremely inefficient and slow. The node_modules are installed and usable, but the entire directory then gets uploaded somewhere and is re-downloaded between each job. (I would really love to know what happens here... Where do the modules go?)

Is there a better approach to run npm install only once at the beginning of the pipeline and then keep the node_modules in the pipeline during its entire runtime? I do not want to keep the node_modules after all jobs are finished so they don't need to be uploaded or downloaded anywhere.

Sample pipeline configuration file to reproduce the behavior:

image: node:lts

stages:
  - setup
  - build
  - test

node:
  stage: setup
  script:
    - npm install
  artifacts:
    paths:
      - node_modules/

build:
  stage: build
  script:
    - npm run build
  dependencies:
    - node

test:
  stage: test
  script:
    - npm run lint
    - npm run test
  dependencies:
    - node
1

1 Answers

2
votes

Where do the modules go?

By default artifacts are saved on the main gitlab machine:

/var/opt/gitlab/gitlab-rails/shared/artifacts

Is there a better approach to run npm install only once at the beginning of the pipeline and then keep the node_modules in the pipeline during its entire runtime?

There are some options that you can try:

  1. Merge setup and build stages to one stage.

  2. Local npm cache on builder machines. Faster npm install times. Or use private npm proxy registry (for example - Nexus/Artifactory)

  3. Check if gitlab main machine and the builders are in the same network so the upload/download will be faster

  4. Consider packaging your build in docker. You will get reusable docker images between your gitlab stages. (Of course that there is an overhead of uploading the images to docker registry)