Container Security Scanning in CI/CD Pipelines
If you’re not scanning container images before they hit production, it’s only a matter of time before something ugly shows up in your environment. I learned this the hard way, and I’m going to walk you through exactly how I set up container security scanning in CI/CD pipelines so you don’t repeat my mistakes.
The Wake-Up Call
About two years ago, I was running a handful of microservices on ECS. Everything was humming along. Deployments were smooth, monitoring looked clean, the team was shipping features weekly. Life was good.
Then one Tuesday morning, our security team ran an ad-hoc scan across our container registry. They found a critical CVE in the base image of our main API service — a remote code execution vulnerability in OpenSSL that had been publicly disclosed three months earlier. Three months. That image had been rebuilt and redeployed dozens of times during that window, each time inheriting the same vulnerable base layer because nobody was checking.
The fix took twenty minutes. Updating the base image, rebuilding, redeploying. But the conversation with leadership about why a known critical vulnerability sat in production for a quarter? That took considerably longer.
That incident changed how I think about container pipelines entirely. Scanning isn’t optional. It’s not a “nice to have” you bolt on later. It belongs in the pipeline from day one, and it needs to break builds when it finds something serious.
Why Pipeline-Integrated Scanning Matters
You might already be following container security best practices — minimal base images, non-root users, read-only filesystems. That’s all essential. But static best practices don’t catch the CVE that gets disclosed next Wednesday in a library three layers deep in your dependency tree.
Scanning at build time catches vulnerabilities before they reach any environment. Scanning at the registry level catches drift. Scanning at runtime catches things that slipped through both. You want all three, but the pipeline is where you get the most leverage because it’s the chokepoint every image passes through.
The key insight: scanning needs to be automated, opinionated, and blocking. If it’s a manual step someone has to remember, it won’t happen. If it reports warnings that nobody reads, it’s theatre. The scanner needs to fail the build when policy is violated.
Choosing a Scanner
There are several solid options. I’ve used all of these in production and they each have tradeoffs.
Trivy is my default recommendation for most teams. It’s open source, fast, covers OS packages and application dependencies, and has zero configuration to get started. Aqua Security maintains it actively and the vulnerability database updates are frequent.
Grype from Anchore is another strong open-source option. It’s slightly more opinionated about output formats and integrates well if you’re already in the Anchore ecosystem. I find Trivy’s ecosystem broader, but Grype is perfectly solid.
Snyk Container is the commercial option I reach for when teams need a dashboard, policy management across repos, and integration with developer workflows. It’s not cheap, but for larger organisations the centralised visibility is worth it.
AWS ECR Image Scanning comes in two flavours — basic scanning powered by Clair, and enhanced scanning powered by Amazon Inspector. If you’re already pushing images to ECR, enhanced scanning is worth enabling. It provides continuous monitoring, not just point-in-time scans.
I typically run Trivy in the pipeline as a gate, and ECR enhanced scanning as a continuous backstop. Belt and suspenders.
Trivy in GitHub Actions
If you’re already building CI/CD pipelines with GitHub Actions, adding Trivy is straightforward. Here’s what a real scanning job looks like:
name: Build and Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build-and-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: |
docker build -t myapp:${{ github.sha }} .
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: table
exit-code: 1
severity: CRITICAL,HIGH
ignore-unfixed: true
- name: Run Trivy and upload SARIF
uses: aquasecurity/trivy-action@master
if: always()
with:
image-ref: myapp:${{ github.sha }}
format: sarif
output: trivy-results.sarif
severity: CRITICAL,HIGH
- name: Upload scan results
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: trivy-results.sarif
A few things to note here. The exit-code: 1 is what makes this a gate — the step fails if vulnerabilities matching the severity threshold are found. I set ignore-unfixed: true because there’s no point failing a build over a vulnerability that has no available fix yet. You can’t patch what doesn’t have a patch.
The second Trivy step runs with if: always() so it still produces the SARIF report even when the first step fails. That report gets uploaded to GitHub’s Security tab, giving you a nice UI for triaging findings.
Grype in GitLab CI
For GitLab, here’s how I wire up Grype:
stages:
- build
- scan
- push
build:
stage: build
image: docker:24
services:
- docker:24-dind
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker save $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -o image.tar
artifacts:
paths:
- image.tar
scan:
stage: scan
image: anchore/grype:latest
script:
- grype image.tar --fail-on critical --output table
- grype image.tar --output json > grype-report.json
artifacts:
paths:
- grype-report.json
when: always
push:
stage: push
image: docker:24
services:
- docker:24-dind
script:
- docker load -i image.tar
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
only:
- main
The --fail-on critical flag is the gate. The image only gets pushed if the scan passes. I save the image as a tarball between stages so we’re scanning the exact same artifact we built — no pulling from a registry, no race conditions.
Scanning Your Dockerfile and Dependencies Too
Image scanning catches vulnerabilities in what’s already built. But you can shift even further left by scanning the Dockerfile itself and your dependency lock files before the build even happens.
Trivy handles this natively:
- name: Scan Dockerfile for misconfigurations
uses: aquasecurity/trivy-action@master
with:
scan-type: config
scan-ref: .
exit-code: 1
severity: CRITICAL,HIGH
- name: Scan dependency files
uses: aquasecurity/trivy-action@master
with:
scan-type: fs
scan-ref: .
exit-code: 1
severity: CRITICAL
The config scan catches things like running as root, using latest tags, or exposing unnecessary ports. The filesystem scan catches vulnerable packages in your package-lock.json, go.sum, requirements.txt, or whatever your language uses. Both run before docker build, so you get faster feedback.
If you’re already building production-ready Docker images with multi-stage builds, you’re in a good position — smaller images mean a smaller attack surface and fewer things for the scanner to flag.
AWS ECR Enhanced Scanning
If your images end up in ECR — and if you’re on AWS, they probably should — enhanced scanning with Amazon Inspector adds a layer of continuous monitoring that pipeline scans alone can’t provide.
Pipeline scans are point-in-time. They tell you the image was clean when it was built. Enhanced scanning re-evaluates images when new CVEs are published, which is exactly the scenario that bit me with that three-month-old OpenSSL vulnerability.
Enable it at the registry level:
# CloudFormation
Resources:
ECRRepository:
Type: AWS::ECR::Repository
Properties:
RepositoryName: myapp
ImageScanningConfiguration:
ScanOnPush: true
RegistryScanningConfig:
Type: AWS::ECR::RegistryScanningConfiguration
Properties:
ScanType: ENHANCED
Rules:
- RepositoryFilters:
- Filter: "*"
FilterType: WILDCARD
ScanFrequency: CONTINUOUS_SCAN
With enhanced scanning, Inspector continuously monitors your images and publishes findings to Security Hub. You can set up EventBridge rules to alert on critical findings:
CriticalFindingRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.inspector2
detail-type:
- Inspector2 Finding
detail:
severity:
- CRITICAL
status:
- ACTIVE
Targets:
- Arn: !Ref AlertSNSTopic
Id: critical-finding-alert
This closes the loop. Pipeline scanning catches known issues at build time. ECR enhanced scanning catches new disclosures against images already in your registry. Together, they cover the full lifecycle.
Policy Enforcement That Actually Works
Scanning without enforcement is just generating reports nobody reads. I’ve seen teams set up beautiful scanning pipelines that produce detailed vulnerability reports, and then deploy anyway because “we’ll fix it next sprint.” Next sprint never comes.
Here’s my approach to policy that actually sticks:
Hard gates for critical and high severity. If Trivy or Grype finds a critical or high CVE with a fix available, the build fails. No exceptions, no override buttons. This sounds harsh until you realise that most critical findings are fixed by bumping a base image version — a five-minute change.
Allowlisting with expiry dates. Sometimes you genuinely can’t fix something immediately. Maybe the fix introduces a breaking change that needs testing, or the vulnerability is in a component you don’t actually expose. For these cases, I use a .trivyignore file with comments explaining why and when it should be revisited:
# CVE-2024-XXXXX - not exploitable in our configuration, revisit by 2026-04-15
CVE-2024-XXXXX
# CVE-2024-YYYYY - fix requires major version bump, scheduled for sprint 47
CVE-2024-YYYYY
Track allowlist growth. If your ignore file keeps growing, something is wrong. Either your base images are too old, your dependencies are unmaintained, or your team is using the allowlist as a way to avoid doing the work. I review the ignore file in every sprint retrospective.
Separate thresholds for different environments. I’m stricter about what goes to production than what runs in dev. A medium-severity finding might be acceptable in a development environment but not in prod. You can handle this with environment-specific severity flags:
scan-dev:
script:
- trivy image --exit-code 1 --severity CRITICAL $IMAGE
scan-prod:
script:
- trivy image --exit-code 1 --severity CRITICAL,HIGH $IMAGE
Handling the Noise
One of the biggest reasons teams abandon security scanning is alert fatigue. You enable scanning, get hit with 200 findings on day one, and everyone decides the tool is useless.
A few strategies that have worked for me:
Start with critical only. Get the pipeline working, get the team used to fixing critical findings, then gradually lower the threshold. Going from zero scanning to blocking on medium-severity findings overnight is a recipe for revolt.
Use ignore-unfixed. There’s nothing more demoralising than a build that fails because of a vulnerability nobody can fix yet. Trivy’s --ignore-unfixed flag filters these out. You’re still aware of them (they show up in the SARIF report), but they don’t block deployments.
Keep base images current. Half the findings I see in practice come from stale base images. If you’re building on node:18.12 when node:18.20 is available, you’re carrying months of accumulated vulnerabilities. Automate base image updates with Dependabot or Renovate. This pairs well with the multi-stage build patterns I covered in my post on building production-ready Docker images.
Separate OS findings from application findings. A vulnerability in libsystemd matters a lot less in a distroless container that doesn’t include systemd. Context matters, and good scanners let you filter by package type.
A Complete Pipeline Example
Pulling it all together, here’s what a comprehensive scanning pipeline looks like in GitHub Actions. This is close to what I run in production:
name: Secure Build Pipeline
on:
push:
branches: [main]
pull_request:
env:
IMAGE: myapp
ECR_REGISTRY: 123456789012.dkr.ecr.eu-west-1.amazonaws.com
jobs:
pre-build-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Scan IaC and Dockerfiles
uses: aquasecurity/trivy-action@master
with:
scan-type: config
scan-ref: .
exit-code: 1
severity: CRITICAL,HIGH
- name: Scan dependencies
uses: aquasecurity/trivy-action@master
with:
scan-type: fs
scan-ref: .
exit-code: 1
severity: CRITICAL
build-and-scan:
needs: pre-build-scan
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
security-events: write
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-arn: arn:aws:iam::123456789012:role/github-actions
aws-region: eu-west-1
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Build image
run: docker build -t $ECR_REGISTRY/$IMAGE:${{ github.sha }} .
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.ECR_REGISTRY }}/${{ env.IMAGE }}:${{ github.sha }}
format: table
exit-code: 1
severity: CRITICAL,HIGH
ignore-unfixed: true
trivyignores: .trivyignore
- name: Generate SARIF report
uses: aquasecurity/trivy-action@master
if: always()
with:
image-ref: ${{ env.ECR_REGISTRY }}/${{ env.IMAGE }}:${{ github.sha }}
format: sarif
output: trivy-results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: trivy-results.sarif
- name: Push to ECR
if: github.ref == 'refs/heads/main'
run: docker push $ECR_REGISTRY/$IMAGE:${{ github.sha }}
The pre-build scan catches Dockerfile misconfigurations and dependency vulnerabilities before we even build the image. The image scan catches everything else. The SARIF upload gives visibility in GitHub’s Security tab. And the push only happens on main, only after all scans pass.
Making It Stick
The technical setup is the easy part. The hard part is making security scanning a habit that the team actually values rather than resents.
A few things that helped my teams:
Make the feedback loop fast. If scanning adds ten minutes to every build, people will find ways around it. Trivy scans most images in under thirty seconds. If your scans are slow, check whether you’re re-downloading the vulnerability database every run — cache it.
Celebrate fixes, not findings. Nobody wants to be the person whose commit broke the build because of a security finding. Frame it differently: the pipeline caught something before it reached production. That’s a win.
Share context on findings. When a build fails, the developer needs to understand what the vulnerability is, whether it’s actually exploitable in their context, and what the fix is. Trivy’s table output is decent for this. For more detail, link to the CVE database entry.
Review and iterate. Your scanning policy isn’t set-and-forget. New tools emerge, new vulnerability categories matter, and your team’s tolerance for friction will change. Revisit your configuration quarterly.
Container security scanning in CI/CD isn’t glamorous work. Nobody’s writing blog posts about the CVE that didn’t make it to production because a pipeline caught it. But that’s exactly the point — the best security incidents are the ones that never happen. If you’re following a DevSecOps approach, pipeline scanning is one of the highest-leverage practices you can adopt. Start with Trivy, set exit-code: 1, and go from there.