a blog

Jenkins in the Ops space is in general already painful. Lately the deprecation of the multiple-scms plugin caused some headache, becaue we relied heavily on it to generate pipelines in a Seedjob based on structure inside secondary repositories. We kind of started from scratch now and ship parameterized pipelines defined in Jenkinsfiles in those secondary repositories. Basically that is the way it should be, you store the pipeline definition along with code you'd like to execute. In our case that is mostly terraform and ansible.

Problem

Directory structure is roughly "stage" -> "project" -> "service". We'd like to have one job pipeline per project, which dynamically reads all service folder names and offers those as available parameters. A service folder is the smallest entity we manage with terraform in a separate state file.

Now Jenkins pipelines are by intention limited, but you can add some groovy at will if you whitelist the usage in Jenkins. You have to click through some security though to make it work.

Jenkinsfile

This is basically a commented version of the Jenkinsfile we copy now around as a template, to be manually adjusted per project.

// Syntax: https://jenkins.io/doc/book/pipeline/syntax/
// project name as we use it in the folder structure and job name
def TfProject = "myproject-I-dev"
// directory relative to the repo checkout inside the jenkins workspace
def jobDirectory = "terraform/dev/${TfProject}"
// informational string to describe the stage or project
def stageEnvDescription = "DEV"

/* Attention please if you rebuild the Jenkins instance consider the following:

- You've to run this job at least *thrice*. It first has to checkout the
repository, then you've to add permisions for the groovy part, and on
the third run you can gather the list of available terraform folder.

- As a safeguard the first first folder name is always the invalid string
"choose-one". That prevents accidential execution of a random project.

- If you add new terraform folder you've to run the "choose-one" dummy rollout so
the dynamic parameters pick up the new folder. */

/* Here we hardcode the path to the correct job workspace on the jenkins host, and
   discover the service folder list. We have to filter it slightly to avoid temporary folders created by Jenkins (like @tmp folders). */
List tffolder = new File("/var/lib/jenkins/jobs/terraform ${TfProject}/workspace/${jobDirectory}").listFiles().findAll { it.isDirectory() && it.name ==~ /(?i)[a-z0-9_-]+/ }.sort()
/* ensure the "choose-one" dummy entry is always the first in the list, otherwise
   initial executions might execute something. By default the first parameter is
   used if none is selected */
tffolder.add(0,"choose-one")

pipeline {
    agent any
    /* Show a choice parameter with the service directory list we stored
       above in the variable tffolder */
    parameters {
        choice(name: "TFFOLDER", choices: tffolder)
    }
    // Configure logrotation and coloring.
    options {
        buildDiscarder(logRotator(daysToKeepStr: "30", numToKeepStr: "100"))
        ansiColor("xterm")
    }
    // Set some variables for terraform to pick up the right service account.
    environment {
        GOOGLE_CLOUD_KEYFILE_JSON = '/var/lib/jenkins/cicd.json'
        GOOGLE_APPLICATION_CREDENTIALS = '/var/lib/jenkins/cicd.json'
    }

stages {
    stage('TF Plan') {
    /* Make sure on every stage that we only execute if the
       choice parameter is not the dummy one. Ensures we
       can run the pipeline smoothly for re-reading the
       service directories. */
    when { expression { params.TFFOLDER != "choose-one" } }
    steps {
        /* Initialize terraform and generate a plan in the selected
           service folder. */
        dir("${params.TFFOLDER}") {
        sh 'terraform init -no-color -upgrade=true'
        sh 'terraform plan -no-color -out myplan'
        }
        // Read in the repo name we act on for informational output.
        script {
            remoteRepo = sh(returnStdout: true, script: 'git remote get-url origin').trim()
        }
        echo "INFO: job *${JOB_NAME}* in *${params.TFFOLDER}* on branch *${GIT_BRANCH}* of repo *${remoteRepo}*"
    }
    }
    stage('TF Apply') {
    /* Run terraform apply only after manual acknowledgement, we have to
       make sure that the when { } condition is actually evaluated before
       the input. Default is input before when. */
    when {
        beforeInput true
        expression { params.TFFOLDER != "choose-one" }
    }
    input {
        message "Cowboy would you really like to run **${JOB_NAME}** in **${params.TFFOLDER}**"
        ok "Apply ${JOB_NAME} to ${stageEnvDescription}"
    }
    steps {
        dir("${params.TFFOLDER}") {
        sh 'terraform apply -no-color -input=false myplan'
        }
    }
    }
}
    post {
            failure {
                // You can also alert to noisy chat platforms on failures if you like.
                echo "job failed"
            }
        }

job-dsl side of the story

Having all those when { } conditions in the pipeline stages above allows us to create a dependency between successful Seedjob executions and just let that trigger the execution of the pipeline jobs. This is important because the Seedjob execution itself will reset all pipeline jobs, so your dynamic parameters are gone. By making sure we can re-execute the job, and doing that automatically, we still have up to date parameterized pipelines, whenever the Seedjob ran successfully.

The job-dsl script looks like this:

import javaposse.jobdsl.dsl.DslScriptLoader;
import javaposse.jobdsl.plugin.JenkinsJobManagement;
import javaposse.jobdsl.plugin.ExecuteDslScripts;
def params = [
    // Defaults are repo: mycorp/admin, branch: master, jenkinsFilename: Jenkinsfile
    pipelineJobs: [
        [name: 'terraform myproject-I-dev', jenkinsFilename: 'terraform/dev/myproject-I-dev/Jenkinsfile', upstream: 'Seedjob'],
        [name: 'terraform myproject-I-prod', jenkinsFilename: 'terraform/prod/myproject-I-prod/Jenkinsfile', upstream: 'Seedjob'],
    ],
]

params.pipelineJobs.each { job ->
    pipelineJob(job.name) {
        definition {
            cpsScm {
                // assume admin and branch master as a default, look for Jenkinsfile
                def repo = job.repo ?: 'mycorp/admin'
                def branch = job.branch ?: 'master'
                def jenkinsFilename = job.jenkinsFilename ?: 'Jenkinsfile'
                scm {
                    git("ssh://git@github.com/${repo}.git", branch)
                }
                scriptPath(jenkinsFilename)
            }
        }
        properties {
            pipelineTriggers {
                triggers {
                    if(job.upstream) {
                        upstream {
                            upstreamProjects("${job.upstream}")
                            threshold('SUCCESS')
                        }
                    }
                }
            }
        }
    }
}

Disadvantages

There are still a bunch of disadvantages you've to consider

Jenkins Rebuilds are Painful

In general we rebuild our Jenkins instances quite frequently. With the approach outlined here in place, you've to allow the groovy script execution after the first Seedjob execution, and then go through at least another round of run the job, allow permissions, run the job, until it's finally all up and running.

Copy around Jenkinsfile

Whenever you create a new project you've to copy around Jenkinsfiles for each and every stage and modify the variables at the top accordingly.

Keep the Seedjob definitions and Jenkinsfile in Sync

You not only have to copy the Jenkinsfile around, but you also have to keep the variables and names in sync with what you define for the Seedjob. Sadly the pipeline env-vars are not available outside of the pipeline when we execute the groovy parts.

Kudos

This setup was crafted with a lot of help by Michael and Eric.

Posted Wed Dec 23 14:20:04 2020

Edit

The latest docker 20.10.x release unlocks the buildx subcommands which allow for some sugar, like building something in a container and dumping the result to your local directory in one command.

Dockerfile

FROM docker-registry.mycorp.com/debian-node:lts as builder
USER service
COPY . /opt/service
RUN cd /opt/service; npm install; npm run build

FROM scratch as dist
COPY --from=builder /opt/service/dist /

build with

docker buildx build --target=dist --output type=local,dest=$(pwd)/pages/ .

Here we build a page, copy the result with all assets from the /opt/service/dist directory to an empty image and dump it into the local pages directory.

Posted Mon Dec 21 12:12:29 2020

Edit

Another note to myself before I forget about this nifty usage of socat again. I was looking for something to mock a serial device, similar to a microcontroller which usually ends up as /dev/ttyACM0 and might output some text. What I found is a very helpful post on stackoverflow showing an example utilizing socat.

$ socat -d -d pty,rawer pty,rawer
2020/12/20 21:37:53 socat[29130] N PTY is /dev/pts/8
2020/12/20 21:37:53 socat[29130] N PTY is /dev/pts/11
2020/12/20 21:37:53 socat[29130] N starting data transfer loop with FDs [5,5] and [7,7]

Write whatever you need to the second pty, here /dev/pts/11, e.g.

$ i=0; while :; do echo "foo: ${i}" > /dev/pts/11; let i++; sleep 5; done

Now you can listen with whatever you like, e.g. some tool you work on, on the fist pty, here /dev/pts/8. For demonstration purpose just use cat:

$ cat /dev/pts/8
foo: 0
foo: 1

socat is an awesome tool, looking through the manpage you need some knowledge about sockets, but it's incredibly vesatile.

Posted Sun Dec 20 21:52:25 2020

Edit

One of the most awesome helpers I carry around in my ~/bin since the early '00s is the sanity.pl script written by Andreas Gohr. It just recently came back to use when I started to archive some awesome Corona enforced live session music with youtube-dl.

Update: Francois Marier pointed out that Debian contains the detox package, which has a similar functionality.

Posted Wed Oct 14 15:56:05 2020

Edit

Now that GitHub released v1.0 of the gh cli tool, and this is all over HN, it might make sense to write a note about my clumsy aliases and shell functions I cobbled together in the past month. Background story is that my dayjob moved to GitHub coming from Bitbucket. From my point of view the WebUI for Bitbucket is mediocre, but the one at GitHub is just awful and painful to use, especially for PR processing. So I longed for the terminal and ended up with gh and wtfutil as a dashboard.

The setup we have is painful on its own, with several orgs and repos which are more like monorepos covering several corners of infrastructure, and some which are very focused on a single component. All workflows are anti GitHub workflows, so you must have permission on the repo, create a branch in that repo as a feature branch, and open a PR for the merge back into master.

gh functions and aliases

# setup a token with perms to everything, dealing with SAML is a PITA
export GITHUB_TOKEN="c0ffee4711"
# I use a light theme on my terminal, so adjust the gh theme
export GLAMOUR_STYLE="light"

#simple aliases to poke at a PR
alias gha="gh pr review --approve"
alias ghv="gh pr view"
alias ghd="gh pr diff"

### github support functions, most invoked with a PR ID as $1

#primary function to review PRs
function ghs {
    gh pr view ${1}
    gh pr checks ${1}
    gh pr diff ${1}
}

# very custom PR create function relying on ORG and TEAM settings hard coded
# main idea is to create the PR with my team directly assigned as reviewer
function ghc {
    if git status | grep -q 'Untracked'; then
        echo "ERROR: untracked files in branch"
        git status
        return 1
    fi
    git push --set-upstream origin HEAD
    gh pr create -f -r "$(git remote -v | grep push | grep -oE 'myorg-[a-z]+')/myteam"
}

# merge a PR and update master if we're not in a different branch
function ghm {
    gh pr merge -d -r ${1}
    if [[ "$(git rev-parse --abbrev-ref HEAD)" == "master" ]]; then
        git pull
    fi
}

# get an overview over the files changed in a PR
function ghf {
    gh pr diff ${1} | diffstat -l
}

# generate a link to a commit in the WebUI to pass on to someone else
# input is a git commit hash
function ghlink {
    local repo="$(git remote -v | grep -E "github.+push" | cut -d':' -f 2 | cut -d'.' -f 1)"
    echo "https://github.com/${repo}/commit/${1}"
}

Update 2020-10-14: create pr from a branch with multiple commits

Bitbucket had a nice PR creation functionality by default: If you created a PR from a branch with multiple commits, it derived the titel from the branch name and create a PR discribtion based on all commit messages. I replicated this behaviour, and open the description text in an editor (via $EDITOR) for you to edit. Feels more native, like a git commit, now. In the honor of Bitbucket it currently derives the PR title from the branch name, though I'm wondering if that should be changed to something more helpful. Lacking ideas at the moment.

function ghbbc {
    if git status | grep -q 'Untracked'; then
        echo "ERROR: untracked files in branch"
        git status
        return 1
    fi
    local commitmsg="$(mktemp ${XDG_RUNTIME_DIR}/ghbbc_commit.XXXXXXX)"
    git log --pretty=format:"%B" origin.. > ${commitmsg}
    eval "${EDITOR} ${commitmsg}"
    git push --set-upstream origin HEAD
    gh pr create \
    -r "$(git remote -v | grep push | grep -oE 'myorg-[a-z]+')/myteam" \
    -b "$(cat ${commitmsg})" \
    -t "$(git rev-parse --abbrev-ref HEAD)"
    rm ${commitmsg}

wtfutil

I have a terminal covering half my screensize with small dashboards listing PRs for the repos I care about. For other repos I reverted back to mail notifications which get sorted and processed from time to time. A sample dashboard config looks like this:

github_admin:
  apiKey: "c0ffee4711"
  baseURL: ""
  customQueries:
    othersPRs:
      title: "Pull Requests"
      filter: "is:open is:pr -author:hoexter -label:dependencies"
  enabled: true
  enableStatus: true
  showOpenReviewRequests: false
  showStats: false
  position:
    top: 0
    left: 0
    height: 3
    width: 1
  refreshInterval: 30
  repositories:
    - "myorg/admin"
  uploadURL: ""
  username: "hoexter"
  type: github

The -label:dependencies is used here to filter out dependabot PRs in the dashboard.

Workflow

Look at a PR with ghv $ID, if it's ok ACK it with gha $ID. Create a PR from a feature branch with ghc and later on merge it with ghm $ID. The $ID is retrieved from looking at my wtfutil based dashboard.

Security Considerations

The world is full of bad jokes. For the WebUI access I've the full array of pain with SAML auth, which expires too often, and 2nd factor verification for my account backed by a Yubikey. But to work with the CLI you basically need an API token with full access, everything else drives you insane. So I gave in and generated exactly that. End result is that I now have an API token - which is basically a password - which has full power, and is stored in config files and environment variables. So the security features created around the login are all void now. Was that the aim of it after all?

Posted Fri Sep 18 12:05:30 2020

Edit

This is just a "warn your brothers" post for those who use Cloudflare Bot Management, and have customers which use MITM boxes to break up TLS 1.3 connections.

Be aware that right now some heuristic rules in the Cloudflare Bot Management score TLS 1.3 requests made by some MITM boxes with 1 - which equals "we're 99.99% sure that this is none human browser traffic". While technically correct - the TLS connection hitting the Cloudflare Edge node is not established by a browser - that does not help your customer if you block those requests. If you do something like blocking requests with a BM score of 1 at the Cloudflare Edge, you might want to reconsider that at the moment and sent a captcha challenge instead. While that is not a lot nicer, and still pisses people off, you might find a balance there between protecting yourself and still having some customers.

I've a confirmation for this happening with Cisco WSA, but it's likely to be also the case with other vendors. Breaking up TLS 1.2 seems to be stealthy enough in those appliances that it's not detected, so this issue creeps in with more enterprises rolling out modern browser.

You can now enter youself here a rant about how bad the client-server internet of 2020 is, and how bad it is that some of us rely on Cloudflare, and that they have accumulated a way too big market share. But the world is as it is.

Posted Tue Sep 8 14:08:35 2020

Edit

Update

I have to stand corrected. noahm@ wrote me, because the Debian Cloud Image maintainer only ever included python explicitly in Azure images. The most likely explanation for the change in the Google images is that Google just ported the last parts of their own software to python 3, and subsequently removed python 2.

With some relieve one can just conclude it's only our own fault that we did not build our own images, which include all our own dependencies. Take it as reminder to always build your own images. Always. Be it VMs or docker. Build your own image.

Original Post

Fun in the morning, we realized that the Debian Cloud image builds dropped python 2 and that propagated to the Google provided Debian/buster images. So in case you use something like ansible, and so far assumed python 2 as the default interpreter, and installed additional python 2 modules to support ansible modules, you now have to either install python 2 again or just move to python 3k.

We just try to suffer it through now, and set interpreter_python = auto in our ansible.cfg to anticipate the new default behaviour, which is planned for ansible 2.12. See also https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html

Other lesson to learn here: The GCE Debian stable images are not stable. Blends in nicely with this rant, though it's not 100% a Google Cloud foul this time.

Posted Mon Aug 24 11:12:17 2020

Edit

Note to myself so I do not have to figure that out every few month when I've to dig out a WLAN PSK from my existing configuration.

Step 1: Figure out the UUID of the network:

$ nmcli con show
NAME                  UUID                                  TYPE      DEVICE          
br-59d010130b86       d8672d3d-7cf6-484f-9ef8-e6ec3e73bef7  bridge    br-59d010130b86 
FRITZ!Box 7411        1ed1cec1-f586-4e75-ba6d-c9f6f4cba6e2  wifi      wlp4s0
[...]

Step 2: Request to view the PSK for this network based on the UUID

$ nmcli --show-secrets --fields 802-11-wireless-security.psk con show '1ed1cec1-f586-4e75-ba6d-c9f6f4cba6e2'
802-11-wireless-security.psk:           0815471123420511111

Posted Fri Aug 14 10:11:34 2020

Edit

Supply chain attacks are a known issue, and also lately there was a discussion around the relevance of reproducible builds. Looking in comparison at an average IT org doing something with the internet, I believe the pressing problem is neither supply chain attacks nor a lack of reproducible builds. The real problem is the amount of prefabricated binaries supplied by someone else, created in an unknown build environment with unknown tools, the average IT org requires to do anything.

The Mess the World Runs on

By chance I had an opportunity to look at what some other people I know use, and here is the list I could compile by scratching just at the surface:

80% of what HashiCorp releases. Vagrant, packer, nomad, terraform, just all of it. In the case of terraform of course with a bunch of providers and for Vagrant with machine images from the official registry.
Lots of ansible usecases, usually retrieved by pip.
Jenkins + a myriad of plugins from the Jenkins plugin registry.
All the tools/SDKs of a cloud provider du jour to interface with the Cloud. Mostly via 3rd party Debian repository.
docker (the repo for dockerd) and DockerHub
Mobile SDKs.
Kafka fetched somewhere from apache.org.
Binary downloads from github. Many. Go and Rust make it possible.
Elastic, more or less the whole stack they offer via their Debian repo.
Postgres + the tools around it from the apt.postgresql.org Debian repo.
archive.debian.org because it's hard to keep up at times.
Maven Central.

Of course there are also all the script language repos - Python, Ruby, Node/Typescript - around as well.

Looking at myself, who's working in a different IT org but with a similar focus, I have the following lingering around on my for work laptop and retrieved it as a binary from a 3rd party:

dockerd from the docker repo
vscode from the microsoft repo
vivaldi from the vivaldi repo
Google Cloud SDK from the google repo
terraform + all the providers from hashicorp
govc form github
containerdiff from github(yes, by now included in Debian main)
github gh cli tool from github
wtfutil from github

Yes some of that is even non-free and might contain spyw^telemetry.

Takeway I

By guessing based on Pareto Principle probably 80% of the software mentioned above is also open source software. But, and here we leave Pareto behind, close to none is build by the average IT org from source.

Why should the average IT org care about advanced issues like supply chain attacks on source code and mitigations, when it already gets into very hot water the day DockerHub closes down, HashiCorp moves from open core to full proprietary or Elastic decides to no longer offer free binary builds?

The reality out there seems to be that infrastructure of "modern" IT orgs is managed similar to the Windows 95 installation of my childhood. You just grab running binaries from somewhere and run them. The main difference seems to be that you no longer have the inconvenience of downloading a .xls from geocities you've to rename to .rar and that it's legal.

Takeway II

In the end the binary supply is like a drug for the user, and somehow the Debian project is also just another dealer / middle man in this setup. There are probably a lot of open questions to think about in that context.

Are we the better dealer because we care about signed sources we retrieve from upstream and because we engage in reproducible build projects?

Are our own means of distributing binaries any better than a binary download from github via https with a manual checksum verification, or the Debian repo at download.docker.com?

Is the approach of the BSD/Gentoo ports, where you have to compile at least some software from source, the better one?

Do I really want to know how some of the software is actually build?

Or some more candid ones like is gnutls a good choice for the https support in apt and how solid is the gnupg code base? Update: Regarding apt there seems to be some movement.

Posted Thu Aug 13 18:14:25 2020

Edit

As some might have noticed we now have Linux 5.7 in Debian/unstable and subsequently the in kernel exFAT implementation created by Samsung available. Thus we now have two exFAT implementations, the exfat fuse driver and the Linux based one. Since some comments and mails I received showed minor confusions, especially around the available tooling, it might help to clarify a bit what is required when.

Using the Samsung Linux Kernel Implementation

Probably the most common use case is that you just want to use the in kernel implementation. Easy, install a Linux 5.7 kernel package for your architecture and either remove the exfat-fuse package or make sure you've version 1.3.0-2 or later installed. Then you can just run mount /dev/sdX /mnt and everything should be fine.

Your result will look something like this:

$ mount|grep sdb
/dev/sdb on /mnt type exfat (rw,relatime,fmask=0022,dmask=0022,iocharset=utf8,errors=remount-ro

In the past this basic mount invocation utilized the mount.exfat helper, which was just a symlink to the helper which is shipped as /sbin/mount.exfat-fuse. The link was dropped from the package in 1.3.0-2. If you're running a not so standard setup, and would like to keep an older version of exfat-fuse installed you must invoke mount -i to prevent mount from loading any helper to mount the filesystem. See also man 8 mount.

For those who care, mstone@ and myself had a brief discussion about this issue in #963752, which quickly brought me to the conclusion that it's in the best interest of a majority of users to just drop the symlink from the package.

Sticking to the Fuse Driver

If you would like to stick to the fuse driver you can of course just do it. I plan to continue to maintain all packages for the moment. Just keep the exfat-fuse package installed and use the mount.exfat-fuse helper directly. E.g.

$ sudo mount.exfat-fuse /dev/sdb /mnt
FUSE exfat 1.3.0
$ mount|grep sdb
/dev/sdb on /mnt type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096)

In case this is something you would like to make permanent, I would recommend that you create yourself a mount.exfat symlink pointing at the mount.exfat-fuse helper.

mkfs and fsck - exfat-utils vs exfatprogs

Beside of the filesystem access itself we now also have two implementations of tooling, which support filesystem creation, fscking and label adjustment. The older one is exfat-utils by the creator of the fuse driver, which is also part of Debian since the fuse driver was packaged in 2012. New in Debian is the exfatprogs package written by the Samsung engineers. And here the situation starts to get a bit more complicated.

Both packages ship a mkfs.exfat and fsck.exfat tool, so we can not co-install them. In the end both packages declare a conflict with each other at the moment. As outlined in this thread I do not plan to overcomplicate the situation by using the alternatives system. I do feel strongly that this would just create more confusion without a real benefit. Since the tools do not have matching cli options, that could also cause additional issues and confusion.

I plan to keep that as is, at least for the bullseye release. Afterwards it's possible, depending on how the usage evolves, to drop the mkfs.exfat and fsck.exfat from exfat-utils, they are in fact again only symlinks. pain point might be tools interfacing with the differing implementations. Currently I see only three reverse depedencies, so that should be manageable to consolidate if required.

Last but not least it might be relevant to mention that the exfat-utils package also contains a dumpexfat tool which could be helpful if you're more into forensics, or looking into other lower level analysis of an exFAT filsystem. Thus there is a bit of an interest to have those tools co-installed in some - I would say - niche cases.

buster-backports

Well if you use buster with a backports kernel you're a bit on your own. In case you want to keep the fuse driver installed, but would still like to mount, e.g. for testing, with the kernel exFAT driver, you must use mount -i. I do not plan any uploads to buster-backports. If you need a mkfs.exfat on buster, I would recommend to just use the one from exfat-utils for now. It has been good enough for the past years, should not get sour before the bullseye release, which ships exfatprogs for you.

Kudos

My sincere kudos go to:

Andrew Nayenko who wrote the exFAT fuse implemetation which was very helpful to many people for the past years. He's a great upstream to work with.
Namjae Jeon and Hyunchul Lee who maintain the Linux exFAT driver and exfatprogs. They are also very responsive upstreams and easy to work with.
Last but not least our ftp-master who reviewed the exfatprogs package way faster than what I had anticipated looking at the current NEW backlog.

Posted Fri Jul 17 17:58:03 2020

Edit