Infrastructure Archives - Timur and associates

Building Terraform Quick Start repo part 3 – Azure DevOps API

This is the third part of the series following our humble endeavors to automate Terraform deployment tasks. First part here, second part here. With housekeeping out of the way, let’s get on to the content.

Now that we’ve got the Terraform part sorted, we’d like to take this project to a logical conclusion and build a reusable template for our future endeavors.

Compile or script?

Our initial idea was to write a small console app that would gather parameters and make all API calls. But this repository got us thinking, that it’s way more efficient to be able to just run the script off GitHub. So, we went to the drawing board and ended up with a Bash script. It ain’t much but it’s honest work.

Ultimately the script goes to create an ADO Project, imports its own GitHub repo into a newly created project and proceeds to set up the pipeline.

Streamlining process

With this tooling, we can now automate most of our process. We’d start by obtaining the parameters and setting up required credentials:

In ADO we’ll need to create and grab a PAT of a user with permissions to manage the organization

In target Azure environment we need to start with finding the tenant id
While we’re collecting intel, we’d also grab target Subscription Name and Id.
Next step would be creating a Service Principal for Terraform.

By default, the principal has no permissions, so we’ll need to give it something like Contributor access on a Subscription we want to manage

Finally, it’s good practice to name Azure resources in such a way that it makes sense later. We come up with a distinct prefix for Terraform state storage account. Since storage accounts have strict naming policies, our prefix must be 2-13 characters long and must only contain alphanumerics.

Once all prep work is done, running script should produce an ADO project:

And running a default pipeline there should deploy Terraform management resource group (this is where state file will sit) and an actual workload – in our case it’s a Static Web App

Conclusion

This repository gives us a good starting point in our engagements with clients using ADO. As more clients start to pick GitHub as their platform of choice, we may have to upgrade it to use Actions. Until then, happy infrastructure-as-coding!

Building Terraform Quick Start repo part 2 – Zero touch pipeline

This is the second part of the series following our humble endeavors to automate Terraform deployment tasks. First part here. With housekeeping out of the way, let’s get on to the content.

For purposes of this exercise, it does not matter what we want to deploy. Can be a simple Web App or full fat Landing Zone. The pipeline itself remains unchanged.

Sample Infrastructure

Since we want an absolute minimum, we’ll go with one resource group and one Static Web App:

#============= main.tf ====================
terraform {
  backend "azurerm" { }

  required_providers {
    azurerm = {
      version = "~> 2.93"
    }
  }
}

# Set target subscription for deployment
provider "azurerm" {
  features {}
  subscription_id = var.subscription_id
}
#============= infra.tf ==================== 
resource "azurerm_resource_group" "main" {
  name = "${var.prefix}-${var.environment}-${var.location}-workload-rg"
  location = var.location
}

resource "azurerm_static_site" "main" {
  name = "${var.prefix}-${var.environment}-${var.location}-swa"
  resource_group_name = azurerm_resource_group.main.name
  location = var.location
}

We’ll focus on the pipeline though

Since our goal is to have as little human intervention as possible, we went with multi-stage YAML pipeline.

the YAML may look something like that:

trigger: none # intended to run manually
name: Deploy Terraform

pool:
  vmImage: 'ubuntu-latest'

variables:
  - group: 'bootstrap-state-variable-grp'

stages:
- stage: bootstrap_state
  displayName: 'Bootstrap TF State'
  jobs:
  - job: tf_bootstrap
    steps:
    - task: AzureResourceManagerTemplateDeployment@3
      inputs:
        deploymentScope: 'Subscription'
        azureResourceManagerConnection: '$(azureServiceConnection)'
        subscriptionId: '$(targetSubscriptionId)'
        location: '$(location)'
        csmFile: '$(Build.SourcesDirectory)/bicep/main.bicep' # on dev machine, compile into ARM (az bicep build --file .\bicep\main.bicep) and use that instead until agent gets update to 3.199.x
        deploymentOutputs: 'deploymentOutputs'
        overrideParameters: '-prefix $(prefix) -location $(location)'
    - script: |
        # this script takes output from ARM deployment and makes it available to steps further down the pipeline
        echo "##vso[task.setvariable variable=resourceGroupName;isOutput=true]`echo $DEPLOYMENT_OUTPUT | jq -r '.resourceGroupName.value'`"
        echo "##vso[task.setvariable variable=storageAccountName;isOutput=true]`echo $DEPLOYMENT_OUTPUT | jq -r '.storageAccountName.value'`"
        echo "##vso[task.setvariable variable=containerName;isOutput=true]`echo $DEPLOYMENT_OUTPUT | jq -r '.containerName.value'`"
        echo "##vso[task.setvariable variable=storageAccessKey;isOutput=true;isSecret=true]`echo $DEPLOYMENT_OUTPUT | jq -r '.storageAccessKey.value'`"
      # https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#share-variables-across-pipelines
      name: armOutputs # giving name to this task is extremely important as we will use it to reference the variables from later stages      
      env:
        DEPLOYMENT_OUTPUT: $(deploymentOutputs)

- stage: run_tf_plan # Build stage
  displayName: 'TF Plan'
  jobs:
  - job: tf_plan
    variables:
      # to be able to reference outputs from earlier stage, we start hierarchy from stageDependencies and address job outputs by full name: <stage_id>.<job_id>.outputs
      - name: resourceGroupName
        value: $[ stageDependencies.bootstrap_state.tf_bootstrap.outputs['armOutputs.resourceGroupName'] ]
      - name: storageAccountName
        value: $[ stageDependencies.bootstrap_state.tf_bootstrap.outputs['armOutputs.storageAccountName'] ]
      - name: containerName
        value: $[ stageDependencies.bootstrap_state.tf_bootstrap.outputs['armOutputs.containerName'] ]
      - name: storageAccessKey
        value: $[ stageDependencies.bootstrap_state.tf_bootstrap.outputs['armOutputs.storageAccessKey'] ]
    steps:              
      # check out TF code from git
      - checkout: self
        persistCredentials: true
      # init terraform and point the backend to correct storage account
      - task: TerraformTaskV2@2 # https://github.com/microsoft/azure-pipelines-extensions/blob/master/Extensions/Terraform/Src/Tasks/TerraformTask/TerraformTaskV2/task.json
        displayName: terraform init
        inputs:
          workingDirectory: '$(System.DefaultWorkingDirectory)/tf'
          backendServiceArm: $(azureServiceConnection)
          backendAzureRmResourceGroupName: $(resourceGroupName)
          backendAzureRmStorageAccountName: $(storageAccountName)
          backendAzureRmContainerName: $(containerName)
          backendAzureRmKey: '$(prefix)/terraform.tfstate'
        env:
          ARM_ACCESS_KEY: $(storageAccessKey)
      # run terraform plan and store it as a file so we can package it
      - task: TerraformTaskV2@2
        displayName: terraform plan
        inputs:
          workingDirectory: '$(System.DefaultWorkingDirectory)/tf'
          environmentServiceNameAzureRM: $(azureServiceConnection)
          command: 'plan'
          # feed tfvars file and set variables for azure backend (see TF files for usage)
          commandOptions: '-input=false -var-file=terraform.tfvars -var="prefix=$(prefix)" -var="location=$(location)" -var="subscription_id=$(targetSubscriptionId)" -out=$(prefix)-plan.tfplan'
        env:
          ARM_ACCESS_KEY: $(storageAccessKey)
      # package workspace into an artifact so we can publish it
      - task: ArchiveFiles@2
        inputs:
          displayName: 'Create Plan Artifact'
          rootFolderOrFile: '$(System.DefaultWorkingDirectory)/tf'
          includeRootFolder: false                
          archiveFile: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip'
          replaceExistingArchive: true
      # publish artifact to ADO
      - task: PublishBuildArtifacts@1
        inputs:
          displayName: 'Publish Plan Artifact'
          PathtoPublish: '$(Build.ArtifactStagingDirectory)'
          ArtifactName: '$(Build.BuildId)-tfplan'
          publishLocation: 'Container'          

- stage: run_tf_apply # Deploy stage
  dependsOn: 
    - bootstrap_state # adding extra dependencies so we can access armOutputs from earlier stages
    - run_tf_plan # by default next stage would have depended on the previous, but we broke that chain by depending on earlier stages
  displayName: 'TF Apply'
  jobs:  
  - deployment: tf_apply
    variables:
      # to be able to reference outputs from earlier stages, we start hierarchy from stageDependencies and address job outputs by full name: <stage_id>.<job_id>.outputs
      - name: storageAccessKey
        value: $[ stageDependencies.bootstrap_state.tf_bootstrap.outputs['armOutputs.storageAccessKey'] ]
    environment: 'dev' # required for deployment jobs. will need to authorise the pipeline to use it at first run
    strategy:
        runOnce:
          deploy:
            steps:
            # grab published artifact
            - task: DownloadBuildArtifacts@0
              inputs:
                artifactName: '$(Build.BuildId)-tfplan'
                displayName: 'Download Plan Artifact'
            # unpack the archive, we should end up with all necessary files in root of working directory
            - task: ExtractFiles@1
              inputs:
                archiveFilePatterns: '$(System.ArtifactsDirectory)/$(Build.BuildId)-tfplan/$(Build.BuildId).zip'
                destinationFolder: '$(System.DefaultWorkingDirectory)/'
                cleanDestinationFolder: false
                displayName: 'Extract Terraform Plan Artifact'
            - task: TerraformTaskV2@2
              displayName: terraform apply
              inputs:
                workingDirectory: $(System.DefaultWorkingDirectory)
                command: 'apply'
                commandOptions: '-auto-approve -input=false $(prefix)-plan.tfplan'
                environmentServiceNameAzureRM: $(azureServiceConnection)
              env:
                ARM_ACCESS_KEY: $(storageAccessKey)

Couple of notes regarding the pipeline

The pipeline is pretty straightforward so instead of going through it line by line, we just wanted to point out a few things that really helped us put this together

armOutputs is where we capture JSON outputs and feed them to pipeline.
Building on top of that, we had to import these variables in subsequent stages using stage dependencies. The pipeline can ultimately be represented as a tree containing stages on top level and ending with tasks as leaves. Keywords dependencies and stageDependencies tell us which level we’re looking at
For this trick to work, the requesting stage must depend on the stage where variables are exported from. By default, subsequent stages depend on the stages immediately preceding them. But in more complicated scenarios we can use dependsOn parameter and specify it ourselves.
Keen-eyed readers may notice we do not perform Terraform Install at all. This is very intentional, as Hosted Agent we’re using for this build already has TF 1.1.5 installed. It’s good enough for us but may need an upgrade in your case
The same point applies to using jq in our JSON parsing script – it’s already in there but your mileage may vary

Conclusion

With the build pipeline sorted, we’re yet another step closer to our zero-touch Terraform deployment nirvana. We already can grab the code and commit it into a fresh ADO project to give our workflow a boost. I’m not sharing the code just yet as there are still a couple of things we can do, so watch this space for more content!

Building Terraform Quick Start repo part 1 – Bootstrapping Azure remote state

We often get to come in, deploy cloud services for customers and get out. Some customers have established teams and processes, others have green fields and rely on us to do the right thing. Regardless of the level of investment, customers expect us to stick to the best practice and not only create bits of cloud infrastructure for them but also do the right thing and codify the infrastructure as much as possible. By default, we’d stick to Terraform for that.

Storing state

To be able to manage infrastructure and detect changes, Terraform needs a place to store current state of affairs. The easiest solution would be to store the state file locally but that’s not really an option for CI/CD pipelines. Luckily, we’ve got a bunch of backends to pick from.

This, however, leads to a chicken and egg situation where we can’t use Terraform to deploy storage backend without having access to storage backend where it can keep state file.

Bicep

So far, we’ve been mostly dealing with Azure so it made sense to prep a quick Bicep snippet to create required resources for us. One thing to keep in mind is the fact that Bicep by default deploys resources into resourceGroup scope. This implies we’ve already created a resource group, which is not exactly what we want to do. To switch it up we need to start at subscription level (this is what we are usually given, anyway) and create a resource group followed by whatever else we wanted. The recommended way to do that would be to declare main template for RG and reference a module with all other good stuff:

targetScope = 'subscription' // switching scopes here


// declaring some parameters so we can easier manage the pipeline later
@maxLength(13)
@minLength(2)
param prefix string
param tfstate_rg_name string = '${prefix}-terraformstate-rg'
@allowed([
  'australiaeast'
])
param location string

// creating resource group
resource rg 'Microsoft.Resources/resourceGroups@2021-01-01' = {
  name: tfstate_rg_name
  location: location
}

// Deploying storage account via module reference
module stg './tfstate-storage.bicep' = {
  name: 'storageDeployment'
  scope: resourceGroup(rg.name)
  params: {
    storageAccountName: '${prefix}statetf${take(uniqueString(prefix),4)}'
    location: location
  }
}

the module code would be important here:

param storageAccountName string
param location string
param containerName string = 'tfstate' 

output storageAccountName string = storageAccountName
output containerName string = containerName

resource storageAccount_resource 'Microsoft.Storage/storageAccounts@2021-06-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
  properties: {
    allowBlobPublicAccess: true
    networkAcls: {
      bypass: 'AzureServices'
      virtualNetworkRules: []
      ipRules: []
      defaultAction: 'Allow'
    }
    supportsHttpsTrafficOnly: true
    encryption: {
      services: {
        blob: {
          keyType: 'Account'
          enabled: true
        }
      }
      keySource: 'Microsoft.Storage'
    }
    accessTier: 'Hot'
  }
}

resource blobService_resource 'Microsoft.Storage/storageAccounts/blobServices@2021-06-01' = {
  parent: storageAccount_resource
  name: 'default'
  properties: {
    cors: {
      corsRules: []
    }
    deleteRetentionPolicy: {
      enabled: false
    }
  }
}

resource storageContainer_resource 'Microsoft.Storage/storageAccounts/blobServices/containers@2021-06-01' = {
  parent: blobService_resource
  name: containerName
  properties: {
    immutableStorageWithVersioning: {
      enabled: false
    }
    defaultEncryptionScope: '$account-encryption-key'
    denyEncryptionScopeOverride: false
    publicAccess: 'None'
  }
}

Assuming we just want to chuck all our assets into a repository and drive from there, it’d make sense to also write a simple ADO deployment pipeline. Previously we’d have to opt for AzureCLI task and do something like this:

- task: AzureCLI@2
  inputs:
    azureSubscription: $(azureServiceConnection)
    scriptType: bash
    scriptLocation: inlineScript
    inlineScript: |
      # steps to create RG
      az deployment group create --resource-group $(resourceGroupName) --template-file bicep/main.bicep

Luckily, the work has been done and starting with agent version 3.199, AzureResourceManagerTemplateDeployment does support Bicep deployments natively! Unfortunately, at the time of testing our ADO-hosted agent was still at version 3.198 so we had to cheat and compile Bicep down to ARM manually. The final pipeline, however, would look something like this:

trigger: none # intended to run manually

name: Deploy TF state backend via Bicep

pool:
  vmImage: 'ubuntu-latest'

variables:
  - group: "bootstrap-state-variable-grp" # define variable groups to point to correct subscription

steps:
- task: AzureResourceManagerTemplateDeployment@3
  inputs:
    deploymentScope: 'Subscription'
    azureResourceManagerConnection: $(azureServiceConnection)
    subscriptionId: $(targetSubscriptionId)
    location: $(location)
    templateLocation: 'Linked Artifact'
    csmFile: '$(System.DefaultWorkingDirectory)/bicep/main.bicep' # on dev machine, compile into ARM (az bicep build --file .\bicep\main.bicep) and use that instead until agent gets update to 3.199.x
    deploymentMode: 'Incremental'
    deploymentOutputs: 'storageAccountParameters'
    overrideParameters: '-prefix $(prefix) -location $(location)'

Running through ADO should yield us a usable storage account within a brand-new resource group:

Where to from here

Having dealt with foundations, we should be able to capture output of this step (we mostly care about storage account name as it’s got some randomness in it) and feed it to Terraform backend provider. We’ll cover it in the next part of this series.

Conclusion

Existing solutions in this space have so far relied on either PowerShell or az cli to do the job. That’s still doable but can get a bit bulky, especially if we want to query outputs. Now that Bicep support is landing in AzureResourceManagerTemplateDeploymentV3 directly, we will likely see this as a recommended approach.

Git SSH setup for VisualStudio

Every now and then we need to set ourselves up a new dev machine. And 99% of the time, that means setting up git source control. We believe that password authentication is a no-no, so we needed a quick way to bootstrap fresh Windows 10 install to use SSH key pairs.

This Is The Way

Setting things up would involve making sure OpenSSH is installed, ssh-agent is running and key pair is generated and registered with the agent. Finally, we’d go to http://dev.azure.com/{orgname}/_usersSettings/keys and paste public key in. This however is a laborious task, and most sources online seem to suggest doing it that way. We decided to simplify:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/tkhadimullin/win-ssh-bootstrap/master/install.ps1'))

this will download and run the following:

if (-Not ([Security.Principal.WindowsPrincipal] [Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole] 'Administrator')) {
    Write-Warning  "Running as non-Admin user. Skipping environment checks"
} else {
    $capability = Get-WindowsCapability -Online | Where-Object Name -like "OpenSSH.Client*"

    if($capability.State -ne "Installed") {
        Write-Information "Installing OpenSSH client"
        Add-WindowsCapability -Online -Name $capability.Name
    } else {
        Write-Information "OpenSSH client installed"
    }

    $sshAgent = Get-Service ssh-agent
    if($sshAgent.Status -eq "Stopped") {$sshAgent | Start-Service}
    if($sshAgent.StartType -eq "Disabled") {$sshAgent | Set-Service -StartupType Automatic }
}

if([String]::IsNullOrWhiteSpace([Environment]::GetEnvironmentVariable("GIT_SSH"))) {
    [Environment]::SetEnvironmentVariable("GIT_SSH", "$((Get-Command ssh).Source)", [System.EnvironmentVariableTarget]::User)
}

$keyPath = Join-Path $env:Userprofile ".ssh\id_rsa" {
 # Assuming file name here
if(-not (Test-Path $keyPath)) { 
    ssh-keygen -q -f $keyPath -C "autogenerated_key" -N """" # empty password
    ssh-add -q -f $keyPath
} 

$line = Get-Content -Path "$($keyPath).pub" | Select-Object -First 1 # assuming file name and key index

Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing
$form = New-Object System.Windows.Forms.Form
$form.Text = 'Your SSH Key'
$form.Size = New-Object System.Drawing.Size(600,150)
$form.StartPosition = 'CenterScreen'

$okButton = New-Object System.Windows.Forms.Button
$okButton.Location = New-Object System.Drawing.Point(260,70)
$okButton.Size = New-Object System.Drawing.Size(75,23)
$okButton.Text = 'OK'
$okButton.DialogResult = [System.Windows.Forms.DialogResult]::OK
$form.AcceptButton = $okButton
$form.Controls.Add($okButton)

$label = New-Object System.Windows.Forms.Label
$label.Location = New-Object System.Drawing.Point(10,10)
$label.Size = New-Object System.Drawing.Size(280,20)
$label.Text = 'Copy your key and paste into ADO:'
$form.Controls.Add($label)

$textBox = New-Object System.Windows.Forms.TextBox
$textBox.Location = New-Object System.Drawing.Point(10,30)
$textBox.Size = New-Object System.Drawing.Size(560,40)
$textBox.Text = $line
$textBox.ReadOnly = $true

$form.Controls.Add($textBox)
$form.Add_Shown({$textBox.Select()})
$form.Topmost = $true
$form.ShowDialog()

This script will take care of prerequisites (if run as admin) or try to generate a key in case everything else is done. Then it’ll paint a small window with public key:

The script makes a couple of assumptions about existing keys and will just roll with defaults. Nothing fancy at all. We also wanted to automate posting to ADO, but that did not happen (see below).

Setting up Visual Studio

Next order of business was to set up the IDE. It appears, Visual Studio would default to using password credentials, unless we set a GIT_SSH environment variable and point it to ssh.exe from OpenSSH distribution. The script will take care of that too.

Posting public key to Azure DevOps (not really)

ADO does not have an API for managing SSH keys. Therefore, generating PATs and service credentials will not going to help. We can try to make it happen by reverse engineering the front-end call and hoping it’s isolated enough for us to be able to repeat the procedure. Turns out, it’s indeed a matter of sending payload to https://dev.azure.com/{org}/_apis/Contribution/HierarchyQuery – this looks like a common message bus for ADO Extensions to post updates to:

{
    "contributionIds": [
        "ms.vss-token-web.personal-access-token-issue-session-token-provider"
    ],
    "dataProviderContext": {
        "properties": {
            "displayName": "key-name",
            "publicData": "ssh-rsa Aaaaaaaaaaaaaabbbbbb key-comment",
            "validFrom": "2021-11-30T08:00:00.000Z",
            "validTo": "2026-11-30T08:00:00.000Z",
            "scope": "app_token",
            "targetAccounts": [
                "xxxxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxxxx"
            ],
            "isPublic": true
        }
    }
}

The first issue waits us right in the payload: dataProviderContext.targetAccounts needs a value, but we could not find where to fetch it from. It’s loaded along with other content on the page, but opening it kind of eliminates the purpose of automating this task. And unfortunately, that’s not the only obstacle we’ve hit there.

Authentication

Front end relies on cookies to authenticate this request. We found that the only one we really need is UserAuthentication:

The value is standard JWT, issued by app.vstoken.visualstudio.com. Getting it requires us to register an app and have users go through oAuth flow. Also, since ADO works on concept of tenants and organisations, it is tricky to get the correct tenancy without interactive login. It seems doable, but we have deemed it to be not worth the effort. <sad_face_emoji_here>

Conclusion

Despite not being able to reach our fully automated nirvana, we’ve got to a state where we’d prep the system for SSH and surface the public key to copy-paste. It seems that reverse engineering the ADO frontend and extracting token from there is very much achievable, but at the stage we’d not pursue it. Publishing the code on GitHub gives us a faint hope the Community may push it across the line.

Setting up L2TP VPN with Mikrotik

For quite some time we wanted to be able to securely access our on-prem services, such as local NAS, IoT hub and Grafana. We have tried setting up PPTP but quickly realised that the technology has been long compromised. IPsec would be a great option, but it requires both ends of tunnel to have static IP addresses.

OpenVPN and AWS

Theoretically we can simply spin up an EC2 instance from the marketplace or even configure it manually, but we were feeling adventurous.

Setting up Client VPN Endpoint on AWS effectively stands up managed OpenVPN instance. We ended up not going with it (and we’ll get to reasons in a few moments), but let’s quickly go through steps one would need to take to pull it off. The setup is fairly complex and involved:

Set up server certificate in AWS Certificate manager. Public certificates are free, but we had to go through DNS-based ownership validation, which is not that hard but takes anywhere between 15 minutes and few hours and we were not planning to use that domain name to connect to our server anyway.
Make sure to pick up IP range that’s big enough (at least /22) and does not overlap with given VPC
Stand up some sort of Directory Service for user authentication. Cognito is not an option, and we don’t have AD readily available. Creating full fat AD just for VPN seemed overkill, so we created Simple AD (which is still surplus to needs). It would’ve been fine, but to manage it, we had to stand up a Windows EC2. We of course joined it into the domain. And this stage it became obvious, that creating a virtual EC2 appliance would probably be way easier, but we decided to proceed for the sake of science.
Finally, coming back to VPC we created a Gateway and VPN itself. One thing to keep in mind here is Transport protocol: Mikrotik only supports TCP. Yuck.
All we have left to do now would be to download .ovpn file and use it to set up our router. But unfortunately, this is where our shenanigans will have to stop: RouterOS does not support AES-256-GCM.

L2TP scripts

Since we were standing up compute resources anyway, our goal shifted towards finding the easiest way to set things up. And IPsec VPN Server Auto Setup Scripts delivered just that! Just running wget https://git.io/vpnquickstart -O vpn.sh && sudo sh vpn.sh on a fresh EC2 instance did the trick for us. One thing to remember is to save auto-generated credentials the script prints on exit – that’s almost all VPN server setup done.

Since we had a router on the other end and wanted access to internal resources, we had to log in again and add couple of routes into /etc/ppp/ip-up.local:

#!/bin/bash
/sbin/route add -net 192.168.99.0/24 gw $4 # see for parameters: https://tldp.org/HOWTO/PPP-HOWTO/x1455.html

We also wanted to use conditional routing on the client side and only route certain client machines through the tunnel. For that, /etc/sysconfig/iptables needed a little update:

# Modified by hwdsl2 VPN script
*nat
:POSTROUTING ACCEPT [0:0]
# autogenerated code here
-A POSTROUTING -s 192.168.99.0/24 -j MASQUERADE # adding our own network so it gets NATted

COMMIT

Finally, we needed to enable L2TP through AWS NSG:

Mikrotik setup

With WinBox, setting up VPN in RouterOS is pretty straightforward:

you may notice we opted to not use the VPN as default route. This solution comes with tradeoffs, but in our case, we wanted to only tunnel specific clients. For that we have set up policy routing. Added a Mangle rule where we mark all connections from chosen hosts and then assigned new routing table to these packets:

Conclusion

It is a bit unfortunate that in 2021 Mikrotik still does not properly support OpenVPN. On the other hand, it exposes a lot of configurability to cater for uncommon network layouts. And now we got a bit closer to realising its full potential.

ARR: Setting up

Not so long ago a client asked us to spec up their CI/CD pipeline. They are going through devops transformation and as part of their “speed up delivery” objective they wanted to minimize downtime when they deploy new versions of their software or run maintenance.

🟦🟩-deployments

First thing we wanted to try was to introduce blue-green approach. The application runs on IIS and luckily for us, Microsoft offers a solution there: ARR. There’s heaps documentation online, and most examples seem to point at scaling out by routing traffic to application round robin. In our case the application was not ready for that just yet so we decided to use it for directing all traffic to one backend server only while we deploy the inactive one:

Farming Web Farms

ARR introduces a concept of Web Farms. This basically is a logical grouping of content servers that ARR treats as one site. Each farm comes with settings on how caching should work, or what actual content servers are like. It’s pretty easy to set up when we’ve got one or two of there. But in our case we were looking at approximately 100 farms. Yikes! Overall the process is pretty simple: create farm, add content servers, create URL rewrite rule. Nothing fancy and documentation is plentiful. What we wanted to do however was to automate everything into one script that could later run remotely when triggered by CI/CD pipelines.

PowerShell to the rescue

Our requirements were pretty standard until we realized that there’s no easy way to insert URL rewrite rules into arbitrary positions in the list. So we implemented a set of dummy rules that the script uses as anchors to locate a place where to inject new rule. We also needed node health check to cut off inactive servers, the easiest was a plain text file in website root with words “UP” or “DOWN” so that we can swap ARR slots by simply updating a file. ARR supports a few ways to programmatically manage, but since we’re on Windows we picked PowerShell as our tool of choice and ended up with something like this:

function CheckIfExists($xpath, $name, $remove = $true) {      
   $existing = Get-WebConfigurationProperty -pspath $psPath "$xpath[@name='$name']" -Name .
   if($null -ne $existing) {      
      if($remove) {
         Clear-WebConfiguration -pspath $psPath -Filter "$xpath[@name='$name']"
      }
      return  $true
   }
function IndexOfNode($collection, $name) {
   $i=0
   for ($i=0; $i -lt $existing.Collection.Count; $i++)
   {
      if ($collection[$i].name -eq $name) 
      { 
         return $i
      }
   }
   return $i-1 #found nothing - return position at the end of collection
}

function CreateRule($name, $matchUrl = "*", $atAnchor = "") {
   $matchingPatternSyntax = if ($matchUrl -eq "*") {"Wildcard"} else { "ECMAScript" };

   $existing = Get-WebConfiguration -pspath $psPath "system.webServer/rewrite/globalRules"
   $index = IndexOfNode $existing.Collection $atAnchor

   Add-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules" -AtIndex $index -name "." -value @{name=$name;patternSyntax=$matchingPatternSyntax;stopProcessing='True';enabled='True'}
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/match" -name "url" -value $matchUrl
}

function CreateRuleCondition($name, $in = "{HTTP_HOST}", $pattern, $negate = $false) 
{
   $value = @{
      input=$in;
      pattern=$pattern; 
   }

   if($negate -eq $true) {
      $value.Add("negate", "True")
   }

   Add-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/conditions" -name "." -value $value
}

function CreateRewriteAction($name, $url) {   
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/action" -name "type" -value "Rewrite"
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/action" -name "url" -value "$url/{R:0}"
}

function CreateRedirectRule(
   $ruleName,      
   $matchUrl,
   $conditionHost,
   $farmName,
   $recreate = $true,
   $atName
) 
{
   if(CheckIfExists "system.webServer/rewrite/globalRules/rule" $ruleName $recreate) {
      if($recreate) {
         Write-Host "Removed existing $ruleName before proceeding"
      } else {
         Write-Host "Skipped existing $ruleName"
         return
      }
   }
   
   # Create a new rule
   CreateRule "$ruleName" -matchUrl $matchUrl -atAnchor $atName
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$farmName']/conditions" -name "logicalGrouping" -value "MatchAny"
   
   $conditionHost | ForEach-Object {
      CreateRuleCondition $ruleName -pattern $_
   }
   
   CreateRewriteAction $farmName "https://$farmName"
}

function CreateWebFarm(
      $farmName,
      $healthCheckUrl,
      $blueIpAddress,
      $greenIpAddress,
      $parentSiteHostName,
      $Recreate = $true
   )
{
   return; #debugging
   if(CheckIfExists "webFarms/webFarm" $farmName $Recreate) {
      if($Recreate) {
         Write-Host "Removed existing $farmName before proceeding"
      } else {
         Write-Host "Skipped existing $farmName"
         return
      }
   }
   
   Add-WebConfigurationProperty -pspath $psPath  -filter "webFarms" -name "." -value @{name=$farmName}
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/affinity" -name "useCookie" -value "True"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/protocol/cache" -name "enabled" -value "False"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "url" -value $healthCheckUrl
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "interval" -value "00:00:10"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "timeout" -value "00:00:5"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "responseMatch" -value "UP"

   Add-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']" -name "." -value @{address=$blueIpAddress}
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/server[@address='$blueIpAddress']/applicationRequestRouting" -name "hostName" -value $parentSiteHostName

   Add-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']" -name "." -value @{address=$greenIpAddress}
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/server[@address='$greenIpAddress']/applicationRequestRouting" -name "hostName" -value $parentSiteHostName
}

$host
 = "newservice.example.com"
$farm = "newservice-farm"

CreateWebFarm $farm "https://$farm/healthcheck.htm" $greenIp $blueIp $host
CreateRedirectRule -ruleName $farm -matchUrl "*" -conditionHost @($host) -farmName $farm -atName "--Inserting rules above this point--"

Running https with Docker

It’s not a secret we love Docker. And with recent changes to how Chrome treats SameSite cookies it’s become a bit of a pain to develop any sort of oAuth solutions with containers: these have to go over SSL so the browser takes it.

Tools like dotnet dev-certs do provide some relief by generating self-signed certs and adding those to trusted store on host machine. In short – most of the time, host-to-container development is not an issue.

What if we need more than one domain?

Sometimes there will be cases where we’d like to access the same service by two domain names. It might be useful if Host header is required:

we can opt for what’s known a SAN certificate. It’s an extension to x.509 that allows us to reuse the same cert for multiple domain names. We can then trust certificate on our dev machine and make Docker use the same cert for HTTPS:

#create a SAN cert for both server.docker.local and localhost
$cert = New-SelfSignedCertificate -DnsName "server.docker.local", "localhost" -CertStoreLocation cert:\localmachine\my

#export it for docker container to pick up later
$password = ConvertTo-SecureString -String "123123" -Force -AsPlainText
Export-PfxCertificate -Cert $cert -FilePath C:\https\aspnetapp.pfx -Password $password

# trust it on our host machine
$store = New-Object System.Security.Cryptography.X509Certificates.X509Store "TrustedPublisher","LocalMachine"
$store.Open("ReadWrite")
$store.Add($cert)
$store.Close()

More containers?

Sometimes we want one container to talk to another while retaining the ability to check up on things from localhost. Consider the following docker-compose:

version: '3'
services:
  client: # client process that needs to talk to server
    depends_on:
      - server
  server: # server that we'd also like to access from the outside
    image:     
    ports:
      - "8443:443"

This would roughtly translate to the following network layout:

Problems start

When one container needs to talk to another container it’s a slightly different story: dev tools don’t have control over containers and cannot magically trust certificates inside there. We can try opt for properly signed certificates (from Let’s Encrypt for example), but that’s a whole different story and is likely not worth it for development machines.

The above powershell script is also going to fall short as it’s only adding the cert onto development machine – containers will keep failing to validate the cert. The solution would build on the same principles: generate a self-signed cert and trust it everywhere. Since most Docker containers run Linux we might have better luck going the opposite direction and generating certs in PEM format using a well known tool OpenSSL. Then we’d use Dockerfiles to inject this cert into all our containers and finally we’d convert it into format suitable for our host Windows machine (PKCS#12).

$certPass = "password_here"
$certSubj = "host.docker.internal"
$certAltNames = "DNS:localhost,DNS:host.docker.internal,DNS:identity_server" # we can also add individual IP addresses here like so: IP:127.0.0.1
$opensslPath="path\to\openssl\binaries" #aOpenSSL needs to be present on the host, no installation is necessary though
$workDir="path\to\your\project"
$dockerDir=Join-Path $workDir "ProjectApi"

#generate a self-signed cert with multiple domains
Start-Process -NoNewWindow -Wait -FilePath (Join-Path $opensslPath "openssl.exe") -ArgumentList "req -x509 -nodes -days 365 -newkey rsa:2048 -keyout ",
                                          (Join-Path $workDir aspnetapp.key),
                                          "-out", (Join-Path $dockerDir aspnetapp.crt),
                                          "-subj `"/CN=$certSubj`" -addext `"subjectAltName=$certAltNames`""

# this time round we convert PEM format into PKCS#12 (aka PFX) so .net core app picks it up
Start-Process -NoNewWindow -Wait -FilePath (Join-Path $opensslPath "openssl.exe") -ArgumentList "pkcs12 -export -in ", 
                                           (Join-Path $dockerDir aspnetapp.crt),
                                           "-inkey ", (Join-Path $workDir aspnetapp.key),
                                           "-out ", (Join-Path $workDir aspnetapp.pfx),
                                           "-passout pass:$certPass"

$password = ConvertTo-SecureString -String $certPass -Force -AsPlainText
$cert = Get-PfxCertificate -FilePath (Join-Path $workDir "aspnetapp.pfx") -Password $password

# and still, trust it on our host machine
$store = New-Object System.Security.Cryptography.X509Certificates.X509Store [System.Security.Cryptography.X509Certificates.StoreName]::Root,"LocalMachine"
$store.Open("ReadWrite")
$store.Add($cert)
$store.Close()

Example: Running Identity Server

Now we have our certs (for example, located in %USERPROFILE%.aspnet\https). Here’s a quick how to tell asp.net core -base containers to pick them up:

docker pull your_docker_image
docker run --rm -it -p 8000:80 -p 8001:443 -e ASPNETCORE_URLS="https://+;http://+" -e ASPNETCORE_HTTPS_PORT=8001 -e ASPNETCORE_Kestrel__Certificates__Default__Password="123123" -e ASPNETCORE_Kestrel__Certificates__Default__Path=\https\aspnetapp.pfx -v %USERPROFILE%\.aspnet\https:C:\https\ your_docker_image

docker run <your image> --rm -it -p 8000:80 -p 8001:443 -e ASPNETCORE_URLS="https://+;http://+" -e ASPNETCORE_HTTPS_PORT=8001 -e ASPNETCORE_Kestrel__Certificates__Default__Password="123123" -e ASPNETCORE_Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx

Or in docker-compose format:

version: '3'
services:
  identity_server:
    image: mcr.microsoft.com/dotnet/core/samples:aspnetapp    
    environment: 
      - ASPNETCORE_URLS=https://+:443;http://+:80
      - ASPNETCORE_Kestrel__Certificates__Default__Password=password_here
      - ASPNETCORE_Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx
    volumes:
      - ~/.aspnet/https/:/https/:ro 
    container_name: identity_server
    ports:
      - "8443:443"
      - "8080:80"

Monitoring Mikrotik with InfluxDb

With unlimited fibre plans it is less important to keep track on who is the biggest content consumer in the office, but it still is interesting to find out.

Mikrotik

We’ve got a Mikrotik box serving as our gateway. It might be a total overkill, but it is rock solid and offers a ton of feaures with easy to follow configuration UI (they call it WinBox). One feature we’ll be particularly interested in is ip accounting. Despite intimidating wiki page, the set up is actually pretty simple:

tick a couple checkboxes
enable web access
and optionally supply ip of host that will be allowed to collect data from the router:

With data source out of the way it’s time to look the receiving side.

InfluxDB

The endgame goal was to feed the data into nice Grafana dashboards (we touched on it here). Luckily we found this dashboard that promised to be pretty much a turnkey solution. Unfortunately the exporter didn’t want to play nicely. The issue was with the exporter assuming that local network would be 192.168.0.0/16 (as well as hardcoded influxdb address). This appears to be correct, but we still couldn’t get it to parse our network of 192.168.1.0/24 (lack of Go knowledge might have been a contributing factor here too). What can we do then?

Build one!

Ladies and gentlemen, I give you mikrotik-acct! This collector is written with .net core 3.1 so should run on Docker with no real issues. It offers more knobs to play with through config file (which is hopefully all self-explanatory). One thing of interest here is actually a hepler method that tells us whether address is in our subnet or not: it appears, .net core doesn’t offer this functrionality out of the box, so we had to resort to the mighty Stack Overflow:

public static bool IsInSubnet(this IPAddress address, string subnetMask)
        {
            var slashIdx = subnetMask.IndexOf("/");
            if (slashIdx == -1)
            {
                // We only handle netmasks in format "IP/PrefixLength".
                throw new NotSupportedException("Only SubNetMasks with a given prefix length are supported.");
            }

            // First parse the address of the netmask before the prefix length.
            var maskAddress = IPAddress.Parse(subnetMask.Substring(0, slashIdx));

            if (maskAddress.AddressFamily != address.AddressFamily)
            {
                // We got something like an IPV4-Address for an IPv6-Mask. This is not valid.
                return false;
            }

            // Now find out how long the prefix is.
            int maskLength = int.Parse(subnetMask.Substring(slashIdx + 1));

            if (maskAddress.AddressFamily == AddressFamily.InterNetwork)
            {
                // Convert the mask address to an unsigned integer.
                var maskAddressBits = BitConverter.ToUInt32(maskAddress.GetAddressBytes().Reverse().ToArray(), 0);

                // And convert the IpAddress to an unsigned integer.
                var ipAddressBits = BitConverter.ToUInt32(address.GetAddressBytes().Reverse().ToArray(), 0);

                // Get the mask/network address as unsigned integer.
                uint mask = uint.MaxValue << (32 - maskLength);

                // https://stackoverflow.com/a/1499284/3085985
                // Bitwise AND mask and MaskAddress, this should be the same as mask and IpAddress
                // as the end of the mask is 0000 which leads to both addresses to end with 0000
                // and to start with the prefix.
                return (maskAddressBits & mask) == (ipAddressBits & mask);
            }

            if (maskAddress.AddressFamily == AddressFamily.InterNetworkV6)
            {
                // Convert the mask address to a BitArray.
                var maskAddressBits = new BitArray(maskAddress.GetAddressBytes());

                // And convert the IpAddress to a BitArray.
                var ipAddressBits = new BitArray(address.GetAddressBytes());

                if (maskAddressBits.Length != ipAddressBits.Length)
                {
                    throw new ArgumentException("Length of IP Address and Subnet Mask do not match.");
                }

                // Compare the prefix bits.
                for (int i = 0; i < maskLength; i++)
                {
                    if (ipAddressBits[i] != maskAddressBits[i])
                    {
                        return false;
                    }
                }

                return true;
            }

            throw new NotSupportedException("Only InterNetworkV6 or InterNetwork address families are supported.");
        }

Grafana

After having gone through all the trouble we’ve finally got ourselves nice dashboards:

Monitoring SQL Server: index maintenance

Now that we’ve got basic SQL queries going, we want to collect some actionable intel. A good measure of any SQL database performance is how well indexes are utilised.

This isn’t new

All queries we’re about to share are hardly a revelation. What we however want to achieve is to have an idea of how that information changes over time

Index utilisation

input {
	############################################################# Index Maintenance ############################################################################
	jdbc {
		id => "master_index_maintenance"
		jdbc_driver_library => "path\to\jdbc\lib\mssql-jdbc-7.2.1.jre8.jar"
		jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
		jdbc_connection_string => "jdbc:sqlserver://<your connection string>"
		jdbc_user => nil
		schedule => "*/15 * * * *"
		statement => "	DECLARE @db_id INT
						DECLARE @db_name NVARCHAR(120)
						DECLARE @index_data TABLE(
												[database_name] NVARCHAR(128) NOT NULL,
												table_name sysname NOT NULL, 
												index_name sysname NULL, 
												index_type TINYINT NOT NULL, 
												user_seeks bigint NOT NULL, 
												user_scans bigint NOT NULL, 
												user_lookups bigint NOT NULL, 
												user_updates bigint NOT NULL
											   );

						SET NOCOUNT ON
						DECLARE @dbs table ([db_name] sysname)
						DECLARE @db_query_sql nvarchar(4000)
						SET @db_query_sql='select ''?'' as [db_name] from [?].sys.tables t WHERE t.name = ''Users''';

						INSERT INTO @dbs ([db_name]) EXEC sp_msforeachdb @db_query_sql
						SET NOCOUNT OFF

						DECLARE db_id_cursor CURSOR FOR SELECT DB_ID([db_name]), [db_name] FROM @dbs FOR READ ONLY 
						OPEN db_id_cursor  
						FETCH NEXT FROM db_id_cursor INTO @db_id, @db_name;  
						WHILE @@FETCH_STATUS = 0  
						BEGIN
							DECLARE @sql NVARCHAR(MAX) = CAST(N'USE [' + @db_name + '];
							
							SELECT ''' + @db_name + ''', t.[name], ix.[name], ix.[type], us.user_seeks, us.user_scans, us.user_lookups, us.user_updates
							FROM sys.dm_db_index_usage_stats us
							INNER JOIN sys.indexes ix ON us.object_id = ix.object_id and ix.index_id = us.index_id
							INNER JOIN sys.tables t ON ix.object_id = t.object_id	
							WHERE us.database_id = ' + CAST(@db_id AS NVARCHAR(10)) AS NVARCHAR(MAX));

							INSERT INTO @index_data EXEC sys.sp_executesql @sql;
							FETCH NEXT FROM db_id_cursor INTO @db_id, @db_name;  
						END
						CLOSE db_id_cursor
						DEALLOCATE db_id_cursor
						SELECT * FROM @index_data"
		add_field => {
						"sql_instance" => "SQL2"
					}
	}
	################################################################################################################################################################	
}
output {
	elasticsearch {
		hosts => "elasticsearch:9200"
		index => "sql-index-stats-%{+YYYY.MM}"            
	}
}

Missing indexes

This one is a bit more interesing in a sense that we will get actual index hints. It is not a silver bullet though – this still needs to be analysed by humans for best results.

input {
	############################################################# Master sp_WhoIsActive ############################################################################
	jdbc {
		id => "master_missing_indexes"
		jdbc_driver_library => "path\to\jdbc\lib\mssql-jdbc-7.2.1.jre8.jar"
		jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
		jdbc_connection_string => "jdbc:sqlserver://<your connection string>"
		jdbc_user => nil
		schedule => "1 */2 * * *"
		statement => "	DECLARE @db_id INT
						DECLARE @db_name NVARCHAR(120)
						DECLARE @index_data TABLE(
												[database_name] NVARCHAR(128) NOT NULL,						
												table_name sysname NOT NULL, 
												unique_compiles bigint NOT NULL,
												user_seeks bigint NOT NULL, 
												user_scans bigint NOT NULL, 
												avg_total_user_cost float NULL, 
												avg_user_impact float NULL, 
												overall_impact float NOT NULL,
												sql_code NVARCHAR(MAX) NOT NULL
											   );

						SET NOCOUNT ON
						DECLARE @dbs table ([db_name] sysname)
						DECLARE @db_query_sql nvarchar(4000)
						SET @db_query_sql='select ''?'' as [db_name] from [?].sys.tables t WHERE t.name = ''Users''';

						INSERT INTO @dbs ([db_name]) EXEC sp_msforeachdb @db_query_sql
						SET NOCOUNT OFF

						DECLARE db_id_cursor CURSOR FOR SELECT DB_ID([db_name]), [db_name] FROM @dbs FOR READ ONLY 
						OPEN db_id_cursor  
						FETCH NEXT FROM db_id_cursor INTO @db_id, @db_name;  
						WHILE @@FETCH_STATUS = 0  
						BEGIN
							DECLARE @sql NVARCHAR(MAX) = CAST(N'USE [' + @db_name + '];
							
							WITH index_definitions (table_name, unique_compiles, user_seeks, users_scans, avg_total_user_cost, 
							avg_user_impact, [overall_impact], column_names, included_columns) 
							AS (
								SELECT TOP(600) object_name(c.object_id) AS table_name, 
								a.unique_compiles, a.user_seeks, a.user_scans, a.avg_total_user_cost, a.avg_user_impact, 
								a.avg_total_user_cost * a.avg_user_impact * (a.user_seeks + a.user_scans) AS [overall_impact],
								case when c.equality_columns is not null and c.inequality_columns is not null then c.equality_columns + '', '' + c.inequality_columns
								when c.equality_columns is not null and c.inequality_columns is null then c.equality_columns
								when c.inequality_columns is not null then c.inequality_columns
								END AS column_names,
								c.included_columns
								FROM sys.dm_db_missing_index_group_stats a
								inner join sys.dm_db_missing_index_groups b ON a.group_handle = b.index_group_handle
								inner join sys.dm_db_missing_index_details c on c.index_handle = b.index_handle
								where database_id = ' + CAST(@db_id AS NVARCHAR(10)) + '
								and equality_columns is not null
							)
							SELECT '''+ @db_name +''' AS [database], table_name, unique_compiles, user_seeks, users_scans, avg_total_user_cost, avg_user_impact, overall_impact, ''CREATE NONCLUSTERED INDEX IX_'' + REPLACE(REPLACE(REPLACE(index_definitions.column_names, ''], ['', ''_''), ''['', ''''), '']'', '''') 
									+ '' on '' + index_definitions.table_name + '' ('' + index_definitions.column_names + '') INCLUDE ('' + index_definitions.included_columns + '')''  AS [sql_code]
							FROM index_definitions WHERE index_definitions.included_columns IS NOT NULL
							UNION 
							SELECT '''+ @db_name +''' AS [database], table_name, unique_compiles, user_seeks, users_scans, avg_total_user_cost, avg_user_impact, overall_impact, ''CREATE NONCLUSTERED INDEX IX_'' + REPLACE(REPLACE(REPLACE(index_definitions.column_names, ''], ['', ''_''), ''['', ''''), '']'', '''') 
									+ '' on '' + index_definitions.table_name + '' ('' + index_definitions.column_names + '')'' AS [sql_code]
							FROM index_definitions WHERE index_definitions.included_columns IS NULL' AS NVARCHAR(MAX));

							INSERT INTO @index_data EXEC sys.sp_executesql @sql;
							FETCH NEXT FROM db_id_cursor INTO @db_id, @db_name;  
						END
						CLOSE db_id_cursor
						DEALLOCATE db_id_cursor
						SELECT * FROM @index_data;"
		add_field => {
						"sql_instance" => "SQL2"
					}
	}
	################################################################################################################################################################	
}
output {
	elasticsearch {
		hosts => "elasticsearch:9200"
		index => "sql-missing-indexes-%{+YYYY.MM}"            
	}
}

With data collection out of the way it’s time to move on to plotting graphs. Grafana is quite easy to pick up so instead of repeating official documentation we’d share a few dashboards we have in place

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 33,
  "iteration": 1584410507264,
  "links": [],
  "panels": [
    {
      "collapsed": false,
      "datasource": null,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 74,
      "panels": [],
      "title": "Index usage",
      "type": "row"
    },
    {
      "columns": [],
      "datasource": "Elasticsearch [index-stats]",
      "fontSize": "100%",
      "gridPos": {
        "h": 5,
        "w": 24,
        "x": 0,
        "y": 1
      },
      "id": 72,
      "interval": "1h",
      "links": [],
      "maxPerRow": null,
      "options": {},
      "pageSize": null,
      "repeat": "Database",
      "repeatDirection": "v",
      "scopedVars": {
        "Database": {
          "selected": true,
          "text": "All",
          "value": "All"
        }
      },
      "scroll": true,
      "showHeader": true,
      "sort": {
        "col": 4,
        "desc": true
      },
      "styles": [
        {
          "alias": "Table",
          "align": "auto",
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "pattern": "table_name.keyword",
          "type": "string"
        },
        {
          "alias": "Index",
          "align": "auto",
          "colorMode": "value",
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "decimals": 2,
          "pattern": "index_name.keyword",
          "thresholds": [
            "60",
            "80",
            "90"
          ],
          "type": "string",
          "unit": "percent"
        },
        {
          "alias": "Scans",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_scans",
          "sanitize": false,
          "thresholds": [],
          "type": "number",
          "unit": "none"
        },
        {
          "alias": "Seeks",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_seeks",
          "thresholds": [],
          "type": "number",
          "unit": "none"
        },
        {
          "alias": "Lookups",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_lookups",
          "thresholds": [],
          "type": "number",
          "unit": "none"
        },
        {
          "alias": "Updates",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_updates",
          "thresholds": [],
          "type": "number",
          "unit": "none"
        },
        {
          "alias": "Total index usages",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "bucket",
          "thresholds": [],
          "type": "number",
          "unit": "none"
        }
      ],
      "targets": [
        {
          "alias": "",
          "bucketAggs": [
            {
              "fake": true,
              "field": "table_name.keyword",
              "id": "4",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "9",
                "size": "0"
              },
              "type": "terms"
            },
            {
              "fake": true,
              "field": "index_name.keyword",
              "id": "3",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "9",
                "size": "0"
              },
              "type": "terms"
            }
          ],
          "hide": true,
          "metrics": [
            {
              "field": "user_scans",
              "id": "5",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_seeks",
              "id": "6",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_lookups",
              "id": "7",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_updates",
              "id": "8",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "select field",
              "id": "9",
              "meta": {},
              "pipelineVariables": [
                {
                  "name": "var1",
                  "pipelineAgg": "5"
                },
                {
                  "name": "var2",
                  "pipelineAgg": "6"
                },
                {
                  "name": "var3",
                  "pipelineAgg": "7"
                }
              ],
              "settings": {
                "script": "params.var1+params.var2+params.var3"
              },
              "type": "bucket_script"
            }
          ],
          "query": "database_name.keyword:$Database",
          "refId": "A",
          "timeField": "@timestamp"
        },
        {
          "bucketAggs": [
            {
              "fake": true,
              "field": "table_name.keyword",
              "id": "3",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "4",
                "size": "0"
              },
              "type": "terms"
            },
            {
              "fake": true,
              "field": "index_name.keyword",
              "id": "9",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "4",
                "size": "10"
              },
              "type": "terms"
            }
          ],
          "metrics": [
            {
              "field": "user_scans",
              "id": "4",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_seeks",
              "id": "5",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_lookups",
              "id": "6",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_updates",
              "id": "7",
              "meta": {},
              "settings": {},
              "type": "avg"
            }
          ],
          "query": "database_name.keyword:$Database",
          "refId": "B",
          "timeField": "@timestamp"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "Index usage - $Database",
      "transform": "table",
      "type": "table"
    },
    {
      "collapsed": false,
      "datasource": null,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 6
      },
      "id": 76,
      "panels": [],
      "title": "Underutilised indexes",
      "type": "row"
    },
    {
      "cacheTimeout": null,
      "columns": [],
      "datasource": "Elasticsearch [index-stats]",
      "fontSize": "100%",
      "gridPos": {
        "h": 7,
        "w": 24,
        "x": 0,
        "y": 7
      },
      "id": 58,
      "links": [],
      "options": {},
      "pageSize": null,
      "repeat": "Database",
      "repeatDirection": "h",
      "scopedVars": {
        "Database": {
          "selected": true,
          "text": "",
          "value": ""
        }
      },
      "showHeader": true,
      "sort": {
        "col": 3,
        "desc": true
      },
      "styles": [
        {
          "alias": "Table",
          "align": "auto",
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "pattern": "table_name.keyword",
          "type": "string"
        },
        {
          "alias": "Index",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "decimals": 2,
          "pattern": "index_name.keyword",
          "thresholds": [],
          "type": "string",
          "unit": "short"
        },
        {
          "alias": "Lookups",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_lookups",
          "thresholds": [],
          "type": "number",
          "unit": "short"
        },
        {
          "alias": "Scans",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_scans",
          "thresholds": [],
          "type": "number",
          "unit": "short"
        },
        {
          "alias": "Seeks",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_seeks",
          "thresholds": [],
          "type": "number",
          "unit": "short"
        },
        {
          "alias": "Total Usage",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 3,
          "mappingType": 1,
          "pattern": "Bucket Script",
          "thresholds": [],
          "type": "number",
          "unit": "short"
        },
        {
          "alias": "Updates",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_updates",
          "thresholds": [],
          "type": "number",
          "unit": "short"
        }
      ],
      "targets": [
        {
          "alias": "{{table_name.keyword}}.{{index_name.keyword}}",
          "bucketAggs": [
            {
              "fake": true,
              "field": "table_name.keyword",
              "id": "5",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "6",
                "size": "5"
              },
              "type": "terms"
            },
            {
              "fake": true,
              "field": "index_name.keyword",
              "id": "3",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "6",
                "size": "10"
              },
              "type": "terms"
            }
          ],
          "metrics": [
            {
              "field": "user_lookups",
              "hide": true,
              "id": "1",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_scans",
              "hide": true,
              "id": "6",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "user_seeks",
              "hide": true,
              "id": "7",
              "meta": {},
              "settings": {},
              "type": "avg"
            },
            {
              "field": "select field",
              "id": "8",
              "meta": {},
              "pipelineVariables": [
                {
                  "name": "var1",
                  "pipelineAgg": "1"
                },
                {
                  "name": "var2",
                  "pipelineAgg": "6"
                },
                {
                  "name": "var3",
                  "pipelineAgg": "7"
                },
                {
                  "name": "var4",
                  "pipelineAgg": "9"
                }
              ],
              "settings": {
                "script": "params.var4/(params.var1+params.var2+params.var3)"
              },
              "type": "bucket_script"
            },
            {
              "field": "user_updates",
              "hide": true,
              "id": "9",
              "meta": {},
              "settings": {},
              "type": "avg"
            }
          ],
          "query": "database_name.keyword:$Database AND user_lookups:[0 TO 100] AND user_scans:[0 TO 100] AND user_seeks:[0 TO 100]",
          "refId": "A",
          "timeField": "@timestamp"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "$Database - Underutilised indexes",
      "transform": "table",
      "type": "table"
    },
    {
      "collapsed": false,
      "datasource": null,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 14
      },
      "id": 91,
      "panels": [],
      "title": "Missing Indexes",
      "type": "row"
    },
    {
      "columns": [],
      "datasource": "Elasticsearch [missing-indexes]",
      "fontSize": "100%",
      "gridPos": {
        "h": 5,
        "w": 24,
        "x": 0,
        "y": 15
      },
      "id": 89,
      "interval": "30m",
      "links": [],
      "maxPerRow": 2,
      "options": {},
      "pageSize": null,
      "repeat": "Database",
      "repeatDirection": "v",
      "scopedVars": {
        "Database": {
          "selected": true,
          "text": "",
          "value": ""
        }
      },
      "scroll": true,
      "showHeader": true,
      "sort": {
        "col": 6,
        "desc": true
      },
      "styles": [
        {
          "alias": "Table",
          "align": "auto",
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "pattern": "table_name.keyword",
          "type": "string"
        },
        {
          "alias": "Index",
          "align": "auto",
          "colorMode": "value",
          "colors": [
            "rgba(50, 172, 45, 0.97)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(245, 54, 54, 0.9)"
          ],
          "decimals": 2,
          "pattern": "sql_code.keyword",
          "thresholds": [
            ""
          ],
          "type": "string",
          "unit": "percent"
        },
        {
          "alias": "Impact",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 2,
          "mappingType": 1,
          "pattern": "Average",
          "sanitize": false,
          "thresholds": [],
          "type": "number",
          "unit": "none"
        },
        {
          "alias": "Seeks",
          "align": "auto",
          "colorMode": null,
          "colors": [
            "rgba(245, 54, 54, 0.9)",
            "rgba(237, 129, 40, 0.89)",
            "rgba(50, 172, 45, 0.97)"
          ],
          "dateFormat": "YYYY-MM-DD HH:mm:ss",
          "decimals": 0,
          "mappingType": 1,
          "pattern": "Average user_seeks",
          "thresholds": [],
          "type": "number",
          "unit": "none"
        }
      ],
      "targets": [
        {
          "alias": "",
          "bucketAggs": [
            {
              "fake": true,
              "field": "table_name.keyword",
              "id": "11",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "10",
                "size": "0"
              },
              "type": "terms"
            },
            {
              "fake": true,
              "field": "sql_code.keyword",
              "id": "4",
              "settings": {
                "min_doc_count": 1,
                "order": "desc",
                "orderBy": "10",
                "size": "0"
              },
              "type": "terms"
            }
          ],
          "metrics": [
            {
              "field": "overall_impact",
              "id": "10",
              "meta": {},
              "settings": {},
              "type": "avg"
            }
          ],
          "query": "database_name.keyword:$Database",
          "refId": "A",
          "timeField": "@timestamp"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "Missing indexes - $Database",
      "transform": "table",
      "type": "table"
    }
  ],
  "refresh": false,
  "schemaVersion": 22,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "allValue": null,
        "current": {
          "text": "",
          "value": [
            ""
          ]
        },
        "datasource": "Elasticsearch [index-stats]",
        "definition": "{\"find\": \"terms\",\"field\": \"database_name.keyword\"}",
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": true,
        "name": "Database",
        "options": [],
        "query": "{\"find\": \"terms\",\"field\": \"database_name.keyword\"}",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "Index Maintenance",
  "uid": "OQVK9BSWk",
  "version": 27
}

Monitoring SQL Server: fiddling with sp_whoisactive output

Building up on from basic telegraf inputs we now want some actual SQL data. And this time around it’s going to be not just numeric telemetry, we’d gain some insights into actual queries that get run.

Our choice of Elastic

Up to this point our choice of data back-end has been unjustified. Telegraf is intended to work with time series DBs like Influx. These are heavily optimised based on assumption what kind of data they store and query (i.e. timed series of events). Elastic is however a full text search. So it’s about the time we started to use it for what it was designed for, right?

Finding active sessions with queries and wait reasons

Taking a step back, we must acknowledge work of Adam Machanic and call out sp_whoisactive as probably the most useful tool every DBA should be aware of. This script provides great overview of what the system is doing right now, how long the session have been running for and what they are waiting on. It even allows for dumping data into predefined tables out of the box! Again, if we were looking to utilise SEL server for monitoring we could have easily gone this way and with a bit of SQL Agent scheduling we’d be done in no time.

There’s one slight issue with sp_whoisactive – it’s not meant to work with anything else but SQL table as data store. For our case it is not ideal . It also does not really know anything about SQL Server Query Store which we think is a nice to have.

It’s a known fact we can’t join results of a stored proc with other tables unless we store results in an interim temp table first. So we’d use sp_whoisactive‘s own temp table scripting facility to store results while we loop through query stores on all our databases to grab some extra bits of data we think might be useful:

DECLARE @table_schema NVARCHAR(MAX);
DECLARE @temp_table NVARCHAR(MAX);
DECLARE @result_table NVARCHAR(MAX);
SET NOCOUNT ON;
/*EXEC sp_WhoIsActive @get_outer_command = 1,
					@find_block_leaders = 1,
					@return_schema = 1,
					@format_output = 0,
					@schema = @table_schema OUTPUT;*/
SET @table_schema = N'CREATE TABLE <table_name> ( [session_id] smallint NOT NULL,[sql_text] nvarchar(max) NULL,[sql_command] nvarchar(max) NULL,[login_name] nvarchar(128) NOT NULL,[wait_info] nvarchar(4000) NULL,[CPU] int NULL,[tempdb_allocations] bigint NULL,[tempdb_current] bigint NULL,[blocking_session_id] smallint NULL,[blocked_session_count] smallint NULL,[reads] bigint NULL,[writes] bigint NULL,[physical_reads] bigint NULL,[used_memory] bigint NOT NULL,[status] varchar(30) NOT NULL,[open_tran_count] smallint NULL,[percent_complete] real NULL,[host_name] nvarchar(128) NULL,[database_name] nvarchar(128) NULL,[program_name] nvarchar(128) NULL,[start_time] datetime NOT NULL,[login_time] datetime NULL,[request_id] int NULL,[collection_time] datetime NOT NULL)';
SET @temp_table = REPLACE(@table_schema, '<table_name>', '#whoisactive_monitoring');
SET @result_table = REPLACE(@table_schema, '<table_name>', '#result');
SET @result_table = REPLACE(@result_table, 'NOT NULL)', 'NOT NULL,[duration] bigint NOT NULL DEFAULT 0)'); -- add placeholder for duration column 
SET @result_table = REPLACE(@result_table, 'DEFAULT 0)', 'DEFAULT 0,[query_text_id] bigint NULL)'); -- add placeholder for query_text_id from Query Store
---
DECLARE @db_name NVARCHAR(120);
DECLARE @enhance_sql NVARCHAR(MAX);

SET NOCOUNT ON
-- here we just build a list of databases on the server using known table Users as an anchor. Your databases will have other well known tables
DECLARE @dbs table ([db_name] sysname)
DECLARE @db_query_sql nvarchar(4000)
SET @db_query_sql='select ''?'' as [db_name] from [?].sys.tables t WHERE t.name = ''Users'''; 
INSERT INTO @dbs ([db_name]) EXEC sp_msforeachdb @db_query_sql
SET NOCOUNT OFF

DECLARE db_name_cursor CURSOR FOR
SELECT [db_name]
FROM @dbs
FOR READ ONLY;
OPEN db_name_cursor;
FETCH NEXT FROM db_name_cursor
INTO @db_name;
WHILE @@FETCH_STATUS = 0
BEGIN
	SET @enhance_sql
		= CONCAT(
					@enhance_sql,
					CHAR(13),
					CHAR(10),
					CAST(N'USE [' + @db_name + ']
						IF EXISTS(SELECT 1 FROM sys.database_query_store_options WHERE actual_state_desc <> ''OFF'')
						UPDATE #result 
						SET query_text_id = qsqt.query_text_id 
						FROM #result wm 
						LEFT JOIN sys.query_store_query_text qsqt ON wm.sql_text COLLATE DATABASE_DEFAULT = qsqt.query_sql_text COLLATE DATABASE_DEFAULT 
						WHERE wm.database_name = ''' + @db_name + ''';' AS NVARCHAR(MAX))
				);
	FETCH NEXT FROM db_name_cursor
	INTO @db_name;
END;
CLOSE db_name_cursor;
DEALLOCATE db_name_cursor;
---
DECLARE @main_script NVARCHAR(MAX);
SET @main_script
	= CAST(CONCAT(
						@temp_table, N';',
						CHAR(13),
						CHAR(10),
						'USE [master]; EXEC [master].[dbo].sp_WhoIsActive @get_outer_command=1, @find_block_leaders=1,@format_output=0, @destination_table=''#whoisactive_monitoring'';',
						CHAR(13),
						CHAR(10),
						@result_table, ';',
						CHAR(13),
						CHAR(10),
						N'INSERT INTO #result SELECT	wm.*, 
													CASE
														WHEN DATEDIFF(hour, wm.start_time, wm.collection_time) > 576 THEN
															DATEDIFF(second, wm.collection_time, wm.start_time)
														ELSE DATEDIFF(ms, wm.start_time, wm.collection_time)
													END AS duration, 
													NULL 
											FROM #whoisactive_monitoring wm;',
						CHAR(13),
						CHAR(10),
						@enhance_sql,
						CHAR(13),
						CHAR(10),
						N'DROP TABLE #whoisactive_monitoring;',
						CHAR(13),
						CHAR(10),
						N'SELECT * FROM #result;DROP TABLE #result;'
					) AS NVARCHAR(MAX));
--PRINT @main_script
EXEC (@main_script);

Now that we’ve got the data

We want Logstash to push the output of this query into Elastic. This way we will always have some idea what’s been going on the DB server. One way to run queries against SQL is JDBC driver – download .jre8.jar, unpack and remember the location.

Defining pipeline

After we’ve got the driver, everything else is easy: the pipeline is literally one input and one output:

input {
	############################################################# Master sp_WhoIsActive ############################################################################
	jdbc {
		id => "master_spWhoIsActive"
		jdbc_driver_library => "path\to\jdbc\lib\mssql-jdbc-7.2.1.jre8.jar"
		jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
		jdbc_connection_string => "jdbc:sqlserver://<your connection string>"
		jdbc_user => nil
		schedule => "* * * * *" ## runs each minute
		statement => "-- literally paste sql code from above inside quotes here"
		add_field => {
						"database" => "master" ## optionally add more fields to distinguish where data comes from
						"instance" => "MY-SQL-Server"
					}
	}
	################################################################################################################################################################	
}
output {
	elasticsearch {
		hosts => "elasticsearch:9200"
		index => "whoisactive-%{+YYYY.MM}"            
	}
}

Restart Logstash and watch Kibana updating with new events!