Monitoring SQL Server: fiddling with sp_whoisactive output

Building up on from basic telegraf inputs we now want some actual SQL data. And this time around it’s going to be not just numeric telemetry, we’d gain some insights into actual queries that get run.

Our choice of Elastic

Up to this point our choice of data back-end has been unjustified. Telegraf is intended to work with time series DBs like Influx. These are heavily optimised based on assumption what kind of data they store and query (i.e. timed series of events). Elastic is however a full text search. So it’s about the time we started to use it for what it was designed for, right?

Finding active sessions with queries and wait reasons

Taking a step back, we must acknowledge work of Adam Machanic and call out sp_whoisactive as probably the most useful tool every DBA should be aware of. This script provides great overview of what the system is doing right now, how long the session have been running for and what they are waiting on. It even allows for dumping data into predefined tables out of the box! Again, if we were looking to utilise SEL server for monitoring we could have easily gone this way and with a bit of SQL Agent scheduling we’d be done in no time.

sp_whoisactive output showing active SQL Server sessions with query text and wait info

There’s one slight issue with sp_whoisactive – it’s not meant to work with anything else but SQL table as data store. For our case it is not ideal . It also does not really know anything about SQL Server Query Store which we think is a nice to have.

It’s a known fact we can’t join results of a stored proc with other tables unless we store results in an interim temp table first. So we’d use sp_whoisactive‘s own temp table scripting facility to store results while we loop through query stores on all our databases to grab some extra bits of data we think might be useful:

DECLARE @table_schema NVARCHAR(MAX);
DECLARE @temp_table NVARCHAR(MAX);
DECLARE @result_table NVARCHAR(MAX);
SET NOCOUNT ON;
/*EXEC sp_WhoIsActive @get_outer_command = 1,
					@find_block_leaders = 1,
					@return_schema = 1,
					@format_output = 0,
					@schema = @table_schema OUTPUT;*/
SET @table_schema = N'CREATE TABLE <table_name> ( [session_id] smallint NOT NULL,[sql_text] nvarchar(max) NULL,[sql_command] nvarchar(max) NULL,[login_name] nvarchar(128) NOT NULL,[wait_info] nvarchar(4000) NULL,[CPU] int NULL,[tempdb_allocations] bigint NULL,[tempdb_current] bigint NULL,[blocking_session_id] smallint NULL,[blocked_session_count] smallint NULL,[reads] bigint NULL,[writes] bigint NULL,[physical_reads] bigint NULL,[used_memory] bigint NOT NULL,[status] varchar(30) NOT NULL,[open_tran_count] smallint NULL,[percent_complete] real NULL,[host_name] nvarchar(128) NULL,[database_name] nvarchar(128) NULL,[program_name] nvarchar(128) NULL,[start_time] datetime NOT NULL,[login_time] datetime NULL,[request_id] int NULL,[collection_time] datetime NOT NULL)';
SET @temp_table = REPLACE(@table_schema, '<table_name>', '#whoisactive_monitoring');
SET @result_table = REPLACE(@table_schema, '<table_name>', '#result');
SET @result_table = REPLACE(@result_table, 'NOT NULL)', 'NOT NULL,[duration] bigint NOT NULL DEFAULT 0)'); -- add placeholder for duration column 
SET @result_table = REPLACE(@result_table, 'DEFAULT 0)', 'DEFAULT 0,[query_text_id] bigint NULL)'); -- add placeholder for query_text_id from Query Store
---
DECLARE @db_name NVARCHAR(120);
DECLARE @enhance_sql NVARCHAR(MAX);

SET NOCOUNT ON
-- here we just build a list of databases on the server using known table Users as an anchor. Your databases will have other well known tables
DECLARE @dbs table ([db_name] sysname)
DECLARE @db_query_sql nvarchar(4000)
SET @db_query_sql='select ''?'' as [db_name] from [?].sys.tables t WHERE t.name = ''Users'''; 
INSERT INTO @dbs ([db_name]) EXEC sp_msforeachdb @db_query_sql
SET NOCOUNT OFF

DECLARE db_name_cursor CURSOR FOR
SELECT [db_name]
FROM @dbs
FOR READ ONLY;
OPEN db_name_cursor;
FETCH NEXT FROM db_name_cursor
INTO @db_name;
WHILE @@FETCH_STATUS = 0
BEGIN
	SET @enhance_sql
		= CONCAT(
					@enhance_sql,
					CHAR(13),
					CHAR(10),
					CAST(N'USE [' + @db_name + ']
						IF EXISTS(SELECT 1 FROM sys.database_query_store_options WHERE actual_state_desc <> ''OFF'')
						UPDATE #result 
						SET query_text_id = qsqt.query_text_id 
						FROM #result wm 
						LEFT JOIN sys.query_store_query_text qsqt ON wm.sql_text COLLATE DATABASE_DEFAULT = qsqt.query_sql_text COLLATE DATABASE_DEFAULT 
						WHERE wm.database_name = ''' + @db_name + ''';' AS NVARCHAR(MAX))
				);
	FETCH NEXT FROM db_name_cursor
	INTO @db_name;
END;
CLOSE db_name_cursor;
DEALLOCATE db_name_cursor;
---
DECLARE @main_script NVARCHAR(MAX);
SET @main_script
	= CAST(CONCAT(
						@temp_table, N';',
						CHAR(13),
						CHAR(10),
						'USE [master]; EXEC [master].[dbo].sp_WhoIsActive @get_outer_command=1, @find_block_leaders=1,@format_output=0, @destination_table=''#whoisactive_monitoring'';',
						CHAR(13),
						CHAR(10),
						@result_table, ';',
						CHAR(13),
						CHAR(10),
						N'INSERT INTO #result SELECT	wm.*, 
													CASE
														WHEN DATEDIFF(hour, wm.start_time, wm.collection_time) > 576 THEN
															DATEDIFF(second, wm.collection_time, wm.start_time)
														ELSE DATEDIFF(ms, wm.start_time, wm.collection_time)
													END AS duration, 
													NULL 
											FROM #whoisactive_monitoring wm;',
						CHAR(13),
						CHAR(10),
						@enhance_sql,
						CHAR(13),
						CHAR(10),
						N'DROP TABLE #whoisactive_monitoring;',
						CHAR(13),
						CHAR(10),
						N'SELECT * FROM #result;DROP TABLE #result;'
					) AS NVARCHAR(MAX));
--PRINT @main_script
EXEC (@main_script);

Now that we’ve got the data

We want Logstash to push the output of this query into Elastic. This way we will always have some idea what’s been going on the DB server. One way to run queries against SQL is JDBC driver – download .jre8.jar, unpack and remember the location.

Defining pipeline

After we’ve got the driver, everything else is easy: the pipeline is literally one input and one output:

input {
	############################################################# Master sp_WhoIsActive ############################################################################
	jdbc {
		id => "master_spWhoIsActive"
		jdbc_driver_library => "path\to\jdbc\lib\mssql-jdbc-7.2.1.jre8.jar"
		jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
		jdbc_connection_string => "jdbc:sqlserver://<your connection string>"
		jdbc_user => nil
		schedule => "* * * * *" ## runs each minute
		statement => "-- literally paste sql code from above inside quotes here"
		add_field => {
						"database" => "master" ## optionally add more fields to distinguish where data comes from
						"instance" => "MY-SQL-Server"
					}
	}
	################################################################################################################################################################	
}
output {
	elasticsearch {
		hosts => "elasticsearch:9200"
		index => "whoisactive-%{+YYYY.MM}"            
	}
}

Restart Logstash and watch Kibana updating with new events!

Monitoring SQL Server: setting up Logstash with telegraf

Having set up basic ELK stack we probably should get on to ingesting some data.

Wait, what about schema?

Theoretically, we could play it very strict and define our expected documetns similar to the way we define SQL tables. But it’s not fun and besides, Elastic comes with sensible defaults. It has done a decent enough job for us to have never needed to correct it. Not bad at all.

Setting up the pipeline

Logstash operates in terms of pipelines. If I were to reduce it to the bare basics, each pipeline allows us to define where data comes from (input), how Logstash should pick events to process it (filter) and finally, where to send it to (output). The last bit is kinda obvious and easy – we want to send processed events to Elastic. And by virtue of running doker we happen to know that it’s hosted at http://elasticsearch:9200. Too easy.

Input

Being quite modular, Logstash operates in terms of plugins. And there are HEAPS to chose from! Apart from obvious bits you get to see in the documentation, one thing that might not be quite obvious – you can have multiple plugins running for one input stream!

Our telemetry agent of choice, telegraf supports sending metrics via Influx Line Protocol, but Logstash does not have a plugin for that – so we will need to craft something bit more complex. Suppose we want basic metrics and sql-stats

input {
	tcp {
		port => 8098
		type => "telegraf"
	} # opens up a TCP listener for events sent by telegraf (which happens to support Influx Line Protocol)
	tcp {
		port => 8099
		type => "sql-stats"
	} # opens up a TCP listener for events coded in Influx Line protocol
}

Filter

This bit is optional, but sometimes is very helpful. And again, we’ve got quite a variety of plugins to suit. Here we’ve got a chance to inspect incoming events and either transform or outright reject some. For example here we parse Influx Line protocol sent by telegraf and do some basic enhancements on data coming from SQL-stats poller:

filter {
	if [type] == "telegraf" {		
		dissect {
			mapping => {
				"message" => "%{measurement}.%{metric};%{tags} %{value} %{ts}"
			}
		}
		kv {
			source => "tags"
			field_split => ";"
		}	
		date {
			match => ["ts", "UNIX"]
		}
		mutate {
			convert => {
				"value" => "float"
			}
			remove_field => ["tags", "message", "ts", "port"]
		}
	}
	if [type] == "sql-stats" {		
		grok {
			match => {
				"message" => "%{WORD:measurement}(\.%{WORD:metric})?;%{GREEDYDATA:tags} (%{BASE10NUM:value}) %{NUMBER:ts}"
			}
		}
		kv {
			source => "tags"
			field_split => ";"
		}	
		date {
			match => ["ts", "UNIX"]
		}
		mutate {
			convert => {
				"value" => "float"
			}
			rename => {
				"counter" => "metric"
				"clerk_type" => "metric"
			}
			remove_field => ["tags", "message", "ts", "port", "host"]
		}
	}
}

All that’s left

To store the events, we invoke output like so:

output {
	if [type] == "telegraf" {		
		elasticsearch {
			hosts => "elasticsearch:9200"
			index => "telegraf-%{+YYYY.MM}"
		}
	}
	if [type] == "sql-stats" {		
		elasticsearch {
			hosts => "elasticsearch:9200"
			index => "sql-stats-%{+YYYY.MM}"
		}
	}
}

Monitoring SQL Server: setting up ELK+G

In 9 cases out of 10 our clients have some sort of database that they want to interface with. And 8 out of those 9 cases the database is going to be SQL Server. Yes, this is us being biased, but you know what?

It does not matter

The important bit is – out clients like to know how the database is doing. Some are happy to pay for commercial APMs, others either have very specific needs or love the challenge to DIY.

We are here to help

One way to get better picture of what’s happening with the DB would be to keep grabbing vitals over time and plotting them on a graph of some sort. Grafana is a fantastic way to achieve that. It supports a whole bunch of backends (including SQL server) and allows insane amount of customisations.

Diversify

It is possible to store SQL telemetry in another SQL database on the same server (you could even set up SQL Agent jobs to do the polling – all nicely packaged). We however thought it might be a good idea to not store all data on the same machine. We’d like to not overstrain the main database in time of pinch and completely decouple analytics from critical business processes.

ELK G stack

One of many ways to approach this is to introduce a (somewhat) free and open source ElasticSearch into the mix. And mightly Logstash for data ingestion. This is where we’d normally go on to Kibana for dashboards and nice UI (and we did end up running it), but the main focus of this exercise will still fall onto Grafana.

Setting it up

Theres no point repeating official documentation for respective products, let’s instead write up a docker-compose file:

version: '3'
services:
    elasticsearch:
        image: docker.elastic.co/elasticsearch/elasticsearch:7.6.1
        environment:
            - node.name=elastic01
            - discovery.type=single-node  
            - bootstrap.memory_lock=true
            - "ES_JAVA_OPTS=-Xms512m -Xmx512m"        
        volumes:
            - ./elastic:/usr/share/elasticsearch/data
    logstash:
        image: docker.elastic.co/logstash/logstash:7.6.1
        volumes: 
            - ./logstash-pipeline:/usr/share/logstash/pipeline/
            - ./logstash-config/usr/share/logstash/config/
        depends_on:
          - elasticsearch
    kibana:
        image: docker.elastic.co/kibana/kibana:7.6.1
        environment:
          - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
        ports:
          - 5601:5601
        depends_on:
          - elasticsearch
    grafana:
        image: grafana/grafana
        ports:
          - 3000:3000
        depends_on:
          - elasticsearch

All that’s left to do is docker-compose up -d and run. Stay tuned for next posts in the series.