Some Aspects of MS SQL Server Monitoring. Part 2.

Written by jordan-sanders | Published 2019/06/27
Tech Story Tags: ms-sql-server-monitoring | sql-server | database | programming | mssql

TLDRvia the TL;DR App

In this part of the article, we’ll look at the main system counters used to monitor MS SQL Server’s state, review the process of tempdb system database monitoring and learn more about the SQL Server performance monitor feature in dbForge Studio for SQL Server.

The Main System Counters Used for MS SQL Server Monitoring

The following system counters are commonly used to monitor the state of MS SQL Server:

1. Memory:

Pages/sec — Shows the rate at which pages are read from or written to disk to resolve hard page faults. This value represents the sum of Pages Input/sec and Pages Output/sec values, while also including pages retrieved to satisfy faults in the file system cache (usually requested by applications) and noncached mapped memory files. This is the main counter you should keep track of in times of high memory usage load and excessive page swapping. This counter represents the intensity of page swapping, so its non-peak value should be close to zero. If the swapping value is growing, then there is a need to increase the amount of RAM or to limit the number of applications the server is running.

Page Faults/sec — This value represents the number of page faults. A page fault occurs when a program requests an address on a page that is not in the current set of memory resident pages. This counter includes page faults that require disk requests, as well as faults caused by pages being not currently present in the set of memory resident pages. Most CPUs can process the latter fault type with no considerable latency. However, page faults of the former type can lead to significant latency when being processed.

Available Bytes — Tracks the memory (in bytes) available for the execution of various processes. Low values are a sign of a memory deficit. The solution here would be to increase the amount of memory. In most cases, this counter should be higher than 5000 Kb.It may prove beneficial to set the Available Mbytes threshold manually based on the following considerations:

  • 50% available memory = Excellent
  • 25% = Requires attention
  • 10% = Possible issues
  • Less than 5% available memory = Critical state, needs intervention.

2. Processor (Total): % Processor Time

This counter shows the percentage of time spent by the processor on performing various operations for non-idle threads. This value can be seen as a representation of the time spent on productive work. Each processor can be assigned to an idle thread which uses unproductive processor cycles which other threads don’t use. It’s quite natural for this counter to have short-lived actvity peaks often reaching 100%. However, if there are long stretches of time during which the processor load is greater than 80%, your system will benefit from using more CPU cores.

3. MS SQL Server: Access Methods

The Access Methods object in SQL Server provides counters that help with tracking the access to logic data in a database. Physical access to database pages on the disk can be tracked with buffer manager counters. Tracking the data access methods in a particular database will allow you to decide which way of increasing the query performance will be the best: adding or changing indices, sections, files or groups of files, index defragmentation or changing the query text. Additionally, you can use the object’s Access Methods counters to track data size, indices and available space in the database by controlling the size and fragmentation of each server instance. Excessive index fragmentation can lead to significant reductions in performance.

Page Splits/sec — The number of page splits per second that are performed as a result of index page overflow. If this value is high, it could mean that SQL Server has to execute a lot of resource-heavy operations to split pages and move a part of an existing page to a new location when inserting and updating data. Such operations should be avoided if this is possible. We can try to resolve this using one of the following methods:

  • create a cluster index for auto-increment columns. In this case, new records are going to be sequentially placed on new pages rather than on pages that are already occupied with data;
  • rebuild indices by increasing the Fillfactor parameter’s value. This parameter allows you to reserve space in index pages, which can then be used to place new data with no actual need for page split operations.

Full Scans/sec — The amount of unrestricted full scan operations per second. Such operations include scans of the main table and full index scans. If the value is steadily rising, it may be a sign of the system’s degradation (lack of indices, heavy index fragmentation, the existence of unused indices). But please keep in mind that full scans are not always unwanted when used in small tables — if a table can be fully fit into the system’s RAM, it’s would be quicker to perform a full scan. But in most cases, the stable growth of this counter’s value is going to signify the system’s degradation. All this information is ony applicable to OLTP systems. In OLAP systems, on the other hand, full scans are considered completely normal.

4. MS SQL Server: Buffer Manager

The Buffer Manager object allows you to monitor how SQL Server uses different resources by providing the following functionality:

  • tracking the memory for data page storage;
  • counters for tracking the physical input-output when SQL Server reads and writes database pages;
  • extending the buffer pool and the buffer cache by using fast non-volatile memory, i.e., SSD;
  • tracking SQL Server’s memory and counter usage. This helps with retrieving the following information:

a) existence of bottlenecks caused by physical memory deficits. If frequently-used data cannot be cached, SQL Server needs to read it directly from the disk;

b) the possibility of improving query execution performance by increasing the total memory volume or by dedicating an additional amount of memory to data caching or storing SQL Server’s internal structures;

c) how often SQL Server reads data directly from the disk. Compared to other operations (memory access, for example), physical input-output is considerably more time-consuming.

Decreasing the input-output volume can lead to improved query execution performance.

Buffer Cache hit radio — Shows how completely SQL Server can fit data into the cache buffer. The higher this value is, the better — data pages should be placed into the cache buffer and physical input-output operations should be absent for SQL Server to access these pages effectively. If this counter’s value is constantly reducing, you should consider adding more RAM to your system. This counter should always be greater than 90% for OLTP systems and greater than 50% for OLAP systems.

Page life expectancy — Shows for how long the page will be contained in the memory scope in its current state. If this value is constantly dropping, then the system overuses the buffer pool. This way, working memory can cause issues leading to performance reduction. It is important to note that there is no universal value of Page Life Expectancy which will signify the system’s buffer pool overuse (the 300s value is no longer relevant since MS SQL Server 2012).

5. MS SQL Server: General Statistics

The General Statistics object provides counters for tracking the server’s overall activity. For example, you can look at the number of concurrent connections and users per second on the computer that runs the SQL Server instance. These values can be used in large OLTP systems where large amounts of clients connect to and disconnect from the SQL Server instance.

Process blocked — The number of currently blocked processes.

User Connections — The number of users currently connected to the SQL Server instance.

6. MS SQL Server: Locks

The Locks object provides information about locks for different resource types in SQL Server. Locks are issued against resources such as rows read or updated by a transaction to avoid resources being used by multiple transactions at once. For example, when a transaction issues an exclusive (X) lock against a row in a table, no other transaction can update this row — until the lock is released. Reducing the usage of locks can positively affect parallelism, which, in turn, increases overall performance. A few instances of the Locks object can be tracked, each of them representing a lock issued against a specific type of resource.

Average Wait Time (ms) — Average wait time of all lock requests that required waiting, in ms. This counter shows how much, on average, a user process needs to wait before it can issue a lock against a resource. There is no way to define a universal maximum value for this counter, as it fully depends on what your current task is. High values of this counter can point at issues with blocks in your database.

Lock Wait Time (ms) — Total amount of lock wait time in the span of the last second, displayed in ms.

Lock Waits/sec — Total amount of times a thread had to wait due to a lock request in the span of the last second.

Lock Timeouts/sec — Amount of times a lock could not be achieved through cyclic requests. The value of the Spin Counter configuration parameter sets the maximum amount of times (spins) before a timeout happens and the thread goes into an idle state.

Lock Requests/sec — Total amount of lock requests of the specified type per second.

Lock Number of Deadlocks/sec — Amount of mutual blocks (deadlocks) per second. The presence of deadlocks signals of improperly constructed queries which lead to mutually requested resources being locked.

7. MS SQL Server: Memory Manager

The Memory Manager object provides counters for tracking memory usage on the entire server. Such type of monitoring and assessing user actions and resource usage can help identify performance issues. SQL Server memory control allows you to check:

  • whether there are physical memory deficits for storing frequently-used data in the cache. If there is not enough memory, SQL Server will need to retrieve the data directly from the disk;
  • if the query’s performance can be improved by adding memory or increasing the size of available memory to cache data or internal SQL Server structures.

Memory Grants Outstanding — Counts the total number of processes that have successfully received dedicated working area memory. If this value is steadily falling, the amount of RAM should be increased.

Memory Grants Pending — Shows the total number of processes that are currently waiting to receive a dedicated area of working memory. If this value is steadily growing, the amount of RAM should be increased.

8. MS SQL Server: Statistics

The Statistics object assures the work of counters tracking the compilation and types of requests sent to the SQL Server instance. By monitoring the number of compilations and repeated compilations of queries and the amount of packages received by SQL Server, you can get a general picture of how quickly user requests are being executed by SQL Server and how effectively they are being processed by the query optimizer.

Batch Requests/sec — The number of Transact-SQL command packages received per second. This value is affected by any limitations such as input-output, the number of users, cache size, query complexity, etc. A high amount of package requests signifies high throughput capabilities.

9. Logical Disk:

Avg Disk sec/Read — Shows the average time spent on reading files from the disk, in seconds. This counter’s average value should not exceed 2 ms, and its maximum value should not be greater than 10 ms. In some cases, the peak values of 25–50 ms can be accepted. But the specifics depend on your system’s performance requirements.

Avg Disk sec/Write — Shows the average time spent on writing data to the disk, in seconds. This counter’s average value should not exceed 5 ms, and its maximum value should not be higher than 25 ms.

In addition to these main performance counters, supplemental ones can also be set up based on your specific environment, if a need for this arises in the process.

Monitoring the Performance of the tempdb System Database

With time, the amount of files dedicated to the tempdb database can become insufficient. To check this, we can use the following query:

<a href="https://medium.com/media/38f87cb001f8e44efbdf671f4554cbda/href">https://medium.com/media/38f87cb001f8e44efbdf671f4554cbda/href</a>

With the help of this query, we are trying to find a latch for the PFS, GAM, and SGAM pages in the tempdb database.

If this query does not return anything or only returns ‘Is Not PFS, GAM, or SGAM page’ rows, it most probably means that there is no need in increasing tempdb files in the current circumstances.

It’s also important to assess wait time for the tempdb system database files, which can be done by using the following query:

SELECT files.physical_name, files.name,

stats.num_of_writes, (1.0 * stats.io_stall_write_ms / stats.num_of_writes) AS avg_write_stall_ms,

stats.num_of_reads, (1.0 * stats.io_stall_read_ms / stats.num_of_reads) AS avg_read_stall_ms

FROM sys.dm_io_virtual_file_stats(2, NULL) as stats

INNER JOIN master.sys.master_files AS files

ON stats.database_id = files.database_id

AND stats.file_id = files.file_id

WHERE files.type_desc = 'ROWS'

The acceptable values depend on your specific system and its performance requirements. But, usually, the following values can serve as a good reference point:

  1. avg_read_stall_ms no greater than 1 ms
  2. avg_write_stall_ms no greater than 5 ms

Getting the General Information about the MS SQL Server Instance

It’s important to have a full picture of your particular MS SQL Server instance by getting the most important basic information. This can be achieved by using the following query:

<a href="https://medium.com/media/8d243353e68b4ac88dd34c8354d4cfc9/href">https://medium.com/media/8d243353e68b4ac88dd34c8354d4cfc9/href</a>

With this, we acquire the following information:

1) Server\Instance — the name of the server our instance of MS SQL Server is installed on

2) SQLServerVersion compound MS SQL Server information (version, CPU architecture, build date, edition, OS version)

3) ServiceInstance — the name of the MS SQL Server instance (for an unnamed instance, ‘MSSQLSERVER’ is returned)

4) CurrentDB_Name the name of the database in which all relevant information is stored

5) CurrentLogin — name if the current login specified by the system

6) CurrentUser current user’s name

After this, some more information about the MS SQL Server instance is shown. You can learn more about these properties by following this link:

https://docs.microsoft.com/sql/t-sql/functions/serverproperty-transact-sql?view=sql-server-2017

At the end of the query, these fields are displayed:

1) ServerStarted — date and time of MS SQL Server instance’s last launch

2) DaysRunning — total number of days the MS SQL Server instance has been working since the last launch

3) SecondsRunning — total number of seconds the MS SQL Server instance has been working since the last launch

These three fields are retrieved from the properties of tempdb system database. This is done due to the fact that the tempdb system database is created on each reload of the MS SQL Server instance. This is why it’s possible to retrieve the date and time of the last MS SQL Server instance launch and its total working time since the last launch from the tempdb database.

SQL Server Monitoring Statistics in dbForge Studio For SQL Server

dbForge SQL Monitor shows the main SQL Server performance statistics with a dedicated Overview tab mentioned in the first part of this article. In the Overview tab, you can view key sql activity stats like CPU utilization, memory utilization, disc activity, and IO latency.

If you would like to delve into real-time statistics and track the info about databases, the most resource-heavy queries, currently active processes, wait for stats and other aspects, you can use other SQL Monitor tabs in dbForge Studio for SQL Server.

1. Data IO — real-time file read and write statistics:

Fig.1. Data IO tab

Here, latency is displayed in ms.

This info can be retrieved by executing the following query:

<a href="https://medium.com/media/ea41540199aa8007885c0ac2090b4620/href">https://medium.com/media/ea41540199aa8007885c0ac2090b4620/href</a>

2. Databases — real-time info about databases:

Fig.2. Databases tab in SQL Monitor

This data can be acquired by using the following query:

<a href="https://medium.com/media/3acf4088080d6d1b4f3e8c2209063bc1/href">https://medium.com/media/3acf4088080d6d1b4f3e8c2209063bc1/href</a>

3. Wait Stats — real-time wait type statistics:

Fig.3. Wait Stats tab

4. Top 10 most resource-heavy queries displayed in real time.

The Elapsed Time parameter helps with locating the queries that affect performance the most:

Fig.4. Top Queries tab

This tab can also display the detailed query text and execution plan

You can get this info by executing the following:

<a href="https://medium.com/media/878cadba7d29aaebbb55c9cd715b0704/href">https://medium.com/media/878cadba7d29aaebbb55c9cd715b0704/href</a>

5. Sessions — processes currently active in the MS SQL Server instance:

Fig.5. Sessions tab of SQL Monitor tool

This data can be acquired with the following query:

<a href="https://medium.com/media/8a96f3adb1022fee1598b279265da307/href">https://medium.com/media/8a96f3adb1022fee1598b279265da307/href</a>

6. Backups — information about the latest database backups:

Fig.6. Backups tab of SQL Monitor

You can retrieve this info by executing the following:

<a href="https://medium.com/media/9e1f0df89b05f870d083ef6ad14f455d/href">https://medium.com/media/9e1f0df89b05f870d083ef6ad14f455d/href</a>

Summary

In this article, we looked at the ways of MS SQL Server activity monitoring which allow us to detect a lack of RAM and free threads, as well as other less obvious issues.

We have also seen how dbForge Studio for SQL Server by Devart can help with the MS SQL Server process.


Written by jordan-sanders | PR Manager at Devart since 2010 | Obsessed with the promotion of database development optimization
Published by HackerNoon on 2019/06/27