• /
  • /

Keeping an Eye on MySQL with Health Checks

NOV 15, 2023 • WRITTEN BY ROMAN AGABEKOV
Databases are at the heart of many modern applications, powering everything from e-commerce websites to mobile apps. As a result, it's critical to ensure that they are running smoothly and efficiently at all times. Any downtime or performance issues can have a significant impact on business operations, causing lost revenue and frustrated users.

Health checks play a crucial role in ensuring the smooth operation of databases. These checks provide insights into a database's health, allowing administrators to monitor performance, diagnose issues, and take proactive measures to resolve them. By performing regular health checks, administrators can identify potential problems before they become critical, helping to minimize downtime and maintain optimal performance.

Database Health Categories

Database engineers should be looking at various indicators of database health to ensure that databases are running smoothly and efficiently. Some of the most important categories to consider include:

Cache Performance Metrics

These metrics focus on the efficiency of various cache systems within the database, helping to identify potential bottlenecks and areas for optimization. They measure the hit rate and fragmentation of different cache types, such as thread, table, MyISAM, and InnoDB caches, to ensure that frequently accessed data is readily available and cache usage is optimized.

  • Thread Cache Hit Rate
  • Thread Cache Ratio
  • MyISAM Cache Hit Rate
  • InnoDB Cache Rate Hit
  • Table Cache Hit Rate
  • QCache Fragmentation

Database Efficiency Metrics

This category encompasses metrics that monitor the overall efficiency and performance of the database system. They track key write ratios, log file sizes, and the number of sort merge passes to determine how well the database is managing data writes, storage, and sorting operations. These metrics can help identify areas where optimization efforts can be targeted to improve database performance.

  • MyISAM Key Write Ratio
  • InnoDB Log File Size
  • Sort Merge Passes Ratio
  • Flushing Log

Resource Utilization Metrics

These metrics focus on measuring the usage of system resources like memory, disk space, and CPU by the database. By monitoring resource utilization, you can identify potential bottlenecks or capacity issues and make informed decisions about resource allocation or scaling. High resource utilization may indicate the need for hardware upgrades, resource reallocation, or query optimizations to ensure optimal database performance.

  • Database Connection Utilization
  • Memory Utilization
  • Disk Space Usage
  • CPU Utilization

Temporary Data Metrics

This category tracks the creation and management of temporary data during database operations. Monitoring temporary disk data can help identify issues with the temporary storage system or inefficiencies in query execution that lead to excessive temporary data creation. By optimizing these aspects, you can improve overall database performance and reduce the impact of temporary data operations on system resources.

  • Temporary Disk Data

Health Checks

Regular health checks are essential for ensuring the smooth operation of databases. By monitoring key indicators, engineers and administrators can detect and address issues before they result in slow performance, application crashes, or data corruption.

To maintain database health, it is important to regularly monitor key indicators such as CPU usage, memory usage, disk space, network latency, query response time, and database backups. If any of these indicators fall outside of the expected bounds, it can indicate a potential problem with the database that needs to be addressed.

To calculate these health checks, administrators use various tools and techniques, including system monitoring software, performance counters, and diagnostic queries. Some databases also have built-in tools that administrators can use to monitor database health and performance.

1. Thread Cache Hit Rate

Thread cache hit rate is a metric that indicates the percentage of times a thread cache was able to provide a reusable thread instead of creating a new thread. A high thread cache hit rate, over 50, is desirable as it indicates that the server is efficiently reusing existing threads and reducing the overhead of creating new threads.
Gather the Required Data
To calculate Thread Cache Hit Rate, you will need to gather the following information:
  • Threads_created - Number of threads created by the server. This can be obtained from the Threads_created variable in the Server Status report.
  • Connections - Total number of connections made to the server. This can be obtained from the Connections variable in the Server Status report.
Calculate Thread Cache Hit Rate
Once you have gathered this information, you can plug the values into the following formula
100 - Threads_created * 100.0 / Connections
Interpret the Results
The result will be a percentage value between 0 and 100. A high thread cache hit rate (close to 100%) indicates that the server is efficiently reusing existing threads, while a low thread cache hit rate (close to 0%) indicates that the server is creating a lot of new threads, which can be inefficient. Ideally, the thread cache hit rate should be at least 50%.

2. Thread Cache Ratio

The thread cache ratio is a performance metric that indicates how efficiently the thread cache is being utilized. A high thread cache ratio (close to 100%) indicates that the thread cache is effective.
Gather the Required Data
To calculate Thread Cache Ratio, you will need to gather the following information:
  • Threads_cached - Number of threads currently cached.
  • Threads_created - Number of threads created by the server. This can be obtained from the Threads_created variable in the Server Status report.
Calculate Thread Cache Ratio
Once you have gathered this information, you can plug the values into the following formula
Threads_cached * 100.0 / Threads_created 
The resulting value is the Thread Cache Ratio, expressed as a percentage.

For example, if the status report indicates that Threads_cached is 200 and Threads_created is 500, the Thread Cache Ratio would be calculated as follows:

(200 * 100.0) / 500 = 40.0

Therefore, the Thread Cache Ratio is 40.0%.
Interpret the Results
A good threshold to look for, for this health check is Thread Cache Ratio > 10%. This threshold is commonly used to indicate that a thread cache is functioning effectively and helping to reduce the overhead of creating and destroying threads.

Interpreting the results of the Thread Cache Ratio calculation using this threshold involves comparing the calculated ratio to the threshold value of 10%. If the calculated ratio is greater than 10%, then the thread cache is considered to be functioning effectively. If the calculated ratio is less than 10%, then the thread cache may not be providing significant performance benefits.

It is important to note that the ideal Thread Cache Ratio may vary depending on the specific application or system environment. Therefore, it is recommended to monitor the Thread Cache Ratio over time and adjust the thread cache configuration as necessary to achieve optimal performance.

3. MyISAM Cache Hit Rate

Key Cache Hit Rate is a performance metric that measures the effectiveness of the key cache in MyISAM. It indicates the percentage of requests for index blocks that are satisfied by the key cache instead of having to be read from the disk. A high value indicates that the key cache is effectively reducing disk I/O and improving query performance.
Gather the Required Data
To calculate the MyISAM Cache Hit Rate, you'll need to log into the MySQL console and enter the SHOW STATUS; command to display a list of different status variables for the server:

  • Key_read_requests - The total number of requests made to read a key from the MyISAM cache.
  • Key_reads - The total number of times that a key was not found in the MyISAM cache and had to be read from disk.
Calculate the MyISAM Cache Hit Rate
Now that you have the values for "Key_read_requests" and "Key_reads," plug in the values for the following formula:
100 - Key_reads / Key_read_requests * 100
For example, if the value of "Key_read_requests" is 1000 and the value of "Key_reads" is 100, the formula would be:

100 - ((100 / 1000) * 100) = 90

This means that the MyISAM Cache Hit Rate is 90%.
Interpret the Results
A higher MyISAM Cache Hit Rate indicates that most of the requested data is being served from the cache, which is faster than reading data from the disk. This means that a higher MyISAM Cache Hit Rate is desirable, and values above 95% are generally considered good.

On the other hand, a low MyISAM Cache Hit Rate indicates that a large amount of data is being read from disk, which can significantly impact the performance of your MySQL server. In this case, you may want to consider optimizing your queries or increasing the size of your cache to improve performance.

4. MyISAM Key Write Ratio

The ratio of the number of physical writes of a key block to the cache to the number of requests to write a key block to the cache in percentage.
Gather the Required Data
To calculate MyISAM Key Write Ratio, you will need to gather information using the SHOW GLOBAL STATUS LIKE 'Key_writes'; command:

  • Key_write_requests - The number of requests to write a key block to the MyISAM key cache.
  • Key_writes - The number of physical writes of a key block from the MyISAM key cache to disk.
Calculate the MyISAM Key Write Ratio
Once you have gathered this information, you can plug the values into the following formula
Key_writes * 100.0 / Key_write_requests
For example, if the value of "Key_writes" is 1000 and the value of "Key_write_requests" is 5000, the calculation would be:

1000 * 100.0 / 5000 = 20.0

This means that the MyISAM Key Write Ratio is 20.0%.
Interpret the Results
The MyISAM Key Write Ratio is an indicator of the efficiency of the key cache. For a good performance of the MySQL server, the value of Key Write Efficiency must be 90 percent and above. If it is found less, then you can increase the size of the cache to improve the performance.

5. InnoDB Cache Hit Rate

InnoDB Buffer Cache Hit Rate indicates the percentage of read requests that are satisfied by the buffer pool instead of having to be read from disk. A high hit rate indicates that the buffer pool is effectively reducing disk I/O and improving query performance, while a low hit rate suggests that the buffer pool is not sized appropriately or that.

Gather the Required Data
To calculate the InnoDB Cache Hit Rate, use the SHOW GLOBAL STATUS; command to display a list of status variables and their current values. You'll need to look for the following variables:

  • Innodb_buffer_pool_read_requests – This variable tracks the number of read requests made to the InnoDB buffer pool.
  • Innodb_buffer_pool_reads – This variable tracks the number of read requests that required a physical disk read to satisfy.
Calculate the InnoDB Cache Hit Rate
Once you have the values for Innodb_buffer_pool_read_requests and Innodb_buffer_pool_reads, you can use the following formula to calculate the InnoDB cache hit rate:
(Innodb_buffer_pool_read_requests - Innodb_buffer_pool_reads) * 100.0 / 
Innodb_buffer_pool_read_requests
Interpret the Results
The resulting value is the InnoDB cache hit rate. This value represents the percentage of requests that were satisfied from the InnoDB buffer pool cache without requiring a physical disk read. A higher hit rate indicates that more queries are being served from memory, which is faster than accessing the disk. Ideally, an InnoDB Cache Hit Rate should be greater than 90%.

6. InnoDB Log File Size

InnoDB is a storage engine used in MySQL for managing tables with transactions. The InnoDB Log File is an important component of this engine, responsible for recording all changes made to a database.

The InnoDB Log File Size for this health check represents the number of minutes between flushes from the InnoDB Buffer Pool to disk.
Check MySQL Version
The first step is to check the MySQL version being used in the database. If the version is greater than 8.0.30, the calculation will be based on the innodb_redo_log_capacity variable.

If the version is less than or equal to 8.0.30, the calculation will be based on the innodb_log_file_size and innodb_log_files_in_group variables.

Gather the Required Data
Depending on your MySQL version, different variables are needed to compute the InnoDB Log File Size. Obtain the following:

  • Uptime – The time in seconds since the server started.
  • innodb_redo_log_capacityFor MySQL versions > 8.0.30. This variable represents the total size of the redo log buffer in bytes.
  • innodb_log_files_in_group and innodb_log_file_sizeFor MySQL versions <= 8.0.30. These variables represent the total number of log files and the size of each InnoDB Log File, respectively.
  • Innodb_os_log_written – The amount of data written to the InnoDB redo log.
Calculate InnoDB Log File Size
Find the correct formula based on your MySQL to calculate innodb_log_size_pct:

If MySQLVersion > 8.0.30:
(Uptime / 60) * innodb_redo_log_capacity / Innodb_os_log_written
If MySQLVersion <= 8.0.30:
 (Uptime / 60) * innodb_log_files_in_group * innodb_log_file_size / Innodb_os_log_written
Interpret the Results
Ideally, redo log files should be large enough to hold one hour's worth of write activities. This recommendation is about finding a balance between performance and recovery time. Observe innodb_os_log_written throughout a day and ensure that innodb_log_file_size * innodb_log_files_in_group (or innodb_redo_log_capacity for versions > 8.0.30) is big enough for the peak observed hour. From this, you can essentially size the redo log to handle high write workloads without needing to frequently flush data from the InnoDB Buffer Pool and write to disk.

The result of the formula provides the number of minutes between flushes from the InnoDB Buffer Pool to disk. To fall within the one-hour recommendation, it should fall between 45 and 75 (>=45 and <=75). This indicates that the redo log files' size is optimally configured to handle write-heavy workloads without excessive disk I/O. It also suggests that the system is well-equipped to maintain data consistency in case of unforeseen interruptions.

7. Database Connection Utilization

Database Connection Utilization is an important metric that measures the percentage of maximum available database connections that are currently being used. This metric can help identify if the database server is being utilized efficiently or if there are potential issues with connection limits. A value of 0 means the database is idle, while over 85% indicates inefficient connection management.
Gather the Required Data
Before you can calculate Database Connection Utilization, you must gather all the necessary information:

  • max_used_connections - The highest number of concurrent connections utilized since the server started. This value can be obtained with the SHOW STATUS LIKE 'max_used_connections'; command.
  • max_connections - The maximum permitted number of simultaneous client connections. This value can be obtained with the SHOW VARIABLES LIKE 'max_connections'; command.
Calculate Database Connection Utilization
Once you have gathered this information, you can plug the values into the following formula
max_used_connections * 100.0 / max_connections
For example, if max_used_connections is 100 and max_connections is 500, then the Database Connection Utilization would be:
100 * 100.0 / 500 = 20%

This indicates that 20% of the maximum available database connections are currently being used.
Interpret the Results
The value of Database Connection Utilization will be between 0% and 100%. Here are some general guidelines to interpret the results:

  • 0-70% – If the utilization is in this range, it indicates that the database server is operating efficiently and there is ample headroom for additional connections. There are no concerns regarding the maximum number of concurrent connections, and the server should be able to handle an additional load without performance degradation.

  • 70-85% – If the utilization is in this range, it indicates that the database server is approaching its maximum capacity. This is a warning sign that the number of concurrent connections may need to be increased to ensure that the server can handle an additional load without performance degradation.

  • 85-100% – If the utilization is in this range, it indicates that the database server is operating near or at its maximum capacity. This is a critical warning sign that the maximum number of concurrent connections has been reached, and additional connections will not be able to be established. This may cause the application or website to slow down or even crash. In this scenario, immediate action should be taken to optimize the database, increase the number of available connections, or scale up the server resources.
Related MySQL variables
max_connections - The max_connections variable sets the maximum number of concurrent client connections.

8. Table Cache Hit Rate

Table Cache hit rate is a performance metric that measures how efficiently the table cache is being used by the MySQL server. A high hit rate indicates that the table cache is effectively reducing disk I/O and improving performance.
Gather the Required Data
Before you can calculate Table Cache Hit Rate, you must gather all the necessary information by executing the SHOW STATUS; command to view the system variables.

Look for the 'Table_open_cache_hits', 'Table_open_cache_misses', 'Open_tables', and 'Opened_tables' variables in the output.

  • Table_open_cache_hits - The number of hits for open tables cache lookups.
  • Table_open_cache_misses - The number of misses for open tables cache lookups.
  • Open_tables - The number of tables that are open.
  • Opened_tables - The number of tables that have been opened.
Calculate Table Cache Hit Rate
If both 'Table_open_cache_hits' and 'Table_open_cache_misses' variables exist, execute the following query to calculate the Table Cache Hit Rate:
Table_open_cache_hits * 100.0 / (Table_open_cache_hits + Table_open_cache_misses)
If the 'Table_open_cache_hits' and 'Table_open_cache_misses' variables do not exist, execute the following query to calculate the Table Cache Hit Rate:
Open_tables * 100.0 / Opened_tables
This query will return the Table Cache Hit Rate as a percentage.
Interpret the Results
If the Table Cache Hit Rate is high, it indicates that MySQL is able to serve queries faster because it doesn't have to read data from disk as frequently. A high Table Cache Hit Rate means that MySQL is able to keep frequently accessed data in memory, which is much faster than reading from disk.

On the other hand, if the Table Cache Hit Rate is too low, it means that MySQL is not able to keep up with the demand for frequently accessed data in memory. This could indicate that you need to increase the size of the table cache, optimize your queries, or consider other performance-tuning strategies to improve performance.

It's important to monitor the Table Cache Hit Rate regularly to ensure that your MySQL server is performing optimally. A good rule of thumb is to aim for a Table Cache Hit Rate of 90% or higher.

9. Table Definition Cache Hit Rate

This metric measures the efficiency of the table definition cache in handling table metadata requests. A high hit rate indicates that the cache frequently provides the required table definitions, reducing the need for disk I/O and thus improving overall server performance.
Gather the Required Data
To calculate the Table Definition Cache hit rate, you need to collect two pieces of information from the MySQL status variables:

  • Open Table Definitions – This is the count of table definitions currently open in the cache.
  • Opened Table Definitions – This refers to the total number of table definitions that have been opened since the server started.
Calculate the Hit Rate
The hit rate is calculated using the following formula:
(Open Table Definitions / Opened Table Definitions) x 100
For instance, if there are 80 open table definitions and 100 opened table definitions, the hit rate is calculated as:
(80/100) x 100 = Table Definition Cache hit rate is 80% 
Interpret the Results
The interpretation of the Table Definition Cache hit rate is straightforward:

  • A hit rate above 75% is generally considered good, indicating efficient cache use.
  • A lower hit rate suggests that the cache is less effective, leading to more frequent disk I/O operations for table definition retrieval. This can be due to a cache size that is too small or an unusually diverse set of tables being accessed.

It's important to monitor this metric over time to understand the typical behavior for your workload. If the hit rate consistently falls below the desired threshold, you may want to consider increasing the size of the table definition cache.

10. Sort Merge Passes Ratio

Represents the ratio of external sort merge passes to total sort merge passes performed during query execution. A high Sort Merge Passes Ratio indicates that the sorting operation is taking longer than necessary and may be a performance bottleneck.
Gather the Required Data
Before you can calculate Sort Merge Passes Ratio, you must gather all the necessary information from SHOW STATUS:

  • sort_merge_passes - The total number of merge passes executed by the sorting algorithm.
  • sort_scan - The number of sorts that were done by scanning the table.
  • sort_range - The number of sorts that were done using ranges.

The output of each command will provide the number of sort-merge passes, sort scans, and sort ranges that have occurred since the MySQL server was last restarted.
Calculate Sort Merge Passes Ratio
Once you have gathered this information, you can plug the values into the following formula
sort_merge_passes * 100.0 / (sort_scan + sort_range)
Interpret the Results
Finally, the result obtained from the calculation will be the Sort Merge Passes Ratio. It is expressed as a percentage and represents the proportion of sorting operations that required a merge pass in relation to the total number of sorting operations (scans and ranges).

A higher ratio indicates that the MySQL server is experiencing a high number of merge passes and may need tuning to improve sorting performance. Ideally, the ratio should be less than 10%.

11. Temporary Disk Data

Temporary disk tables are created in a database when a query requires more memory than the server has available. This can happen when the query needs to sort or group large amounts of data. When a temporary disk table is created, the data is written to a file on the disk, which can slow down query performance.

Temporary disk space is used when there is insufficient memory available to store temporary data, such as when sorting large amounts of data or performing complex query operations.
Gather the Required Data
To calculate temporary disk data using the formula above, you need to know the values of Created_tmp_disk_tables and Created_tmp_tables. These values can be by running SHOW GLOBAL STATUS LIKE 'Created_tmp%'; command.

This command will return a table with two rows: one for Created_tmp_disk_tables and one for Created_tmp_tables. Note down the values for each of these variables.

  • created_tmp_tables - The server generated a certain number of internal temporary tables during the execution of statements
  • created_tmp_disk_tables - The server generated a certain number of internal on-disk temporary tables during the execution of statements
Calculate Temporary Disk Data
Once you have gathered this information, you can plug the values into the following formula
created_tmp_disk_tables * 100.0 / created_tmp_tables
Interpret the Results
Interpreting the results of the temporary disk data calculation can help you understand the performance of your database queries and identify any potential issues. The result of the calculation is a percentage that represents the amount of data that was written to temporary disk tables.

A high value for temporary disk data, greater than 25%, indicates that a significant amount of data is being written to temporary disk tables. This can slow down query performance and may indicate that there are inefficiencies in your database schema or queries.

A low value for temporary disk data, less than or equal to 25%, indicates that most temporary tables are being created in memory, which is faster than writing them to disk. This is generally desirable for optimal query performance.

It's important to note that the interpretation of the result may vary depending on the specific context. For example, if you are running queries on a large dataset that cannot fit into memory, it may be expected to see a higher value for temporary disk data.

12. Flushing Logs

Flushing Logs is an important metric that measures the efficiency of the InnoDB engine in MySQL databases. It shows the percentage of log writes that had to wait due to the log buffer being full. It is calculated by dividing the number of InnoDB log waits by the number of InnoDB log writes. The higher the value of Flushing Logs, the more inefficient the database engine is, which can lead to poor performance.
Gather the Required Data
Before you can calculate Flushing Logs, you must gather all the necessary information by executing the SHOW GLOBAL STATUS; command.

  • Innodb_log_writes - The number of physical writes to the InnoDB redo log file.
  • Innodb_log_waits - The number of times the log buffer was too small and a wait was required for it to be flushed before continuing.
Calculate Flushing Logs
Plug these values into the following formula:
Innodb_log_waits * 100 / innodb_log_writes
Let's say the value of "Innodb_log_writes" is 100, and the value of "Innodb_log_waits" is 10. The Flushing Logs value would be calculated as follows:

Flushing Logs = ["Status"]["Innodb_log_waits"] * 100 / ["Status"]["Innodb_log_writes"]
10 * 100 / 100 = 10%
In this example, the Flushing Logs value is 10, which indicates that the InnoDB engine is working efficiently and there are no significant waits for space in the log file.
Interpret the Results
The Flushing Logs value indicates the efficiency of the InnoDB engine in MySQL databases. A higher value indicates that the engine is less efficient, and there are more waits for space in the log buffer, which can lead to poor performance. On the other hand, a lower value indicates that the engine is more efficient, and there are fewer waits for space in the log buffer, which can result in better performance.

As a general guideline, a Flushing Logs value of less than 5 is considered good, while a value between 5 and 20 is acceptable. A value above 20 may indicate potential performance issues that need to be addressed.

It is important to note that the Flushing Logs value should be interpreted in conjunction with other performance metrics, such as CPU usage, memory usage, and disk I/O, to get a complete picture of the database performance. A high Flushing Logs value may not necessarily indicate a problem if the other performance metrics are within acceptable limits.

13. QCache Fragmentation

QCache Fragmentation is a measure of how well the Query Cache in MySQL is performing. The Query Cache stores the results of SELECT queries so that if the same query is run again, the results can be returned quickly without having to execute the query again. Over time, the Query Cache can become fragmented, which can reduce its effectiveness.
Gather the Required Data
Before you can calculate QCache Fragmentation, you need to run the SHOW STATUS command.

Record the values of the following variables:

  • qcache_free_blocks - The number of free memory blocks in the query cache.
  • qcache_total_blocks - The total number of blocks in the query cache.
  • qcache_lowmem_prunes - The number of queries deleted from the query cache because of low memory.
  • qcache_inserts - The number of queries added to the query cache.
Calculate QCache Fragmentation
To calculate QCache Fragmentation, plug these variables into the following formula:
qcache_free_blocks * 100.0 / qcache_total_blocks
To calculate QcacheDeleteRate, plug the applicable variables into the following formula:
qcache_lowmem_prunes * 100.0 / qcache_inserts
Interpret the Results
Once you have calculated the QCache Fragmentation and QcacheDeleteRate, you need to interpret the results. Ideally, the QCache Fragmentation should be less than 10, and the QcacheDeleteRate should be less than 20.

If the QCache Fragmentation is high, you may need to adjust the size of the Query Cache to reduce fragmentation. If the QcacheDeleteRate is high, you may need to increase the size of the Query Cache or optimize your queries to reduce the number of INSERTs.

14. CPU Utilization

CPU utilization, also known as CPU usage, is a measure of how much of the CPU's processing power is being used at a given time. It represents the percentage of time that the CPU is busy executing instructions compared to the total time available.

CPU utilization is an important metric for monitoring system performance as it indicates how much processing power is being used by various processes and applications. High CPU utilization may indicate that the system is overloaded or that certain processes are consuming too many CPU resources, which can lead to performance issues such as slow response times, system crashes, or even complete system failure.
Gather the Required Data
Before you can calculate CPU Utilization, you need to collect the following information:

  • cpu_avg_last_day - The first step is to retrieve the average value metrics for the last day. You can obtain this information from a monitoring tool.
  • cpu_counts - Next, you need to obtain the "CPU" counts. This refers to the number of CPUs or cores that are available on the system where MySQL is running. You can obtain this information from the system specifications or by running the command "lscpu" on Linux systems.
Calculate CPU Utilization
Once you have gathered this information, you can plug the values into the following formula
cpu_avg_last_day  * 100 / cpu_counts
For example, if the average CPU usage for MySQL in the last day is 5 and the system has 4 CPUs, the CPU utilization for MySQL would be calculated as follows:

5 * 100 / 4 = 125%

Therefore, the CPU utilization for MySQL would be 125% in this case.
Interpret the Results
The interpretation of the CPU utilization results will depend on the context of the system and the workload that MySQL is handling. In general, a high CPU utilization value can indicate that the system is working hard to process the workload, and it may be close to the limits of its capacity. However, a CPU utilization value above 100% may indicate that the system is experiencing contention or overload, which can lead to performance degradation and potentially even system crashes. In general, CPU utilization should fall under 80%.

To properly interpret the results, it is important to understand the baseline CPU utilization of the system and to monitor it over time to detect any trends or changes. Additionally, it is recommended to correlate the CPU utilization with other performance metrics such as disk I/O, memory usage, and network traffic to gain a more comprehensive view of the system's performance.

In general, a sustained high CPU utilization may indicate that the system is under-provisioned, and additional resources, such as more CPUs or higher capacity hardware, may be needed to handle the workload effectively. On the other hand, a sudden spike in CPU utilization may indicate a performance issue, such as a poorly optimized query or a sudden increase in workload.

15. Memory Utilization

Memory utilization refers to the percentage of memory that is currently being used by a system or applications. It is a measure of how memory resources are being used by a system or applications.
Gather the Required Data
Before you can calculate Memory Utilization, you need to find the value of "PhysicalMemoryUsed" and "PhysicalMemory", which can be obtained with the following command: free -m

PhysicalMemoryUsed - is the amount of memory used by operating system or applications.
PhysicalMemory - is the total amount of physical memory available on the server hosting the MySQL instance.
AvgValueMetricsLastDay(PhysicalMemoryUsed) - is the average amount of memory used by operating system or applications for 1 one day.
Calculate Memory Utilization
Once you have gathered this information, you can plug the values into the following formula
AvgValueMetricsLastDay(PhysicalMemoryUsed) * 100 / PhysicalMemory
Interpret the Results
The result of the memory utilization calculation using the provided formula represents the percentage of physical memory being used in relation to the total physical memory available on the server.

It is important to note that the ideal memory utilization depends on various factors, such as the size of the database, the number of connections, and the server configuration. Therefore, it is recommended to monitor memory utilization regularly and adjust the configuration as needed to ensure optimal performance. But a good general threshold is less than 80%.

If the memory utilization consistently exceeds this threshold, it may lead to performance issues such as slow queries or even crashes. Therefore, it is important to set up alerts or notifications to proactively address any memory utilization issues before they impact the performance of the MySQL instance.

16. Disk Space Usage

As databases grow, they consume more disk space. A lack of disk space can cause issues like slow performance, application crashes, and data corruption.
Gather the Required Data
To calculate Disk Space Usage, you will need to gather the following data:

  • Disk_total – The total amount of disk space.
  • Disk_used – The amount of used disk space.
Calculate Disk Space Usage
Once you have gathered this information, you can plug the values into the following formula
disk_used / disk_total
Interpret the Results
The result of the formula will be the current disk space usage. You can interpret the result to determine the disk space usage trend for the database. It will help you identify any potential disk space issues before they become critical.

If the result is increasing, it means that the database is using more disk space over time, and you may need to allocate more disk space to the database or optimize the database to free up disk space.

If the result is decreasing, it means that the database is using less disk space over time, and you can take necessary actions to free up disk space by deleting unnecessary data or optimizing the database.

17. Open Files Utilization

Open Files Limit Utilization is a system metric that reflects the extent to which the operating system's limit on the maximum number of open files is being used.

Monitoring this metric is essential, as reaching or exceeding the open files limit can lead to potential system performance issues or even failures. It is an indicator of how the system is handling file operations and whether any optimizations or adjustments are needed to prevent problems.
We have recently launched a new version of Releem that includes health checks. By using Releem, users can gain real-time insights into the health of their database and receive recommendations on how to resolve any issues.
Gather the Required Data
To calculate the Open Files Utilization, you need to collect the following information:

  • open_files – The current number of open files by the system.
  • open_files_limit – The maximum number of files that the operating system allows to be open simultaneously.
Calculate Open Files Utilization
Utilize the gathered information in the following formula:
open_files * 100 / open_files_limit
Interpret the Results
The result of the formula will provide the percentage of the open files limit currently being utilized by the system.

If Open Files Utilization is less than 85%, this indicates that the system is within a safe range and there is sufficient headroom before reaching the open files limit. This is the desired condition to ensure smooth operation.

If the value approaches or exceeds 85%, it implies that a significant portion of the open files limit has been utilized. Action may be needed, such as optimizing file operations, closing unnecessary files, or even increasing the system's open files limit.

18. Table Locking Efficiency

Table Locking Efficiency is a memory-related metric that measures the efficiency of table lock requests in a database. It is expressed as a percentage of table lock requests that were granted immediately without requiring the requesting thread to wait. High efficiency in table locking is crucial for optimal database performance, especially in environments with high transaction rates.
Gather the Required Data
To evaluate the Table Locking Efficiency, you need to obtain the following data from your database's status variables:

  • Table_locks_immediate – The number of times that a request for a table lock could be granted immediately.
  • Table_locks_waited – The number of times that a request for a table lock could not be granted immediately and a wait was needed.
Calculate Table Locking Efficiency
Use the collected data in the formula below to calculate the Table Locking Efficiency:
Table_locks_immediate * 100 / (Table_locks_waited + Table_locks_immediate)
Interpret the Results
The result will give you the percentage of table lock requests that were met immediately.

If table_locking_efficiency is greater than 95%, this is indicative of high efficiency in handling table lock requests. The majority of lock requests are being met immediately, suggesting that the database is performing well in this aspect.

If the value is less than 95%, it indicates that a substantial number of lock requests are having to wait. This can be a performance concern, as waiting for table locks can lead to increased response times and decreased throughput.

To improve Table Locking Efficiency, consider optimizing your queries to reduce lock contention and consider using the InnoDB storage engine if you’re not already doing so, as it provides row-level locking rather than table-level locking. This reduces the chance of lock contention, helping to maintain high Table Locking Efficiency even under heavy load.

19. InnoDB Dirty Pages Ratio

InnoDB Dirty Pages Ratio is a crucial metric for both MyISAM and InnoDB storage engines in MySQL, reflecting the proportion of "dirty" pages in the InnoDB buffer pool. A "dirty" page is a page that has been modified in memory but not yet written back to disk.

Monitoring this ratio is vital as it provides insights into the database's behavior, especially during write-heavy workloads.
Gather the Required Data
To calculate the InnoDB Dirty Pages Ratio, you need to collect the following status variables:

  • innodb_buffer_pool_pages_dirty – The number of pages in the InnoDB buffer pool that are currently marked as dirty.
  • innodb_buffer_pool_pages_total – The total number of pages in the InnoDB buffer pool.
Calculate InnoDB Dirty Pages Ratio
Use the collected data in the formula below:
innodb_buffer_pool_pages_dirty / innodb_buffer_pool_pages_total * 100
Interpret the Results
The result from the formula will give you the percentage of dirty pages in the InnoDB buffer pool.

If innodb_dirty_pages_ratio is less than 75%, this indicates that the proportion of dirty pages is within a healthy range. It suggests that the system is effectively managing the flushing of dirty pages back to disk, preventing excessive accumulation.

If the value is greater than or equal to 75%, it indicates a high ratio of dirty pages, which might be normal during write-heavy loads but still requires attention. High values might suggest that the system is struggling to flush dirty pages back to disk promptly, potentially leading to performance issues.

Regularly monitoring the InnoDB Dirty Pages Ratio and understanding its behavior under different workloads can help in tuning the MySQL configuration for optimal performance, such as adjusting the innodb_flush_method and innodb_io_capacity settings.

Releem Introduces Health Checks

We have recently launched a new version of Releem that includes health checks. By using Releem, users can gain real-time insights into the health of their database and receive recommendations on how to resolve any issues.
By automatically monitoring these indicators, Releem saves database engineers tons of time on manual calculations, freeing them up to focus on more important tasks, such as optimizing queries or tuning the database configuration.
WRITTEN BY ROMAN AGABEKOV
Ready to dive in?
Try Releem today for FREE!