Skip to main content
Skip to main content
Edit this page

system.asynchronous_metrics

Querying in ClickHouse Cloud

The data in this system table is held locally on each node in ClickHouse Cloud. Obtaining a complete view of all data, therefore, requires the clusterAllReplicas function. See here for further details.

Description

Contains metrics that are calculated periodically in the background. For example, the amount of RAM in use.

Columns

  • metric (String) — Metric name.
  • value (Float64) — Metric value.
  • description (String - Metric description)

Example

SELECT * FROM system.asynchronous_metrics LIMIT 10
┌─metric──────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ AsynchronousMetricsCalculationTimeSpent │ 0.00179053 │ Time in seconds spent for calculation of asynchronous metrics (this is the overhead of asynchronous metrics).                                                                                                                                              │
│ NumberOfDetachedByUserParts             │          0 │ The total number of parts detached from MergeTree tables by users with the `ALTER TABLE DETACH` query (as opposed to unexpected, broken or ignored parts). The server does not care about detached parts and they can be removed.                          │
│ NumberOfDetachedParts                   │          0 │ The total number of parts detached from MergeTree tables. A part can be detached by a user with the `ALTER TABLE DETACH` query or by the server itself it the part is broken, unexpected or unneeded. The server does not care about detached parts and they can be removed. │
│ TotalRowsOfMergeTreeTables              │    2781309 │ Total amount of rows (records) stored in all tables of MergeTree family.                                                                                                                                                                                   │
│ TotalBytesOfMergeTreeTables             │    7741926 │ Total amount of bytes (compressed, including data and indices) stored in all tables of MergeTree family.                                                                                                                                                   │
│ NumberOfTables                          │         93 │ Total number of tables summed across the databases on the server, excluding the databases that cannot contain MergeTree tables. The excluded database engines are those who generate the set of tables on the fly, like `Lazy`, `MySQL`, `PostgreSQL`, `SQlite`. │
│ NumberOfDatabases                       │          6 │ Total number of databases on the server.                                                                                                                                                                                                                   │
│ MaxPartCountForPartition                │          6 │ Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.                                                                       │
│ ReplicasSumMergesInQueue                │          0 │ Sum of merge operations in the queue (still to be applied) across Replicated tables.                                                                                                                                                                       │
│ ReplicasSumInsertsInQueue               │          0 │ Sum of INSERT operations in the queue (still to be replicated) across Replicated tables.                                                                                                                                                                   │
└─────────────────────────────────────────┴────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Metric descriptions

The descriptions below are generated from the C++ source by utils/generate-async-metrics-docs. The single source of truth is the string literal next to each metric registration in src/Common/AsynchronousMetrics.cpp, src/Interpreters/ServerAsynchronousMetrics.cpp, and src/Coordination/KeeperAsynchronousMetrics.cpp. Metric names that include a variable suffix (per-disk, per-CPU, per-interface, ...) are shown with a *name* placeholder; the running server reports them with the concrete suffix substituted in.

AsynchronousHeavyMetricsCalculationTimeSpent

Time in seconds spent for calculation of asynchronous heavy (tables related) metrics (this is the overhead of asynchronous metrics).

AsynchronousHeavyMetricsUpdateInterval

Heavy (tables related) metrics update interval

AsynchronousMetricsCalculationTimeSpent

Time in seconds spent for calculation of asynchronous metrics (this is the overhead of asynchronous metrics).

AsynchronousMetricsUpdateInterval

Metrics update interval

AsyncLoggingmetric_firstQueueSize

Number of async messages queued pending for logging in this channel

BlockActiveTime_name

Time in seconds the block device had the IO requests queued. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockActiveTimePerOp_name

Similar to the BlockActiveTime metrics, but the value is divided to the number of IO operations to count the per-operation time.

BlockDiscardBytes_name

Number of discarded bytes on the block device. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockDiscardMerges_name

Number of discard operations requested from the block device and merged together by the OS IO scheduler. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockDiscardOps_name

Number of discard operations requested from the block device. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockDiscardTime_name

Time in seconds spend in discard operations requested from the block device, summed across all the operations. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockInFlightOps_name

This value counts the number of I/O requests that have been issued to the device driver but have not yet completed. It does not include IO requests that are in the queue but not yet issued to the device driver. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockQueueTime_name

This value counts the number of milliseconds that IO requests have waited on this block device. If there are multiple IO requests waiting, this value will increase as the product of the number of milliseconds times the number of requests waiting. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockQueueTimePerOp_name

Similar to the BlockQueueTime metrics, but the value is divided to the number of IO operations to count the per-operation time.

BlockReadBytes_name

Number of bytes read from the block device. It can be lower than the number of bytes read from the filesystem due to the usage of the OS page cache, that saves IO. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockReadMerges_name

Number of read operations requested from the block device and merged together by the OS IO scheduler. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockReadOps_name

Number of read operations requested from the block device. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockReadTime_name

Time in seconds spend in read operations requested from the block device, summed across all the operations. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockWriteBytes_name

Number of bytes written to the block device. It can be lower than the number of bytes written to the filesystem due to the usage of the OS page cache, that saves IO. A write to the block device may happen later than the corresponding write to the filesystem due to write-through caching. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockWriteMerges_name

Number of write operations requested from the block device and merged together by the OS IO scheduler. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockWriteOps_name

Number of write operations requested from the block device. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

BlockWriteTime_name

Time in seconds spend in write operations requested from the block device, summed across all the operations. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Source: /sys/block. See https://www.kernel.org/doc/Documentation/block/stat.txt

CGroupMaxCPU

The maximum number of CPU cores according to CGroups.

CGroupMemoryTotal

The total amount of memory in cgroup, in bytes. If stated zero, the limit is the same as OSMemoryTotal.

CGroupMemoryUsed

The amount of memory used in cgroup, in bytes. On cgroup v2 this is anon + sock + non-reclaimable kernel memory; on cgroup v1 this is RSS. In both cases the kernel OS page cache (file-backed cache) is excluded.

CGroupMemoryUsedWithoutPageCache

The amount of memory used in cgroup, in bytes, excluding the ClickHouse userspace page cache. This is CGroupMemoryUsed minus the userspace page cache size. When userspace page cache is disabled, this value equals CGroupMemoryUsed.

CGroupSystemTime

The ratio of time the CPU core was running OS kernel (system) code.

CGroupSystemTimeNormalized

The value is similar to CGroupSystemTime but divided by the number of available CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

CGroupUserTime

The ratio of time the CPU core was running userspace code. This includes also the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core).

CGroupUserTimeNormalized

The value is similar to CGroupUserTime but divided by the number of available CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

CPUFrequencyMHz_core_id

The current frequency of the CPU, in MHz. Most of the modern CPUs adjust the frequency dynamically for power saving and Turbo Boosting.

DictionaryMaxUpdateDelay

The maximum delay (in seconds) of dictionary update

DictionaryTotalFailedUpdates

Number of errors since last successful loading in all dictionaries.

DiskAvailable_name

Available bytes on the disk (virtual filesystem). Remote filesystems may not provide this information and can show a large value like 16 EiB.

DiskGetObjectThrottlerAvailable_name

Number of GetObject requests that can be currently issued without hitting throttling limit on the disk (virtual filesystem). Local filesystems may not provide this information.

DiskGetObjectThrottlerRPS_name

GetObject Request throttling limit on the disk in requests per second (virtual filesystem). Local filesystems may not provide this information.

DiskPutObjectThrottlerAvailable_name

Number of PutObject requests that can be currently issued without hitting throttling limit on the disk (virtual filesystem). Local filesystems may not provide this information.

DiskPutObjectThrottlerRPS_name

PutObject Request throttling limit on the disk in requests per second (virtual filesystem). Local filesystems may not provide this information.

DiskTotal_name

The total size in bytes of the disk (virtual filesystem). Remote filesystems may not provide this information and can show a large value like 16 EiB.

DiskUnreserved_name

Available bytes on the disk (virtual filesystem) without the reservations for merges, fetches, and moves. Remote filesystems may not provide this information and can show a large value like 16 EiB.

DiskUsed_name

Used bytes on the disk (virtual filesystem). Remote filesystems do not always provide this information.

EDACi_Correctable

The number of correctable ECC memory errors. A high number of this value indicates bad RAM which has to be immediately replaced, because in presence of a high number of corrected errors, a number of silent errors may happen as well, leading to data corruption. Source: /sys/devices/system/edac/mc/

EDACi_Uncorrectable

The number of uncorrectable ECC memory errors. A non-zero number of this value indicates bad RAM which has to be immediately replaced, because it indicates potential data corruption. Source: /sys/devices/system/edac/mc/

FilesystemCacheBytes

Total bytes in the cache virtual filesystem. This cache is hold on disk.

FilesystemCacheCapacity

Total capacity in the cache virtual filesystem. This cache is hold on disk.

FilesystemCacheFiles

Total number of cached file segments in the cache virtual filesystem. This cache is hold on disk.

FilesystemLogsPathAvailableBytes

Available bytes on the volume where ClickHouse logs path is mounted. If this value approaches zero, you should tune the log rotation in the configuration file.

FilesystemLogsPathAvailableINodes

The number of available inodes on the volume where ClickHouse logs path is mounted.

FilesystemLogsPathTotalBytes

The size of the volume where ClickHouse logs path is mounted, in bytes. It's recommended to have at least 10 GB for logs.

FilesystemLogsPathTotalINodes

The total number of inodes on the volume where ClickHouse logs path is mounted.

FilesystemLogsPathUsedBytes

Used bytes on the volume where ClickHouse logs path is mounted.

FilesystemLogsPathUsedINodes

The number of used inodes on the volume where ClickHouse logs path is mounted.

FilesystemMainPathAvailableBytes

Available bytes on the volume where the main ClickHouse path is mounted.

FilesystemMainPathAvailableINodes

The number of available inodes on the volume where the main ClickHouse path is mounted. If it is close to zero, it indicates a misconfiguration, and you will get 'no space left on device' even when the disk is not full.

FilesystemMainPathTotalBytes

The size of the volume where the main ClickHouse path is mounted, in bytes.

FilesystemMainPathTotalINodes

The total number of inodes on the volume where the main ClickHouse path is mounted. If it is less than 25 million, it indicates a misconfiguration.

FilesystemMainPathUsedBytes

Used bytes on the volume where the main ClickHouse path is mounted.

FilesystemMainPathUsedINodes

The number of used inodes on the volume where the main ClickHouse path is mounted. This value mostly corresponds to the number of files.

GRPCRejectedConnections

Number of rejected connections for the GRPC protocol.

GRPCThreads

Number of threads in the server of the GRPC protocol.

HashTableStatsCacheEntries

The number of entries in the cache of hash table sizes. The cache for hash table sizes is used for predictive optimization of GROUP BY.

HashTableStatsCacheHits

The number of times the prediction of a hash table size was correct.

HashTableStatsCacheMisses

The number of times the prediction of a hash table size was incorrect.

HTTPConnectionPoolgroup_nameTCPRcvBufTotalBytes

Total kernel TCP receive buffer memory (sk_rmem_alloc) across all HTTP connection pool sockets.

HTTPConnectionPoolgroup_nameTCPSndBufTotalBytes

Total kernel TCP transmit buffer memory (sk_wmem_alloc) across all HTTP connection pool sockets.

HTTPRejectedConnections

Number of rejected connections for the HTTP interface (without TLS).

HTTPSecureRejectedConnections

Number of rejected connections for the HTTPS interface.

HTTPSecureThreads

Number of threads in the server of the HTTPS interface.

HTTPThreads

Number of threads in the server of the HTTP interface (without TLS).

InterserverRejectedConnections

Number of rejected connections for the replicas communication protocol (without TLS).

InterserverSecureRejectedConnections

Number of rejected connections for the replicas communication protocol (with TLS).

InterserverSecureThreads

Number of threads in the server of the replicas communication protocol (with TLS).

InterserverThreads

Number of threads in the server of the replicas communication protocol (without TLS).

jemalloc.active

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.allocated

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.arenas.all.dirty_purged

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.arenas.all.muzzy_purged

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.arenas.all.pactive

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.arenas.all.pdirty

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.arenas.all.pmuzzy

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.arenas.dirty_decay_ms

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.background_thread.num_runs

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.background_thread.num_threads

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.background_thread.run_intervals

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.cache_arena.pactive

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.cache_arena.pdirty

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.epoch

An internal incremental update number of the statistics of jemalloc (Jason Evans' memory allocator), used in all other jemalloc metrics.

jemalloc.mapped

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.mergetree_arena.active_bytes

Active bytes in the dedicated jemalloc MergeTree arena. Holds long-lived MergeTree heap state: per-part metadata (NamesAndTypesList, SerializationInfoByName, the serializations map, column_name_to_position, MergeTreeDataPartChecksums tree, the Poco::LRUCache<String, ColumnSize> delegates inside each IMergeTreeDataPart, the per-part ColumnSize/IndexSize maps, MinMaxIndex, VersionMetadataOnDisk, and the MergeTreeDataPart{Compact,Wide} object itself) plus per-table metadata (StorageInMemoryMetadata / ColumnsDescription / VirtualColumnsDescription clones set up by setProperties, the serialization_hints aggregation, and the columns_descriptions_cache). Active parts and outdated parts pending cleanup both contribute. Disjoint from the cache arena and JIT arena. The per-part columns system.parts.primary_key_bytes_in_memory[_allocated] and system.parts.index_granularity_bytes_in_memory[_allocated] are subsets of this metric (when their values are non-zero — they can also live in PrimaryIndexCacheBytes instead, which is in the cache arena and not counted here).

jemalloc.mergetree_arena.dirty_bytes

Dirty bytes in the MergeTree arena that are eligible for purging back to the OS.

jemalloc.mergetree_arena.pactive

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.mergetree_arena.pdirty

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.metadata

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.metadata_thp

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.prof.active

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.prof.lg_sample

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.prof.thread_active_init

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.resident

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

jemalloc.retained

An internal metric of the low-level memory allocator (jemalloc). See https://jemalloc.net/jemalloc.3.html

Jitter

The difference in time the thread for calculation of the asynchronous metrics was scheduled to wake up and the time it was in fact, woken up. A proxy-indicator of overall system latency and responsiveness.

KeeperApproximateDataSize

The approximate data size of ClickHouse Keeper, in bytes.

KeeperAvgLatency

Average request latency of ClickHouse Keeper.

KeeperCommitLogsCacheEntries

Number of entries stored in the in-memory cache for next logs to be committed

KeeperCommitLogsCacheSize

Total size of in-memory cache for next logs to be committed

KeeperEphemeralsCount

The number of ephemeral nodes in ClickHouse Keeper.

KeeperFollowers

The number of followers of ClickHouse Keeper.

KeeperIsExceedingMemorySoftLimitHit

1 if ClickHouse Keeper is exceeding the memory soft limit, 0 otherwise.

KeeperIsFollower

1 if ClickHouse Keeper is a follower, 0 otherwise.

KeeperIsLeader

1 if ClickHouse Keeper is a leader, 0 otherwise.

KeeperIsObserver

1 if ClickHouse Keeper is an observer, 0 otherwise.

KeeperIsStandalone

1 if ClickHouse Keeper is in a standalone mode, 0 otherwise.

KeeperKeyArenaSize

The size in bytes of the memory arena for keys in ClickHouse Keeper.

KeeperLastCommittedLogIdx

Index of the last committed log in ClickHouse Keeper.

KeeperLastLogIdx

Index of the last log stored in ClickHouse Keeper.

KeeperLastLogTerm

Raft term of the last log stored in ClickHouse Keeper.

KeeperLastSnapshotIdx

Index of the last log present in the last created snapshot.

KeeperLatestLogsCacheEntries

Number of entries stored in the in-memory cache for latest logs

KeeperLatestLogsCacheSize

Total size of in-memory cache for latest logs

KeeperLatestSnapshotSize

The uncompressed size in bytes of the latest snapshot created by ClickHouse Keeper.

KeeperMaxFileDescriptorCount

The maximum number of open file descriptors in ClickHouse Keeper.

KeeperMaxLatency

Maximum request latency of ClickHouse Keeper.

KeeperMinLatency

Minimal request latency of ClickHouse Keeper.

KeeperOpenFileDescriptorCount

The number of open file descriptors in ClickHouse Keeper.

KeeperPacketsReceived

Number of packets received by ClickHouse Keeper.

KeeperPacketsSent

Number of packets sent by ClickHouse Keeper.

KeeperPathsWatched

The number of different paths watched by the clients of ClickHouse Keeper.

KeeperSessionWithWatches

The number of client sessions of ClickHouse Keeper having watches.

KeeperSyncedFollowers

The number of followers of ClickHouse Keeper who are also in-sync.

KeeperTargetCommitLogIdx

Index until which logs can be committed in ClickHouse Keeper.

KeeperTCPRejectedConnections

Number of rejected connections for the Keeper TCP protocol (without TLS).

KeeperTCPSecureRejectedConnections

Number of rejected connections for the Keeper TCP protocol (with TLS).

KeeperTCPSecureThreads

Number of threads in the server of the Keeper TCP protocol (with TLS).

KeeperTCPThreads

Number of threads in the server of the Keeper TCP protocol (without TLS).

KeeperWatchCount

The number of watches in ClickHouse Keeper.

KeeperZnodeCount

The number of nodes (data entries) in ClickHouse Keeper.

KeeperZxid

The current transaction id number (zxid) in ClickHouse Keeper.

LoadAverage1

The whole system load, averaged with exponential smoothing over 1 minute. The load represents the number of threads across all the processes (the scheduling entities of the OS kernel), that are currently running by CPU or waiting for IO, or ready to run but not being scheduled at this point of time. This number includes all the processes, not only clickhouse-server. The number can be greater than the number of CPU cores, if the system is overloaded, and many processes are ready to run but waiting for CPU or IO.

LoadAverage15

The whole system load, averaged with exponential smoothing over 15 minutes. The load represents the number of threads across all the processes (the scheduling entities of the OS kernel), that are currently running by CPU or waiting for IO, or ready to run but not being scheduled at this point of time. This number includes all the processes, not only clickhouse-server. The number can be greater than the number of CPU cores, if the system is overloaded, and many processes are ready to run but waiting for CPU or IO.

LoadAverage5

The whole system load, averaged with exponential smoothing over 5 minutes. The load represents the number of threads across all the processes (the scheduling entities of the OS kernel), that are currently running by CPU or waiting for IO, or ready to run but not being scheduled at this point of time. This number includes all the processes, not only clickhouse-server. The number can be greater than the number of CPU cores, if the system is overloaded, and many processes are ready to run but waiting for CPU or IO.

LongestRunningMerge

Elapsed time in seconds of the longest currently running background merge.

MaxPartCountForPartition

Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.

MemoryCode

The amount of virtual memory mapped for the pages of machine code of the server process, in bytes.

MemoryDataAndStack

The amount of virtual memory mapped for the use of stack and for the allocated memory, in bytes. It is unspecified whether it includes the per-thread stacks and most of the allocated memory, that is allocated with the 'mmap' system call. This metric exists only for completeness reasons. I recommend to use the MemoryResident metric for monitoring.

MemoryResident

The amount of physical memory used by the server process, in bytes.

MemoryResidentMax

Maximum amount of physical memory used by the server process, in bytes.

MemoryResidentWithoutPageCache

The amount of physical memory used by the server process, excluding userspace page cache, in bytes. This provides a more accurate view of actual memory usage when userspace page cache is utilized. When userspace page cache is disabled, this value equals MemoryResident.

MemoryShared

The amount of memory used by the server process, that is also shared by another processes, in bytes. ClickHouse does not use shared memory, but some memory can be labeled by OS as shared for its own reasons. This metric does not make a lot of sense to watch, and it exists only for completeness reasons.

MemoryVirtual

The size of the virtual address space allocated by the server process, in bytes. The size of the virtual address space is usually much greater than the physical memory consumption, and should not be used as an estimate for the memory consumption. The large values of this metric are totally normal, and makes only technical sense.

MySQLRejectedConnections

Number of rejected connections for the MySQL compatibility protocol.

MySQLThreads

Number of threads in the server of the MySQL compatibility protocol.

NetworkReceiveBytes_interface_name

Number of bytes received via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkReceiveDrop_interface_name

Number of bytes a packet was dropped while received via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkReceiveErrors_interface_name

Number of times error happened receiving via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkReceivePackets_interface_name

Number of network packets received via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkSendBytes_interface_name

Number of bytes sent via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkSendDrop_interface_name

Number of times a packed was dropped while sending via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkSendErrors_interface_name

Number of times error (e.g. TCP retransmit) happened while sending via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkSendPackets_interface_name

Number of network packets sent via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

NetworkTCPReceiveQueue

Total size of receive queues of network sockets used on the server across TCPv4 and TCPv6.

NetworkTCPSocketRemoteAddresses

Total number of unique remote addresses of network sockets used on the server across TCPv4 and TCPv6.

NetworkTCPSockets

Total number of network sockets used on the server across TCPv4 and TCPv6, in all states.

NetworkTCPSockets_description

Total number of network sockets in the specific state on the server across TCPv4 and TCPv6.

NetworkTCPTransmitQueue

Total size of transmit queues of network sockets used on the server across TCPv4 and TCPv6.

NetworkTCPUnrecoveredRetransmits

Total size of current retransmits (unrecovered at this moment) of network sockets used on the server across TCPv4 and TCPv6.

NumberOfDatabases

Total number of databases on the server.

NumberOfDetachedByUserParts

The total number of parts detached from MergeTree tables by users with the ALTER TABLE DETACH query (as opposed to unexpected, broken or ignored parts). The server does not care about detached parts and they can be removed.

NumberOfDetachedParts

The total number of parts detached from MergeTree tables. A part can be detached by a user with the ALTER TABLE DETACH query or by the server itself it the part is broken, unexpected or unneeded. The server does not care about detached parts and they can be removed.

NumberOfPendingMutations

The total number of mutations that are in left to be mutated.

NumberOfPendingMutationsOverExecutionTime

The total number of mutations which have data part left to be mutated over the specified max_pending_mutations_execution_time_to_warn setting.

NumberOfTables

Total number of tables summed across the databases on the server, excluding the databases that cannot contain MergeTree tables. The excluded database engines are those who generate the set of tables on the fly, like Lazy, MySQL, PostgreSQL, SQlite.

NumberOfTablesSystem

Total number of tables in the system database on the server stored in tables of MergeTree family.

OSContextSwitches

The number of context switches that the system underwent on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSCPUOverload

Relative CPU deficit, calculated as: how many threads are waiting for CPU relative to the number of threads, using CPU. If it is greater than zero, the server would benefit from more CPU. If it is significantly greater than zero, the server could become unresponsive. The metric is accumulated between the updates of asynchronous metrics.

OSGuestNiceTimecpu_suffix

The ratio of time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, when a guest was set to a higher priority (See man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This metric is irrelevant for ClickHouse, but still exists for completeness. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSGuestNiceTimeNormalized

The value is similar to OSGuestNiceTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSGuestTimecpu_suffix

The ratio of time spent running a virtual CPU for guest operating systems under the control of the Linux kernel (See man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This metric is irrelevant for ClickHouse, but still exists for completeness. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSGuestTimeNormalized

The value is similar to OSGuestTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSIdleTimecpu_suffix

The ratio of time the CPU core was idle (not even ready to run a process waiting for IO) from the OS kernel standpoint. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This does not include the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSIdleTimeNormalized

The value is similar to OSIdleTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSInterrupts

The number of interrupts on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSIOWaitTimecpu_suffix

The ratio of time the CPU core was not running the code but when the OS kernel did not run any other process on this CPU as the processes were waiting for IO. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSIOWaitTimeNormalized

The value is similar to OSIOWaitTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSIrqTimecpu_suffix

The ratio of time spent for running hardware interrupt requests on the CPU. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. A high number of this metric may indicate hardware misconfiguration or a very high network load. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSIrqTimeNormalized

The value is similar to OSIrqTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSMemoryAvailable

The amount of memory available to be used by programs, in bytes. This is very similar to the OSMemoryFreePlusCached metric. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSMemoryBuffers

The amount of memory used by OS kernel buffers, in bytes. This should be typically small, and large values may indicate a misconfiguration of the OS. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSMemoryCached

The amount of memory used by the OS page cache, in bytes. Typically, almost all available memory is used by the OS page cache - high values of this metric are normal and expected. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSMemoryFreePlusCached

The amount of free memory plus OS page cache memory on the host system, in bytes. This memory is available to be used by programs. The value should be very similar to OSMemoryAvailable. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSMemoryFreeWithoutCached

The amount of free memory on the host system, in bytes. This does not include the memory used by the OS page cache memory, in bytes. The page cache memory is also available for usage by programs, so the value of this metric can be confusing. See the OSMemoryAvailable metric instead. For convenience we also provide the OSMemoryFreePlusCached metric, that should be somewhat similar to OSMemoryAvailable. See also https://www.linuxatemyram.com/. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSMemorySwapCached

The amount of memory in swap that was also loaded in RAM. Swap should be disabled on production systems. If the value of this metric is large, it indicates a misconfiguration. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSMemoryTotal

The total amount of memory on the host system, in bytes.

OSNiceTimecpu_suffix

The ratio of time the CPU core was running userspace code with higher priority. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSNiceTimeNormalized

The value is similar to OSNiceTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSOpenFiles

The total number of opened files on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSProcessesBlocked

Number of threads blocked waiting for I/O to complete (man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSProcessesCreated

The number of processes created. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSProcessesRunning

The number of runnable (running or ready to run) threads by the operating system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

OSSoftIrqTimecpu_suffix

The ratio of time spent for running software interrupt requests on the CPU. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. A high number of this metric may indicate inefficient software running on the system. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSSoftIrqTimeNormalized

The value is similar to OSSoftIrqTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSStealTimecpu_suffix

The ratio of time spent in other operating systems by the CPU when running in a virtualized environment. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Not every virtualized environments present this metric, and most of them don't. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSStealTimeNormalized

The value is similar to OSStealTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSSystemTimecpu_suffix

The ratio of time the CPU core was running OS kernel (system) code. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSSystemTimeNormalized

The value is similar to OSSystemTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

OSThreadsRunnable

The total number of 'runnable' threads, as the OS kernel scheduler seeing it.

OSThreadsTotal

The total number of threads, as the OS kernel scheduler seeing it.

OSUptime

The uptime of the host server (the machine where ClickHouse is running), in seconds.

OSUserTimecpu_suffix

The ratio of time the CPU core was running userspace code. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This includes also the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). The value for a single CPU core will be in the interval [0..1]. The value for all CPU cores is calculated as a sum across them [0..num cores].

OSUserTimeNormalized

The value is similar to OSUserTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric. If specified, the Cgroup CPU quota divided by its period can be used instead of the actual number of CPU cores, and in that case the value of this metric may exceed 1 at some moments.

PageCacheMaxBytes

Current limit on the size of userspace page cache, in bytes.

PostgreSQLRejectedConnections

Number of rejected connections for the PostgreSQL compatibility protocol.

PostgreSQLThreads

Number of threads in the server of the PostgreSQL compatibility protocol.

ProcessSignalQueueLimit

Total limit of signal queue (once it reaches ProcessSignalQueueSize, you may get CANNOT_CREATE_TIMER errors)

ProcessSignalQueueSize

Size of signal queue (pending signals, timers for query profiling)

PrometheusRejectedConnections

Number of rejected connections for the Prometheus endpoint. Note: prometheus endpoints can be also used via the usual HTTP/HTTPs ports.

PrometheusThreads

Number of threads in the server of the Prometheus endpoint. Note: prometheus endpoints can be also used via the usual HTTP/HTTPs ports.

PSI_type_stall_type

Microseconds of stall time since last measurement.Upstream docs can be found https://docs.kernel.org/accounting/psi.html for the metrics and how to interpret them

QueriesMemoryUsage

Total memory currently used by all running queries on the server, in bytes. Useful for attributing memory pressure to the concurrent query load.

QueriesPeakMemoryUsage

Sum of per-user query memory peaks across all users tracked in ProcessList, in bytes. Each user's peak is the high-water mark of that user's memory tracker, which is reset when the user has no running queries. This is therefore an aggregate of currently-tracked per-user peaks, not a single server-wide peak of all queries since startup.

ReplicasMaxAbsoluteDelay

Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data.

ReplicasMaxInsertsInQueue

Maximum number of INSERT operations in the queue (still to be replicated) across Replicated tables.

ReplicasMaxMergesInQueue

Maximum number of merge operations in the queue (still to be applied) across Replicated tables.

ReplicasMaxQueueSize

Maximum queue size (in the number of operations like get, merge) across Replicated tables.

ReplicasMaxRelativeDelay

Maximum difference between the replica delay and the delay of the most up-to-date replica of the same table, across Replicated tables.

ReplicasSumInsertsInQueue

Sum of INSERT operations in the queue (still to be replicated) across Replicated tables.

ReplicasSumMergesInQueue

Sum of merge operations in the queue (still to be applied) across Replicated tables.

ReplicasSumQueueSize

Sum queue size (in the number of operations like get, merge) across Replicated tables.

TCPRejectedConnections

Number of rejected connections for the TCP protocol (without TLS).

TCPSecureRejectedConnections

Number of rejected connections for the TCP protocol (with TLS).

TCPSecureThreads

Number of threads in the server of the TCP protocol (with TLS).

TCPThreads

Number of threads in the server of the TCP protocol (without TLS).

Temperaturei

The temperature of the corresponding device in ℃. A sensor can return an unrealistic value. Source: /sys/class/thermal

Temperature_hwmon_name

The temperature reported by the corresponding hardware monitor in ℃. A sensor can return an unrealistic value. Source: /sys/class/hwmon

Temperature_hwmon_name_sensor_name

The temperature reported by the corresponding hardware monitor and the corresponding sensor in ℃. A sensor can return an unrealistic value. Source: /sys/class/hwmon

TotalBytesOfMergeTreeTables

Total amount of bytes (compressed, including data and indices) stored in all tables of MergeTree family.

TotalBytesOfMergeTreeTablesSystem

Total amount of bytes (compressed, including data and indices) stored in tables of MergeTree family in the system database.

TotalIndexGranularityBytesInMemory

The total amount of memory (in bytes) used by index granules (only takes active parts into account).

TotalIndexGranularityBytesInMemoryAllocated

The total amount of memory (in bytes) reserved for index granules (only takes active parts into account).

TotalPartsOfMergeTreeTables

Total amount of data parts in all tables of MergeTree family. Numbers larger than 10 000 will negatively affect the server startup time and it may indicate unreasonable choice of the partition key.

TotalPartsOfMergeTreeTablesSystem

Total amount of data parts in tables of MergeTree family in the system database.

TotalPrimaryKeyBytesInMemory

The total amount of memory (in bytes) used by primary key values (only takes active parts into account).

TotalPrimaryKeyBytesInMemoryAllocated

The total amount of memory (in bytes) reserved for primary key values (only takes active parts into account).

TotalProjectionIndexGranularityBytesInMemory

The total amount of memory (in bytes) used by projection index granularity (only takes active parts into account).

TotalProjectionIndexGranularityBytesInMemoryAllocated

The total amount of memory (in bytes) reserved for projection index granularity (only takes active parts into account).

TotalProjectionPrimaryKeyBytesInMemory

The total amount of memory (in bytes) used by projection primary key values (only takes active parts into account).

TotalProjectionPrimaryKeyBytesInMemoryAllocated

The total amount of memory (in bytes) reserved for projection primary key values (only takes active parts into account).

TotalRowsOfMergeTreeTables

Total amount of rows (records) stored in all tables of MergeTree family.

TotalRowsOfMergeTreeTablesSystem

Total amount of rows (records) stored in tables of MergeTree family in the system database.

TrackedMemory

Memory tracked by ClickHouse (should be equal to MemoryTracking metric), in bytes.

Uptime

The server uptime in seconds. It includes the time spent for server initialization before accepting connections.

VMMaxMapCount

The maximum number of memory mappings a process may have (/proc/sys/vm/max_map_count).

VMNumMaps

The current number of memory mappings of the process (/proc/self/maps). If it is close to the maximum (VMMaxMapCount), you should increase the limit for vm.max_map_count in /etc/sysctl.conf

ZooKeeperClientLastZXIDSeen

The last ZXID seen by the current ZooKeeper client session. This value increases monotonically as the client observes transactions from ZooKeeper.

See Also

  • Monitoring — Base concepts of ClickHouse monitoring.
  • system.metrics — Contains instantly calculated metrics.
  • system.events — Contains a number of events that have occurred.
  • system.metric_log — Contains a history of metrics values from tables system.metrics and system.events.