Manage Query Workloads with Query Monitoring Rules in Amazon Redshift | AWS Big Data Blog
Data warehousing workloads are known for high variability due to seasonality, potentially expensive exploratory queries, and the varying skill levels of SQL developers.
To obtain high performance in the face of highly variable workloads, Amazon Redshift workload management (WLM) enables you to flexibly manage priorities and resource usage. With WLM, short, fast-running queries don’t get stuck in queues behind long-running queries. In spite of this, a query can sometimes corner a disproportionate share of resources, penalizing other queries in the system. Such queries are commonly known as rogue or runaway queries.
While WLM provides a method to restrict memory use and moving queries to other queues using a timeout, many times granular control is desirable. You can now use query monitoring rules to create resource usage rules for queries, monitor a query’s resource use, and then perform actions if a query violates a rule.
Workload management concurrency and query monitoring rules
In an Amazon Redshift environment, there are a maximum of 500 simultaneous connections to a single cluster. Throughput is usually expressed as queries per hour to maximize performance, while row databases like MySQL use concurrent connections to scale. In Amazon Redshift, workload management (WLM) maximizes throughput irrespective of concurrency. There are two main parts to WLM: queues and concurrency. Queues allow you to allocate memory at a user group or a query group level. Concurrency or memory slots is how you further subdivide and allocate memory to a query.
For example, assume that you have one queue (100% memory allocation) with a concurrency of 10. This means that each query gets a maximum of 10% memory. If the majority of your queries need 20% memory, then these queries are swapping to disk, causing a lower throughput. However, if you lower the concurrency to 5, each query is assigned 20% memory and the net result is higher throughput and overall faster response time to SQL clients. When switching from a row database to column-oriented, it is a common pitfall to assume that higher concurrency leads to better performance.
Now that you understand concurrency, here are more details about query monitoring rules. You define a rule based on resource usage and a corresponding action to take if a query violates that rule. Twelve different resource usage metrics are available, such as a query’s use of CPU, query execution time, rows scanned, rows returned, nested loop join, and so on.
Each rule includes up to three conditions, or predicates, and one action. A predicate consists of a metric, a comparison condition (=, <, or > ), and a value. If all of the predicates for any rule are met, that rule’s action is triggered. Possible rule actions are log, hop, and abort.
This allows you to catch a rogue or runaway query long before it causes severe problems. The rule triggers an action to free up the queue, and in turn improve throughput and responsiveness.
For example, for a queue that’s dedicated to short-running queries, you might create a rule aborting queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule logging queries that contain nested loops. There are predefined rule templates in the Amazon Redshift console to get you started.