The Distributed SQL Blog

Thoughts on distributed databases, open source, and cloud native

A SQL Stopwatch Utility for YugabyteDB or PostgreSQL as an Alternative for “\timing on”

Technical Product Manager

This blog post shows you a few ways to use SQL to implement a timer for any multi-statement operation for use with ysqlsh with YugabyteDB (or psql with PostgreSQL). These tools are the overwhelmingly common choice for all sorts of ad hoc tasks as well as for critical tasks like running installation scripts. The native \timing on” feature times only individual SQL statements. Moreover, its output doesn’t go to the spool file that you define with \o”. The tools provide no support for end-to-end timing; and you can’t easily get hold of the per-statement times from \timing on” in a way that lets you add them up.

Any end-to-end timing scheme needs to read the wall clock at the start and the finish of the to-be-timed operations. Of course, the start-time needs to be recorded so that it can be used to compute the elapsed time. This blog post shows you how to read the wall clock with SQL and demonstrates a few ways to record the start time so that you can use a simple SQL statement to return the elapsed time as the difference between the start and finish clock readings.

The SQL approach also has the huge advantage that it lets you record the measured time in a spool file along with other query results.

The problem

Database developers, advocates, and documentation writers produce all sorts of SQL scripts. A script that installs (or later patches) the entire set of artifacts that jointly implement the database component of an application lies at the most critical end of the spectrum. But even the typically much smaller scripted test case that sets up the appropriate preconditions in the database and then provokes what seems to be a bug has the same character. Many such scripts (and almost invariably those that deal with the database component of a critical application) will be broken down into subscripts, sub-subscripts, and so on. And often the overall scripting will re-connect several times as different database users. You can see an example in the YugabyteDB documentation in the section that describes the date-time data types. It’s linked from the Download and install the date-time utilities code page.

It’s always interesting to know the end-to-end time that scripted operations take. Moreover, it’s useful to be able to record this time in the spool file(s) that are produced.

The standard practice, with YugabyteDB, is to run install scripts and the like using ysqlsh. This is inherited directly from PostgreSQL’s psql and faithfully supports the same set of metacommands. The\timing on” metacommand nominally meets the present purpose. However, it has drawbacks that render it simply unfit for this purpose:

  • When you issue the \timing on” metacommand, you set a mode that lasts until you eventually exit the mode with \timing off”. And while the mode is on, its effect is to report the time for each individual SQL statement that ysqlsh issues. The installation script that I referred to above that supports the date-time utilities code executes about 220 SQL statements. It would be crazy to have to add up that many individual short times to get the overall end-to-end time.
  • On Unix-like operating systems, the results that SQL queries report go to stdout. In contrast, the reported timings after setting \timing on” go to stderr and not to stdout. The \o” metacommand spools only what goes to stdout, and so the timing information is not saved in the spool file.

The challenge and some solutions

The solution depends on using a procedure to start a stopwatch and then, later, using a function to read it. The stopwatch is started simply by using the clock_timestamp() built-in function together with the extract(epoch from timestamptz_value) operator to read the number of seconds, s0, since the start of the epoch (00:00 on 1-Jan-1970 UTC) like this:

select extract(epoch from clock_timestamp()) as s0;

The finish time, s1, is read in the same way. The procedure records s0 (as will be explained presently). And the function retrieves s0, gets s1, and returns the difference. The return data type of clock_timestamp() is the timezone-sensitive timestamptz. You might fear, therefore, that you’d have to think carefully about how this might affect your timings. However, the extracted number of seconds from the start of the epoch, as a double precision value, is not timezone-sensitive—just as you’d hope. This is explained and demonstrated in the YSQL documentation section The plain timestamp and timestamptz data types.

Though using SQL to get the start and finish times for a sequence of individual database calls is easy, finding a way to record the start time so that you can read it back at the finish proves to be a challenge. To save words from now on, I’ll call the place where you record the start time a memo—short for memorandum, literally “a note of a thing to be remembered”. So the challenge is to find a suitable memo implementation to record s0—especially when the memo must survive across session boundaries.

You might consider using an ordinary table to implement the memo. But this has the possible drawback, only for short measured times, that the time taken to record and read the memo could be a significant fraction of the duration of the to-be-timed operations. Using a table also brings challenges like these: it has to exist already so that ad hoc timing tests can be done without fuss; and different concurrent users mustn’t interfere with each other’s concurrent use of the table.

At the other end of the spectrum, in a scenario where you want to time a sequence of SQL statements that are executed in the same session, you can encapsulate the statements in a table function declared with “returns table(report_line text)” and format “success” messages and query results as report_line values when you need to. With this encapsulation, you can use an ordinary PL/pgSQL variable as the memo for recording the start time. However, the installation script use case that motivates this post is not readily amenable to encapsulation in a function for these reasons:

  • The encapsulation would make the spellings of the execute statements for DDLs like create procedure awkward.
  • It would make the standard paradigm of decomposing an install script into several individual scripts awkward.

You’ll appreciate this when you read the section “Implement the start-time memo with an ordinary PL/pgSQL variable”.

Another way to implement the memo, again usable only when all the operations are executed in the same session, is to use the set SQL statement to record it as the value of a user-defIned run-time parameter and then, later, to use the current_setting() built-in function to read it.

Yet another way to implement the memo is to use a ysqlsh (or psql) metacommand to populate a client side variable and later to access it using the :my_variable notation. Such a variable, and the value that it currently holds, survives across session boundaries.

The following sections show you how you can use these schemes to add end-to-end timing to your SQL scripts in different ways for different use cases.

Download the code-kit

This blogpost is based on the dedicated subsection Case study—implementing a stopwatch with SQL in the YSQL documentation’s major section Date and time data types. All of the code that it explains and presents for copy-and-paste, and that is referenced here, is also bundled in a single downloadable .zip file. Follow the link that’s given at the end of the case study page. Of course, this is decomposed as a one-touch master-script that calls several subscripts! These scripts include timing tests.

The duration_as_text() formatting function

For large elapsed time values, you also need a utility function to render the number of seconds as a nicely formatted text value by using larger units appropriately, according to the size of the duration—just as the display after setting \timing on” does. The code to do this isn’t rocket science. But it’s fairly voluminous. You also need to test it carefully. This blog post’s purpose would be obscured by presenting the implementation of the formatting function, and the code that tests it, for copy-and-paste here. The function duration_as_text(numeric) is available for copy-and-paste in the case-study in the YSQL documentation.

Install duration_as_text() now in the database where you’re trying this post’s code examples because later examples depend on it. You might like to test it too using duration_as_text_test_report(). Here’s the test output:

  s (in seconds)   duration_as_text(s)
  --------------   -------------------
        0.019999 : less than ~20 ms
 
        0.020000 : 20 ms
        4.999999 : 5000 ms
 
        5.000000 : 5.00 ss
        9.999999 : 10.00 ss
 
       10.000000 : 10.0 ss
       59.900000 : 59.9 ss
       59.999999 : 60.0 ss
 
       60.000000 : 01:00 mi:ss
     3599.000000 : 59:59 mi:ss
     3599.999999 : 60:00 mi:ss
 
     3600.000000 : 01:00 hh:mi
    86340.000000 : 23:59 hh:mi
    86399.999999 : 24:00 hh:mi
 
    86400.000000 : 1 days 00:00 hh:mi
   202608.000000 : 2 days 08:17 hh:mi

Implement the start-time memo with a user-defined run-time parameter

Note: Readers who are familiar with Oracle Database and who are relatively new to PostgreSQL or to YugabyteDB might wonder why the stopwatch’s “start” procedure and “read” function aren’t encapsulated in a dedicated package that implements the start-time memo as a package level variable in the package’s body. The answer is simple: PostgreSQL’s stored procedure language, PL/pgSQL, has no analogous construct to the package concept that Oracle Database’s PL/SQL brings. PL/pgSQL supports only free-standing procedures and functions whose state doesn’t survive beyond the single execution. PostgreSQL’s user-defined run-time parameter is a reasonable alternative to Oracle Database’s package state in many scenarios and not just the stopwatch use case that this blog post addresses.

Create a procedure to start the stopwatch by recording the current time in the memo. (The value of a run-time parameter must be text. And it must be set using a manifest text constant.)

drop procedure if exists start_stopwatch() cascade;
create procedure start_stopwatch()
  language plpgsql
as $body$
declare
  start_time constant text not null := extract(epoch from clock_timestamp())::text;
begin
  execute 'set stopwatch.start_time = '''||start_time||'''';
end;
$body$;

Create a function to read the current wall-clock time, read the start time from the user-defined run-time parameter, and return the elapsed time as a double precision value:

drop function if exists stopwatch_reading_as_dp() cascade;
create function stopwatch_reading_as_dp()
  returns double precision
  -- It's critical to use "volatile" because "clock_timestamp()" is volatile.
  -- "volatile" is the default. Spelled out here for self-doc.
  volatile
  language plpgsql
as $body$
declare
  start_time constant double precision not null :=
    current_setting('stopwatch.start_time');
  curr_time constant double precision not null :=
    extract(epoch from clock_timestamp());
  diff constant double precision not null :=
    curr_time - start_time;
begin
  return diff;
end;
$body$;

Create a simple wrapper for stopwatch_reading_as_dp() that returns the elapsed time since the stopwatch was started as a text value by applying the function duration_as_text() to the stopwatch_reading_as_dp() return value:

drop function if exists stopwatch_reading() cascade;
create function stopwatch_reading()
  returns text
  -- It's critical to use "volatile" because "stopwatch_reading_as_dp()" is volatile.
  volatile
  language plpgsql
as $body$
declare
  t constant text not null :=
    duration_as_text(stopwatch_reading_as_dp()::numeric);
begin
  return t;
end;
$body$;

Test it using pg_sleep() and compare the result with what you get by using \timing on”:

call start_stopwatch();
\timing on
select pg_sleep(1.5)||'slept for 1.5 sec';
\timing off
select stopwatch_reading();

Here is a typical result. This is the output from \timing on” to stderr after “select pg_sleep(1.5)”:

1502.809 ms (00:01.503)

And this is the result of selecting stopwatch_reading():

 1504 ms

The presentation of the \timing on” output with microseconds precision gives a false sense of the accuracy. You can see this from the variability if you repeat the test a few times.

Implement the start-time memo with a ysqlsh variable

First, create a new double precision overload for the function stopwatch_reading() to return the elapsed time like this:

drop function if exists stopwatch_reading(double precision) cascade;
create function stopwatch_reading(start_time in double precision)
  returns text
  volatile
  language plpgsql
as $body$
declare
  curr_time constant double precision not null :=
    extract(epoch from clock_timestamp());
  diff constant double precision not null :=
    curr_time - start_time;
begin
  return duration_as_text(diff::numeric);
end;
$body$;

Now start the stopwatch, do the to-be-timed operations, and display the time that they took. The \gset” metacommand comes to the rescue here:

-- Notice that the "select" statement isn't terminated by the usual semicolon.
-- Rather, the "\gset" metacommand acts both as the directive to store
-- the value of the select list item "s0" as the variable :stopwatch_s0
-- and as the terminator for the "select" statement.
select extract(epoch from clock_timestamp())::text as s0
\gset stopwatch_
do $body$
begin
  perform pg_sleep(93.5);
end;
$body$;
select stopwatch_reading(:stopwatch_s0::double precision);

Here is a typical result:

 01:34 mi:ss

By design, the elapsed time is rounded to the nearest second—in other words to a precision of about 1%. You might argue that what you need to time is more precisely reproducible. But with real world tests of SQL statements, especially when a distributed SQL database like YugabyteDB is used, such reproducibility is very rare.

To make the ad hoc use of this approach easy, define these two shortcuts (using client-side variables, but here for a different purpose):

\set start_stopwatch 'select extract(epoch from clock_timestamp())::text as s0 \\gset stopwatch_'
\set stopwatch_reading 'select stopwatch_reading(:stopwatch_s0);'

Notice that when you copy and paste this into psql or ysqlsh, you see that it’s just two lines (as long as your terminal window is wide enough). Notoriously, psql has no mechanism to continue a metacommand over two or several lines—and ysqlsh, of course, inherits this limitation. However, as long as you simply accept this and let long lines wrap (as they do here), you don’t lose any functionality.

You can define these shortcuts in the psqlrc file (on the postgres/etc directory under the directory where you’ve installed the YugabyteDB client code).

You can now do a cross-session end-to-end test like this:

\o my_spool.txt
\c my_database user_1
:start_stopwatch
-- do something
\c my_database user_2
-- do something else
:stopwatch_reading
\o

Of course, this depends on having first installed the stopwatch code centrally, for example with owner utils in schema utils_s, on having granted the usage and execute privileges appropriately to user_1 and user_2, and on having established utils_s on the search path for user_1 and user_2 like this:

set search_path = "$user", 'utils';

Implement the start-time memo with an ordinary PL/pgSQL variable

Here’s a simulation of an ordinary installation script where a master script invokes several child scripts.

-- In script cr-table-t.sql
drop table if exists t cascade;
create table t(k serial primary key, v text not null);

-- In script "cr-procedure-p.sql"
drop procedure if exists p(t.v%type) cascade;
create procedure p(v_val in t.v%type)
  language plpgsql
as $body$
begin
  insert into t(v) values(v_val);
end;
$body$;

-- In script "install-all.sql"
\ir cr-table-t.sql
\ir cr-procedure-p.sql
call p('cat');
call p('dog');
select v from t order by k;

And here is a rewrite using a table function encapsulation.

-- In script cr-table-t.sql
drop procedure if exists cr_table_t() cascade;
create procedure cr_table_t()
  language plpgsql
as $body$
begin
  execute 'drop table if exists t cascade';
  execute 'create table t(k serial primary key, v text not null)';
end;
$body$;

-- In script "cr-procedure-p.sql"
drop procedure if exists cr_procedure_p() cascade;
create procedure cr_procedure_p()
  language plpgsql
as $outer_body$
begin
  execute 'drop procedure if exists p(t.v%type) cascade';
  execute '
      create procedure p(v_val in t.v%type)
        language plpgsql
      as $body$
      begin
        insert into t(v) values(v_val);
      end;
      $body$
    ';
end;
$outer_body$;

-- In script "install-all.sql"
drop function if exists installation_report() cascade;
create function installation_report()
  returns table(z text)
  volatile
  language plpgsql
as $outer_body$
declare
  t_start   constant double precision not null :=
    extract(epoch from clock_timestamp());
  t_finish           double precision not null := 0;
begin
  call cr_table_t();
  z := '"table t" created.';
  return next;
  call cr_procedure_p();
  z := '"procedure p()" created.';
  return next;
  z := '';
  return next;
  -- The next statements (two “call” statements and one “select”) compile OK,
  -- even though "t" doesn't yet exist.
  call p('cat');
  call p('dog');
  z := '  v  ';
  return next;
  z := '---';
  return next;
  for z in (select v from t order by k) loop
    return next;
  end loop;
  perform pg_sleep(60);
  t_finish := extract(epoch from clock_timestamp());
  z := '';
  return next;
  z := 'Total elapsed time: '||
       duration_as_text((t_finish - t_start)::numeric);
  return next;
end;
$outer_body$;

\t on
select z from installation_report();
\t off

This executes without error, even though table “t” doesn’t yet exist when installation_report() is compiled. This is a consequence of the fact that PL/pgSQL compilation does little more than syntax checking. Name resolution and further semantic checking are deferred to run-time.

Here is a typical result:

 "table t" created.
 "procedure p()" created.
 
   v  
 ---
 cat
 dog
 
 Total elapsed time: 01:00 mi:ss

Because DDLs are used, you’ll see a lot of variability. The point is, rather, just to show that if you work hard enough you can indeed use an ordinary PL/pgSQL variable to implement the memo for recording the start time—provided that all the to-be-timed operations are to be done in the same session. This approach is, however, undoubtedly cumbersome.

Comparing the nominally least noisy approach with the nominally noisiest approach

Obviously, the most noise-free approach is to invoke clock_timestamp() in the body of the same stored procedure (or function or anonymous block) that implements the to-be-timed actions. This is because, with this approach, client-server round trips don’t contribute to the measured elapsed time and because the memo implementation, an ordinary PL/pgSQL variable, is the most lightweight that’s possible. At the other end of the spectrum, using a table (and not even a temporary table) to implement the memo, populating it using one dedicated client-server round trip before executing the to-be-timed operations, and then reading the memo in another dedicated client-server round trip after the to-be-timed operations complete, introduces the most noise. The fixed costs of the round trips to write and read the memo will matter least when the actual to-be-timed duration is relatively long. Here’s a trivial test that allows you to experiment:

drop table if exists start_time_memo cascade;
create table start_time_memo(s0 double precision primary key);

drop function if exists to_be_timed(double precision) cascade;
create function to_be_timed(sleep_seconds in double precision)
  returns text
  language plpgsql
  volatile
as $body$
declare
  s0 constant double precision not null :=
    extract(epoch from clock_timestamp());
begin
  perform pg_sleep(sleep_seconds);
  declare
    s constant double precision not null :=
      extract(epoch from clock_timestamp()) - s0;
  begin
    return duration_as_text(s::numeric);
  end;
end;
$body$;

-- Warm-up execution.
delete from start_time_memo;
insert into start_time_memo(s0) values(extract(epoch from clock_timestamp()));
select to_be_timed(5.0);
select duration_as_text((extract(epoch from clock_timestamp()) - s0)::numeric)
from start_time_memo;

-- Realistic test.
delete from start_time_memo;
insert into start_time_memo(s0) values(extract(epoch from clock_timestamp()));
select to_be_timed(120.0);
select duration_as_text((extract(epoch from clock_timestamp()) - s0)::numeric)
from start_time_memo;

The times that the warm-up execution reports aren’t interesting, Here are typical results from the realistic test:

  • Reported by the function to_be_timed() itself: 02:00 mi:ss
  • Reported by accessing the start_time_memo table in dedicated round-trips: 02:00 mi:ss

With total times this long, where a one second timing granularity is appropriate, the noise contribution brought by using a table to implement the memo is insignificant.

How to choose the best method for a particular use case

It would seem that the \timing on” approach was designed for making ad hoc measurements without requiring any preparation. Overwhelmingly the best replacement for this for the ad hoc end-to-end timing of several SQL statements is to establish the procedure start_stopwatch() and the function stopwatch_reading(), by all means using shortened names if you prefer, together with the code that they depend on, in a central utils_s schema, possibly with a dedicated utils owner as was explained in the section “Implement the start-time memo with a ysqlsh variable”.

With this regime in place, ad hoc end-to-end timing, whenever you need it, is as easy as using \timing on”.

If you need to do end-to-end timing of operations which use two or more newly started sessions, then you have no choice but to use a memo scheme that survives over session boundaries (and probably, therefore, connecting as different users in turn). Using a table would be cumbersome because it could only be a permanent table and because, in a shared database, you’d need to design a regime to ensure that concurrent timing experiments could be conducted without interfering with each other. The design would need, somehow, to solve a tricky circular challenge: how to model the “handle” that each of several concurrent experiments would need to access each its own start time. This handle would need to be memoized—hence the circularity. The obvious solution is to record the handle in a client-side memo—but if you have to do this, then you might just as well use the client-side memo directly to record the start time as was shown in the section “Implement the start-time memo with a ysqlsh variable”. That section explained how you can encapsulate the use of client-side constructs like the \gset” metacommand, and the colon prefix for accessing a client-side variable, by defining variables to act as shortcuts in the psqlrc file.

If you need to compare different implementation approaches for use within a PL/pgSQL procedure or function then, of course, your best approach is to use an ordinary PL/pgSQL variable to implement the memo. If you want to keep the to-be-timed code free of timing code, then you can simply use a dedicated PL/pgSQL unit that implements the timing code and that invokes the to-be-timed unit.

Conclusion

I’ve shown you the core technique for all elapsed time measurements for database operations: get the start time and the finish time as seconds from the start of the epoch, by using extract(epoch from clock_timestamp()). And I’ve shown you various schemes for memoizing the start time so that you can choose the most appropriate for your present use case. I’ve also shown you how to get some re-usable code for reporting short or long times using sensible units and with sensible precision from the YugabyteDB documentation.

YugabyteDB’s YSQL subsystem uses PostgreSQL’s SQL processing C code as is. Therefore all the techniques and code that I’ve explained work the same in both environments—as this post’s title implies. If you’re a PostgreSQL developer and you haven’t yet tried YugabyteDB, then do try in now and use this post’s code in the two environments side by side. The simplest way to do this is to create a free YugabyteDB cluster by going to cloud.yugabyte.com.

I strongly recommend that you make a one-time effort and install the procedure start_stopwatch() and the function stopwatch_reading() centrally in any database of interest so that any user can use it immediately without preparation. I’ve done this myself; and hardly a day goes by when I don’t use it.

Related Posts

Technical Product Manager