The Distributed SQL Blog

Thoughts on distributed databases, open source and cloud native

Four Compelling Use Cases for PostgreSQL Table Functions

Developer Advocate

Welcome to part three of a three-part series of posts on PostgreSQL’s table functions. These functions can be easily leveraged in a distributed SQL database like YugabyteDB, which is PostgreSQL compatible.

In this final post, I’ll discuss four realistic use cases that call for user-defined table functions and I’ll explain how each case is met using complete, self-contained code examples.

If you missed the two previous posts, make sure to check out:

Use Case #1: Parameterized View

In my blog post “Using Stored Procedures in Distributed SQL Databases”, I described an experiment to measure the speeds of transactions with a variable number of insert statements per transaction when these are implemented respectively as a sequence of top-level database calls or as a single call to a stored procedure that has the same effect. I computed the procedure-to-top-level-call speed ratios, as a function of the number of inserts per transaction. I did this by recording the result for each test in a table and by then running code like I describe in this section.

This table is a stylized version of the table that I used:

Use this to insert some data to illustrate the behavior of the code that I describe in this section:

This table function computes the speed ratios for two specified methods for all the test_no values:

Test it like this:

It produces this output:

Of course, you could generate a report like this with a client-side program. But doing so would lose a major benefit. The table function approach lets you run arbitrary queries to answer arbitrary questions, like this:

Use Case #2: Pretty Printed Ad Hoc Reports
for Administrators

In “Use case #1”, there are three distinct values for results.test_no in the example sample data that I use. That means that there are three distinct comparison tables to be looked at: “Method A” vs “Method B”, “Method B” vs “Method C”, and “Method A” vs “Method C”. Rather than producing these by executing a .sql script with the three explicit “select… from speed_ratios(…)” statements, interspersed with appropriate \echo meta-commands, it would be nicer just to execute a single stored procedure. However, the only way to get output from an ordinary PL/pgSQL procedure is to use raise info (or raise notice). When you run such a program from a script, then every output line looks like this:

Moreover, the line numbers will, in general, all be different—so you can’t simply remove this awful clutter with a simple “search and replace all” in an ordinary text editor. This is especially frustrating for developers and administrators who, using other databases, assemble a kit-bag of scripts and programs to generate useful, and usably readable, reports that—just like for the present use case, assemble the output from several separate queries. Here’s where a table function comes to the rescue. This is the basic idea:

Run the report like this:

The loop over the method combinations starts like this:

Here’s the final missing piece:

And here’s the output produced by pretty_report_speed_ratios():

We can see at a glance that “Method B” is about 1.2x faster than “Method A”, that “Method C” is about 1.6x faster than “Method B”, and that, of course, “Method C” is therefore about 1.9x faster than “Method A”.

The full source text of the pretty_report_speed_ratios() table function is presented in the appendix so that you can simply copy and paste it into ysqlsh and then run it.

Use Case #3: Dynamic IN List

Assume that we have table t with this content:

The code to create this is shown in the “Built-in SQL table functions” section under “generate_series()”. Developers and administrators usually type an explicit in list in an ad hoc query like this:

But what if this is to be issued from a PL/pgSQL procedure and the in list members aren’t known until run-time? The naïve programmer tends to think that this calls for dynamic SQL where the in list is constructed programmatically. But there’s a much better, and simpler to write, way:

The functionality of the unnest() built-in SQL table function can be implemented as a user-defined table function thus:

Use it like this:

Of course, it’s pointless just to re-implement the built-in unnest(). Moreover, the built-in can accept an actual argument of any datatype. I illustrated a few possibilities in the section that described it. You’d have to implement each overload that you needed explicitly if you programmed it using PL/pgSQL.

My purpose in showing you my_unnest() is this:

  • firstly, to show you, ordinarily, how to iterate over the elements in an array in PL/pgSQL;
  • and secondly, to show you the overall structure of a table function that computes values that you can then use as an in list in a subquery.

Who knows what arbitrary computation you might want to use to produce the in list values—and, of course, the built-in unnest() simply has its fixed functionality.

Use case 4: Compact syntax for bulk insert

Consider the procedure created thus:

If we have a table created thus:

then we can populate it thus:

I used the same “default gen_random_uuid()” approach to populate the table’s primary key column in my code example in my “Using Stored Procedures in Distributed SQL Databases” post. I pointed out there that you must make gen_random_uuid() available by installing the PostgreSQL extension like this:

as explained in the YugabyteDB documentation here.

Notice that the procedure simply loops over the elements of the input array and executes an explicit single-row insert for each value. This looks to be inefficient; but PL/pgSQL doesn’t support methods for bulk binding to insert, update, or delete statements.

Here’s an alternative implementation of insert_rows() that uses unnest():

Of course, the use of this second implementation of insert_rows() is the same as that for the first implementation that I showed. It’s certainly shorter than the implementation that uses this:

It’s therefore easier to understand and review. However, my tests showed that the performance of the two implementations, and of a third that uses my_unnest() was the same within the confidence limits of my measurements.

Conclusion

When developers have a rich set of tools, they’re much better placed to implement an optimal approach for a new use case than when they’re limited by an impoverished toolkit. This three part series of blog posts has presented several use cases where table functions showed their value. User-defined table functions are part of the larger picture of stored procedures, so to use them you have to use a distributed SQL database that supports these! YugabyteDB inherits PostgreSQL’s support, and in this post series I’ve described examples implemented both in SQL and in PL/pgSQL. The examples have used a very wide range of PostgreSQL’s SQL and stored procedure functionality. The fact that all the code that I’ve shown works identically in YugabyteDB and vanilla PostgreSQL is a testament to the success of our strategy to use the upper half of the PostgreSQL query layer simply “as is” on top of our own unique storage layer. This architecture makes YugabyteDB unique among distributed SQL databases.

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.

Appendix

Here is the complete source code for the use case “Pretty printed ad hoc reports for administrators”:

Run it like this:

The \t on meta-command simply turns off the clutter of the column header (the name of the column t), the underline for that, and the row count footer.

Related Posts

Developer Advocate