Distributed SQL Tips and Tricks – August 3, 2020

August 3, 2020

Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in:

Is there a way to get “updated_at” timestamps from YCQL without additional columns?

On each INSERT/UPDATE/DELETE YugabyteDB also stores by default the timestamp when the operation occurred. This can be easily looked up by using the WriteTime function. WriteTime returns the timestamp in microseconds when the value was written:

ycqlsh:yb_demo> create table users(id bigint PRIMARY KEY, name TEXT);
ycqlsh:yb_demo> INSERT INTO users(id,name) VALUES (1, 'Billy Johnes');
ycqlsh:yb_demo> SELECT id,name,writetime(name) from users;

 id | name     	| writetime(name)
----+--------------+------------------
  1 | Billy Johnes | 1596097449565229

(1 rows)

How to check the per-row on-disk space usage in YugabyteDB?

Data is saved in --fs_data_dirs director(ies), which contains both the transaction logs (WAL) and SSTable files (though you can store WAL files separately using --fs_wal_dirs).

After inserting data, quite likely many of the rows are still in WAL and memtables and not yet flushed to SSTable format on disk. And the WAL files are not in compressed format. To get a better indication, we can either load a lot more data (such that the WAL portion is insignificant) or force a manual flush of the table, and then inspect the size of just the data directories.

After filling a sample table with 1M rows:

CREATE TABLE test(a bigint, b bigint, c bigint, d bigint, PRIMARY KEY((a), b);

We can see the on disk size of the directory where the SSTable files are stored is very small:

$ du -hs yb-data/tserver/data
60K 	yb-data/tserver/data

This confirms that most of the data is still in WAL and memtables.

After forcing a flush of data to disk using:

./bin/yb-admin --master_addresses 127.0.0.1 flush_table <keyspace> <table>

The data directory size is as follows:

$ du -hs yb-data/tserver/data
48M 	yb-data/tserver/data

So approximate on-disk size per row size for the given schema is about 48 bytes. This includes the overheads for the metadata (indexes, bloom filters, internal timestamps, etc.).

How to check if a jsonb field contains a property name in YCQL?

Assuming we have a table with rows below:

CREATE TABLE testjsonb (id text PRIMARY KEY,fields jsonb);
INSERT INTO testjsonb(id,fields)values('3','{"property":"value3"}');
INSERT INTO testjsonb(id,fields)values('1','{"property3":"value3"}');

We can use IF NOT fields->>'property' = null like below:

ycqlsh:yb_demo> select * from testjsonb IF NOT fields->>'property' = null;

 id | fields
----+-----------------------
  3 | {"property":"value3"}

(1 rows)

How to enable query logging in YSQL?

Query logging can be enabled using the --ysql_log_statement gflag in yb-tserver.

Valid values are: none (off), ddl (only data definition queries, such as create/alter/drop), mod (all modifying/write statements, includes DDLs plus insert/update/delete/trunctate, etc), and all (all statements).

The query logs will show in postgres*.log files in the logs directory.

As an example, we’ll start a cluster with yb-ctl and write test queries:

$ ./bin/yb-ctl start --tserver_flags="ysql_log_statement=all"
$ ./bin/ysqlsh
yugabyte=# create extension if not exists pg_stat_statements;
yugabyte=# SELECT * FROM pg_stat_statements;

And then we can check in the logs directory:

~/yugabyte-data/node-1/disk-1/yb-data/tserver/logs$ ls
postgresql-2020-07-30_163500.log
yb-tserver.guru-predator.guru.log.INFO.20200730-163500.31158
yb-tserver.guru-predator.guru.log.WARNING.20200730-163500.31158
yb-tserver.INFO
yb-tserver.WARNING

And check the recorded queries: (log trimmed below)

~/yugabyte-data/node-1/disk-1/yb-data/tserver/logs$ cat postgresql-2020-07-30_163500.log
...
I0730 16:35:46.708248 31619 async_initializer.cc:80] Starting to init ybclient
I0730 16:35:46.708577 31619 client-internal.cc:1977] New master addresses: [127.0.0.1:7100]
I0730 16:35:46.709975 31619 async_initializer.cc:84] Successfully built ybclient
2020-07-30 16:35:56.470 UTC [31612] LOG:  statement: SELECT n.nspname as "Schema",
      c.relname as "Name",
      CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' WHEN 'I' THEN 'index' END as "Type",
      pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
    FROM pg_catalog.pg_class c
     	LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relkind IN ('r','p','v','m','S','f','')
      	AND n.nspname <> 'pg_catalog'
      	AND n.nspname <> 'information_schema'
      	AND n.nspname !~ '^pg_toast'
      AND pg_catalog.pg_table_is_visible(c.oid)
    ORDER BY 1,2;
2020-07-30 16:36:19.483 UTC [31612] LOG:  statement: create extension if not exists pg_stat_statements;
I0730 16:36:19.483259 31612 thread_pool.cc:166] Starting thread pool { name: TransactionManager queue_limit: 150 max_workers: 50 }
2020-07-30 16:36:21.787 UTC [31612] LOG:  statement: SELECT * FROM pg_stat_statements;
...