Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/configuration/pgdog.toml/mirroring.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,11 @@ Default: **none** (optional)
The percentage of transactions to mirror, specified as a floating point number between 0.0 and 1.0. See [mirroring](../../features/mirroring.md) for more details. This overrides the [`mirror_exposure`](./general.md#mirror_exposure) setting.

Default: **none** (optional)

### `level`

The type of statements to mirror. Available options are:

- `ddl`
- `dml`
- `all` (default)
53 changes: 48 additions & 5 deletions docs/features/mirroring.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Mirroring in PgDog is asynchronous and should have minimal impact on production
<img src="/images/mirroring.png" width="80%" height="auto" alt="Mirroring">
</center>

## Configuration
### Configuration

To use mirroring, first configure both the mirror and the production database in [`pgdog.toml`](../configuration/pgdog.toml/databases.md). Once both databases are running, add a `[[mirroring]]` section:

Expand Down Expand Up @@ -43,7 +43,7 @@ Each client connected to the main database has its own queue, so concurrency sca

You can have as many mirror databases as you like. Queries will be sent to each one of them, in parallel. More mirrors will require more CPU and network resources, so make sure to allocate enough compute to PgDog in production.

## Mirror queue
### Mirror queue

If the mirror database(s) can't keep up with production traffic, queries will back up in the queue. To make sure it doesn't overflow and cause out-of-memory errors, the size of the queue is limited:

Expand All @@ -61,15 +61,15 @@ If the mirror database(s) can't keep up with production traffic, queries will ba
[[mirroring]]
source_db = "source"
destination_db = "dest"
queue_depth = 500
queue_length = 500
```

If the queue gets full, all subsequent mirrored transactions will be dropped until there is space in the queue again.

!!! note "Replication"
Since mirror queues can drop queries, it is not a replacement for Postgres replication and should be used for testing & benchmarking purposes only.

## Exposure
### Exposure

It's possible to limit how much traffic mirror databases receive. This is useful when warming up databases from a snapshot or if the mirror databases are smaller than production and can't handle as many transactions.

Expand All @@ -96,8 +96,51 @@ Acceptable values are between **0.0** (0%) and **1.0** (100%).

This is changeable at runtime, without restarting PgDog. When adding a mirror, it's a good idea to start slow, e.g., with only 0.1% exposure (`mirror_exposure = 0.01`), and gradually increase it over time.

## Realism
### Realism

We try to make mirrored traffic as realistic as possible. For each statement inside a transaction, we record the timing between that statement and the next one.

When replaying traffic against a mirror, we pause between statements for the same amount of time. This helps reproduce lock contention experienced by production databases, on the mirrors.

### Filtering

It's possible to filter what kind of statements mirrors receive using configuration, for example:

=== "pgdog.toml"
```toml
[[mirroring]]
source_db = "source"
destination_db = "dest"
level = "ddl"
```
=== "Helm chart"
```yaml
mirroring:
- sourceDb: source
destinationDb: dest
level: ddl
```

The `level` setting supports the following arguments:

| Argument | Description |
|-|-|
| `ddl` | Mirror only DDL statements like `CREATE`, `DROP`, etc. |
| `dml` | Mirror all statements except DDL, e.g. `INSERT`, `UPDATE`, etc. |
| `all` | Mirror all statements. This is the default. |

DDL-only mirroring is useful when maintaining long-running logical replicas, since the logical replication protocol doesn't support synchronizing schema changes.

#### Query parser

Filtering specific statements requires parsing queries. If your database setup doesn't have replicas or sharding, the query parser is typically disabled. Before using this feature, make sure to enable it in [`pgdog.toml`](../configuration/pgdog.toml/general.md#query_parser):

=== "pgdog.toml"
```toml
[general]
query_parser = "on"
```
=== "Helm chart"
```yaml
queryParser: on
```
4 changes: 2 additions & 2 deletions docs/features/sharding/manual-routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The PostgreSQL query language supports adding inline comments to queries. They a
The following query will be sent to shard number zero:

```postgresql
/* pgdog_shard: 0 */ CREATE INDEX CONCURRENTLY users_id_idx USING btree(id);
/* pgdog_shard: 0 */ CREATE INDEX CONCURRENTLY users_id_idx ON users USING btree(id);
```
=== "Sharding key"
This query will be sent to whichever shard maps to the key `"us-east-1"`:
Expand Down Expand Up @@ -70,7 +70,7 @@ The `SET` command comes from the PostgreSQL query language and is used to change
```postgresql
BEGIN;
SET LOCAL pgdog.shard TO 0;
CREATE INDEX users_id_idx USING btree(id);
CREATE INDEX users_id_idx ON users USING btree(id);
COMMIT;
```
=== "Sharding key"
Expand Down
1 change: 1 addition & 0 deletions docs/features/sharding/schema_management/migrations.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
icon: material/arrow-u-left-bottom
---

# Schema migrations

PgDog expects that all shards have, roughly, the same tables. A notable exception to this rule is partitioned tables,
Expand Down
149 changes: 149 additions & 0 deletions docs/features/sharding/sequences.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
icon: material/numeric
---

# Sharded sequences

!!! note "Unique IDs"
Sharded sequences require a bit more configuration to get working. If you're looking
for an easy way to generate cross-shard unique 64-bit integers, consider [Unique IDs](unique-ids.md).

!!! note "Experimental feature"
This feature is new and experimental. Please report any issues you may run into and test it
before deploying to production.

Sharded sequences are a way to generate monotonically increasing, globally unique 64-bit integers, without large gaps between numbers
or using a timestamp-based approach that produces very large numbers.

They can be used for producing cross-shard unique primary keys in [sharded](query-routing.md#sharding-configuration) tables, directly inside the database.

## How it works

Sharded sequences combine two Postgres primitives:

1. A normal sequence (created with `CREATE SEQUENCE`)
2. A hashing function, `satisfies_hash_partition`, used for number selection

The two are called inside a PL/pgSQL function that fetches numbers from a sequence until `satisfies_hash_partition` returns `true`, for the total number of shards in the cluster and the shard number it's being executed on:

```postgresql
DO $$
BEGIN
LOOP
SELECT nextval('normal_seq'::regclass) INTO val;

IF satisfies_hash_partition(/* ... */, val) THEN
RETURN val;
END IF;
END LOOP;
END $$;
```

Since fetching values from a sequence is very quick, we are able to find the correct number without introducing significant latency to row creation. The Postgres hash function is also good at producing uniform outputs, so all shards will have similar, small gaps between generated numbers.

### Configuration

Sharded sequences can only be used to generate primary keys for _sharded_ tables. [Omnisharded](omnishards.md) tables cannot use database sequences since they aren't guaranteed to produce the same number on all shards.

To make sure this constraint is enforced, PgDog can inject [unique IDs](unique-ids.md) into omnisharded-targeted `INSERT` queries only:

=== "pgdog.toml"
```toml
[rewrite]
primary_key = "rewrite_omni"
```
=== "Helm chart"
```yaml
rewrite:
primaryKey: rewrite_omni
```

This configuration setting is required to use sharded sequences, so make sure to set it before proceeding.

### Installation

To install and use sharded sequences, configure [rewrites](#configuration) to target omnisharded tables only, add all the shards to [`pgdog.toml`](../../configuration/pgdog.toml/databases.md) `[[databases]]` section, and run the following [admin database](../../administration/index.md) command:

=== "Admin database"
```
SETUP SCHEMA;
```
=== "CLI"
Since PgDog is also a CLI application, you can run the same command as follows:

```
$ pgdog setup --database <name>
```

| Option | Description |
|-|-|
| `database` | Database `name` in `pgdog.toml`. |

This command will perform the following steps:

1. Install the [schema manager](schema_management/index.md) into all database shards along with the necessary PL/pgSQL functions
2. Find all tables that contain `BIGINT PRIMARY KEY` columns (incl. `BIGSERIAL`) and change their default values to call the sharded sequence function

Once done, all subsequent `INSERT` statements that don't specify the primary key will automatically use the sharded sequence for their respective tables, for example:

=== "Queries"
```postgresql
-- Using DEFAULT explicitly.
INSERT INTO users
(id, email, tenant_id)
VALUES
(DEFAULT, 'admin@example.com', 5) RETURNING id;

-- Omitting the primary key.
INSERT INTO users
(email, tenant_id)
VALUES
('user@example.com', 5) RETURNING id;
```
=== "Output"
```
id
----
1
(1 row)

id
----
5
(1 row)
```

The returned `id` will be globally unique and monotonically increasing.

### Migrations

The schema manager will only install the sharded sequence in tables currently present in the database. When adding new tables or primary keys, make sure to execute the following PL/pgSQL function
as well:

```postgresql
SELECT pgdog.install_sharded_sequence('schema_name', 'table_name', 'column_name');
```

| Argument | Description |
|-|-|
| Schema name | The name of the schema where the table is being created. This is commonly the `public` schema, but can be any other as well. |
| Table name | The name of the new or existing table with the primary key. |
| Column name | The name of the primary key column. |

##### Example

The entire migration can be executed inside the same transaction:

```postgresql
BEGIN;

CREATE TABLE public.users (
id BIGINT PRIMARY KEY,
email VARCHAR NOT NULL,
created_at TIMESTAMPTZ
);

SELECT pgdog.install_sharded_sequence('public', 'users', 'id');

COMMIT;
```
9 changes: 9 additions & 0 deletions docs/features/sharding/sharding-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,18 +51,27 @@ All queries referencing the `user_id` column will be automatically sent to the m
Different integer types are treated the same by the query router. If you're using `BIGINT`, `INTEGER` or `SMALLINT` as your sharding key, you can specify `bigint` in the configuration:

```toml
[[sharded_tables]]
database = "prod"
column = "user_id"
data_type = "bigint"
```
=== "Text"
!!! note "Text types"
`VARCHAR`, `VARCHAR(n)`, and `TEXT` use the same encoding and are treated the same by the query router. For either one, you can specify `varchar` in the configuration:
```toml
[[sharded_tables]]
database = "prod"
column = "serial_number"
data_type = "varchar"
```
=== "UUID"
!!! note "UUID types"
Only UUIDv4 is currently supported for sharding in the query router.
```toml
[[sharded_tables]]
database = "prod"
column = "unique_id"
data_type = "uuid"
```

Expand Down
35 changes: 26 additions & 9 deletions docs/features/sharding/unique-ids.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,9 @@ icon: material/identifier
---
# Unique IDs

To generate unique identifiers, regular PostgreSQL databases use [sequences](https://www.postgresql.org/docs/current/sql-createsequence.html). For example, `BIGSERIAL` and `SERIAL` columns get their values by calling:
To generate unique identifiers, regular PostgreSQL databases use [sequences](https://www.postgresql.org/docs/current/sql-createsequence.html). For example, `BIGSERIAL` and `SERIAL` columns get their values by calling `SELECT nextval('users_id_seq')`.

```postgresql
SELECT nextval('sequence_name');
```

This guarantees that these columns contain unique and monotonically increasing integers.

If your database is sharded, however, using sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.
This guarantees that these columns contain unique and monotonically increasing integers. If your database is sharded, however, using regular sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.

## How it works

Expand Down Expand Up @@ -95,7 +89,7 @@ If you're migrating data from an existing database, you can ensure that all IDs
unique_id_min = 5_000_000
```

When set, all generated IDs are guaranteed to be larger than this value.
When set, all generated IDs are guaranteed to be larger than this value. This feature however is normally not needed, since IDs generated by this function are very large.

## Limitations

Expand All @@ -117,3 +111,26 @@ ID range is **69.73 years**, set to overflow on **August 3, 2095**. We expect da
Since the identifiers are time-based, to ensure uniqueness, PgDog limits how many IDs can be generated per unit of time. This limit is currently **4,096** IDs per millisecond.

When it's reached, PgDog will pause ID generation until the clock ticks to the next millisecond. This gives it an effective ID generation rate of _4,096,000 / second / node_, which should be sufficient for most deployments.

## Compact IDs

The unique ID algorithm generates very large 64-bit integers. This is because the timestamp portion is located 22 bits off to the left (little-endian). Some applications pass those IDs directly to the JavaScript-written frontends, which cannot display those numbers accurately: JS doesn't support any numbers larger than 2^53 - 1.

For this reason, we added a more "compact", 53-bit unique ID generator function. It can be used by enabling it in [`pgdog.toml`](../../configuration/pgdog.toml/general.md):

```toml
[general]
unique_id_function = "compact"
```

!!! warning "Switching to the compact generator"
If you're currently using the `"standard"` unique ID generator (default), be careful switching because the IDs it will generate will be considerably smaller, breaking the monotonic guarantee
and possibly causing unique index constraint errors.

### Limitations

Since the bit space in this function is smaller, and the timestamp granularity had to remain the same (ms), the space allocated to the node identifier
and the internal sequence has been reduced accordingly.

For this reason, the compact function only has a generation rate of _64,000 / second / node_ and supports up to 64 total nodes
in the same deployment.
4 changes: 2 additions & 2 deletions docs/migrating-to-pgdog/from-pgbouncer.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ Both PgBouncer and PgDog can override the user's password used to connect to Pos
name = "prod"
host = "10.0.0.1"
port = 5432
server_user = "postgres"
server_password = "hunter2"
user = "postgres"
password = "hunter2"
pooler_mode = "transaction"
```

Expand Down
Loading
Loading