admin管理员组

文章数量:1432566

Imagine a table like this

create table public.my_table (
    id          uuid primary key,
    category_id uuid not null,
    name        text not null,
    is_active   bool not null,
    created_at timestamp with time zone not null default now(),
    updated_at timestamp with time zone not null default now()
);

There is a rule: a category_id must have only one record with is_active=true

I can come up with 2 ways of adding records:

#1 using CTE insert to add a new active record + update to deactivate previous:

WITH new_record AS (
    INSERT INTO my_table (id, category_id, name, is_active) VALUES (?, ?, ?, true)
    RETURNING id, category_id
)
UPDATE my_table
SET is_active  = false,
    updated_at = now()
FROM new_record
WHERE my_table.category_id = new_record.category_id
  AND my_table.id <> new_record.id
  AND my_table.is_active = true

#2 making separate insert & update queries in a transaction:

BEGIN TRANSACTION;

INSERT INTO my_table (id, category_id, name, is_active) VALUES (?, ?, ?, true);

UPDATE my_table
   SET is_active  = false,
       updated_at = now()
   WHERE my_table.category_id = ?
         AND my_table.id <> ?
         AND my_table.is_active = true;

COMMIT;

I like the second way for its simplicity.

Does the first way have any benefits compared to the second?

Imagine a table like this

create table public.my_table (
    id          uuid primary key,
    category_id uuid not null,
    name        text not null,
    is_active   bool not null,
    created_at timestamp with time zone not null default now(),
    updated_at timestamp with time zone not null default now()
);

There is a rule: a category_id must have only one record with is_active=true

I can come up with 2 ways of adding records:

#1 using CTE insert to add a new active record + update to deactivate previous:

WITH new_record AS (
    INSERT INTO my_table (id, category_id, name, is_active) VALUES (?, ?, ?, true)
    RETURNING id, category_id
)
UPDATE my_table
SET is_active  = false,
    updated_at = now()
FROM new_record
WHERE my_table.category_id = new_record.category_id
  AND my_table.id <> new_record.id
  AND my_table.is_active = true

#2 making separate insert & update queries in a transaction:

BEGIN TRANSACTION;

INSERT INTO my_table (id, category_id, name, is_active) VALUES (?, ?, ?, true);

UPDATE my_table
   SET is_active  = false,
       updated_at = now()
   WHERE my_table.category_id = ?
         AND my_table.id <> ?
         AND my_table.is_active = true;

COMMIT;

I like the second way for its simplicity.

Does the first way have any benefits compared to the second?

Share Improve this question edited Nov 19, 2024 at 10:49 Mark Rotteveel 110k230 gold badges156 silver badges225 bronze badges asked Nov 18, 2024 at 18:20 Peter FencePeter Fence 636 bronze badges 5
  • 1 I suggest reading WITH data modifying statements. – Adrian Klaver Commented Nov 18, 2024 at 18:43
  • Are you aware that this structure will become slower every INSERT? The first time there are 0 records to update. The second time there will we 1 update, the third time 2 updates, etc. – Frank Heikens Commented Nov 18, 2024 at 19:25
  • @FrankHeikens I don't think I follow. The first time there's nothing to update. The second and all following inserts require just a single update, of the row that was previously is_active for that category_id. The search would be faster if they moved is_active to the category table (if they have one and my_table.category_id points at its pk) but the insert would result in the exact same amount of updates: just the one that was active before the new insert arrived. – Zegarek Commented Nov 18, 2024 at 19:54
  • "There is a rule" - but there is no constraint enforcing this? – Bergi Commented Nov 18, 2024 at 22:26
  • In your first solution ("with ... as ( insert ... ) update ..."), the update runs on the same snapshot as the insert (see the answer from @Zegarek), and does not see the row which was newly inserted. You don't have to exclude the new row from the update, the part "AND my_table.id <> new_record.id" is superfluous. – hobgoblin Commented Nov 25, 2024 at 17:05
Add a comment  | 

1 Answer 1

Reset to default 4

does the first way have any pros?

It does, it's one atomic operation.

By default you operate in read committed transaction isolation mode which means there's a chance that in your second example one of your workers will
demo at db<>fiddle

  1. Open the transaction, insert the new row and promote it to active.
  2. Meanwhile, some other worker will insert an even newer active row and commit it.
  3. The first worker will proceed to run the update to make sure all other rows of the category are deacativated. At that point it'll notice the newer row and deactivate it. As a result, once it commits, your active row might not be the latest one, because the slower worker overwrote the work from the faster, most recent messenger.
  4. If the other worker didn't commit before the first one issued the update, they will both deactivate some older row, and after that they'll both commit their active row. As a result, you'll have two active rows in a category. It can go as high as the number of concurrent clients.

Your first example does the whole insert-one-update-another operation in one go, on one snapshot that doesn't allow a window for anyone to slip in any changes. The doc immediately recommended by @Adrian Klaver puts it well:

when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates

If my_table.category_id is really a foreign key pointing at the primary key of category.id, it's better to move is_active to category table. PK means it's unique, so it'll be smaller and faster to search for the category and update a pointer at the currently active my_table.id:

create table category(
  id uuid primary key
 ,active_my_table_id uuid references my_table(id)
);

To populate it, rather than taking all where is_active, grab a distinct on, in case there are inactive categories:

insert into category
select distinct on(category_id)category_id
     , id
from my_table
order by category_id
        ,is_active desc
        ,created_at desc
        ,updated_at desc;

alter table public.my_table 
  drop column is_active cascade--drops the exclusion constraint too
 ,add foreign key(category_id)references category(id);

Then, inserting a new active row for a category:

with new_record as (
    insert into my_table (id, category_id, name) 
    values ( get_row_uuid(row_)
            ,get_category_uuid(category_)
            ,row_)
    returning id, category_id
)
insert into category
select category_id, id
from new_record
on conflict(id)
do update set active_my_table_id = excluded.active_my_table_id

From technical standpoint, expressing your business rule as an actual SQL-level constraint is optional. But if you define it, it automatically rejects any violation attempts, throwing a clear error message and logging the event, preventing the table from entering invalid state with multiple active records for you to track down and fix. Plus, in the config above, you can use it to run insert..on conflict..do update.


That being said, even if you have the separate category table, adding, maintaining and manipulating a 128-bit uuid fk in there will obviously take more space than the 8-bit boolean in my_table. If you wish to keep it that way, you can follow @Bergi's suggestion below:
demo at db<>fiddle

alter table my_table 
  add constraint one_active_per_category
  exclude (category_id with =)
  where (is_active)
  deferrable initially deferred;

It keeps the is_active untouched and guards the table against inserting more than a single active row per category.

Making the constraint deferrable initially deferred means it'll only be checked at the end of transaction so you can deactivate then activate or the other way around, as long as in the end you commit a state that respects the rule. Otherwise, you need to deactivate first, then activate to avoid a window of time when the old and the new entry for the category have both is_active=true.
If you do it that way, a regular unique is simpler and works faster:

create unique index on my_table(category_id)
  where (is_active);

Unique and exclusion constraints are backed by indexes, so setting up just an index instead of a constraint will have the same effect. You could normally tie the index to the table so that it becomes a proper constraint, but that's not supported for partial indices.

If multiple concurrent workers try the operation, all but one will hit a lock and get rejected, the exact point and error message depending on whether they try that atomically or in transactions with multiple steps, and if it's deferred or not. The advantage of marking what's active in the separate category table and doing that in one step is they'll instead get queued up and wait - first come, first served.

本文标签: