Postgres anti join

8/16/2023

SELECT username FROM sign_ins_big GROUP BY username

Here is the query plan for the EXCEPT version: EXPLAIN ANALYZE SELECT username FROM sign_ups_big Was there a difference with 2,700 sign-ins against 100,000 sign-ups? Yes! SELECT 'User ' || generate_series(1,900) as username SELECT 'User ' || generate_series(1,100000) as username, '' as date I didn’t think there was any performace difference between the two solutions.īut then I tried the queries on larger tables: Initially, I picked the example data for this post to be readable, so it was easy to reason about missing rows. You can see the query plan by running EXPLAIN ANALYZE. However, if you have two different queries and you really want to know whether the query engine will treat them differently, in Postgres you can just ask. Most of the time this is a fine arrangement. SQL is considered a “declarative” programming language, which basically means you tell the query engine what you want and you don’t care how it gets it for you. When we go back and add WHERE si.username IS NULL to our LEFT OUTER JOIN, we are explictly asking for just the null rows. We have 5 users who have signed up, but only 3 who have signed in, meaning 3 rows from the sign-ins table will have username values and 2 rows from the sign-ins table, which show up here in the sign_in_username column, will have null values. You can see more clearly where the missing data is now: Running the previous query without the filter, just a regular LEFT OUTER JOIN: Our final result is the same, but it will be easier to visualize what is going on if we project the full set of values from both tables, before the special WHERE filter is applied. Here we’re saying WHERE si.username IS NULL which means we are asking for anything in the sign-ups table that does not have a matching row in the sign-ins table. SELECT username FROM sign_ins GROUP BY username We use the LEFT OUTER JOIN on our sign-ups table – which means we want to potentially include all rows from the sign-ups table, but then we do something tricky with the WHERE clause: The second way to get the missing data combines a LEFT OUTER JOIN and a WHERE clause into something called an “Anti Join.” In this case, the EXCEPT query is pretty readable, and with our sample data, pretty fast.īut there is an alternate way to solve this problem that will be powerful to know and often more performant. There is a way to write this query that almost looks exactly how you’d phrase it in plain English, “Give me everything from table 1, except what’s in table 2”: SELECT username FROM sign_ups We know our result should be that users 4 and 5 never signed in. SELECT username, date FROM sign_ins ORDER BY username, date SELECT 'User ' || generate_series(1,3) as username SELECT generate_series('', '', '1 day'::interval) as date Now we’ll do a series of sign-ins over the subsequent days, but only for users 1,2, and 3. I stole that string concatenation techique from this post which has some other cool data generation ideas in it too. SELECT 'User ' || generate_series(1,5) as username, '' as date

Postgres makes it easy to mock up some sample data so we can work through this use case. You might be interested in knowing which users have signed up, but never signed in.

Let’s say you have an application that tracks user sign-ups separately for user sign-ins. The other rows in the result are the same as the inner join.Whenever you have two sets of data and you need to find the entries that are in your first set, but not in your second set, use this pattern. In a left join, these rows are included in the result set with a NULL in the Quantity column. In our example database, there are two products - oranges and tomatoes - on the 'left' ( Prices table) that do not have a corresponding entry on the 'right' (Quantities table). Sometimes nulls will be produced in this process as some data is shared while other data is not.Ī left outer join will return all the data in Table 1 and all the shared data (so, the inner part of the Venn diagram example), but only corresponding data from Table 2, which is the right join. An outer join returns a set of records (or rows) that include what an inner join would return but also includes other rows for which no corresponding match is found in the other table.Įach of these outer joins refers to the part of the data that is being compared, combined, and returned.

0 Comments

Postgres anti join

Leave a Reply.

Author

Archives

Categories