Detecting duplicate records is a common maintenance task in Phoenix applications. In SQL, this is often solved with a grouped subquery joined back to the base table. You can express the same idea cleanly and safely using Ecto.
Assume a table called tags with the fields id, name, and a generic scoping column scope_id.
The problem in SQL terms
Conceptually, the approach is:
- Group records by
nameandscope_id - Keep only groups with more than one row
- Join those groups back to the original table to list all duplicates
Translating this to Ecto
The key is to model the grouped subquery first, then join it in a second query.
import Ecto.Query
duplicates_query =
from t in "tags",
where: not is_nil(t.scope_id),
group_by: [t.name, t.scope_id],
having: count(t.id) > 1,
select: %{
name: t.name,
scope_id: t.scope_id
}
This subquery identifies (name, scope_id) combinations that occur more than once.
Next, join it back to the tags table to fetch the full records:
query =
from t in "tags",
join: d in subquery(duplicates_query),
on:
t.name == d.name and
t.scope_id == d.scope_id,
order_by: [t.scope_id, t.name, t.id],
select: %{
id: t.id,
name: t.name,
scope_id: t.scope_id
}
Running this query will return all rows that are part of a duplicate group, ordered in a predictable way.
Using schemas instead of table names
If you have a schema module (recommended), the same query becomes more expressive:
duplicates_query =
from t in Tag,
where: not is_nil(t.scope_id),
group_by: [t.name, t.scope_id],
having: count(t.id) > 1,
select: %{
name: t.name,
scope_id: t.scope_id
}
query =
from t in Tag,
join: d in subquery(duplicates_query),
on:
t.name == d.name and
t.scope_id == d.scope_id,
order_by: [t.scope_id, t.name, t.id]
You can then execute it with:
Repo.all(query)
This pattern is useful not only for reporting duplicates, but also as a foundation for cleanup tasks, background jobs, or admin tooling in Phoenix applications.
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.