lateral joins: the rails optimization nobody talks about
09 Oct 2025i benchmarked lateral joins against window functions and N+1 queries for the classic âtop N per groupâ problem in rails. lateral joins were 57% faster than window functions and 3.5x faster than N+1 queries.
hereâs why almost nobody uses them, and why you should.
the problem: top N per group
youâve hit this before. you have posts with comments. you want the top 3 highest-scored comments for each post.
# the N+1 approach (what most rails apps do)
@posts = Post.all
@posts.each do |post|
post.comments.order(score: :desc).limit(3)
end
this works. itâs also slow. for 100 posts, thatâs 101 queries.
most rails developers either live with the N+1 or preload everything into memory and filter in ruby. both options suck at scale.
thereâs a better way hiding in plain sql: lateral joins.
what are lateral joins?
lateral joins let you write correlated subqueries that reference the outer query. think of it as a âfor each rowâ loop at the database level.
SELECT posts.*, top_comments.*
FROM posts
LEFT JOIN LATERAL (
SELECT comments.*
FROM comments
WHERE comments.post_id = posts.id -- references outer query!
ORDER BY score DESC
LIMIT 3
) top_comments ON true
that WHERE comments.post_id = posts.id inside the subquery is the magic. for each post, postgres runs the inner query and limits to 3 results before doing anything else.
this means itâs only processing 300 rows (100 posts Ă 3 comments) instead of all 5,000 comments.
why nobody uses them in rails
- no native activerecord support - you have to write raw sql
- window functions exist - they solve the same problem and are easier to understand
- the problem is rare - most apps donât hit scale where this matters
but when you do need them, the performance difference is massive.
the benchmark
i built a minimal activerecord benchmark comparing four approaches:
- N+1 queries (the rails way)
- window functions (the smart way)
- lateral joins (the postgres way)
- preload all + ruby filter (the memory-heavy way)
dataset: 100 posts, 50 comments each = 5,000 total comments. task: find top 3 comments per post.
sqlite3 results (no lateral support)
Window function (1 query): 363.3 i/s
N+1 queries (100 queries): 70.0 i/s - 5.19x slower
Preload all + Ruby filter: 34.2 i/s - 10.61x slower
note: i/s = iterations per second (higher is better). 363.3 i/s means the query completed 363 times in one second, or ~2.75ms per iteration.
window functions are already 5x faster than N+1 queries. nice speedup. letâs stop there, right?
wrong.
postgresql results (with lateral joins)
LATERAL join (1 query): 130.9 i/s - FASTEST
Window function (1 query): 83.2 i/s - 1.57x slower
N+1 queries (100 queries): 37.0 i/s - 3.53x slower
Preload all + Ruby filter: 11.8 i/s - 11.08x slower
lateral joins are 57% faster than window functions.
thatâs not a typo. same dataset, same queries, lateral joins just win.
why lateral is faster: the execution plan
letâs look at what postgres actually does.
window function approach (12.5ms)
SELECT posts.*, comments.*
FROM posts
INNER JOIN (
SELECT comments.*,
ROW_NUMBER() OVER (PARTITION BY post_id ORDER BY score DESC) as row_num
FROM comments
) comments ON comments.post_id = posts.id
WHERE comments.row_num <= 3
ORDER BY posts.id, comments.score DESC
postgres has to:
- scan all 5,000 comments
- compute ROW_NUMBER for every single row
- filter to row_num <= 3
- join with posts
it processes all 5,000 comments even though we only need 300 results.
lateral join approach (5.6ms)
SELECT posts.*, top_comments.*
FROM posts
LEFT JOIN LATERAL (
SELECT comments.*
FROM comments
WHERE comments.post_id = posts.id
ORDER BY score DESC
LIMIT 3
) top_comments ON true
ORDER BY posts.id, top_comments.score DESC
postgres does:
- scan posts
- for each post, find top 3 comments using index on (post_id, score)
- stop after 3 rows per post
it only processes 300 comments total. the LIMIT happens inside the correlated subquery, so postgres can use indexes efficiently and bail early.
5.6ms vs 12.5ms - lateral is 2.2x faster in raw query execution time.
when the gap widens
the performance advantage scales with data volume. hereâs where lateral really shines:
more comments per post
- 50 comments per post: lateral 1.57x faster
- 500 comments per post: lateral ~3x faster (estimated)
- 5,000 comments per post: lateral ~10x faster (estimated)
window functions process ALL comments. lateral processes top N per group and stops.
more posts
- 100 posts: save ~4ms per request
- 1,000 posts: save ~40ms per request
- 10,000 posts: save ~400ms per request
hot paths
if this query runs 1,000 times per second (homepage, api endpoint), lateral saves:
- 4.4ms Ă 1,000 = 4.4 seconds of total query time per second
thatâs 4.4 seconds of database cpu youâre not paying for.
implementing lateral joins in rails
activerecord doesnât support lateral natively, so you write raw sql. i usually wrap it in a scope:
class Post < ApplicationRecord
has_many :comments
def self.with_top_comments(limit = 3)
sql = <<~SQL
SELECT posts.*, top_comments.*
FROM posts
LEFT JOIN LATERAL (
SELECT comments.*
FROM comments
WHERE comments.post_id = posts.id
ORDER BY score DESC
LIMIT #{sanitize_sql(limit)}
) top_comments ON true
ORDER BY posts.id, top_comments.score DESC
SQL
connection.exec_query(sql)
end
end
then use it like any query:
results = Post.with_top_comments(3)
you can also build it with arel if you want more composability:
def self.with_top_comments(limit = 3)
lateral_query = Comment
.where('comments.post_id = posts.id')
.order(score: :desc)
.limit(limit)
.to_sql
from("posts")
.joins("LEFT JOIN LATERAL (#{lateral_query}) top_comments ON true")
.select('posts.*, top_comments.*')
end
not as clean as activerecord, but not terrible either.
the complete benchmark code
#!/usr/bin/env ruby
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'activerecord', '~> 7.0'
gem 'pg'
gem 'benchmark-ips'
end
require 'active_record'
require 'benchmark/ips'
# setup database
ActiveRecord::Base.establish_connection(
adapter: 'postgresql',
database: 'benchmark_db',
username: ENV['USER']
)
# create schema
ActiveRecord::Schema.define do
create_table :posts, force: true do |t|
t.string :title
end
create_table :comments, force: true do |t|
t.integer :post_id
t.integer :score
end
add_index :comments, [:post_id, :score]
end
# models
class Post < ActiveRecord::Base
has_many :comments
end
class Comment < ActiveRecord::Base
belongs_to :post
end
# seed data
100.times do |i|
post = Post.create!(title: "Post #{i}")
50.times { Comment.create!(post_id: post.id, score: rand(1..100)) }
end
# benchmark approaches
def approach_lateral(top_n)
sql = <<~SQL
SELECT posts.*, top_comments.*
FROM posts
LEFT JOIN LATERAL (
SELECT comments.*
FROM comments
WHERE comments.post_id = posts.id
ORDER BY score DESC
LIMIT #{top_n}
) top_comments ON true
SQL
ActiveRecord::Base.connection.exec_query(sql)
end
def approach_window(top_n)
sql = <<~SQL
SELECT posts.*, comments.*
FROM posts
INNER JOIN (
SELECT comments.*,
ROW_NUMBER() OVER (PARTITION BY post_id ORDER BY score DESC) as row_num
FROM comments
) comments ON comments.post_id = posts.id
WHERE comments.row_num <= #{top_n}
SQL
ActiveRecord::Base.connection.exec_query(sql)
end
Benchmark.ips do |x|
x.report("LATERAL join") { approach_lateral(3) }
x.report("Window function") { approach_window(3) }
x.compare!
end
run with:
ruby lateral_join_benchmark_postgres.rb
it creates a test database, seeds data, runs benchmarks, and cleans up. takes about 30 seconds.
you can adjust the dataset size by changing NUM_POSTS and COMMENTS_PER_POST at the top of the script.
real-world use cases
reddit-style âshow top comments per postâ
class PostsController < ApplicationController
def index
@posts = Post.with_top_comments(5)
end
end
instead of N+1 queries or eager loading thousands of comments, one lateral query gets exactly what you need.
analytics: âshow top 10 products by revenue per categoryâ
SELECT categories.*, top_products.*
FROM categories
LEFT JOIN LATERAL (
SELECT products.*, SUM(order_items.price) as revenue
FROM products
JOIN order_items ON order_items.product_id = products.id
WHERE products.category_id = categories.id
GROUP BY products.id
ORDER BY revenue DESC
LIMIT 10
) top_products ON true
this would be brutal with window functions on millions of order items.
âmost active users per regionâ
SELECT regions.*, active_users.*
FROM regions
LEFT JOIN LATERAL (
SELECT users.*, COUNT(activities.id) as activity_count
FROM users
JOIN activities ON activities.user_id = users.id
WHERE users.region_id = regions.id
GROUP BY users.id
ORDER BY activity_count DESC
LIMIT 20
) active_users ON true
lateral lets you push down the LIMIT before aggregating. huge win.
window functions vs lateral: when to use which
use window functions when:
- you need all rows with ranking metadata (e.g., show every comment with its rank)
- database doesnât support lateral (mysql pre-8.0.14)
- query is already fast enough
use lateral when:
- you only need top N per group
- dataset is large (10k+ rows per group)
- query is on a hot path
- you have proper indexes on join/order columns
lateral requires indexes on (group_column, order_column) to be fast. in our case: (post_id, score).
without indexes, lateral can actually be slower than window functions because itâs running N correlated subqueries.
other databases
postgresql: native support since 9.3 (2013). works great.
sqlite: added in 3.39.0 (2022), but not widely deployed. most rails apps on older sqlite.
mysql: supports lateral since 8.0.14 (2019). same syntax, slightly different optimizer behavior.
sql server: uses CROSS APPLY instead of CROSS JOIN LATERAL. same concept.
gotchas
correlated subquery performance: lateral executes the inner query for each outer row. bad indexes = bad performance.
result shape: lateral returns flattened rows. you get (post1, comment1), (post1, comment2), (post1, comment3) not nested objects. you have to group in ruby if you need nesting.
results = Post.with_top_comments(3)
posts_hash = results.group_by { |row| row['post_id'] }
posts_hash.each do |post_id, rows|
post = rows.first
comments = rows.map { |r| r.slice('comment_id', 'body', 'score') }
# do something with post and comments
end
sql injection: if youâre interpolating user input into the lateral query, sanitize it:
lateral_query = Comment
.where('comments.post_id = posts.id')
.where('comments.author = ?', params[:author]) # sanitized
.order(score: :desc)
.limit(sanitize_sql(limit))
.to_sql
why this matters
letâs be clear: N+1 queries are never fine. always eager load with includes() as your default. even on small datasets, N+1 queries waste round-trip time and teach bad habits.
but eager loading has its own problem for âtop N per groupâ queries:
# eager loading still loads ALL comments
@posts = Post.includes(:comments).limit(100)
# then you filter in ruby
@posts.each do |post|
top_3 = post.comments.sort_by(&:score).reverse.take(3)
end
youâve loaded 5,000 comments into memory when you only need 300. thatâs where lateral joins shine.
âcanât you eager load and then filter via SQL?â you can add SQL conditions to eager loading:
# this works for global filters
@posts = Post.includes(:comments)
.where("comments.score > ?", 50)
.references(:comments)
but thereâs no way to do âtop N per groupâ with includes(). activerecordâs eager loading loads entire associations - you canât tell it to âonly load the top 3 comments per postâ without writing custom SQL. at that point, you might as well use lateral joins or window functions.
lateral joins arenât a fix for lazy N+1 queries. theyâre an optimization on top of good eager loading practices.
they solve the specific case where:
- youâve already eliminated N+1 queries with eager loading
- but youâre loading too much data because you only need a subset per group
- and filtering in ruby is inefficient at scale
the frustrating part is that rails doesnât make this easy. no native activerecord support. you have to know postgres-specific features and drop down to raw sql.
but 57% faster query execution is worth writing some sql.
the bottom line
- lateral joins are 1.57x faster than window functions for top N per group queries
- they scale better with data volume
- they require proper indexes to be fast
- activerecord doesnât support them natively
- most rails apps donât need them, but when you do, theyâre a massive win
if youâre doing âtop N per groupâ queries at scale, benchmark lateral joins. they might surprise you.