03 Oct 2025
i built a rails dashboard to analyze millions of records. first pass: painfully slow. adding materialized views with the scenic gem: absurdly fast.
here’s what i learned benchmarking against 100k users, 1M orders, and 5M user activities.
the problem
dashboard queries were hitting multiple tables with joins and aggregations. every page load meant scanning millions of rows, grouping, sorting… you know the drill.
daily sales query: 7.1 seconds per request. user engagement: same pain. this is fine for batch reports but unusable for a dashboard people actually look at.
materialized views in 30 seconds
regular views are just saved queries. postgres re-runs them every time.
materialized views are snapshots. postgres runs the query once, stores the results as a real table. subsequent reads? just SELECT from that cached table.
trade-off: data gets stale until you refresh. for dashboards where 5-60 minute staleness is fine, this works great.
scenic gem setup
scenic wraps postgres materialized views in rails migrations. feels native.
rails generate scenic:view daily_sales
creates two files:
- migration file to create the view
- sql file for your query
here’s the daily sales view:
SELECT
DATE(orders.order_date) AS sale_date,
COUNT(DISTINCT orders.id) AS total_orders,
COUNT(DISTINCT orders.user_id) AS unique_customers,
SUM(orders.total_amount) AS total_revenue,
AVG(orders.total_amount) AS average_order_value,
SUM(CASE WHEN orders.status = 'completed' THEN 1 ELSE 0 END) AS completed_orders,
SUM(CASE WHEN orders.status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled_orders
FROM orders
GROUP BY DATE(orders.order_date)
ORDER BY sale_date DESC
normal complex aggregation. but instead of running this every time, we materialize it:
class CreateDailySales < ActiveRecord::Migration[8.0]
def change
create_view :daily_sales, materialized: true
add_index :daily_sales, :sale_date, unique: true
end
end
now you can query it like any rails model:
class DailySale < ApplicationRecord
def readonly?
true
end
def self.refresh
Scenic.database.refresh_materialized_view(table_name, concurrently: false, cascade: false)
end
end
# in your controller
@daily_sales = DailySale.order(sale_date: :desc).limit(30)
postgres reads from a pre-computed table instead of scanning orders every time.
the benchmarks
i built 4 materialized views:
- daily_sales - revenue metrics by day
- top_products - product performance
- user_engagements - customer lifetime value
- category_revenues - category breakdowns
then benchmarked raw queries vs materialized views using benchmark-ips.
results
daily sales summary
- raw query: 6.25 iterations/sec (160ms per query)
- materialized view: 2,191 iterations/sec (456 microseconds per query)
- 350x faster
top products by revenue
- raw query: 0.69 iterations/sec (1.44 seconds per query)
- materialized view: 438 iterations/sec (2.28ms per query)
- 633x faster
user engagement metrics
- raw query: 0.14 iterations/sec (7.12 seconds per query)
- materialized view: 135 iterations/sec (7.39ms per query)
- 963x faster
category revenue analysis
- raw query: 0.29 iterations/sec (3.41 seconds per query)
- materialized view: 2,715 iterations/sec (368 microseconds per query)
- 9,252x faster
the user engagement query went from 7 seconds to 7 milliseconds. category revenue from 3.4 seconds to 368 microseconds.
how the queries work
let’s look at the user engagement view since it had the biggest pain:
SELECT
users.id AS user_id,
users.email,
users.name,
COUNT(DISTINCT orders.id) AS total_orders,
SUM(orders.total_amount) AS lifetime_value,
AVG(orders.total_amount) AS avg_order_value,
COUNT(DISTINCT user_activities.id) AS total_activities,
COUNT(DISTINCT CASE WHEN user_activities.activity_type = 'page_view' THEN user_activities.id END) AS page_views,
MAX(orders.order_date) AS last_order_date,
MAX(user_activities.occurred_at) AS last_activity_date,
DATE_PART('day', NOW() - MAX(user_activities.occurred_at)) AS days_since_last_activity
FROM users
LEFT JOIN orders ON users.id = orders.user_id
LEFT JOIN user_activities ON users.id = user_activities.user_id
GROUP BY users.id, users.email, users.name
ORDER BY lifetime_value DESC NULLS LAST
two left joins across 100k users, 1M orders, and 5M activities. grouping, aggregating, sorting. every single time someone loads the dashboard.
materialized it? 100k rows pre-computed. SELECT with a simple ORDER BY and LIMIT.
the indexes matter too:
add_index :user_engagements, :user_id, unique: true
postgres can use the index for lookups. filtering by high-value customers? instant.
why materialized views are faster: database internals
ran EXPLAIN ANALYZE on both approaches to see what postgres is actually doing. the difference is wild.
raw query execution (7.1 seconds)
Limit (cost=1666565.79..1666566.04 rows=100)
Buffers: shared hit=383450 read=135233 written=1559
-> Sort (top-N heapsort)
-> GroupAggregate (rows=100000)
-> Merge Left Join (rows=50455739) ← 50 MILLION intermediate rows
-> Gather Merge (parallel workers: 2)
-> Incremental Sort
-> Merge Left Join (users + orders)
-> Materialize (user_activities, 5M rows)
what’s happening:
- joins 100k users + 1M orders + 5M activities
- creates 50 million intermediate rows
- groups all 100k users
- sorts by lifetime value
- reads 135,233 disk blocks from storage
- takes top 100
the query is scanning millions of rows, doing complex joins, aggregating, then sorting. postgres is working hard.
materialized view execution (7.4ms)
Limit (cost=0.29..8.87 rows=100)
Buffers: shared hit=103
-> Index Scan using index_user_engagements_on_user_id
Order By: lifetime_value DESC
what’s happening:
- uses index to read rows sorted by lifetime_value
- reads 103 blocks (all from cache)
- stops after 100 rows
no joins. no aggregation. no sorting. just reading pre-computed results.
buffer analysis: cache hits matter
postgres tracks how often data is read from RAM (cache hits) vs disk:
base tables getting hammered by raw queries:
order_items: 5.2M disk reads, 76% cache hit ❌
user_activities: 1.3M disk reads, 91% cache hit ❌
orders: 817K disk reads, 95% cache hit ⚠️
materialized views:
daily_sales: 37 disk reads, 99.88% cache hit ✅
user_engagements: 9,612 disk reads, 99.71% cache hit ✅
top_products: 1,168 disk reads, 99.87% cache hit ✅
disk reads are ~1000x slower than RAM. materialized views stay in cache because they’re small and accessed frequently.
query cost comparison
postgres estimates query cost before execution:
| query |
raw cost |
view cost |
ratio |
| daily sales |
101,503 |
0.96 |
105,628x |
| user engagement |
763,318 |
2.86 |
266,860x |
| top products |
101,996 |
2.54 |
40,156x |
these aren’t execution times, they’re cost units. includes disk I/O, CPU operations, memory usage. lower is better.
raw query for user engagement costs 763,318 units. materialized view: 2.86 units.
the memory problem: external sorts
raw daily sales query execution plan shows this:
Sort Method: external merge Disk: 14208kB
Worker 0: Disk: 12200kB
Worker 1: Disk: 13736kB
sorting 1M rows doesn’t fit in work_mem, so postgres spills to disk. writes ~40MB of temporary files across 3 parallel workers.
disk I/O during sorting kills performance.
materialized views? no sorting needed. data is already sorted via indexes.
sequential scans vs index scans
checked how often postgres uses indexes vs scanning entire tables:
base tables:
orders: 5.5M index scans (99.99% index usage) ✅
users: 6.2M index scans (100% index usage) ✅
products: 7.0M index scans (100% index usage) ✅
every raw query hits these tables with index lookups. millions of operations putting load on the database.
materialized views:
category_revenues: 38,642 sequential scans (0 index scans) ✅
top_products: 5,988 sequential scans (0 index scans) ✅
daily_sales: 5 seq scans, 29,578 index scans ✅
materialized views are small. sequential scans are actually faster than indexes for small tables (no index overhead).
real I/O impact
ran rails sql:analysis to get detailed buffer statistics:
raw user engagement query:
- 135,233 disk blocks read
- 383,450 cache blocks read
- 1,559 blocks written (temp data)
- 9.26 seconds execution
materialized view:
- 0 disk blocks read
- 103 cache blocks read
- 0 blocks written
- 0.0074 seconds execution
the raw query is doing 1300x more I/O. that’s why it’s slow.
added comprehensive SQL analysis tools to the repo:
# full analysis report
rails sql:analysis
# shows: execution plans, buffer usage, cache hit ratios,
# index usage, query costs, table statistics
# analyze specific query
rails sql:analyze_query QUERY='SELECT * FROM orders WHERE status = "completed"'
# compare raw vs materialized views
rails benchmark:compare
the EXPLAIN ANALYZE output shows exactly what postgres is doing: parallel workers, sort methods, join types, buffer usage, actual row counts.
check out PERFORMANCE_ANALYSIS.md in the repo for the complete breakdown with execution plans and statistics.
refreshing the views
views get stale. you need to refresh them.
i use a background job with solid queue:
class RefreshMaterializedViewsJob < ApplicationJob
queue_as :default
def perform
DailySale.refresh
TopProduct.refresh
UserEngagement.refresh
CategoryRevenue.refresh
end
end
scheduled hourly in production:
# config/recurring.yml
production:
refresh_materialized_views:
class: RefreshMaterializedViewsJob
queue: default
schedule: every hour
refreshing all 4 views takes about 27 seconds with my dataset. once an hour is negligible overhead for 350-9000x query speedups.
for larger views or high-traffic sites, use CONCURRENTLY:
def self.refresh
Scenic.database.refresh_materialized_view(table_name, concurrently: true)
end
requires unique indexes but lets you refresh without locking the view. users can keep querying during refresh.
when this makes sense
materialized views work when:
- you have complex aggregations that run often
- data staleness of 5-60 minutes is acceptable
- reads massively outnumber writes
- the underlying query is expensive (>500ms)
don’t use them for:
- real-time data requirements
- simple queries already fast with indexes
- write-heavy tables that change constantly
my dashboard checks all the boxes. analytics data where hour-old numbers are fine. users hitting the same queries hundreds of times per day.
the full setup
i open sourced the complete case study. includes:
- production-ready schema (users, products, orders, activities)
- 4 materialized views with sql
- seed script that generates millions of records
- benchmark rake tasks
- dashboard ui
- automated refresh jobs
you can clone it and run benchmarks yourself:
git clone https://github.com/sngeth/scenic-materialized-views-demo
cd scenic-materialized-views-demo
bundle install
rails db:create db:migrate
rails db:seed
rails benchmark:refresh
rails benchmark:compare
customize data volume with env vars:
USERS_COUNT=50000 PRODUCTS_COUNT=5000 rails db:seed
some specifics on scenic
scenic handles view versioning like migrations. updating a view:
rails generate scenic:view daily_sales --version 2
creates daily_sales_v02.sql. modify the query, run migrations, scenic handles the swap.
you can also drop down to raw sql when needed:
ActiveRecord::Base.connection.execute("REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales")
scenic mostly stays out of your way. it’s a thin wrapper that makes postgres materialized views feel like rails.
track how long refreshes take:
def perform
Rails.logger.info "Starting materialized views refresh..."
start_time = Time.now
DailySale.refresh
Rails.logger.info " ✓ DailySale refreshed"
# ... other views
elapsed_time = Time.now - start_time
Rails.logger.info "Completed in #{elapsed_time.round(2)}s"
end
watch for degradation as data grows. if refreshes start taking too long, consider:
- refreshing views separately with different schedules
- using incremental refresh patterns
- partitioning underlying tables
practical example: the dashboard controller
here’s how simple the controller gets:
class DashboardController < ApplicationController
def index
@daily_sales = DailySale.order(sale_date: :desc).limit(30)
@top_products = TopProduct.order(total_revenue: :desc).limit(10)
@category_revenues = CategoryRevenue.order(total_revenue: :desc)
@top_users = UserEngagement.order(lifetime_value: :desc).limit(10)
end
end
four simple queries. no joins, no aggregations, no complexity. just reading pre-computed data.
response time? 50-100ms total including rendering. used to be 10+ seconds with raw queries.
the views handle all the heavy lifting in the background refresh job.
cost analysis
refreshing 4 views takes 27 seconds every hour = 648 seconds per day.
without materialized views, if the dashboard gets hit 1000 times per day (conservative):
- 1000 requests × 4 queries × 3 seconds average = 12,000 seconds of query time
- plus database load, connection pool pressure, etc.
the math checks out. background refresh overhead is tiny compared to saved query time.
edge cases
partial data during refresh: use CONCURRENTLY to avoid downtime, but it requires unique indexes and takes longer.
view dependencies: if views reference other views, refresh order matters. scenic handles this with cascade options.
schema changes: changing underlying tables requires updating and versioning the views. scenic makes this manageable with version files.
storage: materialized views duplicate data. monitor disk usage. my 4 views add maybe 50mb on top of 2gb of base tables. negligible.
wrapping up
350x to 9000x faster queries. 27 seconds of refresh time per hour. hour-old data that’s perfectly acceptable for analytics.
materialized views aren’t magic. they’re cached query results. but for dashboards on millions of rows, they transform unusable into instant.
the scenic gem makes them feel native to rails. write sql, run migrations, query like models.
check out the full repo if you want to try it. includes all the benchmarks, views, and a working dashboard you can load with test data.
22 Sep 2025
another year down, and i’m trying to remember what i actually built this year. honestly? it’s all a blur.
you know the feeling. you’ve been shipping code consistently, fixing bugs, building features, but when someone asks “what did you accomplish this year?” your brain just goes blank. was that auth refactor in march or july? did i ship the analytics dashboard before or after the mobile redesign?
the problem with developer memory
we’re constantly context switching. one day you’re debugging a race condition in the payment flow, the next you’re building a new onboarding experience, then suddenly you’re optimizing database queries because the dashboard is slow. each task feels important in the moment, but they all blend together over months.
whether it’s performance reviews, job interviews, or just internal reflection, people want concrete examples of your impact. “tell us about a complex technical challenge you solved” or “describe how you improved system performance.” but when everything feels like just another tuesday, it’s hard to remember which wins were actually significant.
your git history is your accomplishment log
every commit you make is a timestamp of progress. your git history contains:
- exact dates of when you shipped features
- the complexity and scope of changes
- how many bugs you fixed vs features you built
- patterns in your work (are you always fixing the same types of issues?)
- collaboration evidence (co-authored commits, code reviews)
the trick is turning that raw commit data into a coherent story of growth and impact.
the magic command
here’s what i fed into an LLM to generate my yearly summary:
# get a year's worth of commits with stats
git log --author="[email protected]" \
--since="2024-01-01" \
--until="2024-12-31" \
--pretty=format:"%h|%ad|%s" \
--date=short \
--all | head -50
then prompt your favorite LLM with:
“analyze these git commits and create a technical accomplishments summary. group by major themes like features, bug fixes, performance improvements, and security. highlight the business impact and technical complexity. include specific metrics where possible.”
automating with a script
i’ve been using this technique for weekly standups too. here’s a script that automates the whole process:
#!/bin/bash
# git-standup.sh - Generate AI-powered standup reports from git commits
set -e
DAYS=${1:-7} # Default to last 7 days
AUTHOR=${2:-$(git config user.email)}
ENV_FILE=${3:-~/.env}
# Source environment variables
source "$ENV_FILE"
# Get git commits
COMMITS=$(git log --author="$AUTHOR" \
--since="$DAYS days ago" \
--pretty=format:"%h|%ad|%s" \
--date=short \
--all \
--no-merges | head -50)
# Prepare the prompt
PROMPT="Analyze these git commits and create a concise standup update. Focus on:
- What was accomplished (group similar work)
- Any blockers or challenges implied by the commits
- Key technical wins or improvements
- Format as: Completed, In Progress, Blockers, Notes
Commits (format: hash|date|message):
$COMMITS"
# Use OpenAI API
if [[ -n "$OPENAI_API_KEY" ]]; then
ESCAPED_PROMPT=$(echo "$PROMPT" | jq -Rs .)
curl -s https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d "{
\"model\": \"gpt-4o-mini\",
\"messages\": [{
\"role\": \"user\",
\"content\": $ESCAPED_PROMPT
}],
\"max_tokens\": 1000
}" | jq -r '.choices[0].message.content'
fi
just add your OPENAI_API_KEY to ~/.env and run:
./git-standup.sh # last 7 days
./git-standup.sh 3 # last 3 days
./git-standup.sh 14 [email protected] # custom timeframe/author
what the analysis revealed
looking at my own year through this lens was… honestly pretty shocking. here’s what git actually tracked:
major feature developments:
- webinar platform (2025)
- enhanced VCF platform to support webinars including:
- post-registration functionality and user workflows
- custom marketing capabilities for webinar events
- early start/late end time configuration system
- private webinar filtering for staff interfaces
- technical impact: enabled virtual employment workshops, expanding platform capabilities beyond traditional job fairs
- virtual career fair (VCF) enhancements
- developed VCF featured jobs system - complete job highlighting and promotion feature
- built pre-event search functionality - allowing candidates to discover opportunities before events
- enhanced chat welcome message formatting with WYSIWYG input
- improved message template positioning and dropdown functionality
- created mobile-responsive interfaces for exhibitor lists and candidate interactions
- event management & analytics
- built sold-out event handling system with automatic waitlist functionality
- created comprehensive messaging analytics with campaign details and performance tracking
- enhanced control center real-time statistics display
security & performance contributions:
security hardening
- strengthened password requirements and implemented secure reset flows
- prevented user enumeration attacks in authentication systems
- replaced insecure staff password generation with secure reset links
- added CSRF protection and input sanitization improvements
performance optimization
- resolved N+1 query issues in candidate searches and exhibitor displays
- optimized database queries and added proper indexing
- implemented efficient search filtering with elasticsearch integration
- added query optimization for large dataset operations
technical problem solving:
mobile & responsive design
- fixed critical mobile responsiveness issues across VCF interfaces
- resolved viewport and layout problems for exhibitor schedules
- implemented x-teleport solutions for dropdown menu clipping issues
- enhanced mobile chat functionality with proper input visibility
data export & reporting
- built comprehensive CSV export systems for:
- staff organization users
- client job postings with missing columns
- candidate applications with enhanced filtering
- event folder candidate downloads with rep attendance data
UI/UX improvements
- implemented advanced filtering systems with sidebar interfaces
- created sortable lists for completed one-on-one meetings
business impact contributions:
event operations
- enhanced attendee tracking with view confirmation systems
- implemented booth management with presence and broadcasting fixes
- created candidate folder filtering for improved exhibitor experience
- built time zone handling for multi-region events
client tools
- developed job deletion workflows with automatic credit refunds
- enhanced job application search with advanced filtering options
- created messaging template systems with positioning improvements
- implemented draft job management capabilities
quality assurance & testing
- fixed flaky test issues with proper mocking and stubbing
- implemented integration specs for complex workflows
- added test helpers for consistent testing patterns
recent high-impact work (2024-2025):
- september 2025: sold-out event handling and waitlist system
- august 2025: VCF enhancements and messaging analytics
- july 2025: staff candidate tracking and view confirmation
- june 2025: security hardening and password requirements
technical skills demonstrated:
- full-stack ruby on rails development
- javascript/stimulus frontend frameworks
- elasticsearch implementation and optimization
- database design and query optimization
- real-time features with turbo streams
- mobile-responsive design patterns
- security best practices implementation
beyond the basic stats
the real value isn’t just counting commits. it’s seeing patterns:
what types of problems do you gravitate toward? my commits showed i spend a lot of time on integration challenges, mobile responsiveness, and real-time features.
when are you most productive? my commit timestamps revealed patterns i never noticed. heavy feature work in morning sprints, bug fixes and optimization in the afternoon.
what’s your technical growth path? the progression from simple bug fixes early in the year to building complete subsystems later shows clear skill development. commits touching multiple systems prove comfort with complex, cross-cutting changes.
staying on top of it
keep a running note of the big wins as they happen. git gives you the data, but you need to capture the context: why was this hard? what would’ve happened if you didn’t fix it? how many users did this impact?
your commits are proof you’ve been busy. turning them into a story of impact? that’s the difference between “i wrote code” and “i moved the business forward.”
14 Sep 2025
so cloudflare had this massive outage recently. their tenant service api went down, taking the dashboard and a bunch of other apis with it. the root cause? a react useEffect dependency array bug that made their dashboard hammer the api with unnecessary requests.
here’s what went wrong…
the setup
they had a react component that needed to fetch data from their tenant service api. pretty standard stuff - throw it in a useEffect, call it a day:
useEffect(() => {
fetchTenantData(config);
}, [config]);
looks fine, right? except config was an object that got recreated on every render.
why objects break dependency arrays
react’s dependency array uses Object.is() to check if dependencies changed (verified in react’s source - see packages/shared/objectIs.js). for primitives like strings and numbers, this works great:
Object.is('hello', 'hello') // true
Object.is(42, 42) // true
but for objects and arrays? different story:
Object.is({a: 1}, {a: 1}) // false!
Object.is([1, 2], [1, 2]) // false!
even if the contents are identical, they’re different object references. so when you do this:
function Dashboard() {
const config = { endpoint: '/api/tenant' }; // new object every render!
useEffect(() => {
fetchData(config);
}, [config]); // this runs every single render
}
that effect runs on every render. every state update. every prop change. everything.
the cascade failure
here’s where it gets interesting. the dashboard wasn’t just making one extra call - it was making dozens. why? because the api call itself was probably updating state:
- component renders → creates new config object
- useEffect sees “new” dependency → calls api
- api response updates state → triggers re-render
- go to step 1
add multiple components doing this, users refreshing the page, and a recent service update that made the tenant service less stable… boom. you’ve got an outage.
how to fix it
few options here:
option 1: useMemo
memoize the object so it keeps the same reference:
const config = useMemo(() => ({
endpoint: '/api/tenant'
}), []); // only create once
useEffect(() => {
fetchData(config);
}, [config]); // now this only runs once
option 2: primitive dependencies
instead of passing the whole object, use primitive values:
const endpoint = '/api/tenant';
useEffect(() => {
fetchData({ endpoint });
}, [endpoint]); // strings compare by value
option 3: move it outside
if the config never changes, define it outside the component:
const CONFIG = { endpoint: '/api/tenant' };
function Dashboard() {
useEffect(() => {
fetchData(CONFIG);
}, []); // no dependency needed
}
how eslint might have made it worse
here’s the ironic part: the exhaustive-deps rule might have actually caused this bug!
{
"rules": {
"react-hooks/exhaustive-deps": "error"
}
}
imagine you start with this:
function Dashboard() {
const config = { endpoint: '/api/tenant' };
useEffect(() => {
fetchData(config);
}, []); // eslint error: missing dependency 'config'
}
the linter complains that config is used but not in the deps array. so you “fix” it:
useEffect(() => {
fetchData(config);
}, [config]); // linter happy, performance dead
now your effect runs on every render because config is a new object each time. the linter pushed you into the bug!
the real fix is understanding why the warning exists and addressing the root cause (memoizing the object, using primitives, or moving it outside the component) rather than just making the linter happy.
13 Sep 2025
a complete guide to containerizing a rails application and deploying it to aws ecs fargate with proper alb health check configuration.
overview
this guide walks through deploying a rails 8 application to aws using:
- ecs fargate for serverless container orchestration
- application load balancer (alb) for traffic routing and health checks
- ecr for container image storage
- secrets manager for secure configuration management
- cloudwatch for logging
important security note: replace all placeholder values like [APP-NAME] and [ACCOUNT-ID] with your actual values. never commit these actual values to version control.
why ecs fargate over traditional deployment?
benefits of fargate:
- no server management - aws handles os patches, scaling, security
- pay-per-use pricing model
- built-in integration with alb and other aws services
- automatic scaling and load balancing
- perfect for microservices and containerized applications
vs. traditional ec2:
- no ssh access needed
- no ami management
- scales to zero for cost savings
- simpler operations and ci/cd
prerequisites
- aws cli configured with appropriate permissions
- docker installed locally
- rails application with health check endpoint
step 1: containerizing the rails application
1.1 create dockerfile
rails 8 generates an excellent production-ready dockerfile. key components:
# multi-stage build for smaller final image
ARG RUBY_VERSION=3.2.9
FROM ruby:$RUBY_VERSION-slim as base
# production environment configuration
ENV RAILS_ENV="production" \
BUNDLE_DEPLOYMENT="1" \
BUNDLE_PATH="/usr/local/bundle"
# thruster configuration for http proxy (recommended)
ENV TARGET_PORT=3000
ENV HTTP_PORT=80
EXPOSE 80
# use thruster to proxy port 80 → rails on port 3000
CMD ["./bin/thrust", "./bin/rails", "server", "-b", "0.0.0.0", "-p", "3000"]
1.2 health check endpoint
create a robust health check endpoint:
# app/controllers/health_controller.rb
class HealthController < ApplicationController
def check
render json: {
status: "ok",
timestamp: Time.current.iso8601,
rails_version: Rails.version,
environment: Rails.env
}, status: :ok
end
end
# config/routes.rb
Rails.application.routes.draw do
get "health/check"
# other routes...
end
1.3 docker entrypoint
simplify the entrypoint for containerized deployment:
#!/bin/bash -e
# bin/docker-entrypoint
# enable jemalloc for reduced memory usage
if [ -z "${LD_PRELOAD+x}" ]; then
LD_PRELOAD=$(find /usr/lib -name libjemalloc.so.2 -print -quit)
export LD_PRELOAD
fi
echo "starting rails server without database setup..."
exec "${@}"
add linux platforms to gemfile.lock for cross-platform builds:
bundle lock --add-platform x86_64-linux aarch64-linux
step 2: aws infrastructure setup
2.1 create ecr repository
security note: use unique repository names to avoid conflicts with existing resources.
aws ecr create-repository --repository-name [APP-NAME] --region us-east-1
2.2 build and push docker image
# build for production architecture
docker buildx build --platform linux/amd64 -t [APP-NAME]:latest .
# tag and push to ecr
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com
docker tag [APP-NAME]:latest [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:latest
docker push [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:latest
2.3 vpc and networking setup
security consideration: this creates a new vpc. if you have existing infrastructure, consider using existing vpcs and subnets instead.
# create vpc
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 --region us-east-1 --query Vpc.VpcId --output text)
# create public subnets in different azs
SUBNET1=$(aws ec2 create-subnet --vpc-id $VPC_ID --cidr-block 10.0.1.0/24 --availability-zone us-east-1a --query Subnet.SubnetId --output text)
SUBNET2=$(aws ec2 create-subnet --vpc-id $VPC_ID --cidr-block 10.0.2.0/24 --availability-zone us-east-1b --query Subnet.SubnetId --output text)
# internet gateway and routing
IGW_ID=$(aws ec2 create-internet-gateway --query InternetGateway.InternetGatewayId --output text)
aws ec2 attach-internet-gateway --vpc-id $VPC_ID --internet-gateway-id $IGW_ID
# route table configuration
RT_ID=$(aws ec2 create-route-table --vpc-id $VPC_ID --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id $RT_ID --destination-cidr-block 0.0.0.0/0 --gateway-id $IGW_ID
aws ec2 associate-route-table --subnet-id $SUBNET1 --route-table-id $RT_ID
aws ec2 associate-route-table --subnet-id $SUBNET2 --route-table-id $RT_ID
# enable auto-assign public ips
aws ec2 modify-subnet-attribute --subnet-id $SUBNET1 --map-public-ip-on-launch
aws ec2 modify-subnet-attribute --subnet-id $SUBNET2 --map-public-ip-on-launch
step 3: application load balancer configuration
3.1 security groups
security note: the alb security group allows traffic from the entire internet (0.0.0.0/0). this is appropriate for public web applications but consider restricting if needed.
# alb security group
ALB_SG=$(aws ec2 create-security-group \
--group-name [APP-NAME]-alb-sg \
--description "security group for alb" \
--vpc-id $VPC_ID \
--query GroupId --output text)
aws ec2 authorize-security-group-ingress \
--group-id $ALB_SG \
--protocol tcp --port 80 --cidr 0.0.0.0/0
# ecs security group
ECS_SG=$(aws ec2 create-security-group \
--group-name [APP-NAME]-ecs-sg \
--description "security group for ecs tasks" \
--vpc-id $VPC_ID \
--query GroupId --output text)
aws ec2 authorize-security-group-ingress \
--group-id $ECS_SG \
--protocol tcp --port 80 --source-group $ALB_SG
3.2 create application load balancer
# create alb
ALB_ARN=$(aws elbv2 create-load-balancer \
--name [APP-NAME]-alb \
--subnets $SUBNET1 $SUBNET2 \
--security-groups $ALB_SG \
--query 'LoadBalancers[0].LoadBalancerArn' --output text)
# create target group with health check configuration
TG_ARN=$(aws elbv2 create-target-group \
--name [APP-NAME]-targets \
--protocol HTTP --port 80 \
--vpc-id $VPC_ID \
--target-type ip \
--health-check-path /health/check \
--health-check-protocol HTTP \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--matcher HttpCode=200 \
--query 'TargetGroups[0].TargetGroupArn' --output text)
# create listener
aws elbv2 create-listener \
--load-balancer-arn $ALB_ARN \
--protocol HTTP --port 80 \
--default-actions Type=forward,TargetGroupArn=$TG_ARN
3.3 health check configuration details
the alb health check configuration is critical for proper operation:
- path:
/health/check - your rails endpoint
- success codes:
200 - http ok status
- interval:
30 seconds - check frequency
- timeout:
5 seconds - request timeout
- healthy threshold:
2 - consecutive successful checks to mark healthy
- unhealthy threshold:
3 - consecutive failed checks to mark unhealthy
step 4: ecs configuration
4.1 create ecs cluster
aws ecs create-cluster --cluster-name [APP-NAME]-cluster
4.2 iam role for task execution
security note: check if ecstaskexecutionrole already exists in your account before creating it to avoid conflicts.
# create execution role (skip if it already exists)
aws iam create-role \
--role-name ecsTaskExecutionRole \
--assume-role-policy-document '{
"Version":"2012-10-17",
"Statement":[{
"Effect":"Allow",
"Principal":{"Service":"ecs-tasks.amazonaws.com"},
"Action":"sts:AssumeRole"
}]
}'
# attach required policies
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite
4.3 secrets management
store sensitive configuration in aws secrets manager:
# store rails master key
aws secretsmanager create-secret \
--name [APP-NAME]/rails_master_key \
--secret-string "$(cat config/master.key)"
4.4 ecs task definition
{
"family": "[APP-NAME]-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::[ACCOUNT-ID]:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "[APP-NAME]",
"image": "[ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:latest",
"essential": true,
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp"
}
],
"environment": [
{
"name": "RAILS_ENV",
"value": "production"
},
{
"name": "RAILS_LOG_TO_STDOUT",
"value": "true"
}
],
"secrets": [
{
"name": "RAILS_MASTER_KEY",
"valueFrom": "arn:aws:secretsmanager:us-east-1:[ACCOUNT-ID]:secret:[APP-NAME]/rails_master_key"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost/health/check || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/[APP-NAME]",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
4.5 register task definition and create service
# create cloudwatch log group
aws logs create-log-group --log-group-name /ecs/[APP-NAME]
# register task definition
TASK_DEF_ARN=$(aws ecs register-task-definition \
--cli-input-json file://ecs-task-definition.json \
--query 'taskDefinition.taskDefinitionArn' --output text)
# create ecs service
aws ecs create-service \
--cluster [APP-NAME]-cluster \
--service-name [APP-NAME]-service \
--task-definition $TASK_DEF_ARN \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[$SUBNET1,$SUBNET2],
securityGroups=[$ECS_SG],
assignPublicIp=ENABLED
}" \
--load-balancers "targetGroupArn=$TG_ARN,containerName=[APP-NAME],containerPort=80"
step 5: deployment and testing
5.1 monitor deployment
# check service status
aws ecs describe-services --cluster [APP-NAME]-cluster --services [APP-NAME]-service --region us-east-1
# check target health
aws elbv2 describe-target-health --target-group-arn $TG_ARN --region us-east-1
# view logs
aws logs get-log-events --log-group-name /ecs/[APP-NAME] --log-stream-name [LOG-STREAM] --region us-east-1
5.2 test health checks
# get alb dns name
ALB_DNS=$(aws elbv2 describe-load-balancers \
--load-balancer-arns $ALB_ARN \
--query 'LoadBalancers[0].DNSName' --output text)
# test health check endpoint
curl http://$ALB_DNS/health/check
expected response:
{
"status": "ok",
"timestamp": "2025-09-13t21:28:14z",
"rails_version": "8.0.2.1",
"environment": "production"
}
common issues and solutions
container permission errors
issue: permission denied - bind(2) for "0.0.0.0" port 80
solution options:
option a: use non-privileged port (recommended for security)
# run as non-root user on port 3000
USER rails:rails
EXPOSE 3000
CMD ["./bin/rails", "server", "-b", "0.0.0.0", "-p", "3000"]
# update alb target group to port 3000
# update ecs security group to allow port 3000 from alb
option b: use thruster proxy (better performance)
# run as root to bind privileged port, but thruster drops privileges
ENV TARGET_PORT=3000
ENV HTTP_PORT=80
EXPOSE 80
CMD ["./bin/thrust", "./bin/rails", "server", "-b", "0.0.0.0", "-p", "3000"]
# benefits: http/2, compression, static file serving, caching
# security: thruster runs as root but rails process runs as rails user
thruster benefits you lose with option a:
- http/2 support
- automatic compression (gzip/brotli)
- static file serving optimizations
- built-in caching
- x-sendfile support for efficient file downloads
health check failures
issue: alb showing 502/503 errors
solutions:
- verify health check path matches your rails route
- ensure container is listening on the correct port
- check security group allows alb → ecs communication
- review container logs for startup errors
issue: exec format error in container logs
solution: build for correct architecture:
docker buildx build --platform linux/amd64 -t [APP-NAME]:latest .
security considerations
best practices implemented
- secrets management: sensitive data stored in aws secrets manager
- network security: security groups restrict access between components
- least privilege: iam roles with minimal required permissions
- container security: multi-stage builds reduce attack surface
security group rules
with option a (port 3000):
- alb sg: allow http (80) from internet
- ecs sg: allow http (3000) only from alb sg
- alb handles port 80 → 3000 mapping
with option b (thruster):
- alb sg: allow http (80) from internet
- ecs sg: allow http (80) only from alb sg
- thruster handles http optimizations
security trade-offs
option a (non-privileged port):
- ✅ better: no root processes
- ✅ better: principle of least privilege
- ❌ worse: no http/2, compression, caching
- ❌ worse: higher resource usage for static files
option b (thruster):
- ✅ better: http/2, compression, optimizations
- ✅ better: rails process still runs as non-root
- ⚠️ acceptable: thruster proxy runs as root (industry standard)
- ⚠️ acceptable: container isolation provides security boundary
recommendation: use thruster (option b) unless you have strict security requirements that prohibit any root processes.
cost optimization
fargate pricing factors
- cpu allocation: 256 cpu units (0.25 vcpu)
- memory allocation: 512 mb ram
- running time: pay per second, minimum 1 minute
cost-saving tips
- right-size resources: start small, monitor, and adjust
- use spot pricing: for non-critical workloads
- scale to zero: during low-traffic periods
- monitor usage: cloudwatch metrics for optimization
monitoring and logging
cloudwatch integration
- container logs: automatically streamed to cloudwatch
- metrics: cpu, memory, network utilization
- alarms: set up alerts for health check failures
health check monitoring
# create cloudwatch alarm for unhealthy targets
aws cloudwatch put-metric-alarm \
--alarm-name "[APP-NAME]-unhealthy-targets" \
--alarm-description "alb has unhealthy targets" \
--metric-name UnHealthyHostCount \
--namespace AWS/ApplicationELB \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 0 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=TargetGroup,Value=$TG_ARN
deployment commands summary
here’s the complete sequence of commands to deploy your rails app:
# 1. build and push image
docker buildx build --platform linux/amd64 -t [APP-NAME]:latest .
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com
docker tag [APP-NAME]:latest [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:latest
docker push [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:latest
# 2. create infrastructure
aws ecs create-cluster --cluster-name [APP-NAME]-cluster --region us-east-1
aws logs create-log-group --log-group-name /ecs/[APP-NAME] --region us-east-1
# 3. register task definition and deploy
aws ecs register-task-definition --cli-input-json file://ecs-task-definition.json --region us-east-1
aws ecs create-service --cluster [APP-NAME]-cluster --service-name [APP-NAME]-service --task-definition [APP-NAME]-task:1 --desired-count 2 --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[subnet-ids],securityGroups=[ecs-sg-id],assignPublicIp=ENABLED}" --load-balancers "targetGroupArn=[tg-arn],containerName=[APP-NAME],containerPort=80" --region us-east-1
# 4. test deployment
curl http://[alb-dns]/health/check
high availability and reliability patterns
current availability with 2 containers
our basic deployment with desired-count: 2 provides:
- basic redundancy: if one container fails, traffic routes to the healthy container
- rolling updates: ecs can update one container at a time without downtime
- automatic recovery: failed containers are automatically restarted
- estimated availability: ~99.5% (basic level)
achieving higher availability (99.9%+)
for production applications requiring maximum uptime, implement these patterns:
1. multi-az deployment with increased capacity
{
"serviceName": "[APP-NAME]-service-ha",
"desiredCount": 4,
"deploymentConfiguration": {
"maximumPercent": 200,
"minimumHealthyPercent": 50,
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
},
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": ["subnet-1a", "subnet-1b", "subnet-1c"],
"securityGroups": ["sg-ecs"],
"assignPublicIp": "ENABLED"
}
}
}
benefits:
- 4 containers across 3 availability zones
- can lose entire az and maintain service
- circuit breaker automatically rolls back failed deployments
- deployment flexibility allows 100% capacity increase during deployments
2. auto scaling configuration
# create auto scaling target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/[APP-NAME]-cluster/[APP-NAME]-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 4 \
--max-capacity 20
# cpu-based scaling policy
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/[APP-NAME]-cluster/[APP-NAME]-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}'
3. enhanced health checks
extend your health controller for comprehensive monitoring:
# app/controllers/health_controller.rb
class HealthController < ApplicationController
def check
health_data = {
status: "ok",
timestamp: Time.current.iso8601,
rails_version: Rails.version,
environment: Rails.env,
uptime: uptime_seconds,
memory: memory_usage,
checks: {
database: database_check,
redis: redis_check,
storage: storage_check
}
}
if health_data[:checks].values.all? { |check| check[:status] == "ok" }
render json: health_data, status: :ok
else
render json: health_data, status: :service_unavailable
end
end
private
def database_check
ActiveRecord::Base.connection.execute("SELECT 1")
{ status: "ok", response_time_ms: 0 }
rescue => e
{ status: "error", message: e.message }
end
def memory_usage
return {} unless defined?(GC)
{
rss_mb: `ps -o rss= -p #{Process.pid}`.strip.to_i / 1024,
gc_count: GC.count,
heap_slots: GC.stat[:heap_live_slots]
}
end
def uptime_seconds
Process.clock_gettime(Process::CLOCK_UPTIME).to_i
end
end
4. monitoring and alerting setup
# create comprehensive alarms
aws cloudwatch put-metric-alarm \
--alarm-name "[APP-NAME]-high-cpu" \
--alarm-description "high cpu utilization" \
--metric-name CPUUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ServiceName,Value=[APP-NAME]-service Name=ClusterName,Value=[APP-NAME]-cluster
aws cloudwatch put-metric-alarm \
--alarm-name "[APP-NAME]-response-time" \
--alarm-description "high response time" \
--metric-name TargetResponseTime \
--namespace AWS/ApplicationELB \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 2.0 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=LoadBalancer,Value=[ALB-FULL-NAME]
5. graceful shutdown handling
rails applications handle sigterm gracefully by default with puma. configure ecs task definition for proper shutdown timing:
{
"containerDefinitions": [{
"stopTimeout": 30,
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost/health/check || exit 1"],
"interval": 15,
"timeout": 5,
"retries": 3,
"startPeriod": 45
}
}]
}
availability comparison
| pattern |
containers |
azs |
estimated availability |
recovery time |
| basic |
2 |
2 |
99.5% |
2-3 minutes |
| enhanced |
4 |
3 |
99.9% |
30 seconds |
| enterprise |
6+ |
3+ |
99.95%+ |
10 seconds |
cost vs availability trade-offs
basic deployment (2 containers):
- cost: ~$30/month for small workloads
- availability: sufficient for internal tools, staging
- recovery: manual intervention may be needed
high availability (4+ containers):
- cost: ~$60-120/month depending on scale
- availability: production-ready for business applications
- recovery: automatic with circuit breakers
enterprise (6+ containers + auto-scaling):
- cost: variable, $100-500+/month based on traffic
- availability: mission-critical applications
- recovery: instant failover across multiple zones
deployment pipeline for ha
# 1. build and test
docker buildx build --platform linux/amd64 -t [APP-NAME]:latest .
docker run --rm -p 3000:80 [APP-NAME]:latest &
sleep 10
curl -f http://localhost:3000/health/check || exit 1
# 2. push to ecr
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com
docker tag [APP-NAME]:latest [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:$(git rev-parse --short HEAD)
docker push [ACCOUNT-ID].dkr.ecr.us-east-1.amazonaws.com/[APP-NAME]:$(git rev-parse --short HEAD)
# 3. update task definition with new image
sed "s/:latest/:$(git rev-parse --short HEAD)/g" ecs-task-definition.json > ecs-task-definition-$(git rev-parse --short HEAD).json
aws ecs register-task-definition --cli-input-json file://ecs-task-definition-$(git rev-parse --short HEAD).json
# 4. update service (ecs handles rolling deployment)
aws ecs update-service \
--cluster [APP-NAME]-cluster \
--service [APP-NAME]-service \
--task-definition [APP-NAME]-task:$(aws ecs list-task-definitions --family-prefix [APP-NAME]-task --status ACTIVE --sort DESC --max-items 1 --query 'taskDefinitionArns[0]' --output text | cut -d'/' -f2)
# 5. wait for deployment to complete
aws ecs wait services-stable --cluster [APP-NAME]-cluster --services [APP-NAME]-service
recommended ha configuration
for most production applications, this configuration provides excellent availability:
{
"desiredCount": 4,
"deploymentConfiguration": {
"maximumPercent": 150,
"minimumHealthyPercent": 75,
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
},
"healthCheckGracePeriodSeconds": 60
}
key benefits:
- 4 containers provide redundancy across az failures
- 75% minimum ensures 3 containers always running during deployments
- circuit breaker prevents bad deployments from taking down service
- reasonable costs while maintaining high availability
conclusion
this deployment approach provides:
- scalable architecture that grows with your application
- high availability across multiple azs with configurable redundancy levels
- proper health monitoring with comprehensive alb and container health checks
- security best practices with secrets management
- cost-effective operations with serverless containers
- reliability patterns including auto-scaling, circuit breakers, and graceful shutdowns
the combination of alb health checks, ecs service management, and proper application health endpoints creates a robust production deployment that can achieve 99.9%+ availability for business-critical applications.
for production environments, consider adding:
- database integration (rds with multi-az)
- ssl/tls termination at alb
- cdn (cloudfront) for global performance
- comprehensive monitoring and alerting
- backup and disaster recovery strategies
- blue/green or canary deployments
repository structure
├── dockerfile # container definition
├── docker-compose.yml # local development
├── ecs-task-definition.json # ecs configuration
├── app/
│ └── controllers/
│ └── health_controller.rb
├── config/
│ └── routes.rb
└── bin/
└── docker-entrypoint
this guide demonstrates a complete production-ready rails deployment on aws using modern containerization and infrastructure practices.
17 Aug 2025
When I decided to build a modern IRC client for the terminal, I wanted something more sophisticated than the typical ncurses-based applications. Enter Bubble Tea, Charm’s powerful framework for building terminal user interfaces in Go. In this post, I’ll walk through how Bubble Tea works and how I used it to create a feature-rich IRC client.
What is Bubble Tea?
Bubble Tea is based on The Elm Architecture, bringing functional programming concepts to terminal UIs. It follows a simple pattern:
- Model: Your application state
- Update: A function that modifies state based on messages
- View: A function that renders the current state
This architecture makes applications predictable, testable, and easy to reason about.
The Elm Architecture in Bubble Tea
According to the Bubble Tea repository, it’s “based on the functional design paradigms of The Elm Architecture”. Here’s how it works:
The Four Pillars
Every Bubble Tea program consists of:
- Model: A struct that holds your entire application state
- Init(): Returns the initial model and any startup commands
- Update(msg tea.Msg): Receives messages and returns an updated model
- View(): Takes the model and returns a string representation
Here’s the minimal interface every Bubble Tea program must implement:
type Model interface {
Init() Cmd
Update(Msg) (Model, Cmd)
View() string
}
How It Works
A key concept: You implement these methods, but you never call them. The framework calls your code:
// What you write:
func (m IRCModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
switch msg := msg.(type) {
case tea.KeyMsg:
// Handle user input
case msgConnected:
// Handle IRC connection
}
return m, nil
}
func main() {
p := tea.NewProgram(InitialModel())
p.Run() // You call this once, then Bubble Tea takes over
}
Inside p.Run(), Bubble Tea’s event loop calls your methods:
// What Bubble Tea does (you never write this):
for {
select {
case msg := <-p.msgs:
model, cmd = model.Update(msg) // Framework calls YOUR Update
handleCommand(cmd) // Framework handles returned command
render(model.View()) // Framework calls YOUR View
}
}
The Message Flow
The genius of this architecture is its unidirectional data flow:
┌─────────────────┐
│ │
│ Model │◄─────────────┐
│ │ │
└────────┬────────┘ │
│ │
▼ │
┌─────────────────┐ │
│ │ │
│ View │ │
│ │ │
└────────┬────────┘ │
│ │
▼ │
Terminal │
Display │
│ │
User Input │
│ │
▼ │
┌─────────────────┐ │
│ │ │
│ Update │──────────────┘
│ │
└─────────────────┘
Messages flow in one direction: User input → Update → Model → View → Display.
Why this matters: In traditional UI programming, different parts of your app can modify state directly, leading to chaos:
// Traditional approach - multiple places changing state
func onKeyPress() {
sidebar.addChannel("#golang")
chatArea.updateUserCount(42)
statusBar.setConnected(true)
// Who changed what? When? In what order?
}
func onNetworkEvent() {
sidebar.removeUser("bob")
chatArea.addMessage("bob left")
// Now sidebar and chat area might be out of sync!
}
With Bubble Tea’s unidirectional flow, only one place can change state:
// Bubble Tea approach - all changes go through Update
func (m IRCModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
switch msg := msg.(type) {
case UserLeftMsg:
// Remove from users list
delete(m.channelUsers[msg.channel], msg.user)
// Add to message history
m.addMessage(msg.channel, fmt.Sprintf("%s left", msg.user))
// State is always consistent!
}
return m, nil
}
This guarantees your UI state is always consistent because there’s only one path for changes.
Why This Matters for Terminal UIs
Traditional terminal UI libraries like ncurses use imperative updates:
// ncurses - imperative, stateful
mvprintw(10, 20, "Status: ");
if (connected) {
attron(COLOR_PAIR(GREEN));
printw("Connected");
} else {
attron(COLOR_PAIR(RED));
printw("Disconnected");
}
refresh();
With Bubble Tea’s Elm Architecture:
// Bubble Tea - declarative, functional
func (m Model) View() string {
status := "Disconnected"
if m.connected {
status = "Connected"
}
return fmt.Sprintf("Status: %s", status)
}
The framework handles all the diffing, rendering, and optimization. You just describe what you want to see.
The Core Architecture
Here’s how I structured the IRC client using Bubble Tea:
type IRCModel struct {
// UI components
viewport viewport.Model
sidebarViewport viewport.Model
textarea textarea.Model
// Application state
allMessages map[string][]string
channels map[string]bool
channelUsers map[string][]string
activeChannel string
sidebarFocused bool
connected bool
// Layout
width int
height int
sidebarWidth int
}
The model contains both UI components (viewports, textarea) and application state (channels, messages, users). This separation allows for clean state management while leveraging Bubble Tea’s built-in components.
The Update Loop
The heart of any Bubble Tea application is the Update function, which handles all events:
func (m IRCModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
// Route input based on focus
if !m.sidebarFocused {
m.textarea, tiCmd = m.textarea.Update(msg)
m.viewport, vpCmd = m.viewport.Update(msg)
} else {
m.sidebarViewport, svpCmd = m.sidebarViewport.Update(msg)
}
switch msg := msg.(type) {
case tea.WindowSizeMsg:
m.handleResize(msg)
case tea.KeyMsg:
return m.handleKeypress(msg)
case msgConnected:
return m.handleConnection(msg)
case msgReceived:
return m.handleIRCMessage(msg)
}
return m, tea.Batch(tiCmd, vpCmd, svpCmd)
}
Notice how different message types are handled separately. This pattern makes it easy to add new features without breaking existing functionality.
Custom Message Types
One powerful feature of Bubble Tea is custom message types. For IRC, I created specific messages for different network events:
type msgConnected struct {
conn net.Conn
}
type msgReceived struct {
text string
}
type errMsg error
These messages are sent through commands, which are functions that return messages:
func connectToIRC(server, nickname string) tea.Cmd {
return func() tea.Msg {
conn, err := net.Dial("tcp", server)
if err != nil {
return errMsg(err)
}
// Send IRC registration
writer := bufio.NewWriter(conn)
writer.WriteString(fmt.Sprintf("NICK %s\r\n", nickname))
writer.WriteString(fmt.Sprintf("USER %s 0 * :%s\r\n", nickname, nickname))
writer.Flush()
return msgConnected{conn: conn}
}
}
This approach keeps the UI responsive while handling network operations in the background.
Layout with Golden Ratio
For the visual design, I implemented a golden ratio layout to create pleasing proportions:
goldenRatio := 1.618
m.sidebarWidth = int(float64(msg.Width) / (goldenRatio + 1.0))
// Ensure reasonable bounds
if m.sidebarWidth < 15 {
m.sidebarWidth = 15
}
if m.sidebarWidth > 25 {
m.sidebarWidth = 25
}
This creates a sidebar that’s approximately 38% of the screen width, following the golden ratio principle for visual harmony.
One challenge was implementing independent scrolling for the sidebar and main chat area. I solved this with a focus system:
case tea.KeyTab:
m.sidebarFocused = !m.sidebarFocused
if m.sidebarFocused {
m.textarea.Blur()
} else {
m.textarea.Focus()
}
When the sidebar is focused, arrow keys scroll through channels and users. When the chat is focused, they scroll through message history. This gives users full control over both areas independently.
Real-time Updates
IRC requires real-time message handling. I set up a continuous message loop:
func waitForMessage(conn net.Conn) tea.Cmd {
return func() tea.Msg {
scanner := bufio.NewScanner(conn)
if scanner.Scan() {
return msgReceived{text: scanner.Text()}
}
if err := scanner.Err(); err != nil {
return errMsg(err)
}
return nil
}
}
Each time a message is received, it triggers an update, parses the IRC protocol, and updates the appropriate channel or user list.
Styling with Lipgloss
Bubble Tea integrates beautifully with Lipgloss for styling. I created adaptive styles that work in both light and dark terminals:
var (
titleStyle = lipgloss.NewStyle().
Foreground(lipgloss.AdaptiveColor{Light: "#FFFFFF", Dark: "#FFFDF5"}).
Background(lipgloss.AdaptiveColor{Light: "#0969DA", Dark: "#25A065"}).
Padding(0, 1)
userStyle = lipgloss.NewStyle().
Foreground(lipgloss.AdaptiveColor{Light: "#1A7F37", Dark: "#7EE787"})
)
This ensures the client looks great regardless of the terminal’s color scheme.
Under the Hood: How Bubble Tea Prevents UI Blocking
Looking at the Bubble Tea source code reveals elegant concurrency patterns that keep the UI responsive. Here’s how it actually works:
The Message Channel Architecture
Bubble Tea uses a central message channel (p.msgs) as the communication hub:
func (p *Program) Send(msg Msg) {
select {
case <-p.ctx.Done():
case p.msgs <- msg:
}
}
This channel allows background goroutines to safely send messages back to the main event loop without blocking.
Command Execution in Goroutines
When you return a tea.Cmd, Bubble Tea spawns a goroutine to execute it:
func (p *Program) handleCommands(cmds chan Cmd) chan struct{} {
go func() {
for {
select {
case cmd := <-cmds:
go func() {
// Each command runs in its own goroutine
msg := cmd()
p.Send(msg) // Send result back to main loop
}()
}
}
}()
}
Key benefits:
- Non-blocking execution - Long-running operations don’t freeze the UI
- Automatic panic recovery - Crashed commands don’t take down the app
- Graceful cleanup - Context cancellation stops all goroutines on exit
The Event Loop
The main event loop processes messages sequentially, ensuring thread safety:
func (p *Program) eventLoop(model Model, cmds chan Cmd) (Model, error) {
for {
select {
case msg := <-p.msgs:
// Update model (always on main thread)
model, cmd = model.Update(msg)
// Send new commands for background execution
select {
case cmds <- cmd:
case <-p.ctx.Done():
return model, nil
}
// Render immediately with updated model
p.renderer.write(model.View())
}
}
}
Why This Design Matters
In your IRC client, when connectToIRC() makes a network call:
- Network operation runs in background goroutine (doesn’t block UI)
- User can still type, scroll, resize (UI remains responsive)
- When connection completes, sends
msgConnected (thread-safe communication)
- Main loop processes message and updates model (sequential, no race conditions)
- UI re-renders with new state (immediate visual feedback)
This is why you can have dozens of ongoing network operations (IRC reads, user lookups, etc.) without any UI lag or complex synchronization code.
Source Code
You can check out the complete IRC client source code at github.com/sngeth/chat.