Sid Ngeth's Blog A blog about anything (but mostly development)

ai-powered goal setting with smart caching

built a goal-setting app that transforms problems into actionable smart goals, then provides conversational ai guidance for each task. the challenge? making ai interactions fast, cost-effective, and genuinely helpful. the solution? a multi-layer caching approach with semantic similarity matching.

the problem

traditional goal-setting apps are static lists. when users get stuck on “start a 4-day upper lower split” or “learn spanish,” they’re on their own. adding ai help seems obvious, but creates new problems:

  • cost escalation: every “how do I do this?” question hits expensive openai apis
  • response inconsistency: same question gets different answers due to ai randomness
  • context loss: follow-up questions lack memory of the original task
  • user frustration: waiting 2+ seconds for common questions like “what about sets and reps?”

how it works

implemented a simple flow: users describe problems, get structured goals, then ask follow-up questions about how to complete them.

layer 1: problem to smart goals

transforms user problems into specific, measurable, achievable, relevant, and time-bound goals:

// user input: "How do I quit smoking?"
const response = await fetch('/api/generate-goals', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${token}` },
  body: JSON.stringify({ problem: userInput })
});

// generates structured output:
{
  "today": ["throw away all cigarettes and smoking accessories"],
  "month": ["complete 30-day nicotine replacement therapy program"],
  "year": ["maintain smoke-free lifestyle for 365 consecutive days"]
}

uses embedding-based caching - similar problems get similar goals without redundant api calls.

layer 2: contextual how-to guides

each goal gets an interactive ❓ button for ai-powered guidance:

async function showHowTo(goalSetId, listType, taskId) {
  const task = goalSets[goalSetId][listType].find(t => t.id === taskId);
  
  const response = await fetch('/api/how-to', {
    method: 'POST',
    body: JSON.stringify({
      taskText: task.text,
      goalContext: {
        goalSetName: goalSet.name,
        timeframe: listType // today, month, year
      }
    })
  });
  
  return response.json(); // structured guide with steps, tips, timing
}

returns structured guidance:

  • overview: brief explanation of the task
  • steps: actionable numbered instructions
  • proTip: expert advice or common pitfalls
  • timeNeeded: realistic estimates
  • difficulty: complexity assessment

layer 3: conversational follow-ups

users can ask clarifying questions that maintain full context:

// conversation state preserved across questions
currentConversation = {
  taskId: "start_4_day_split_123",
  goalSetId: "fitness_goals", 
  listType: "today",
  messages: [
    { type: 'assistant', content: originalGuide },
    { type: 'user', content: "What about sets and reps?" },
    { type: 'assistant_followup', content: "For upper body days..." }
  ]
};

understanding openai embeddings

before diving into caching, it’s crucial to understand what embeddings are and why they’re revolutionary for text similarity.

what are embeddings?

embeddings convert text into high-dimensional vectors that capture semantic meaning. think of them as “fingerprints” for concepts - similar ideas get similar fingerprints.

openai’s text-embedding-3-small model transforms any text into a 1536-dimensional array of floating-point numbers:

async function getEmbedding(text, apiKey) {
  const response = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'text-embedding-3-small',
      input: text.toLowerCase().trim(),
      encoding_format: 'float'
    })
  });
  
  const data = await response.json();
  return data.data[0].embedding; // array of 1536 numbers
}

// "start a 4 day upper lower split" becomes:
// [0.12, -0.05, 0.73, 0.41, -0.19, 0.84, ...]  // 1536 floating-point numbers

the magic of semantic similarity

the breakthrough insight: semantically similar text produces similar vectors, even with completely different words.

// these all get embedded into nearby points in 1536-dimensional space:
"start a 4 day upper lower split"     // [0.12, -0.05, 0.73, 0.41, ...]
"begin 4-day upper/lower routine"     // [0.11, -0.04, 0.74, 0.42, ...]  
"initiate four day upper-lower workout" // [0.13, -0.06, 0.72, 0.40, ...]

// while completely different concepts are far apart:
"bake chocolate chip cookies"         // [0.89, 0.34, -0.12, -0.67, ...]

the ai model has learned through massive training that these phrases represent the same underlying concept, despite using different words.

measuring similarity with cosine similarity

cosine similarity measures the angle between two vectors in high-dimensional space, returning a score from 0.0 to 1.0:

function cosineSimilarity(vecA, vecB) {
  // calculate dot product (how aligned the vectors are)
  const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
  
  // calculate magnitudes (lengths of the vectors)
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  
  // cosine similarity = dot product / (magnitude_a * magnitude_b)
  return dotProduct / (magnitudeA * magnitudeB);
}

// similarity score meanings:
// 0.95-1.0: nearly identical meaning ("start workout" vs "begin workout")
// 0.8-0.95:  very similar concepts ("sets and reps" vs "repetitions")  
// 0.6-0.8:   related but different ("workout" vs "exercise routine")
// 0.0-0.6:   unrelated concepts ("workout" vs "baking cookies")

real similarity examples from the app

here are actual similarity scores measured during development:

// high task similarity (0.92) - correctly matched
cosineSimilarity(
  embedding("start a 4 day upper lower split"),
  embedding("begin 4-day upper/lower routine")
) // returns 0.92

// high question similarity (0.87) - correctly matched
cosineSimilarity(
  embedding("what about sets and reps?"),
  embedding("how many sets and repetitions?")
) // returns 0.87

// low task similarity (0.23) - correctly rejected  
cosineSimilarity(
  embedding("start a 4 day upper lower split"),
  embedding("bake chocolate chip cookies")
) // returns 0.23

// medium question similarity (0.64) - correctly rejected
cosineSimilarity(
  embedding("what about sets and reps?"),
  embedding("what about rest periods?")
) // returns 0.64 (related but different question)

why embeddings beat traditional approaches

traditional string matching would completely fail on natural language variations:

// string matching approach (brittle and inflexible)
function isSetRepsQuestion(question) {
  const q = question.toLowerCase();
  return q.includes("sets") && q.includes("reps");
}

isSetRepsQuestion("what about sets and reps?");     // ✅ true
isSetRepsQuestion("how many repetitions per set?"); // ❌ false (missed!)
isSetRepsQuestion("what's the rep and set scheme?"); // ❌ false (missed!)

// embedding approach (semantic understanding)
function isSetRepsQuestion(question) {
  const questionEmbedding = getEmbedding(question);
  const referenceEmbedding = getEmbedding("sets and reps");
  return cosineSimilarity(questionEmbedding, referenceEmbedding) >= 0.8;
}

isSetRepsQuestion("what about sets and reps?");     // ✅ true (0.95)
isSetRepsQuestion("how many repetitions per set?"); // ✅ true (0.84)  
isSetRepsQuestion("what's the rep and set scheme?"); // ✅ true (0.81)

semantic caching approach

armed with understanding embeddings, we can now implement intelligent caching using dual similarity matching:

dual similarity matching

the breakthrough insight: we need both the task context and question intent to match for a valid cache hit.

asking “what about sets and reps?” means completely different things for:

  • “start a 4 day upper lower split” → workout programming advice
  • “bake chocolate chip cookies” → nonsensical question
async function findSimilarFollowUp(taskText, question, env) {
  // convert both text inputs to semantic vectors  
  const taskEmbedding = await getEmbedding(taskText, env.OPENAI_API_KEY);
  const questionEmbedding = await getEmbedding(question, env.OPENAI_API_KEY);
  
  // search all cached responses
  const list = await env.GOAL_CACHE.list({ prefix: 'followup_' });
  
  for (const item of list.keys) {
    const cached = await env.GOAL_CACHE.get(item.name, 'json');
    
    // calculate semantic similarity for both dimensions
    const taskSimilarity = cosineSimilarity(taskEmbedding, cached.taskEmbedding);
    const questionSimilarity = cosineSimilarity(questionEmbedding, cached.questionEmbedding);
    
    // both thresholds must be exceeded for cache hit
    if (taskSimilarity >= 0.85 && questionSimilarity >= 0.8) {
      console.log(`Cache hit: task=${taskSimilarity.toFixed(3)}, question=${questionSimilarity.toFixed(3)}`);
      return cached.answer;
    }
  }
  
  return null; // no semantic match found
}

threshold tuning through experimentation

the similarity thresholds were determined through empirical testing:

  • task similarity: 0.85 - tasks must be very similar
    • fitness tasks vs cooking tasks score ~0.2 (correctly rejected)
    • “upper lower split” vs “upper/lower routine” score ~0.92 (correctly matched)
  • question similarity: 0.8 - questions can vary more linguistically
    • “sets and reps” vs “repetitions” score ~0.87 (correctly matched)
    • “sets and reps” vs “rest periods” score ~0.64 (correctly rejected)

too low and you get wrong answers for the wrong context, too high and you miss valid linguistic variations.

cache structure evolution: from duplication to referential integrity

the initial cache structure stored complete task embeddings with every follow-up question:

// inefficient: task embeddings duplicated across follow-ups
"followup_1703512345_abc123"  {
  taskText: "start a 4 day Upper Lower Split",
  taskEmbedding: [0.1, 0.2, 0.3, ...], // 1536 dimensions - DUPLICATED
  question: "What about sets and reps?",
  questionEmbedding: [0.7, 0.8, 0.9, ...],
  answer: "For upper body days, aim for 3-4 sets of 8-12 reps...",
  hitCount: 23,
  timestamp: "2024-01-15T10:30:00Z"
}

the optimized structure uses a relational approach with task hash references:

// task embeddings stored once and referenced by hash
"task_embedding_abc123"  {
  taskText: "start a 4 day Upper Lower Split", 
  taskEmbedding: [0.1, 0.2, 0.3, ...], // 1536 dimensions - STORED ONCE
  taskHash: "abc123",
  usageCount: 15,
  timestamp: "2024-01-15T10:30:00Z"
}

// follow-up entries reference the task by hash
"followup_v2_abc123_def456_1703512345"  {
  taskHash: "abc123", // FOREIGN KEY REFERENCE
  question: "What about sets and reps?",
  questionEmbedding: [0.7, 0.8, 0.9, ...],
  answer: "For upper body days, aim for 3-4 sets of 8-12 reps...",
  hitCount: 23,
  timestamp: "2024-01-15T10:30:00Z"
}

task embedding lookup mechanism

when a cached follow-up is found, the system performs a two-step lookup to verify task similarity:

// step 1: find potential follow-up cache matches
const followupList = await env.GOAL_CACHE.list({ prefix: 'followup_v2_' });

for (const item of followupList.keys) {
  const cached = await env.GOAL_CACHE.get(item.name, 'json');
  
  if (cached && cached.taskHash && cached.questionEmbedding) {
    // step 2: lookup task embedding by hash reference
    const cachedTaskEmbedding = await getTaskEmbedding(cached.taskHash, env);
    
    if (cachedTaskEmbedding) {
      // step 3: verify both task and question similarity
      const taskSimilarity = cosineSimilarity(currentTaskEmbedding, cachedTaskEmbedding);
      const questionSimilarity = cosineSimilarity(currentQuestionEmbedding, cached.questionEmbedding);
      
      if (taskSimilarity >= 0.85 && questionSimilarity >= 0.8) {
        return cached.answer; // cache hit!
      }
    }
  }
}

the getTaskEmbedding() function resolves the hash reference:

async function getTaskEmbedding(taskHash, env) {
  const taskCacheKey = `task_embedding_${taskHash}`;
  const cachedTask = await env.GOAL_CACHE.get(taskCacheKey, 'json');
  
  if (cachedTask && cachedTask.taskEmbedding) {
    return cachedTask.taskEmbedding; // return the 1536-dimensional vector
  }
  
  return null; // task embedding not found (shouldn't happen in normal operation)
}

the reality: javascript does all the work

important distinction: cloudflare kv is a simple key-value store with no query capabilities. it only supports:

  • get(key) - retrieve value by exact key match
  • put(key, value) - store value at key
  • list({ prefix }) - list keys with a given prefix
  • delete(key) - remove key

kv does not have:

  • sql queries
  • indexing beyond key prefixes
  • semantic search capabilities
  • similarity functions
  • relational joins

this means all semantic matching happens in javascript:

// what actually happens during cache lookup:

// 1. javascript fetches ALL follow-up cache entries (brute force)
const followupList = await env.GOAL_CACHE.list({ prefix: 'followup_v2_', limit: 50 });

// 2. javascript loops through each entry one by one
for (const item of followupList.keys) {
  const cached = await env.GOAL_CACHE.get(item.name, 'json'); // 🔥 KV API CALL
  
  // 3. javascript makes ANOTHER kv call to get the task embedding
  const cachedTaskEmbedding = await getTaskEmbedding(cached.taskHash, env); // 🔥 ANOTHER KV API CALL
  
  // 4. javascript calculates cosine similarity in memory
  const taskSimilarity = cosineSimilarity(currentTaskEmbedding, cachedTaskEmbedding);
  const questionSimilarity = cosineSimilarity(currentQuestionEmbedding, cached.questionEmbedding);
  
  // 5. javascript evaluates thresholds
  if (taskSimilarity >= 0.85 && questionSimilarity >= 0.8) {
    return cached.answer; // found match!
  }
}

performance implications:

  • searching 50 follow-up entries = 50 get() calls to kv
  • each entry requires another get() call for task embedding = 50 more calls
  • total: 100 kv api calls for a single cache lookup
  • each call has ~5-15ms latency from edge to kv storage
  • semantic similarity calculations happen in cloudflare’s v8 javascript runtime

this is why we limit searches (limit: 50) and use early return on first match - the “database” is actually just a distributed hashtable with javascript doing all the intelligent work.

why O(1) lookup is impossible with embeddings

the fundamental problem: you cannot create deterministic cache keys from semantic similarity.

// two questions that mean the same thing to humans
const question1 = "what about sets and reps?";
const question2 = "how many repetitions?";

// but AI embeddings convert them to completely different number arrays
embedding(question1)  [0.123, 0.456, 0.789, ...]  // 1536 numbers
embedding(question2)  [0.187, 0.423, 0.801, ...]  // 1536 different numbers

// to create cache keys, we hash the text (not the embeddings)
hash(question1)  "abc123"  // deterministic based on exact text
hash(question2)  "xyz789"  // different text = different hash

// KV can only find exact key matches
await env.GOAL_CACHE.get("followup_task1_abc123"); // ✅ finds cached answer for "sets and reps"
await env.GOAL_CACHE.get("followup_task1_xyz789"); // ❌ cache miss for "repetitions"

// even though humans know these questions are asking the same thing!

what if we hashed the embeddings instead?

// you could hash the embedding arrays...
embedding(question1)  [0.123, 0.456, 0.789, ...]
hash([0.123, 0.456, 0.789, ...])  "def456"

embedding(question2)  [0.187, 0.423, 0.801, ...]  
hash([0.187, 0.423, 0.801, ...])  "ghi789"

// but you still get different hashes for similar meanings!
// hashing doesn't make semantically similar vectors produce similar hashes

the problem: hash functions are designed to produce completely different outputs for even tiny input changes. this is the opposite of what we want for semantic similarity.

// tiny difference in embeddings = completely different hash
embedding("sets and reps")       hash  "abc123"  
embedding("sets and reps!")      hash  "xyz789"  // just added "!" 
embedding("reps and sets")       hash  "def456"  // just swapped order

// semantic similarity is about finding vectors that are *close* in high-dimensional space
// but hash functions are designed to make similar inputs produce *distant* outputs

to find semantic matches, you must compare embeddings:

// the only way to know if two questions are similar:
const similarity = cosineSimilarity(
  embedding("how many repetitions?"),     // user's question
  embedding("what about sets and reps?") // cached question
); // returns 0.87 - they ARE similar!

// but you can't know this without calculating similarity for every cached question

attempted workarounds and why they fail:

// ❌ canonical mapping: requires manual maintenance, misses variations
const canonicalMap = {
  "sets_and_reps": ["sets and reps", "repetitions", "how many reps"],
  // what about "rep count"? "set/rep scheme"? "lifting numbers"?
};

// ❌ embedding bucketing: complex, approximate, still requires similarity search
function bucketEmbedding(embedding) {
  return embedding.slice(0, 10).map(x => Math.round(x * 100)).join('_');
}
const bucket = bucketEmbedding(questionEmbedding); // still need O(k) search in bucket

// ❌ locality-sensitive hashing: difficult to implement correctly, approximate results

the harsh reality: if you need semantic similarity, you need either:

  1. O(n) search through all candidates (what we built)
  2. specialized vector database with optimized similarity algorithms

key-value stores excel at exact lookups, but semantic similarity requires mathematical comparison of high-dimensional vectors - a fundamentally different operation.

performance reality check

this approach has significant scalability issues:

cache size kv api calls lookup latency bottleneck
10 entries ~20 calls ~100-300ms acceptable
50 entries ~100 calls ~500-1500ms slower than openai
100 entries ~200 calls ~1000-3000ms unusable

why this can be slower than calling openai directly:

  • openai api: 1 call, ~2000ms response time
  • our cache lookup: 100+ kv calls, potentially 1500ms+ just for network overhead
  • plus javascript cpu time for 50+ cosine similarity calculations

architectural limitations:

  • o(n) search complexity - performance degrades linearly with cache size
  • api call explosion - each cache lookup requires dozens of network requests
  • no indexing - cloudflare kv provides no query optimization beyond key prefixes
  • cpu intensive - 1536-dimensional vector math in javascript runtime

better approaches for production scale

1. cloudflare vectorize (purpose-built for this):

// cloudflare's native vector database - perfect fit
const results = await env.VECTORIZE_INDEX.query(questionEmbedding, {
  filter: { taskHash: { $eq: "abc123" } }, // filter by task first
  topK: 3,
  returnMetadata: true
}); // single api call, sub-100ms response, same infrastructure

// would store vectors like:
await env.VECTORIZE_INDEX.upsert([{
  id: "followup_abc123_def456",
  values: questionEmbedding, // 1536 dimensions
  metadata: {
    taskHash: "abc123",
    question: "what about sets and reps?",
    answer: "For upper body days, aim for 3-4 sets...",
    taskSimilarity: 0.92 // pre-computed for filtering
  }
}]);

2. third-party vector databases:

// pinecone, weaviate, or similar if you need features vectorize lacks
const results = await vectorDB.query({
  vector: questionEmbedding,
  filter: { taskSimilarity: { $gte: 0.85 } },
  topK: 1
}); // single api call, sub-100ms response

3. pre-computed similarity indices:

// compute similarities at write-time, not read-time
"task_questions_abc123"  {
  taskHash: "abc123",
  questions: [
    { text: "sets and reps?", embedding: [...], answers: ["key1", "key2"] },
    { text: "how often?", embedding: [...], answers: ["key3"] }
  ]
} // one kv call gets all questions for a task

4. hybrid approach with similarity caching:

// cache the similarity calculations themselves
"question_matches_def456"  {
  questionHash: "def456",
  matches: [
    { followupKey: "followup_v2_abc123_ghi789", similarity: 0.87 },
    { followupKey: "followup_v2_xyz789_mno123", similarity: 0.82 }
  ],
  timestamp: "2024-01-15T10:30:00Z"
} // amortize expensive similarity calculations

the vectorize migration

after running the kv-based system in production and validating user demand, we migrated to cloudflare vectorize for true semantic caching.

migration results

performance improvements:

  • cache lookup: 500-1500ms → <100ms
  • api calls: 100+ kv operations → 1 vectorize query
  • cache accuracy: string matching → semantic similarity

real-world impact:

// before: brute force kv iteration
const followupList = await env.GOAL_CACHE.list({ prefix: 'followup_v2_', limit: 50 });
for (const item of followupList.keys) {
  const cached = await env.GOAL_CACHE.get(item.name, 'json'); // 50+ api calls
  // ... similarity calculations in javascript
}

// after: native vectorize query  
const matches = await env.SEMANTIC_CACHE.query(questionEmbedding, {
  filter: { 
    type: { $eq: "followup" },
    taskHash: { $eq: taskHash }
  },
  topK: 3,
  returnMetadata: "all"
}); // single api call with hardware-accelerated similarity search

unified semantic cache setup

single vectorize index handles all cache types:

// goal generation cache
{ 
  id: "goal_abc123_timestamp",
  values: problemEmbedding,
  metadata: { 
    type: "goal_generation", 
    problem: "How do I learn Spanish?", 
    goals: "{...}" 
  }
}

// how-to guide cache
{
  id: "guide_def456_timestamp", 
  values: taskEmbedding,
  metadata: { 
    type: "how_to_guide", 
    taskText: "start 4 day upper lower split", 
    guide: "{...}" 
  }
}

// follow-up question cache
{
  id: "followup_ghi789_timestamp",
  values: questionEmbedding, 
  metadata: { 
    type: "followup", 
    taskHash: "def456", 
    question: "what about sets and reps?", 
    answer: "..."
  }
}

production migration approach

phase 1: dual system with fallbacks

async function findSimilarFollowUpVectorize(taskText, question, env) {
  if (!env.SEMANTIC_CACHE) {
    // fallback to legacy kv caching if vectorize unavailable
    return await findSimilarFollowUp(taskText, question, env);
  }
  
  try {
    // vectorize query
    const matches = await env.SEMANTIC_CACHE.query(questionEmbedding, {...});
    return matches.length > 0 ? matches[0].metadata.answer : null;
  } catch (error) {
    console.error('vectorize error:', error);
    // graceful fallback to kv on errors
    return await findSimilarFollowUp(taskText, question, env);
  }
}

phase 2: monitoring and validation

  • cache headers distinguish systems: X-Cache-Status: HIT-VECTORIZE vs HIT
  • performance monitoring shows dramatic improvements
  • error rates remain low with reliable fallbacks

why the migration made sense:

  • architectural mismatch resolved - we were forcing kv (key-value store) to do similarity search (vector database job)
  • performance unpredictability - 100+ api calls per cache lookup could be slower than openai depending on network conditions
  • complexity reduction - eliminated 200+ lines of similarity calculation code
  • timing alignment - vectorize became available when we needed it

engineering reality: we built a working but inefficient system, then migrated when better infrastructure became available. the kv approach taught us the requirements and validated user demand before committing to specialized tooling.

why this matters: the taskHash acts as a foreign key that enables semantic deduplication while maintaining referential integrity. multiple follow-up questions about the same task concept share a single task embedding, but each has its own question embedding and cached answer.

this creates a many-to-one relationship where:

  • one task embedding (e.g., “start 4 day upper lower split”)
  • supports multiple follow-up caches (e.g., “sets and reps?”, “how often?”, “what weight?”)
  • without duplicating the expensive 1536-dimensional task vector

cross-user cache benefits

the magic happens when multiple users ask similar questions:

user a: “start 4 day upper lower split” + “what about sets and reps?” → cache miss → ai generates → cached

user b: “begin 4-day upper/lower routine” + “sets and repetitions?” → cache hit → instant response

the semantic matching catches variations:

  • “What about sets and reps?” ≈ “How many sets and repetitions?”
  • “start 4 day upper lower split” ≈ “begin upper/lower 4-day routine”
  • but rejects wrong contexts: “sets and reps” + “bake cookies” → no match

performance characteristics

two distinct response patterns:

scenario network openai cost latency hit rate
cache miss + openai openai api ~$0.005 ~6.5s 0%
vectorize cache hit vectorize query $0 ~2.1s 40-60%

vectorize caching implementation

the system now implements unified vectorize caching with semantic similarity search:

single index, three cache types

all caching flows through one vectorize index with metadata-based filtering:

// goal generation vectors
{
  id: "goal_776b24e8b76a40ad_1753846452714",
  values: problemEmbedding, // 1536 dimensions
  metadata: {
    type: "goal_generation",
    problemHash: "776b24e8b76a40ad", 
    problem: "get jacked",
    goals: "{\"today\":[...], \"month\":[...], \"year\":[...]}"
  }
}

// how-to guide vectors  
{
  id: "guide_3933f9b7ab9b0f6f_1753847607690",
  values: taskEmbedding, // 1536 dimensions
  metadata: {
    type: "how_to_guide",
    taskHash: "3933f9b7ab9b0f6f",
    taskText: "complete a 30-minute strength training workout", 
    guide: "{\"overview\":\"...\", \"steps\":[...], \"proTip\":\"...\"}"
  }
}

// follow-up question vectors
{
  id: "followup_3933f9b7ab9b0f6f_5d67953f891f8d41_1753847633760", 
  values: questionEmbedding, // 1536 dimensions
  metadata: {
    type: "followup",
    taskHash: "3933f9b7ab9b0f6f", // links to how-to guide
    question: "how to do a lunge",
    answer: "To perform a lunge: 1. Stand with your feet hip-width apart..."
  }
}

vectorize query patterns

goal generation lookup:

const matches = await env.SEMANTIC_CACHE.query(problemEmbedding, {
  filter: { type: { $eq: "goal_generation" } },
  topK: 3,
  returnMetadata: "all"
});

how-to guide lookup:

const matches = await env.SEMANTIC_CACHE.query(taskEmbedding, {
  filter: { type: { $eq: "how_to_guide" } },
  topK: 3, 
  returnMetadata: "all"
});

follow-up question lookup:

const matches = await env.SEMANTIC_CACHE.query(questionEmbedding, {
  filter: { 
    type: { $eq: "followup" },
    taskHash: { $eq: taskHash }
  },
  topK: 3,
  returnMetadata: "all"
});

performance: native vector similarity search with hardware acceleration, ~2.1s cache hits vs ~6.5s cache misses in production

cost analysis

cost breakdown for 1000 users on similar fitness tasks:

without any caching:

  • initial guides: 1000 × $0.005 = $5.00
  • follow-up questions: 1000 × $0.005 = $5.00
  • task embeddings: 2000 × $0.0001 = $0.20
  • total: $10.20

with complete semantic caching:

  • initial guides: 1 × $0.005 = $0.005
  • follow-up questions: varies by uniqueness ≈ $0.05
  • task embeddings: 10 × $0.0001 = $0.0001
  • total: $0.055 (99.5% savings)

the first user asking about any topic creates a “knowledge seed” that benefits all future users with similar needs.

implementation details

rate limiting with graceful degradation

if (response.status === 429) {
  const retryAfter = errorData.retryAfter || 60;
  const minutes = Math.ceil(retryAfter / 60);
  throw new Error(
    `Too many questions! Please wait ${minutes} minute${minutes !== 1 ? 's' : ''} before asking another.`
  );
}

cache warming effects

the vectorize system creates organic cache warming through semantic similarity:

goal generation caching:

  • “get jacked” → instant responses for “build muscle”, “gain strength”, “get buff”
  • “learn spanish” → instant responses for “study spanish”, “spanish fluency”
  • each problem type builds a knowledge base that benefits similar requests

how-to guide caching:

  • “start 4 day upper lower split” → instant for “begin upper/lower routine”, “4-day workout plan”
  • “bake chocolate chip cookies” → instant for “make chocolate cookies”, “cookie baking”
  • semantic matching catches variations without exact string matches

follow-up question caching:

  • “what about sets and reps?” cached once, serves “how many repetitions?”, “sets and repetitions?”
  • “what temperature?” cached once, serves “baking temperature?”, “oven temp?”
  • context-aware caching ensures answers match the original task domain

monitoring and debugging

cache monitoring tracks vectorize performance:

// cache status headers distinguish between systems
return new Response(JSON.stringify(response), {
  headers: {
    'X-Cache-Status': cached ? 'HIT-VECTORIZE' : 'MISS-VECTORIZE',
    'X-Cache-Similarity-Score': bestMatch?.score?.toFixed(3),
    'X-Cache-Vector-Count': matches.matches?.length
  }
});

// console logging shows vectorize query results
console.log('=== GOAL GENERATION REQUEST START ===');
console.log('SEMANTIC_CACHE available:', !!env.SEMANTIC_CACHE);
console.log('Goal generation raw Vectorize response:', JSON.stringify(matches, null, 2));
console.log(`Goal generation search found ${matches.matches?.length || 0} potential matches`);

vectorize vector id structure enables easy debugging:

  • goal_{problemHash}_{timestamp} - goal generation vectors
  • guide_{taskHash}_{timestamp} - how-to guide vectors
  • followup_{taskHash}_{questionHash}_{timestamp} - follow-up q&a vectors

debug cli queries for troubleshooting:

# check metadata indexes
npx wrangler vectorize list-metadata-index semantic-cache

# query by type filter  
npx wrangler vectorize query semantic-cache --vector [...] --filter '{"type": "goal_generation"}'

# check vector count
npx wrangler vectorize info semantic-cache

can monitor hit rates in browser devtools and server logs, plus use cli tools to debug vectorize filtering issues.

alternatives considered

simple string matching: too brittle, misses semantic variations single embedding per question: loses task context, wrong answers llm-based similarity: too expensive for cache lookup hash-based caching: can’t handle natural language variations redis/database: unnecessary infrastructure complexity

lessons learned

migration insights

  1. start simple, upgrade strategically - kv validation → vectorize optimization
  2. reliable fallbacks enable confidence - dual systems during migration prevent downtime
  3. semantic caching scales exponentially - each user benefits from all previous interactions
  4. infrastructure timing matters - vectorize ga made the migration viable
  5. monitoring distinguishes systems - cache headers enable performance comparison

technical learnings

  1. metadata filtering is crucial - type and taskHash filters prevent wrong context matches
  2. similarity thresholds matter - 0.8+ cosine similarity for reliable semantic matching
  3. cache warming is organic - no need to pre-populate, users do it naturally
  4. conversation state is fragile - clear on modal close to prevent bugs
  5. graceful degradation - if caching fails, still call openai
  6. vectorize beats kv - native similarity search vs javascript iterations

production results

performance improvements measured:

  • cache hit latency: 6.5s → 2.1s (~3x faster)
  • api call reduction: 100+ kv operations → 1 vectorize query
  • cost reduction: 99.5% for popular interactions
  • cache accuracy: improved semantic matching vs string similarity

user experience impact:

  • instant responses for cached questions
  • better cross-user knowledge sharing
  • more consistent ai guidance
  • reduced rate limiting due to cache hits

metadata indexing gotcha: vectors vs indexes timing

after deploying the vectorize migration, we discovered a critical timing issue that broke filtering for goal generation and follow-up caching.

the problem: retroactive indexing doesn’t work

what happened:

  1. vectors were inserted into vectorize with metadata like type: "goal_generation"
  2. metadata indexes were created later via wrangler vectorize create-metadata-index
  3. filtered queries returned 0 results despite vectors having the correct metadata
// vectors inserted BEFORE metadata index creation
// ❌ cannot be found by filtered queries
const matches = await env.SEMANTIC_CACHE.query(embedding, {
  filter: { type: { $eq: "goal_generation" } }
}); // returns 0 results

// ✅ but can be found by unfiltered queries  
const allMatches = await env.SEMANTIC_CACHE.query(embedding, {
  topK: 10
}); // returns vectors with metadata intact

the discovery process

debugging revealed the issue:

# unfiltered query found 10 vectors including goals
npx wrangler vectorize query semantic-cache --vector [...] --top-k 10

# filtered query found only 2 newer vectors
npx wrangler vectorize query semantic-cache --vector [...] --filter '{"type": "goal_generation"}'

key insight: cloudflare vectorize metadata indexes only apply to vectors inserted after the index creation. existing vectors become invisible to filtered queries.

the solution: nuclear option

since the site had few users, we chose the clean slate approach:

# delete entire index
npx wrangler vectorize delete semantic-cache

# recreate with metadata indexes first
npx wrangler vectorize create semantic-cache --dimensions 1536 --metric cosine
npx wrangler vectorize create-metadata-index semantic-cache --propertyName=type --type=string
npx wrangler vectorize create-metadata-index semantic-cache --propertyName=taskHash --type=string

result: all new vectors are properly indexed and findable by filtered queries.

lessons for production systems

1. create metadata indexes before inserting vectors

# ✅ correct order for new vectorize setup

# step 1: create vectorize index
npx wrangler vectorize create semantic-cache --dimensions 1536 --metric cosine

# step 2: create metadata indexes BEFORE inserting any vectors
npx wrangler vectorize create-metadata-index semantic-cache --propertyName=type --type=string
npx wrangler vectorize create-metadata-index semantic-cache --propertyName=taskHash --type=string

# step 3: verify indexes are ready
npx wrangler vectorize list-metadata-index semantic-cache

# step 4: NOW safe to insert vectors with metadata
# vectors inserted after this point will be findable by filtered queries

in your application code:

// this will work correctly because metadata indexes exist
await env.SEMANTIC_CACHE.upsert([{
  id: "goal_abc123_timestamp",
  values: problemEmbedding,
  metadata: {
    type: "goal_generation",  // ✅ indexed
    problemHash: "abc123"     // ✅ indexed via taskHash
  }
}]);

// filtered queries will find the vector
const matches = await env.SEMANTIC_CACHE.query(embedding, {
  filter: { type: { $eq: "goal_generation" } }  // ✅ works
});

2. vectorize lacks reindexing capabilities unlike elasticsearch or other databases, vectorize doesn’t offer:

  • reindex command to rebuild metadata indexes
  • bulk export/import for data migration
  • retroactive index application

3. migration strategies for production data

  • small datasets: nuclear option (delete/recreate)
  • large datasets: build custom export/import pipeline
  • critical systems: implement dual-write during transition

4. monitoring metadata filtering add debug logging to detect filtering issues:

const filteredMatches = await env.SEMANTIC_CACHE.query(embedding, {
  filter: { type: { $eq: "goal_generation" } }
});

if (filteredMatches.matches.length === 0) {
  console.warn('Filtered query returned 0 results - check metadata indexes');
}

this metadata indexing gotcha cost us a few hours of debugging but taught valuable lessons about vectorize operational characteristics that aren’t well documented.

conclusion

semantic caching with vector databases transforms ai applications from expensive, slow interactions into fast, cost-effective systems that improve with every user.

key insights for ai application developers:

  • use the right tool for the job - vector databases excel at similarity search, traditional databases at exact lookups
  • semantic caching scales exponentially - each user interaction creates value for all future similar requests
  • ai responses don’t need to be unique - they need to be contextually appropriate and fast
  • metadata filtering is crucial - combine semantic similarity with structured filters for precise results
  • create metadata indexes before inserting vectors - retroactive indexing doesn’t work in most vector databases

when to use vector databases for caching:

  • ✅ user queries have natural language variations (“sets and reps” vs “repetitions”)
  • ✅ exact string matching misses too many valid cache hits
  • ✅ content generation is expensive (time or cost)
  • ✅ semantic similarity matters more than exact matches
  • ❌ simple key-value lookups work fine
  • ❌ transactional consistency is required

bottom line: if you’re building ai applications with expensive generation costs, semantic caching with vector databases can deliver 3x performance improvements and 99%+ cost reductions for popular content. the infrastructure investment pays for itself quickly through improved user experience and reduced ai api costs.

try the live system: actuallydostuff.com - click the ❓ on any goal to experience ~2s cached responses vs ~6s generated responses, powered by cloudflare vectorize.

dual-layer caching with cloudflare kv and localstorage

built a movie recommendation app that uses ai to analyze content themes. the challenge? each openai api call costs money and takes time. the solution? a dual-layer caching strategy that maximizes performance while minimizing costs.

the problem

calling openai’s api for every movie analysis is expensive and slow:

  • cost: $0.002 per 1k tokens with gpt-4o-mini
  • latency: 500-2000ms per request
  • scale: same movies analyzed repeatedly by different users

simple client-side caching helps individual users, but doesn’t solve the broader cost problem when multiple users analyze the same popular movies.

dual-layer approach

implemented two complementary caching layers:

layer 1: localstorage (client-side)

function getCachedAnalysis(movieTitle, year) {
  const cache = JSON.parse(localStorage.getItem('movieAnalysisCache') || '{}');
  const key = `${movieTitle}_${year}`;
  const cached = cache[key];
  
  if (cached && cached.timestamp) {
    const ageInDays = (Date.now() - cached.timestamp) / (1000 * 60 * 60 * 24);
    if (ageInDays < 30) {
      return cached.analysis;
    }
  }
  return null;
}

layer 2: cloudflare kv (server-side)

export async function onRequest(context) {
  const { request, env } = context;
  const { title, synopsis, year } = await request.json();
  
  const cacheKey = `analysis:${title}_${year}`.replace(/[^a-zA-Z0-9_-]/g, '_');
  
  // check kv cache first
  const cached = await env.KV.get(cacheKey, { type: 'json' });
  if (cached) {
    return new Response(JSON.stringify(cached), {
      headers: { 'X-Cache': 'HIT' }
    });
  }
  
  // cache miss - call openai
  const analysis = await callOpenAI(title, synopsis, year);
  
  // store in kv for 30 days
  await env.KV.put(cacheKey, JSON.stringify(analysis), {
    expirationTtl: 2592000
  });
  
  return new Response(JSON.stringify(analysis), {
    headers: { 'X-Cache': 'MISS' }
  });
}

how they work together

the caching cascade works like this:

  1. client checks localstorage - if hit, no network request needed
  2. if miss, calls api - server function gets invoked
  3. server checks kv cache - if hit, returns cached result (no openai cost)
  4. if miss, calls openai - makes expensive api call
  5. server stores in kv - future users benefit from cache
  6. client stores in localstorage - future requests from same user skip network
// client-side flow
const cachedAnalysis = getCachedAnalysis(movie.title, movie.year);
if (cachedAnalysis) {
  // best case: instant local cache hit
  displayAnalysis(cachedAnalysis);
  return;
}

// cache miss - make api call
const response = await fetch('/api/analyze-content', {
  method: 'POST',
  body: JSON.stringify({ title: movie.title, synopsis: movie.synopsis, year: movie.year })
});

const analysis = await response.json();
setCachedAnalysis(movie.title, movie.year, analysis); // store locally
displayAnalysis(analysis);

performance characteristics

this creates three distinct performance scenarios:

scenario network openai cost latency
localstorage hit none $0 ~1ms
kv hit api call $0 ~100ms
cache miss api call ~$0.01 ~1500ms

cost analysis

for a popular movie analyzed by 1000 users:

  • without caching: 1000 × $0.01 = $10.00
  • with localstorage only: 1000 × $0.01 = $10.00 (no sharing)
  • with dual-layer: 1 × $0.01 = $0.01 (99.9% savings)

the kv cache effectively amortizes the openai cost across all users.

cloudflare kv setup

cloudflare pages automatically injects kv bindings into function context through the dashboard:

  1. create kv namespace in cloudflare dashboard
  2. bind it to your pages project via dashboard ui (variable name: KV)
  3. the binding appears as env.KV in your functions automatically

no configuration files needed - it’s all done through the cloudflare dashboard interface.

cache invalidation

both layers use 30-day expiration:

  • localstorage: timestamp-based expiration check
  • kv: cloudflare’s native ttl handling
// localstorage expiration
const ageInDays = (Date.now() - cached.timestamp) / (1000 * 60 * 60 * 24);
if (ageInDays < 30) {
  return cached.analysis;
}

// kv expiration (automatic)
await env.KV.put(cacheKey, data, {
  expirationTtl: 2592000 // 30 days in seconds
});

debugging cache behavior

added cache status headers to track hit/miss patterns:

return new Response(JSON.stringify(analysis), {
  headers: {
    'Content-Type': 'application/json',
    'X-Cache': cached ? 'HIT' : 'MISS'
  }
});

can monitor in browser devtools to verify caching is working.

alternatives considered

redis: would work but adds infrastructure complexity cdn caching: doesn’t work for dynamic post requests database caching: overkill for simple key-value needs memory caching: doesn’t persist across deployments

cloudflare kv hits the sweet spot - globally distributed, zero infrastructure, tight pages integration.

trade-offs

pros:

  • massive cost reduction for popular content
  • improved user experience (faster responses)
  • zero infrastructure management
  • global edge caching

cons:

  • slightly more complex than single-layer
  • potential for stale data (30-day window)
  • kv has eventual consistency (rare edge case)

implementation tips

  1. cache key normalization - strip special characters to avoid kv key issues
  2. graceful degradation - if kv fails, still call openai
  3. cache warming - consider pre-caching popular movies
  4. monitoring - track hit rates and costs

conclusion

dual-layer caching with localstorage and cloudflare kv provides the best of both worlds - instant local performance and shared cost benefits. for api-heavy applications, this pattern can dramatically reduce costs while improving user experience.

the key insight is that not all caching needs to be shared - combining individual and collective caching layers creates optimal performance characteristics for different use cases.

adding fuzzy search to a jekyll blog

wanted to add search to my blog without any server-side complexity or external services. turns out jekyll’s liquid templating makes this surprisingly elegant.

the liquid magic

jekyll can generate json files during build time. here’s the key insight - we can loop through all posts and create a searchable index:

---
layout: null
---
[
  {% for post in site.posts %}
    {
      "title": "{{ post.title | escape }}",
      "excerpt": "{{ post.excerpt | strip_html | truncatewords: 50 | escape }}",
      "url": "{{ post.url }}",
      "date": "{{ post.date | date: '%B %d, %Y' }}",
      "categories": {{ post.categories | jsonify }}
    }{% unless forloop.last %},{% endunless %}
  {% endfor %}
]

the layout: null tells jekyll to output raw json without any html wrapper. the {% unless forloop.last %} handles the trailing comma problem that would break json parsing.

automatic reindexing

this is the beautiful part - no rake tasks or manual reindexing needed. every time you run jekyll build or jekyll serve, the search.json gets regenerated with your latest posts. jekyll’s build process handles the entire search index automatically.

fuse.js integration

for the actual search, fuse.js does the heavy lifting. it’s 6kb gzipped and handles fuzzy matching really well:

fetch('/search.json')
  .then(response => response.json())
  .then(data => {
    fuse = new Fuse(data, {
      keys: ['title', 'excerpt', 'categories'],
      threshold: 0.3,
      includeScore: true
    });
  });

the threshold: 0.3 is the sweet spot - strict enough to avoid nonsense results but loose enough to catch typos and partial matches.

why this approach works

  • no server required - everything happens client-side
  • no build complexity - uses jekyll’s existing templating
  • always current - updates with every build
  • fast - json loads once, search happens locally
  • lightweight - fuse.js is tiny, no dependencies

search ui

added a simple input to the sidebar that shows results as you type:

searchInput.addEventListener('input', function(e) {
  const query = e.target.value.trim();

  if (query.length < 2) {
    searchResults.innerHTML = '';
    return;
  }

  const results = fuse.search(query);
  displayResults(results);
});

only triggers after 2 characters to avoid noise. shows up to 5 results with title, excerpt, and date.

liquid templating gotchas

few things to watch out for:

  • json escaping - use | jsonify filter instead of | escape for proper json encoding
  • strip html and newlines - excerpts need | strip_html | strip_newlines to avoid json breaks
  • arrays - jekyll’s | jsonify filter handles arrays and escaping automatically
  • trailing commas - the {% unless forloop.last %} pattern prevents json errors

csp considerations

if you’re using content security policy, you’ll need to allow the fuse.js cdn and local fetch requests:

script-src 'self' https://cdn.jsdelivr.net 'unsafe-inline'
connect-src 'self' https://disqus.com

or download fuse.js locally to avoid external dependencies entirely.

parse don't validate in ruby: building safer applications

Dynamic languages like Ruby give you flexibility, but they also put the burden of data safety on you. Without compile-time type checking, how do you ensure your application doesn’t crash when it receives unexpected data?

The answer is the “parse don’t validate” pattern - a technique popularized by Alexis King’s influential 2019 blog post that transforms unknown input into well-defined, validated objects before it reaches your business logic.

the problem with validation-only approaches

Traditional validation approaches check if data is correct, but then continue working with the original, unstructured data:

# Don't do this - validation without transformation
def create_user(params)
  if params[:email].present? && params[:name].present?
    User.create(params) # Still working with unstructured hash
    # What if params has unexpected keys?
    # What if email is nil despite the check?
    # What if someone changes the validation logic?
  end
end

This leaves you vulnerable to runtime errors when the raw data doesn’t match your assumptions.

parse don’t validate: transform input into structured objects

Instead of just checking validity, transform unknown data into known, typed structures:

class UserForm
  include ActiveModel::Model
  include ActiveModel::Attributes

  attribute :name, :string
  attribute :email, :string
  attribute :id, :integer

  validates :name, presence: true
  validates :email, presence: true, format: { with: URI::MailTo::EMAIL_REGEXP }
  validates :id, presence: true, numericality: { greater_than: 0 }

  def self.parse(data)
    form = new(data)
    raise ArgumentError, form.errors.full_messages.join(', ') unless form.valid?
    form
  end
end

# Usage
user_form = UserForm.parse(params) # Returns UserForm or raises
User.create!(user_form.attributes)

why this pattern matters

  1. Explicit contracts - Clear what each component expects and returns
  2. Fail fast - Catch invalid data at system boundaries, not deep in business logic
  3. Self-documenting - Code clearly shows what data flows through the system
  4. Centralized validation - All validation rules in one place per data type
  5. Better error messages - Specific, actionable feedback about what’s wrong

controllers: transform filtered params into validated objects

Strong parameters handle security (preventing mass assignment), but they still return unvalidated hashes. Add a parsing layer for data integrity:

class UsersController < ApplicationController
  def create
    # Strong parameters filter, then parse for validation
    user_form = UserForm.parse(user_params)
    @user = UserCreationService.call(user_form)
    render json: UserSerializer.new(@user).to_h
  rescue ArgumentError => e
    render json: { error: e.message }, status: 422
  end

  private

  def user_params
    params.require(:user).permit(:name, :email, :id)
  end
end

services: accept structured objects

Services should work with validated, structured data rather than raw hashes:

class UserCreationService
  def self.call(user_form) # Explicit contract, not random hash
    user = User.create!(user_form.attributes)
    NotificationMailer.welcome_email(user).deliver_later
    user
  end
end

external api integration

Transform external responses into internal objects to maintain consistent data contracts:

class StripeChargeResult
  include ActiveModel::Model
  include ActiveModel::Attributes

  attribute :charge_id, :string
  attribute :amount_cents, :integer
  attribute :status, :string
  attribute :created_at, :datetime

  def self.from_stripe_response(response)
    new(
      charge_id: response[:id],
      amount_cents: response[:amount],
      status: response[:status],
      created_at: Time.at(response[:created])
    )
  end

  def successful? = status == 'succeeded'
  def amount_dollars = amount_cents / 100.0
end

# Usage
stripe_response = stripe_client.charges.create(charge_params)
charge_result = StripeChargeResult.from_stripe_response(stripe_response)

if charge_result.successful?
  record_payment(charge_result)
end

background jobs: structured arguments

Instead of working with argument hashes, parse job parameters into validated objects:

class EmailJobParams
  include ActiveModel::Model
  include ActiveModel::Attributes

  attribute :user_id, :integer
  attribute :template, :string, default: 'welcome'
  attribute :delay_minutes, :integer, default: 0

  validates :user_id, presence: true
  validates :template, inclusion: { in: %w[welcome premium reminder] }

  def self.parse(args)
    params = new(args)
    raise ArgumentError, params.errors.full_messages.join(', ') unless params.valid?
    params
  end

  def user
    @user ||= User.find(user_id)
  end
end

class WelcomeEmailJob < ApplicationJob
  def perform(raw_args)
    job_params = EmailJobParams.parse(raw_args)
    WelcomeMailer.send_email(job_params.user, job_params.template).deliver_now
  end
end

alternative: result objects

For applications that prefer explicit success/failure handling over exceptions:

require 'dry/monads'

class UserForm
  include Dry::Monads[:result]
  include ActiveModel::Model
  include ActiveModel::Attributes

  attribute :name, :string
  attribute :email, :string
  attribute :id, :integer

  validates :name, presence: true
  validates :email, format: { with: URI::MailTo::EMAIL_REGEXP }

  def self.safe_parse(data)
    form = new(data)
    return Failure(form.errors.full_messages) unless form.valid?
    Success(form)
  end
end

# Usage
case UserForm.safe_parse(params)
in Success(user_form)
  User.create!(user_form.attributes)
in Failure(errors)
  render json: { errors: errors }, status: 422
end

libraries to consider

  • dry-validation - Advanced validation with detailed error handling
  • dry-monads - Result objects and functional patterns
  • dry-struct - Immutable value objects with type coercion
  • reform - Form objects that integrate seamlessly with Rails

implementation strategy

  1. Start with new features - Apply parsing pattern to all new controllers and services
  2. Focus on boundaries - Prioritize user input, external APIs, and background jobs
  3. Refactor incrementally - Convert existing code one component at a time

the outcome

By parsing unknown data into known structures at every boundary, you eliminate a whole class of runtime errors. Your code becomes more predictable, easier to debug, and self-documenting.

In dynamic languages, explicit data contracts aren’t just good practice - they’re essential for building reliable applications that handle real-world data gracefully.

form validation with jitter effects

Ever notice how form validation can feel… boring? You click submit, some fields turn red, maybe you get an error message. Functional, sure. But what if we could make invalid fields literally shake their heads at you?

try it out

Jitter Effect Preview

Fill out this form and try the validation buttons to see different jitter animations on required empty fields.

the problem with standard validation

HTML5 gives us built-in form validation with the required attribute and input types like email. But the default browser behavior is pretty bland. And if you add novalidate to your form (which many of us do for custom validation), you’re on your own for feedback.

<form id="testForm" novalidate>
  <input type="email" required>
</form>

That novalidate attribute tells the browser “thanks but no thanks, I’ll handle validation myself.” Which opens the door for more creative feedback…

enter jitter animations

Instead of just turning fields red, what if they physically reacted to being empty? Here’s a collection of CSS animations that give form fields personality:

@keyframes jitter-shake {
  0%, 100% { transform: translateX(0); }
  10%, 30%, 50%, 70%, 90% { transform: translateX(-4px); }
  20%, 40%, 60%, 80% { transform: translateX(4px); }
}

@keyframes jitter-bounce {
  0%, 100% { transform: translateY(0); }
  25% { transform: translateY(-8px); }
  50% { transform: translateY(-4px); }
  75% { transform: translateY(-2px); }
}

@keyframes jitter-pulse {
  0%, 100% { transform: scale(1); }
  50% { transform: scale(1.05); }
}

@keyframes jitter-wobble {
  0%, 100% { transform: rotate(0deg); }
  25% { transform: rotate(-2deg); }
  50% { transform: rotate(2deg); }
  75% { transform: rotate(-1deg); }
}

Each animation has its own personality and color:

  • shake: the classic “nope” head shake (red #f7768e)
  • bounce: a gentle hop like “hey, over here!” (orange #ff9e64)
  • pulse: subtle breathing effect for a softer touch (purple #bb9af7)
  • wobble: playful rotation that feels less aggressive (pink #f7768e)

The colors are applied along with the animation class:

.jitter-shake {
    animation: jitter-shake 0.6s ease-in-out;
    border-color: #f7768e !important;
}

This means when an input gets the jitter effect, it not only animates but also changes its border color to match the animation’s personality. The !important ensures the color override takes precedence during the animation.

the validation logic

Here’s where it gets interesting. Instead of validating fields one by one, we collect all empty required fields and animate them simultaneously. The function gets called from the test buttons, each passing a different animation type:

<button type="button" onclick="validateForm('shake')">🔸 Test Shake</button>
<button type="button" onclick="validateForm('bounce')">🔹 Test Bounce</button>
<button type="button" onclick="validateForm('pulse')">🔸 Test Pulse</button>
<button type="button" onclick="validateForm('wobble')">🔹 Test Wobble</button>

And here’s the validation function itself:

function validateForm(animationType = 'shake') {
  const form = document.getElementById('testForm');
  const requiredInputs = form.querySelectorAll('input[required], textarea[required], select[required]');

  let emptyFields = [];
  let emptyInputs = [];

  // first pass: identify all empty fields
  requiredInputs.forEach(input => {
    if (isEmpty(input)) {
      emptyFields.push(input.name || input.id);
      emptyInputs.push(input);
    }
  });

  // second pass: jitter all empty fields simultaneously
  if (emptyInputs.length > 0) {
    emptyInputs.forEach(input => {
      jitterElement(input.id, animationType);
    });

    statusDiv.className = 'status error';
    statusDiv.textContent = `❌ Please fill in: ${emptyFields.join(', ')}`;
    return false;
  }

  return true;
}

The two-pass approach means all invalid fields react at once, creating a unified “these need attention” moment rather than a sequential cascade.

applying the effect

The jitter function handles the animation lifecycle cleanly:

function jitterElement(elementId, animationType = 'shake') {
  const element = document.getElementById(elementId);
  const className = `jitter-${animationType}`;

  // only remove classes if this element doesn't already have the target class
  if (!element.classList.contains(className)) {
    // remove any existing jitter classes
    element.classList.remove('jitter-shake', 'jitter-bounce', 'jitter-pulse', 'jitter-wobble');
  }

  // add the new jitter class
  element.classList.add(className);

  // remove after animation completes
  setTimeout(() => {
    element.classList.remove(className);
  }, 600);
}

subtle touches

You can also add jitter on blur for immediate feedback:

document.querySelectorAll('input[required]').forEach(input => {
  input.addEventListener('blur', function() {
    if (isEmpty(this)) {
      setTimeout(() => jitterElement(this.id, 'shake'), 100);
    }
  });
});

That 100ms delay prevents the animation from feeling too aggressive when users are just tabbing through fields.

The form submission also triggers validation:

document.getElementById('testForm').addEventListener('submit', function(e) {
  e.preventDefault();

  if (validateForm('shake')) {
    // form is valid, show success message
  }
});

So validation happens in three scenarios:

  1. When clicking any of the test buttons (with their specific animation)
  2. When blurring out of a required field that’s empty (always uses shake)
  3. When submitting the form (uses shake by default)

when to use what

Different animations work better for different contexts:

  • use shake for critical errors or final form submission
  • use bounce for friendly reminders
  • use pulse for subtle hints in longer forms
  • use wobble for playful interfaces or less serious applications

The key is matching the animation personality to your form’s context. A medical form probably wants subtle pulse effects. A game signup might embrace the full wobble.

performance notes

CSS transforms are GPU-accelerated, so these animations won’t cause layout thrashing. The 600ms duration is long enough to be noticeable but short enough to not feel sluggish. And since we’re using classes rather than inline styles, the browser can optimize the animations.

Form validation doesn’t have to be boring. Sometimes a little shake is all you need to make the experience memorable without being annoying.