Sid Ngeth's Blog A blog about anything (but mostly development)

Using Longest Increasing Subsequence to Analyze Training Block Effectiveness

When designing strength programs, we often organize training into 4-week blocks. By combining the Longest Increasing Subsequence (LIS) algorithm with training block analysis, we can identify which combinations of volume, intensity, and exercise variations lead to the best E1RM progressions.

Understanding the Core Algorithm

First, let’s look at the elegant algorithm that powers our analysis - the LIS implementation using Patience Sorting:

def find_progression(nums, min_improvement \\ 2.5) do
  # dp[i] stores smallest number that can end subsequence of length i+1
  dp = []
  prev = %{}  # For reconstructing the sequence

  {dp, prev} = Enum.with_index(nums)
    |> Enum.reduce({dp, prev}, fn {num, i}, {dp, prev} ->
      # Find position where this number belongs
      pos = find_position(dp, num, min_improvement)

      # Track for reconstruction
      prev = if pos > 0,
        do: Map.put(prev, i, {pos - 1, Enum.at(dp, pos - 1)}),
        else: prev

      # Update dp array
      dp = if pos == length(dp),
        do: dp ++ [num],
        else: List.replace_at(dp, pos, num)

      {dp, prev}
    end)

  # Reconstruct the sequence
  {length(dp), reconstruct(nums, prev, length(dp) - 1, dp)}
end

defp find_position(dp, target, min_improvement) do
  do_binary_search(dp, target, min_improvement, 0, length(dp))
end

defp do_binary_search(dp, target, min_improvement, left, right) when left < right do
  mid = div(left + right, 2)
  mid_val = Enum.at(dp, mid)

  cond do
    mid_val == nil -> left
    target - mid_val >= min_improvement ->
      do_binary_search(dp, target, min_improvement, mid + 1, right)
    true ->
      do_binary_search(dp, target, min_improvement, left, mid)
  end
end

defp do_binary_search(_, _, _, left, _), do: left

defp reconstruct(nums, prev, pos, dp) do
  case Map.get(prev, pos) do
    nil -> [Enum.at(dp, pos)]
    {prev_pos, prev_num} ->
      reconstruct(nums, prev, prev_pos, dp) ++ [Enum.at(dp, pos)]
  end
end

This algorithm finds the longest sequence of E1RMs where each value is at least min_improvement greater than the previous. Time complexity is O(n log n).

Structuring Training Data

defmodule TrainingBlock do
  defstruct [
    :block_number,
    :start_date,
    :end_date,
    :primary_movement,     # e.g., "Comp Squat", "Paused Bench"
    :volume_per_session,   # sets * reps
    :intensity_range,      # % of E1RM
    :frequency_per_week,
    :variations_used,      # e.g., ["Paused", "Tempo", "Close Grip"]
    :sets,                 # List of actual training sets
    :starting_e1rm,
    :ending_e1rm
  ]
end

defmodule TrainingSet do
  defstruct [:date, :weight, :reps, :rpe, :e1rm]

  def calculate_e1rm(weight, reps) do
    weight * (36 / (37 - reps))  # Brzycki formula
  end
end

Block Analysis System

defmodule BlockAnalyzer do
  def analyze_progression_patterns(blocks, min_improvement \\ 2.5) do
    # Group blocks by exercise
    blocks_by_exercise = Enum.group_by(blocks, & &1.primary_movement)

    # Analyze each exercise
    Enum.map(blocks_by_exercise, fn {exercise, exercise_blocks} ->
      # Get E1RMs
      e1rms = Enum.map(exercise_blocks, & &1.ending_e1rm)

      # Find progression using LIS
      {length, progression} = find_progression(e1rms, min_improvement)

      # Map back to block characteristics
      successful_blocks =
        exercise_blocks
        |> Enum.filter(& &1.ending_e1rm in progression)
        |> Enum.sort_by(& &1.block_number)

      {exercise, analyze_characteristics(successful_blocks)}
    end)
  end

  defp analyze_characteristics(blocks) do
    %{
      avg_volume_per_session: average_volume(blocks),
      most_successful_intensity: common_intensity_range(blocks),
      optimal_frequency: most_common_frequency(blocks),
      effective_variations: most_effective_variations(blocks),
      block_sequence: extract_block_sequence(blocks)
    }
  end
end

Example Usage and Output

Here’s how we can analyze 6 months (6 blocks) of training:

blocks = [
  %TrainingBlock{
    block_number: 1,
    primary_movement: "Competition Bench",
    volume_per_session: 15,  # 5 sets of 3
    intensity_range: "80-85%",
    frequency_per_week: 2,
    variations_used: ["Paused"],
    starting_e1rm: 100,
    ending_e1rm: 102.5
  },
  %TrainingBlock{
    block_number: 2,
    primary_movement: "Competition Bench",
    volume_per_session: 24,  # 6 sets of 4
    intensity_range: "75-80%",
    frequency_per_week: 2,
    variations_used: ["Paused", "Tempo"],
    starting_e1rm: 102.5,
    ending_e1rm: 105
  },
  # ... more blocks ...
]

analysis = BlockAnalyzer.analyze_progression_patterns(blocks)

The output shows us the successful progression patterns:

%{
  "Competition Bench" => %{
    progression: [102.5, 105.0, 108.5, 112.0],
    characteristics: %{
      avg_volume_per_session: 20,
      most_successful_intensity: "75-80%",
      optimal_frequency: 2,
      effective_variations: ["Paused", "Tempo"],
      block_sequence: [
        %{volume: "moderate", intensity: "moderate"},
        %{volume: "high", intensity: "moderate"},
        %{volume: "moderate", intensity: "high"},
        %{volume: "low", intensity: "very high"}
      ]
    }
  }
}

How It Works

  1. Finding True Progression
    # Example E1RMs: [100, 102.5, 101, 105, 104, 108]
    # With min_improvement = 2.5kg:
    
    Step 1: [100]
    Step 2: [100, 102.5]     # Valid as 102.5 - 100 = 2.5kg
    Step 3: [100, 102.5]     # Skip 101 as it's lower than 102.5
    Step 4: [100, 105]       # Replace 102.5 with 105 (better improvement)
    Step 5: [100, 105]       # Skip 104 as it's lower than 105
    Step 6: [100, 105, 108]  # Add 108 (valid 3kg improvement)
    
    # Result: [100, 105, 108] - Three block progression
    
  2. Block Pattern Analysis
    • Each number in our progression represents a successful block
    • We analyze characteristics of these blocks
    • Look for patterns in volume, intensity, and variation

Using the Results

  1. Program Design
    def design_next_block(current_e1rm, analysis_results) do
     successful_pattern = find_matching_pattern(analysis_results)
     next_block_characteristics = predict_next_block(successful_pattern)
    
     %{
       suggested_volume: next_block_characteristics.volume,
       suggested_intensity: next_block_characteristics.intensity,
       suggested_variations: next_block_characteristics.variations,
       expected_improvement: next_block_characteristics.expected_gain
     }
    end
    
  2. Progress Prediction
    def predict_block_outcome(current_e1rm, block_characteristics) do
     similar_blocks = find_similar_blocks(block_characteristics)
     average_improvement = calculate_avg_improvement(similar_blocks)
    
     %{
       expected_improvement: average_improvement,
       confidence: calculate_confidence(similar_blocks),
       recommended_modifications: suggest_modifications(similar_blocks)
     }
    end
    

Conclusion

By using the LIS algorithm with Patience Sorting, we can:

  1. Find genuine progression patterns in our training
  2. Identify block characteristics that lead to consistent progress
  3. Make data-driven decisions about program design
  4. Predict likely outcomes of different block structures

The algorithm’s efficiency (O(n log n)) makes it practical for analyzing large training histories, while its ability to find strictly increasing sequences with minimum improvements makes it perfect for strength training analysis.


Making a D3 Sankey Chart Responsive in React

Creating data visualizations that work well across different screen sizes can be challenging. Today, I’ll walk you through how I enhanced a D3.js Sankey chart to be more responsive in a React application. Here’s how we made our chart adapt seamlessly to any screen size. The app is deployed here and the full source code is http://github.com/sngeth/cash-flow

The Challenge

Our initial Sankey chart worked well on desktop but had several limitations on smaller screens:

  • Labels would overlap on narrow viewports
  • Node spacing was too wide for mobile screens
  • Font sizes were too large for smaller displays
  • The chart wouldn’t resize smoothly on window resize

Key Responsive Improvements

1. Dynamic SVG Dimensions

export default function SankeyChart({ income, savings, billItems }: SankeyChartProps) {
  const svgRef = useRef<SVGSVGElement>(null);

  const createChart = useCallback(() => {
    if (!svgRef.current) return;

    const svg = d3.select(svgRef.current);
    const width = svg.node()!.getBoundingClientRect().width;
    const height = svg.node()!.getBoundingClientRect().height;
  ...
}

2. Adaptive Node Padding

We adjust the spacing between nodes based on screen width:

const nodePadding = width < 600 ? 10 : 20;

const sankeyGenerator = sankey<SankeyNodeExtended, SankeyLink<SankeyNodeExtended, {}>>()
  .nodeWidth(10)
  .nodePadding(nodePadding)
  .extent([[1, 1], [width - 1, height - 6]]);

This provides:

  • Comfortable spacing on desktop (20px)
  • Compact layout on mobile (10px)
  • Better use of available space across all devices

3. Responsive Text Handling

We implement dynamic font sizing based on viewport width:

const fontSize = width < 600 ? "10px" : "12px";

node.append("text")
  .attr("font-size", fontSize)
  .attr("x", d => (d.x0 ?? 0) < width / 2 ? (d.x1 ?? 0) + 6 : (d.x0 ?? 0) - 6)
  .attr("y", d => ((d.y1 ?? 0) + (d.y0 ?? 0)) / 2)
  .attr("dy", "0.35em")
  .attr("text-anchor", d => (d.x0 ?? 0) < width / 2 ? "start" : "end")
  .text(d => `${d.name}: $${d.value ?? 0}`);

Key features:

  • Smaller font on mobile devices
  • Dynamic text positioning
  • Smart text anchor points based on node position

4. Smooth Resize Handling

We implemented efficient window resize handling:

useEffect(() => {
  createChart();
  const handleResize = () => {
    createChart();
  };
  window.addEventListener('resize', handleResize);
  return () => {
    window.removeEventListener('resize', handleResize);
  };
}, [createChart]);

This ensures:

  • Chart redraws on window resize
  • Clean cleanup of event listeners
  • Smooth transitions between sizes

5. Clean Redraws

Before each redraw, we clear the previous chart:

svg.selectAll('*').remove();

Locality of Behavior vs SOLID: Finding Balance in Code Organization

Software companies often push for modular, highly-abstracted code in pursuit of flexibility and maintainability. However, this approach can inadvertently create significant cognitive overhead for developers, especially those new to a codebase. As codebases grow more complex and distributed, developers increasingly face mental fatigue from juggling numerous abstractions and navigating sprawling file structures. This raises an important question: Are our current practices truly serving us, or are they contributing to developer burnout? The resurgence of interest in locality of behavior, along with the popularity of tools like HTMX and the emergence of “anti-design patterns,” suggests a growing desire for simpler, more cognitively manageable code structures. But how do we balance these competing concerns?

Reflecting on my experience applying for a software internship in 2008, I recall being bombarded with questions about object-oriented programming (OOP), inheritance, and polymorphism. At the time, these concepts were considered essential for writing and understanding modular code. The industry’s focus on these principles stemmed from the belief that they led to more maintainable and scalable software. However, this approach raises an important question: Did the emphasis on OOP truly prepare developers for the complexities of real-world software development? While these concepts can be powerful tools, they don’t necessarily justify the cognitive overhead they introduce. Interview questions rarely addressed the critical skill of determining when such complexity is warranted or how to balance modularity with code readability and maintainability. This disconnect between interview practices and practical development needs highlights the ongoing challenge of finding the right balance in code organization and design.

Understanding Locality of Behavior

Before diving into code organization patterns, let’s understand a fundamental principle that often conflicts with traditional SOLID advice: Locality of Behavior (LoB).

Locality of Behavior was prominently discussed by Richard P. Gabriel in his patterns work and gained more attention through Alan Kay’s ideas about object-oriented programming. However, it really entered mainstream discussion through Rich Hickey (creator of Clojure) who has spoken about it extensively.

The core idea is simple but powerful: code should be organized so that related behaviors are kept close together. In other words, all the code needed to understand a particular operation should be in the same place.

This principle has strong academic roots:

  1. Richard P. Gabriel discussed it in “Patterns of Software: Tales from the Software Community” (1996)
  2. Rich Hickey’s “Simple Made Easy” presentation explores the cognitive overhead of scattered code
  3. John Ousterhout’s “A Philosophy of Software Design” (2018) discusses “deep modules” that keep implementation details close to their interface

Let’s examine how this principle plays out in real code.

The Case for Keeping Things Together

First, let’s look at code with high locality of behavior:

class FileProcessor
  def process(file)
    case file.extension
    when '.csv'
      process_csv(file)    # CSV behavior is local
    when '.json'
      process_json(file)   # JSON behavior is local
    end
  end

  private

  def process_csv(file)
    CSV.read(file.path).map { |row| row.map(&:strip) }  # The full CSV behavior is visible right here
  end

  def process_json(file)
    JSON.parse(File.read(file.path))                    # The full JSON behavior is visible right here
  end
end

Compare this with code that has low locality of behavior:

class FileProcessor
  def process(file)
    processor_for(file.extension).process(file)  # Have to look elsewhere to find the processor
  end
end

class CsvProcessor
  def process(file)
    clean_values(           # Have to look elsewhere to find what clean_values does
      read_csv(file)       # Have to look elsewhere to find what read_csv does
    )
  end
end

module ValueCleaner
  def clean_values(data)   # The actual behavior is far from where it's used
    data.map { |row| row.map(&:strip) }
  end
end

The Great SOLID Debate

Before we dive deeper, let’s address the elephant in the room: SOLID principles, particularly the Open/Closed Principle (OCP), have faced criticism in recent years. Critics argue that breaking everything into separate files and abstractions can actually make code harder to understand. They have a point – let’s look at both sides.

Different Approaches to Code Organization

The Inheritance Approach

Here’s how many developers first attempt to separate concerns:

# base_processor.rb
class BaseProcessor
  def process(file)
    raise NotImplementedError
  end

  protected

  def strip_values(data)
    data.map { |row| row.map(&:strip) }
  end
end

# csv_processor.rb
class CsvProcessor < BaseProcessor
  def process(file)
    data = CSV.read(file.path)
    strip_values(data)
  end
end

# json_processor.rb
class JsonProcessor < BaseProcessor
  def process(file)
    JSON.parse(File.read(file.path))
  end
end

# file_processor.rb
class FileProcessor
  PROCESSORS = {
    '.csv' => CsvProcessor,
    '.json' => JsonProcessor
  }

  def process(file)
    processor_class = PROCESSORS[file.extension] ||
      raise("Unsupported format: #{file.extension}")

    processor_class.new.process(file)
  end
end

Mental Model Required:

  • Understand class inheritance
  • Know to look in multiple files
  • Grasp abstract base classes
  • Learn about class registration patterns

New Developer Questions:

“Why do we need a BaseProcessor? Where are the actual processing methods? How do I find which processor handles which format? Why is strip_values in the base class?”

The Composition Approach

Here’s a composition-based approach:

# processors/csv.rb
module Processors
  class Csv
    def self.process(file)
      new(file).process
    end

    def initialize(file)
      @file = file
    end

    def process
      ValueCleaner.new(
        CsvReader.new(@file)
      ).process
    end
  end
end

# processors/components/csv_reader.rb
class CsvReader
  def initialize(file)
    @file = file
  end

  def process
    CSV.read(@file.path)
  end
end

# processors/components/value_cleaner.rb
class ValueCleaner
  def initialize(source)
    @source = source
  end

  def process
    @source.process.map { |row| row.map(&:strip) }
  end
end

# file_processor.rb
class FileProcessor
  PROCESSORS = {
    '.csv' => Processors::Csv,
    '.json' => Processors::Json
  }

  def process(file)
    processor_class = PROCESSORS[file.extension] ||
      raise("Unsupported format: #{file.extension}")

    processor_class.process(file)
  end
end

Mental Model Required:

  • Understand object composition
  • Grasp dependency injection
  • Know about component assembly
  • Navigate deeper directory structures

New Developer Questions:

“Why are there so many small classes? How do these pieces fit together? Where does the processing actually happen? How do I trace the flow?”

Finding Balance: A More Approachable Solution

Here’s a middle ground that maintains separation while being more approachable:

# file_processor.rb
class FileProcessor
  def process(file)
    processor_for(file.extension).process(file)
  end

  private

  def processor_for(extension)
    case extension
    when '.csv' then CsvProcessor.new
    when '.json' then JsonProcessor.new
    else raise "Unsupported format: #{extension}"
    end
  end
end

# processors.rb
class CsvProcessor
  def process(file)
    clean_values(
      read_csv(file)
    )
  end

  private

  def read_csv(file)
    CSV.read(file.path)
  end

  def clean_values(data)
    data.map { |row| row.map(&:strip) }
  end
end

class JsonProcessor
  def process(file)
    JSON.parse(File.read(file.path))
  end
end

Mental Model Required:

  • Basic object-oriented programming
  • Simple method delegation
  • Two files to navigate

New Developer Experience:

“I can see how processors are selected and where their logic lives. Adding a new format means adding a new processor class with a process method. The processing steps are clear within each processor.”

Key Insights for Real-World Development

  1. Cognitive Load Matters
    • Every layer of abstraction is a concept developers must hold in their head
    • More files = more context switching
    • Simpler patterns = faster onboarding
  2. The Cost of Flexibility
    • Inheritance creates rigid hierarchies that are hard to change
    • Deep composition can make code flow hard to follow
    • Not every difference needs its own abstraction
  3. Signs You Might Be Over-Separating
    • You need a diagram to explain the code structure
    • New developers frequently ask “where does X happen?”
    • Changes require touching many files
    • Test setup becomes complex
  4. When Separation Makes Sense
    • Processing logic is complex (>20-30 lines)
    • Components have different deployment/testing needs
    • Different teams own different processors
    • Performance requires lazy loading

Practical Guidelines

  1. Start Together
    • Keep code in one place until patterns emerge
    • Don’t separate based on speculation
    • Let real requirements drive design
  2. Separate Gradually
    • Move code out when it proves necessary
    • Keep related code close together
    • Document why separation was needed
  3. Optimize for Understanding
    • Could a new developer understand this in their first week?
    • Is the separation making the code clearer or just more “proper”?
    • Are you solving real problems or theoretical ones?

Benefits of Locality of Behavior

  1. Reduced cognitive load - developers don’t have to jump between files
  2. Easier debugging - the full context is visible
  3. Better performance - related code tends to be loaded together
  4. Simpler testing - fewer dependencies to mock

The principle doesn’t mean “put everything in one file” but rather “keep related behaviors together.” The challenge is determining what “related” means in your specific context.

Conclusion

The best code isn’t the most perfectly separated – it’s the code that helps your team move quickly and confidently. Sometimes that means keeping things together, even if it doesn’t satisfy every SOLID principle.

Remember: Every layer of indirection you add is a concept that must live in a developer’s mental model of the system. Choose wisely.

What’s your experience with code organization patterns? How do you balance separation with understandability? Share your thoughts in the comments below.

References

  1. Gabriel, Richard P. (1996). “Patterns of Software: Tales from the Software Community”
  2. Hickey, Rich. “Simple Made Easy” presentation
  3. Ousterhout, John. (2018). “A Philosophy of Software Design”

Database Indexes

Why This Interview Question Needs a Rethink

Software companies frequently ask about database indexing during interviews, which might seem puzzling given that complexity analysis can be easily googled. Even more puzzling: if the goal is to assess system design knowledge, why not directly ask about specific scaling challenges or data access patterns?

The truth is, this question often reveals more about the interviewer’s habits than their assessment goals. A better line of questioning might be:

  • “What read/write patterns in your current system influenced your indexing strategy?”
  • “How did you determine when to add or remove indexes in production?”
  • “What monitoring helped you identify index-related performance issues?”

These questions would better reveal an engineer’s practical experience with database performance tuning. Nevertheless, let’s explore both the theoretical foundations and real-world implications that make index knowledge crucial for day-to-day engineering decisions.

The Theoretical Foundation

Complexity Analysis

Without an index (sequential scan):

  • Time complexity: O(n) where n is the number of rows
  • Every row must be examined to find matches
  • Optimal for scanning large portions of the table (>15-20% of rows)

With a B-tree index:

  • Time complexity: O(log n) for lookups, inserts, and deletes
  • B-tree height typically remains 2-4 levels even with millions of rows
  • Each level requires one disk I/O operation
  • Ideal for highly selective queries

Consider a table with 1,000,000 rows:

  • Sequential scan requires checking all 1,000,000 rows
  • B-tree index typically needs only 3-4 lookups

Inside a B-tree Index: A Practical Example

Understanding how B-tree indexes actually work helps explain both their performance characteristics and limitations. Let’s walk through a concrete example.

B-tree Structure

Consider a table of users with an index on the age column:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT,
    age INTEGER,
    email TEXT
);

CREATE INDEX idx_users_age ON users(age);

The resulting B-tree structure might look like this:

Root Node (Level 0)
[20, 40, 60]
 |   |   |   |
 v   v   v   v
Level 1 Nodes
[10,15] [25,30,35] [45,50,55] [70,80,90]
 |  |  |  |  |  |  |  |  |   |  |  |
 v  v  v  v  v  v  v  v  v   v  v  v
Leaf Nodes (Level 2)
[Pointers to actual table rows...]

How Lookups Work

Let’s trace what happens when we execute:

SELECT * FROM users WHERE age = 25;
  1. Root Node Traversal
    def find_in_node(node, target):
        # Binary search within node's keys
        for i, key in enumerate(node.keys):
            if target < key:
                return node.children[i]
            elif target == key:
                return node.children[i + 1]
        return node.children[-1]
    
    def btree_search(root, target):
        current = root
        while not current.is_leaf:
            current = find_in_node(current, target)
    
  2. Leaf Node Access
    class LeafNode:
        def __init__(self):
            self.keys = []  # The indexed values
            self.row_pointers = []  # Pointers to actual table rows
            self.next_leaf = None  # For range scans
    
    def get_row_pointers(leaf_node, target):
        matches = []
        for i, key in enumerate(leaf_node.keys):
            if key == target:
                matches.append(leaf_node.row_pointers[i])
        return matches
    

Insert Operations

When inserting a new record:

def insert(root, key, row_pointer):
    # Find the appropriate leaf node
    leaf = find_leaf_node(root, key)

    # If leaf has space, simply insert
    if len(leaf.keys) < MAX_KEYS:
        insert_in_leaf(leaf, key, row_pointer)
        return root

    # Otherwise, split the node
    new_leaf = split_leaf(leaf, key, row_pointer)

    # Propagate the split upward if necessary
    return propagate_split(root, leaf, new_leaf)

The Real-World Impact

This is where theoretical knowledge transforms into practical engineering decisions. Here’s what actually happens in production systems:

Write Performance Impact

  1. Single Record Operations
    • Base insert without index: ~1ms
    • Each additional index adds: ~2-10ms overhead
    • Impact: 4-5 indexes can make inserts 3-5x slower
  2. Bulk Operations
    • 1M row import with no indexes: ~2-3 minutes
    • Same import with 3 indexes: ~5-8 minutes
    • With 5+ indexes: Can extend to 15+ minutes or more

When It Really Hurts

The performance impact becomes particularly noticeable in:

  1. High-frequency Insert Systems
    • Logging systems
    • Real-time data pipelines
    • IoT data collection
    • High-volume transaction systems
  2. Development Pain Points
    • Adding “just one more index” suddenly making writes noticeably slower
    • Background index creation blocking production writes
    • Unexpected storage growth (each index can add 20-30% to table size)

Making the Right Engineering Decisions

Understanding both theoretical and practical aspects helps engineers make better decisions:

  1. Index Strategically
    • Don’t index everything that could be queried
    • Consider query-to-write ratio for each table
    • Monitor index usage and remove unused indexes
  2. Balance Performance Tradeoffs
    • Accept slower writes for critical read performance
    • Consider partial indexes for large tables
    • Use covering indexes for crucial queries
  3. Plan for Scale
    • Anticipate growth in both data volume and query patterns
    • Consider index maintenance windows
    • Monitor index bloat and performance degradation

Conclusion

While understanding B-tree complexity is important, the real engineering value comes from:

  1. Recognizing specific access patterns in your system
  2. Understanding the concrete performance implications
  3. Making informed tradeoffs based on actual requirements

The next time you’re interviewing candidates, consider skipping the theoretical complexity question. Instead, ask about their experience with real database performance challenges and how they measured, monitored, and resolved them. These answers will tell you far more about their engineering capabilities than whether they can recite trivia knowledge.

C# Evolution: A Practical Implementation Guide (6.0 to 12.0)

C# 6.0 marked my introduction to the language. After going back to full time professional Ruby development, it seems i’ve missed quite a bit. Features like tuples and pattern matching, which I don’t recall using, are particularly fun from other languages. The full history of changes can be found here but I’d appreciate insights on any other important day-to-day concepts I might have overlooked.

Table of Contents

  1. C# 6.0 (2015)
  2. C# 7.0 (2017)
  3. C# 8.0 (2019)
  4. C# 9.0 (2020)
  5. C# 10.0 (2021)
  6. C# 11.0 (2022)
  7. C# 12.0 (2023)

C# 6.0 (2015)

Focus on developer productivity and code readability.

String Interpolation

// Old way
string message = string.Format("Hello {0}, you are {1} years old", name, age);

// New way
string message = $"Hello {name}, you are {age} years old";
string complex = $"Math: {2 + 2}, Method: {CalculateValue()}";

When to use: Any time you need to embed values or expressions within strings Why use it:

  • More readable than string.Format()
  • Compile-time checking of interpolated values
  • IntelliSense support for embedded expressions

Best practices:

  • Use for simple string formatting
  • Consider traditional format strings for complex formatting scenarios
  • Be careful with complex expressions - extract to variables if they become hard to read

Null Propagation (?.)

// Old way
var zipCode = customer != null
    ? customer.Address != null
        ? customer.Address.ZipCode
        : null
    : null;

// New way
var zipCode = customer?.Address?.ZipCode;
var length = customer?.Name?.Length ?? 0;

When to use:

  • Accessing properties or methods on potentially null objects
  • Chaining multiple null-checkable operations

Why use it:

  • Eliminates verbose null-checking code
  • Prevents null reference exceptions
  • Makes code more readable

Best practices:

  • Combine with ?? operator for default values
  • Don’t overuse - if you find too many null checks, consider redesigning

C# 7.0 (2017)

Introduction of tuples and pattern matching.

Tuples

// Method returning multiple values
public (string name, int age) GetPersonDetails()
{
    return ("John", 30);
}

// With deconstruction
var (name, age) = GetPersonDetails();

// Tuple usage in LINQ
var statistics = orders
    .Select(o => (o.Date, o.Total))
    .GroupBy(x => x.Date.Month)
    .Select(g => (month: g.Key, total: g.Sum(x => x.Total)));

When to use:

  • Returning multiple values from methods
  • Temporary grouping of related data
  • LINQ projections

Why use it:

  • Cleaner than out parameters
  • More structured than anonymous types
  • Better performance than small classes

Best practices:

  • Name tuple elements for clarity
  • Use for internal implementation details
  • Consider proper classes for public APIs

Pattern Matching

// Type patterns with when
switch (shape)
{
    case Circle c when c.Radius > 10:
        return $"Large circle: {c.Radius}";
    case Rectangle r when r.Width == r.Height:
        return "Square";
    case Rectangle r:
        return $"Rectangle: {r.Width}x{r.Height}";
    case null:
        throw new ArgumentNullException(nameof(shape));
    default:
        return "Unknown shape";
}

// Property patterns
if (order is { Status: OrderStatus.Paid, Total: > 1000 })
{
    // Process premium order
}

When to use:

  • Type checking and casting in one operation
  • Complex conditional logic
  • Object property validation

Why use it:

  • More concise than traditional type checking
  • Safer than manual casting
  • More maintainable than nested if statements

C# 8.0 (2019)

Focus on null safety and improved patterns.

Nullable Reference Types

#nullable enable

public class Customer
{
    public string Name { get; set; } = null!; // Must be initialized
    public string? MiddleName { get; set; }   // Can be null

    public string GetFullName(string? title)
    {
        return title is null
            ? Name
            : $"{title} {Name}";
    }
}

When to use:

  • New projects where null safety is important
  • Gradually in existing projects
  • APIs where null semantics matter

Why use it:

  • Catches null reference bugs at compile time
  • Makes null handling intentions clear
  • Improves code documentation

Switch Expressions

public decimal CalculateDiscount(Customer customer) =>
    customer.Type switch
    {
        CustomerType.New => 0.1m,
        CustomerType.Regular when customer.Orders.Count > 100 => 0.2m,
        CustomerType.Regular => 0.15m,
        CustomerType.VIP => 0.3m,
        _ => throw new ArgumentException($"Unknown customer type: {customer.Type}")
    };

When to use:

  • Converting one type to another based on conditions
  • Simple pattern matching scenarios
  • Replacing switch statements with expressions

Why use it:

  • More concise than switch statements
  • Forces exhaustive matching
  • Better type safety

C# 9.0 (2020)

Records

// Immutable record
public record Person(string Name, int Age);

// Record with additional members
public record Employee(string Name, int Age, string Department)
{
    public bool IsManager { get; init; }
    public decimal CalculateBonus() => IsManager ? 5000m : 1000m;
}

// Inheritance
public record Manager(string Name, int Age, string Department)
    : Employee(Name, Age, Department)
{
    public int TeamSize { get; init; }
}

When to use:

  • Data-centric types
  • Domain models
  • DTOs
  • Immutable objects

Why use it:

  • Built-in value equality
  • Immutability by default
  • Concise syntax for data classes

Best practices:

  • Use for immutable data models
  • Consider inheritance hierarchy
  • Use with pattern matching

C# 10.0 (2021)

Global Using Directives

// In a central file (e.g., GlobalUsings.cs)
global using System.Collections.Generic;
global using System.Linq;
global using System.Text.Json;
global using static System.Math;

// File scoped namespaces
namespace MyApp;

public class Program { }

When to use:

  • Common imports across many files
  • Framework-specific imports
  • Large projects with consistent dependencies

Why use it:

  • Reduces code repetition
  • Centralizes dependency management
  • Cleaner source files

C# 11.0 (2022)

Raw String Literals

var json = """
    {
        "name": "John Doe",
        "age": 30,
        "addresses": [
            {
                "type": "home",
                "street": "123 Main St"
            }
        ]
    }
    """;

var sql = """
    SELECT u.Name, u.Email
    FROM Users u
    WHERE u.Status = 'Active'
        AND u.LastLoginDate >= @date
    """;

When to use:

  • JSON templates
  • SQL queries
  • HTML/XML content
  • Any multi-line string with special characters

Why use it:

  • No escape sequences needed
  • Preserves formatting
  • More readable

C# 12.0 (2023)

Primary Constructors

public class CustomerService(
    ILogger logger,
    IRepository repository,
    IValidator validator)
{
    public async Task<Customer> CreateCustomer(CustomerDto dto)
    {
        logger.Log("Creating customer");

        if (!validator.Validate(dto))
            throw new ValidationException();

        var customer = new Customer(dto);
        await repository.Save(customer);
        return customer;
    }
}

When to use:

  • Service classes with dependencies
  • Classes with simple initialization
  • When constructor parameters are used throughout the class

Why use it:

  • Reduces boilerplate
  • Clear dependency declaration
  • Improved readability

Collection Expressions

// Array initialization
int[] numbers = [1, 2, 3, 4, 5];

// List creation with spread operator
var existing = new List<int> { 1, 2, 3 };
var combined = [..existing, 4, 5, 6];

// Dictionary initialization
var config = new Dictionary<string, int>
{
    ["MaxRetries"] = 3,
    ["Timeout"] = 1000
};

When to use:

  • Simple collection initialization
  • Combining collections
  • Creating fixed-size arrays

Why use it:

  • More concise syntax
  • Clearer intent
  • Reduced ceremony