From 8 Second to 200ms API Responses: Rearchitecting Our .NET Caching Layer

When our team noticed API response times climbing from milliseconds to seconds, we knew we had a problem. What we didn’t expect was how our caching strategy — meant to improve performance — was actually the real issue. lets you and me discover, debugg, and ultimately solving complex caching issues in our .NET microservices architecture.

🔹The Initial Architecture

Our system processed financial transactions for a global payment platform [as I mentioned in previous article too…], handling roughly 50,000 requests per minute during peak hours.

The architecture consisted of:

▪️6 microservices handling different aspects of payment processing ▪️A mix of Redis and in-memory caching ▪️Postgres as the primary database ▪️Azure Service Bus for inter-service communication

The caching layer was originally designed to reduce database load and improve response times. Each service maintained its own cache, with a combination of:

// Initial caching implementation
public class CacheService
{
   private readonly IDistributedCache _distributedCache;
   private readonly IMemoryCache _memoryCache;
   
   public async Task GetOrSetAsync(string key, Func> factory, TimeSpan? expiration = null)
   {
     // First check memory cache
     if (_memoryCache.TryGetValue(key, out T value))
     return value;
    
     // Then check distributed cache
     var cached = await _distributedCache.GetAsync(key);
     
     if (cached != null)
     {
       value = JsonSerializer.Deserialize(cached);

       // Set in memory cache
       _memoryCache.Set(key, value, expiration ?? TimeSpan.FromMinutes(5));
       return value;
      }

    // If not found, generate value
     value = await factory();
     
     // Store in both caches
     await _distributedCache.SetAsync(
       key, 
       JsonSerializer.SerializeToUtf8Bytes(value),
       new DistributedCacheEntryOptions 
       { 
         AbsoluteExpirationRelativeToNow = expiration 
       }
     );
     
     _memoryCache.Set(key, value, expiration ?? TimeSpan.FromMinutes(5));
     
     return value;
 }
}

◾Issues Around….

1. Cache Stampede

Our first major issue emerged during peak hours. When a cached item expired, multiple concurrent requests would trigger the same expensive database query. This “cache stampede” effect cascaded across services:

public async Task
 GetPaymentDetails(string paymentId)
{
   return await _cacheService.GetOrSetAsync(
           $"payment:{paymentId}",
           async () => await _dbContext.Payments
                             .Include(p => p.Customer)
                             .Include(p => p.TransactionHistory)
                             .FirstOrDefaultAsync(p => p.Id == paymentId),
           TimeSpan.FromMinutes(15));
}

When this cache entry expired, hundreds of concurrent requests would hit the database simultaneously, causing CPU spikes and increased response times.

2. Memory Leaks

Our memory usage showed a concerning pattern → it kept growing over time, even with cache expiration configured. The Real Villain? We were storing large objects in memory without proper size limitations:

// Memory leak in original implementation
public class PaymentDetails
{
   public string Id { get; set; }
   public Customer Customer { get; set; }
   public List TransactionHistory { get; set; } // Unbounded list
   public byte[] Receipt { get; set; } // Large binary data
}

3. Inconsistent Cache Invalidation

With multiple services managing their own caches, we faced data consistency issues. When a payment was updated in one service, related caches in other services weren’t always invalidated properly:

// Inconsistent cache invalidation
public async Task UpdatePayment(Payment payment)
{
   await _dbContext.Payments.UpdateAsync(payment);
   await _cacheService.RemoveAsync($"payment:{payment.Id}");
   // Other services' caches still had old data
}

◾ We Solved using Multi-Layered Caching Strategy

1. Sliding Window Cache Lock

To prevent cache stampede, we implemented a sliding window lock pattern:

public class SlidingWindowCache
{
   private readonly SemaphoreSlim _lock = new SemaphoreSlim(1, 1);
   private readonly IDistributedCache _cache;
   private const int StaleBufferSeconds = 30;

   public async Task GetOrSetAsync(string key, Func> factory, TimeSpan expiration)
   {
     var value = await TryGetValue(key);
     if (value != null) return value;
     
      try
     {
       await _lock.WaitAsync();
 
       // Double-check after acquiring lock
       value = await TryGetValue(key);

       if (value != null) return value;

        // Generate new value
        value = await factory();

        // Store with stale buffer
         await _cache.SetAsync(
             key,
             JsonSerializer.SerializeToUtf8Bytes(new CacheEntry
             {
               Value = value,
               ExpiresAt = DateTime.UtcNow.Add(expiration),
               IsStale = false
             }),
             new DistributedCacheEntryOptions
             {
               AbsoluteExpirationRelativeToNow = expiration.Add(TimeSpan.FromSeconds(StaleBufferSeconds))
             });

        return value;
     }
     finally
     {
       _lock.Release();
     }
 }

private class CacheEntry
{
   public TValue Value { get; set; }
   public DateTime ExpiresAt { get; set; }
   public bool IsStale { get; set; }
}

2. Memory Management

We implemented a size-aware cache with proper eviction policies:

public class SizeAwareCache
{
   private readonly MemoryCache _cache;
   private long _currentSize;
   private readonly long _sizeLimit;

   public SizeAwareCache(long sizeLimit)
   {
     _sizeLimit = sizeLimit;
     _cache = new MemoryCache(new MemoryCacheOptions
     {
     SizeLimit = sizeLimit,
     ExpirationScanFrequency = TimeSpan.FromMinutes(5)
     });
    }

    public void Set(string key, T value, TimeSpan expiration)
   {
     var size = CalculateSize(value);
     var entryOptions = new MemoryCacheEntryOptions
     {
     Size = size,
     AbsoluteExpirationRelativeToNow = expiration,
     Priority = CacheItemPriority.Normal
     };
    _cache.Set(key, value, entryOptions);
   }

  private long CalculateSize(T value)
  {
   // Implement size calculation based on object type
   // For strings: return UTF8 bytes
   // For objects: use serialization size
   // Add overhead for cache entry metadata
   }
}

3. Distributed Cache Invalidation

We implemented a pub/sub system using Azure Service Bus for coordinated cache invalidation:

public class DistributedCacheInvalidator
{
   private readonly IServiceBusClient _serviceBus;
   private readonly IDistributedCache _cache;
   private readonly string _topicName = "cache-invalidation";

  public async Task InvalidateAsync(string key, string reason)
  {
     var message = new InvalidationMessage
     {
       Key = key,
       Timestamp = DateTime.UtcNow,
       Reason = reason
      };
      
      await _serviceBus.SendMessageAsync(_topicName, message);
   }

  public async Task HandleInvalidationMessage(InvalidationMessage message)
  {
     await _cache.RemoveAsync(message.Key);
     // Log invalidation with reason and timestamp
   }
}

◾Production Monitoring Patterns

1. Cache Hit Rate Monitoring

We implemented detailed metrics collection:

public class CacheMetrics
{
   private readonly IMetricClient _metrics;
    
   public async Task TrackCacheOperation(string cacheType, string operation, string key, long duration)
   {
     _metrics.TrackMetric(new MetricTelemetry
     {
       Name = $"Cache.{cacheType}.{operation}",
       Value = duration,
       Properties = new Dictionary
                   {
                     ["Key"] = key,
                     ["Success"] = "true"
                    }
       });
     }
}

2. Cache Size Monitoring

We added memory monitoring with alerts:

public class CacheHealthCheck : IHealthCheck
{
   private readonly SizeAwareCache _cache;
   private readonly ILogger _logger;

   public async Task CheckHealthAsync(HealthCheckContext context)
   {
     var metrics = _cache.GetMetrics();
 
     if (metrics.CurrentSize > metrics.SizeLimit * 0.9)
     {
         _logger.LogWarning("Cache size approaching limit: {CurrentSize}/{SizeLimit}", 
         metrics.CurrentSize, metrics.SizeLimit);
 
         return HealthCheckResult.Degraded();
       }

      return HealthCheckResult.Healthy();
     }
}

🔹This is What we learned

After implementing these changes, we saw dramatic improvements:

Response times dropped from 8 seconds to 200ms (95th percentile)
CPU usage decreased by 60%
Memory usage stabilized and became predictable
Cache hit rates improved from 65% to 92%

Best Practices

1. Cache Entry Sizing — Always implement size limits for in-memory caches — Use compression for large objects — Monitor memory usage patterns

2. Expiration Strategies — Use sliding expiration for frequently accessed items — Implement stale-while-revalidate pattern — Consider business requirements when setting TTL

Invalidation Patterns — Use pub/sub for distributed invalidation — Implement versioning for cache keys — Log all cache invalidations with reasons

4. Monitoring — Track cache hit/miss rates — Monitor memory usage and eviction rates — Set up alerts for abnormal patterns

5. Error Handling — Implement circuit breakers for cache operations — Have fallback strategies for cache failures — Log all cache-related errors with context

⚡When to Use Different Caching Strategies

In-Memory Cache

Best for: Frequently accessed, small data sets
Pros: Fastest access times, no network latency
Cons: Limited by available memory, not shared across instances
Use when: Data can be eventually consistent and memory is available

Distributed Cache (Redis)

Best for: Larger datasets, shared across services
Pros: Consistent across instances, larger capacity
Cons: Network latency, additional infrastructure needed
Use when: Data must be consistent across services

Hybrid Approach

Best for: Complex systems with varying requirements
Pros: Combines benefits of both approaches
Cons: More complex to implement and maintain
Use when: Performance requirements justify the complexity

Conclusion

Caching is a powerful tool for improving application performance, but it requires careful consideration of implementation details, monitoring, and maintenance. Our journey from slow response times to a highly performant system taught us valuable lessons about the importance of:

Understanding the full impact of caching decisions
Implementing proper monitoring from day one
Having clear invalidation strategies
Managing memory carefully
Testing cache behavior under load

Remember that caching is not a “set it and forget it” solution. It requires ongoing monitoring, maintenance, and occasional rearchitecting as your system grows and requirements change.

About the Author

Vritra here….

Keywords .NET, Caching, Redis, Performance Optimization, Distributed Systems, Microservices, Memory Management, Production Monitoring, Azure, Service Bus

If you found this article helpful, follow me for more in-depth technical content about .NET, distributed systems, and performance optimization.