Activity Retry + Timeout

Lesson 8: Building Resilient Fault-Tolerant Systems

Master the art of building resilient Temporal workflows through sophisticated retry policies, timeout strategies, and failure handling patterns.


Objective

By the end of this lesson, you will understand:

  • Retry policy deep dive with exponential backoff strategies
  • Timeout configurations for different operation types
  • Failure classification - retriable vs non-retriable errors
  • Activity heartbeats for long-running operations
  • Advanced retry patterns and circuit breakers
  • Monitoring and observability for resilient systems

1. Retry Policy Deep Dive

Exponential Backoff Strategy

RetryOptions.newBuilder()
    .setInitialInterval(Duration.ofSeconds(1))      // Start with 1 second
    .setMaximumInterval(Duration.ofMinutes(5))      // Cap at 5 minutes
    .setBackoffCoefficient(2.0)                     // Double each time
    .setMaximumAttempts(10)                         // Try up to 10 times
    .build()

// Retry intervals: 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 300s (capped)

Exponential backoff prevents overwhelming failed services while allowing recovery


Operation-Specific Retry Strategies

class ResilientWorkflowImpl {

    // Database queries: Quick retries, many attempts
    private val databaseActivity = Workflow.newActivityStub(
        DatabaseActivity::class.java,
        ActivityOptions.newBuilder()
            .setRetryOptions(
                RetryOptions.newBuilder()
                    .setInitialInterval(Duration.ofMillis(500))
                    .setMaximumInterval(Duration.ofSeconds(30))
                    .setBackoffCoefficient(1.5)
                    .setMaximumAttempts(15)
                    .build()
            )
            .build()
    )
    // Continued on next slide...

More Retry Strategies

    // External API calls: Conservative retries
    private val apiActivity = Workflow.newActivityStub(
        ExternalApiActivity::class.java,
        ActivityOptions.newBuilder()
            .setRetryOptions(
                RetryOptions.newBuilder()
                    .setInitialInterval(Duration.ofSeconds(5))
                    .setMaximumInterval(Duration.ofMinutes(10))
                    .setBackoffCoefficient(3.0)
                    .setMaximumAttempts(5)
                    .build()
            )
            .build()
    )

    // Payment processing: Careful retries
    private val paymentActivity = Workflow.newActivityStub(
        PaymentActivity::class.java,
        ActivityOptions.newBuilder()
            .setRetryOptions(
                RetryOptions.newBuilder()
                    .setInitialInterval(Duration.ofSeconds(2))
                    .setMaximumInterval(Duration.ofMinutes(2))
                    .setBackoffCoefficient(2.0)
                    .setMaximumAttempts(3)
                    .build()
            )
            .build()
    )
}

2. Timeout Configurations

Timeout Types and Usage

ActivityOptions.newBuilder()
    // Total time allowed (including all retries)
    .setScheduleToCloseTimeout(Duration.ofMinutes(30))

    // Time for single execution attempt
    .setStartToCloseTimeout(Duration.ofMinutes(5))

    // Maximum time waiting in queue
    .setScheduleToStartTimeout(Duration.ofMinutes(2))

    // Heartbeat frequency (for long-running activities)
    .setHeartbeatTimeout(Duration.ofSeconds(30))
    .build()

Different timeout types serve different purposes in the activity lifecycle


Timeout Strategy Matrix

Operation Type Start-to-Close Schedule-to-Close Heartbeat Rationale
Quick validation 10s 30s N/A Fast operations
Database query 30s 5m N/A Network + DB time
File processing 10m 1h 30s Long operations need heartbeat
External API 2m 15m N/A Network dependencies
Email sending 1m 10m N/A External service

Match timeouts to actual operation characteristics and business requirements


3. Failure Classification

Retriable vs Non-Retriable Failures

@Component
class PaymentActivityImpl : PaymentActivity {

    override fun processPayment(paymentInfo: PaymentInfo): PaymentResult {
        try {
            return paymentGateway.charge(paymentInfo)
        } catch (e: Exception) {
            when (e) {
                // Retriable: Temporary issues
                is ConnectTimeoutException,
                is SocketTimeoutException,
                is ServiceUnavailableException -> {
                    throw ApplicationFailure.newFailure(
                        "Temporary payment service issue: ${e.message}",
                        "PAYMENT_SERVICE_TEMPORARY_ERROR"
                    )
                }
                // Continued on next slide...

More Failure Classification

                // Retriable: Rate limiting
                is RateLimitException -> {
                    // Longer delay for rate limiting
                    throw ApplicationFailure.newFailure(
                        "Rate limited by payment service",
                        "PAYMENT_RATE_LIMITED"
                    )
                }

                // Non-retriable: Client errors
                is InvalidCardException,
                is InsufficientFundsException,
                is ExpiredCardException -> {
                    throw ApplicationFailure.newNonRetryableFailure(
                        "Payment failed: ${e.message}",
                        "PAYMENT_DECLINED"
                    )
                }

                // Non-retriable: Configuration errors
                is InvalidMerchantException,
                is InvalidApiKeyException -> {
                    throw ApplicationFailure.newNonRetryableFailure(
                        "Payment configuration error: ${e.message}",
                        "PAYMENT_CONFIG_ERROR"
                    )
                }
            }
        }
    }
}

Classification Strategy

Error Types and Retry Decisions:

  • Temporary network issuesRetry aggressively
  • Rate limitingRetry with longer delays
  • Service unavailableRetry with backoff
  • Invalid inputDon't retry
  • Authentication errorsDon't retry
  • Business rule violationsDon't retry

Smart error classification prevents wasted resources and faster failure detection


💡 Key Takeaways

What You've Learned:

  • Exponential backoff prevents service overload during failures
  • Operation-specific strategies match retry patterns to operation types
  • Timeout configurations control activity lifecycle timing
  • Failure classification determines retry vs immediate failure
  • Smart retry policies balance resilience with resource efficiency

results matching ""

    No results matching ""