Chapter 3: Interprocess Communication in a Microservice Architecture

How services talk to each other — interaction styles, REST, gRPC, messaging, and strategies for availability and reliability.

13 Patterns 19 Definitions 6 Principles 6 Problems 10 Technologies 54 Total Concepts

3.1 Overview of Interprocess Communication

In a monolith, modules call each other through language-level method calls. In a microservice architecture, services run in separate processes or on separate machines. They must use an interprocess communication (IPC) mechanism to work together. Choosing the right IPC mechanism is one of the most important architectural decisions you will make.

Interaction Styles

Interaction styles are classified along two dimensions. The first dimension is how many services receive each request: one-to-one or one-to-many. The second dimension is timing: synchronous (the client waits for a response) or asynchronous (the client does not block).

One-to-One One-to-Many
Synchronous Request / Response
Asynchronous Asynchronous Request / Response, One-way Notification Publish / Subscribe, Publish / Async Responses

This taxonomy guides the IPC choice. The two main pattern categories are the Remote Procedure Invocation (RPI) pattern for synchronous communication and the Messaging pattern for asynchronous communication.

Author's recommendation: Prefer asynchronous messaging for communication between services. Use REST for external-facing APIs.

Key insight: The choice between synchronous and asynchronous IPC directly impacts availability. Synchronous calls reduce it; asynchronous messaging improves it.

Defining APIs in a Microservice Architecture

An API is a contract between a service and its clients. Getting this contract right is important because changes later can break other services.

Use an API-first design approach: define the API before you write any implementation code. Share the draft with client teams, get their feedback, and iterate. This avoids costly rework after coding starts.

The interface definition language (IDL) you use depends on the IPC mechanism:

Evolving APIs

APIs change over time. In a microservice architecture, you cannot force all clients to update at once because each service is deployed independently. You need a strategy to evolve APIs without breaking existing clients.

Semantic Versioning

Use Semantic Versioning to communicate the nature of changes:

The Robustness Principle

The Robustness Principle says: "Be conservative in what you send, be liberal in what you accept." Services should ignore unknown fields in requests and responses. This makes backward-compatible evolution much easier.

Handling Breaking Changes

When a breaking change is unavoidable, you must support multiple API versions simultaneously for a transition period. Two common approaches:

Message Formats

The format you choose for encoding messages affects efficiency, readability, and compatibility. There are two broad categories.

Aspect Text (JSON / XML) Binary (Protocol Buffers / Avro)
Readability Human-readable, self-describing Not human-readable
Size Verbose Compact
Parsing speed Slower Faster
API-first Optional Forced (schema required)
Evolution Easy with Robustness Principle Easy (Protocol Buffers); Avro needs schema to read

Protocol Buffers uses tagged fields, which makes API evolution easier because the reader can skip unknown tags. Avro requires the writer's schema to read data, making evolution slightly harder.

Principle: Always use a cross-language format (like JSON or Protocol Buffers) even if your project currently uses only one language. This keeps you flexible for the future.

3.2 Synchronous Remote Procedure Invocation Pattern

In the Remote Procedure Invocation (RPI) pattern, a client sends a request to a service and waits for a response. This is the most familiar IPC style. Two popular implementations are REST and gRPC.

Pattern: Remote Procedure Invocation (RPI)

Problem How does a client invoke a service synchronously?
Solution The client sends a request and blocks until it receives a response.
Implementations REST, gRPC
Requirements Circuit Breaker + Service Discovery are needed for reliable RPI.

Using REST

REST is an IPC mechanism that uses HTTP. It models the domain as resources (e.g., /orders, /customers/123) and uses standard HTTP verbs (GET, POST, PUT, DELETE) to manipulate them.

REST Maturity Model

Leonard Richardson defined four levels of REST maturity:

Benefits of REST

Drawbacks of REST

Alternative: Technologies like GraphQL and Falcor let clients fetch exactly the data they need in a single request, which is more efficient than REST for complex data fetching.

Using gRPC

gRPC is a framework for making synchronous and streaming remote calls. It uses Protocol Buffers as its IDL and binary message format, and runs over HTTP/2.

Benefits of gRPC

Drawbacks of gRPC

Aspect REST gRPC
Protocol HTTP/1.1 (typically) HTTP/2
Format JSON (text) Protocol Buffers (binary)
Operations Limited to HTTP verbs Any operation defined in .proto file
Streaming Not native Bidirectional streaming
Browser support Excellent Limited
Firewall compatibility High Can be problematic

Handling Partial Failure Using the Circuit Breaker Pattern

In a distributed system, a service you depend on may be slow or unavailable. This is called a partial failure. If you do nothing, the problem cascades: your service blocks waiting for the broken service, your threads get exhausted, and you become unavailable too.

Danger: Cascading failure is one of the biggest risks in synchronous microservice communication. A single slow service can bring down an entire chain of dependent services.

Three-Mechanism Defense

Use all three mechanisms together to handle partial failure:

  1. Network timeouts — never block indefinitely; always set a timeout on remote calls
  2. Limit outstanding requests — cap the number of concurrent requests to a service; reject immediately if the limit is reached
  3. Circuit Breaker — track the error rate and stop sending requests when failures exceed a threshold

Pattern: Circuit Breaker

Problem How to prevent a failing service from causing cascading failures?
Solution Track the error rate of calls to a remote service. When it exceeds a threshold, "trip" the breaker (open state) and reject all calls for a timeout period. After the timeout, allow a trial call. If it succeeds, close the breaker. If not, stay open.
States Closed (normal) → Open (rejecting) → Half-Open (trial)
Technologies Netflix Hystrix (JVM), Polly (.NET)

Analogy: A circuit breaker in software works like one in your home's electrical panel. When there is too much current (too many errors), the breaker trips and cuts the circuit. After a cooldown, you can try turning it back on.

Recovery Strategies

When a remote call fails or the circuit breaker is open, the service can:

Using Service Discovery

In a cloud environment, service instances are created and destroyed dynamically. Their network addresses change constantly. Service discovery is the mechanism that lets a client find the current network location of a service instance.

At the center is a Service Registry: a database of the network locations of all service instances.

Application-Level Discovery

The services themselves handle registration and discovery:

Technologies: Netflix Eureka (registry) + Ribbon (client-side load balancing) + Spring Cloud.

Works across platforms, but requires a discovery library for each programming language you use.

Platform-Provided Discovery

The deployment platform handles registration and discovery:

No discovery code needed in your services, but it ties you to a specific platform.

Aspect Application-Level Platform-Provided
Registration Self Registration 3rd-party Registration
Discovery Client-side Server-side
Code required Yes (per language) None
Platform dependency Cross-platform Single platform
Examples Eureka + Ribbon + Spring Cloud Kubernetes, Docker

Recommendation: Use platform-provided service discovery when possible. It requires no code changes and works automatically with your deployment infrastructure.

3.3 Asynchronous Messaging Pattern

In the Messaging pattern, services communicate by exchanging messages asynchronously. The sender does not wait for a response. Messages flow through channels, which may be managed by a message broker or handled directly between services (brokerless).

Pattern: Messaging

Problem How do services communicate without blocking?
Solution Services exchange messages asynchronously through message channels.
Benefit Loose coupling, improved availability, supports all interaction styles.

Overview of Messaging

Message Structure

A message has two parts:

Message Types

There are three types of messages:

Channel Types

Implementing Interaction Styles Using Messaging

Messaging is flexible enough to implement all the interaction styles from the taxonomy:

Creating an API Specification for a Messaging-Based Service API

Unlike REST (Open API) or gRPC (Protocol Buffers), there is no widely adopted standard for documenting messaging-based APIs. You typically create informal documentation that describes:

Using a Message Broker

A message broker is an intermediary through which messages flow. Services send messages to the broker, and the broker delivers them to the intended recipients.

Broker-Based Messaging

Most production systems use a broker. Popular brokers include ActiveMQ, RabbitMQ, Apache Kafka, Amazon Kinesis, and Amazon SQS.

Brokerless Messaging

In brokerless messaging (e.g., ZeroMQ), services communicate directly without an intermediary.

Aspect Broker-Based Brokerless
Coupling Loose (sender and receiver decoupled) Tighter (need direct addresses)
Buffering Yes (receiver can be offline) No (both sides must be up)
Latency Higher (extra hop through broker) Lower (direct connection)
Guaranteed delivery Built-in Hard to achieve
Operations Broker must be managed Simpler infrastructure
Examples ActiveMQ, RabbitMQ, Kafka, Kinesis, SQS ZeroMQ

Competing Receivers and Message Ordering

To scale message processing, you run multiple instances of a service that all consume from the same channel. These are called competing receivers. The problem is that concurrent consumers may process messages out of order.

Problem: If two messages for the same entity (e.g., "create order" then "update order") are processed by different instances concurrently, the update may be processed before the create, causing errors.

Sharded (Partitioned) Channels

The solution is sharded channels (also called partitioned channels). Each message has a shard key (e.g., the order ID). The broker hashes the key to assign the message to a specific shard. Each shard is consumed by exactly one instance.

In Kafka terminology, this is called Consumer Groups: the broker assigns each partition to exactly one consumer in the group.

This guarantees per-shard-key ordering while still enabling horizontal scaling.

Analogy: Think of a post office with multiple counters. Instead of random assignment, each counter handles a specific ZIP code range. All letters for the same ZIP always go to the same counter, so they are processed in order.

Handling Duplicate Messages

Message brokers typically guarantee at-least-once delivery: a message will be delivered, but it might be delivered more than once if a failure occurs. Your service must handle duplicates.

Strategy 1: Idempotent Handlers

If your message handler is naturally idempotent (processing the same message twice has the same effect as processing it once), duplicates are harmless. This only works if the broker also preserves ordering on redelivery.

Strategy 2: Message Tracking

Record each processed message ID in a PROCESSED_MESSAGES table as part of the same database transaction that performs the business logic. If a duplicate arrives, the INSERT into the tracking table fails (duplicate key), and the handler skips it.

Best practice: When your logic is not naturally idempotent, use message tracking. It guarantees exactly-once processing as long as the tracking and business logic share the same transaction.

Transactional Messaging

A common need is to atomically update a database and publish a message. For example, when creating an order, you want to insert the order into the database and publish an "OrderCreated" event. If either operation fails alone, the system becomes inconsistent.

Problem: Distributed transactions (two-phase commit) across the database and message broker are not a viable solution for modern microservices. They are complex, slow, and many brokers do not support them.

Transactional Outbox Pattern

Pattern: Transactional Outbox

Problem How to atomically update the database and publish a message?
Solution Insert the message into an OUTBOX table as part of the same local ACID database transaction that performs the business update. A separate MessageRelay process reads the OUTBOX and publishes messages to the broker.

MessageRelay Implementations

The MessageRelay component reads messages from the OUTBOX table and publishes them to the broker. Two approaches:

Polling Publisher: The relay periodically queries the OUTBOX table for unpublished messages. Simple to implement, but the polling adds load to the database. It may not work well with NoSQL databases that lack efficient querying.

Transaction Log Tailing: The relay reads the database's transaction log (also called commit log or WAL). When an INSERT into the OUTBOX is found in the log, the relay publishes the corresponding message. This is more sophisticated but very performant because it uses the database's built-in change capture.

Technologies for log tailing: Debezium, Eventuate Tram, DynamoDB Streams.

Aspect Polling Publisher Transaction Log Tailing
Complexity Simple More complex
Performance DB polling is expensive at scale High performance
NoSQL support May not work well Works (via native streams)
Technologies Custom implementation Debezium, Eventuate Tram, DynamoDB Streams

Libraries and Frameworks for Messaging

Rather than using broker client libraries directly, use higher-level frameworks that abstract away the details of messaging infrastructure. These frameworks handle message serialization, channel management, and transactional outbox concerns so you can focus on business logic.

3.4 Using Asynchronous Messaging to Improve Availability

The IPC mechanism you choose has a direct effect on your system's availability. This section explains why synchronous communication hurts availability and how to fix it.

Synchronous Communication Reduces Availability

When service A calls service B synchronously, A is available only if B is also available. If B also calls service C, then A is available only if both B and C are available. The overall availability is the product of all dependent service availabilities.

Example: If each service is 99.5% available and you have a chain of three synchronous calls, the combined availability drops to 0.995 × 0.995 × 0.995 ≈ 98.5%. With more services, it gets worse quickly.

Eliminating Synchronous Interaction

There are three strategies to eliminate synchronous dependencies and improve availability:

Strategy 1: Use Asynchronous Interaction Styles End-to-End

Replace synchronous request/response with asynchronous messaging throughout the system. The service that receives the initial request sends it as a message and responds immediately. Downstream processing happens asynchronously.

This is not always possible, especially for external REST APIs that clients expect to return a result immediately.

Strategy 2: Replicate Data Locally

Each service maintains a local copy of the data it needs from other services. It subscribes to events from those services and keeps its copy up to date. When it needs the data, it reads from its local copy instead of making a synchronous call.

Strategy 3: Return Response Immediately, Finish Processing Later

The service creates the entity in a PENDING state and returns a response immediately to the client. It then validates and completes the operation asynchronously using a saga. When the saga finishes, the entity moves to a final state (e.g., APPROVED or REJECTED).

Analogy: Think of placing an order at a busy restaurant. The waiter does not go to the kitchen and wait for your food before confirming your order. Instead, the waiter writes down your order (PENDING), confirms it immediately, and brings the food later when it is ready.

Strategy How It Works Limitation
Async end-to-end Replace all sync calls with messaging Not always possible for external APIs
Replicate data locally Subscribe to events, keep local copy Large data storage; does not solve writes
Respond immediately + saga Create in PENDING state, process via saga Client must handle PENDING state

Key Takeaways