Chapter 7 — Implementing Queries | Microservices Patterns Study Guide

7.1 Querying using the API composition pattern

In a microservice architecture, each service owns its private database. You cannot write a single SQL query that joins data from multiple services. This is the core querying problem.

Two patterns solve this problem:

API Composition — the simple, default choice. A composer queries multiple services and combines the results.
CQRS — the powerful but complex alternative. A separate view database is kept in sync using events.

Guiding principle: Use API Composition whenever possible. Only use CQRS when API Composition cannot efficiently support the query you need.

The findOrder() query operation

Consider a findOrder() query that returns details about an order. In the FTGO application, order data is spread across multiple services: Order Service holds the order details, Kitchen Service holds the ticket status, Delivery Service holds the delivery info, and Accounting Service holds the billing status.

No single service has all the data needed to answer this query. You must gather data from several services and combine it.

Analogy: Imagine you need a full report on a student. Grades are in one office, attendance is in another, and health records are in a third. You must visit each office, collect the information, and combine it yourself.

Overview of the API composition pattern

Pattern: API Composition

Problem How to query data spread across multiple services?

Solution An API Composer queries multiple Provider Services and combines the results in memory.

Participants API Composer (the coordinator) + Provider Services (the data sources).

The API Composer sends requests to each Provider Service, waits for the responses, and then merges the data into a single result. The Provider Services are the services that own the data needed for the query.

Implementing findOrder() using API composition

To implement findOrder(), the API Composer calls four provider services — Order Service, Kitchen Service, Delivery Service, and Accounting Service — and combines their responses into one unified order view.

Where to place the API Composer

There are three options for where the API Composer lives:

Option	Description	When to use
Client	The frontend application queries each service directly and merges the results.	Simple cases where the client has direct access to services.
API Gateway	The API Gateway acts as the composer. It queries provider services and returns one combined response.	When the query is exposed as an external API endpoint.
Standalone Service	A dedicated service handles the composition logic.	When the composition logic is complex or used by multiple clients internally.

Reactive programming model

To reduce latency, the API Composer should call provider services in parallel whenever possible. Use a reactive programming model: send all independent requests at once and wait for all of them to complete. Only call services sequentially when one call depends on the result of another.

API composition design issues

Increased overhead

API Composition adds network overhead. Instead of one database query, you make multiple service calls. Each call adds latency. For simple queries, this overhead is acceptable. For complex queries with many providers, it can become a problem.

Reduced availability

The more provider services involved, the lower the overall availability. If any provider is down, the query may fail. Two strategies can help:

Cache previous results — if a provider is unavailable, return cached data from a previous successful call.
Return incomplete data — return whatever data is available and let the client know some parts are missing.

No transactional consistency

The API Composer calls multiple services at different points in time. There is no transaction that spans these calls. This means the data from different services may be slightly inconsistent — one service might have processed an update that another has not yet seen.

Limitation: API Composition does not guarantee transactional consistency across provider services. The combined result may show data from different points in time.

Benefits and drawbacks of API composition

Benefits	Drawbacks
Simple to understand and implement	Increased network overhead (multiple calls)
No extra infrastructure needed	Reduced availability when many providers are involved
Works well for straightforward queries	No transactional consistency across providers
Default choice — use it first	Inefficient for queries requiring in-memory joins of large datasets

7.2 Using the CQRS pattern

Some queries are too complex or too inefficient for API Composition. In these cases, the CQRS pattern is the answer. CQRS stands for Command Query Responsibility Segregation — it separates the write side from the read side of your data.

Motivations for using CQRS

Three specific problems motivate the use of CQRS:

Problem 1: Inefficient in-memory joins

Consider a findOrderHistory() query that searches a customer's past orders by keyword. API Composition would need to fetch all orders from the Order Service, all tickets from the Kitchen Service, and then join and filter them in memory. For large datasets, this is very slow and wasteful.

Problem 2: Wrong database type for the query

Sometimes a service stores its data in a database that does not support the type of query you need. For example, you need geospatial search (find nearby restaurants) but the service uses a database without geospatial features. API Composition cannot fix this — the provider simply cannot answer the query efficiently.

Problem 3: Separation of concerns

The service that owns the data is not always the best service to implement a complex, high-volume query. A read-heavy dashboard or search feature may need a dedicated service with its own optimized view of the data. Putting this logic inside the data-owning service would add unnecessary complexity to that service.

Rule of thumb: If API Composition requires fetching large datasets and joining them in memory, if the provider's database does not support the query type, or if the query logic does not belong in the data-owning service — consider CQRS.

Overview of CQRS

Pattern: CQRS (Command Query Responsibility Segregation)

Problem How to efficiently query data spread across services, especially when API Composition is inadequate?

Solution Split into a Command Side (creates, updates, deletes + publishes events) and a Query Side (maintains read-optimized views updated by consuming those events).

Key idea The query side subscribes to events from one or more services and builds a view that is optimized for the specific query.

Command side and query side

The command side handles all create, update, and delete operations. It contains the domain model and business logic. After each change, it publishes a domain event.

The query side subscribes to these events and updates its read-optimized view. When a query arrives, it reads directly from this view — no joins, no calling other services.

Query-only services

You can create standalone query services that subscribe to events from multiple services. For example:

Order History Service — subscribes to events from Order Service, Kitchen Service, and Delivery Service to build a searchable order history.
Available Restaurants Service — subscribes to Restaurant Service events and stores data in a geospatial database for location-based searches.

Generalization of a familiar pattern

CQRS is a broader version of a pattern many developers already know: using an RDBMS as the primary database and syncing data to a text search engine (like Elasticsearch). CQRS generalizes this by allowing any combination of databases and using domain events for near-real-time sync instead of ETL jobs.

CQRS and event sourcing

Event sourcing stores data as a sequence of events. Event stores typically only support queries by primary key (aggregate ID). CQRS solves this limitation by building read-optimized views from the event stream. This makes CQRS almost essential for applications that use event sourcing.

Analogy: Think of a library system. The cataloging department (command side) records every new book that arrives. The search terminals (query side) have their own index, organized for fast searching by title, author, or subject. The search index is updated every time a new book is cataloged.

The benefits of CQRS

Efficient multi-service queries — the view already contains all the data needed for the query. No need to call multiple services or join data in memory.
Diverse query types — you can choose any database technology for the view: document store for flexible queries, text search engine for keyword search, graph database for relationship queries, or RDBMS for structured data.
Separation of concerns — the query logic lives in its own service, keeping the data-owning services simpler.
Enables querying in event-sourced applications — because event stores only support primary key lookups, CQRS views provide the rich querying that event-sourced apps need.

The drawbacks of CQRS

Architectural complexity — you need additional services, event handlers, a view database, and an event infrastructure. This is more code to write, test, and operate.
Replication lag (eventual consistency) — the view is updated asynchronously. There is a delay between when the command side publishes an event and when the query side processes it. During this window, the view shows stale data.

Tradeoff: CQRS gives you powerful and efficient queries, but at the cost of additional complexity and eventual consistency. The view may be slightly behind the command side.

Aspect	API Composition	CQRS
Complexity	Simple	Higher (extra services, events, views)
Consistency	Near real-time (but no transactions)	Eventually consistent (replication lag)
Query efficiency	Multiple calls + in-memory joins	Single read from optimized view
Database flexibility	Limited to each provider's database	Any database type for the view
When to use	Default choice for simple queries	Complex queries, large datasets, special DB needs

7.3 Designing CQRS views

A CQRS view module has four components:

Event Handlers — subscribe to domain events and translate them into database updates.
Query API — exposes the query operations for clients.
Data Access Module (DAO) — handles all reads and writes to the view database.
View Database — stores the read-optimized data.

Choosing a view datastore

The choice of view database depends on the type of query the view needs to support. Match the database to the query:

Query type	Good datastore choice
Flexible queries on JSON-like documents	Document store (e.g., MongoDB)
Text search with keywords	Text search engine (e.g., Elasticsearch)
Graph or relationship queries	Graph database (e.g., Neo4j)
Structured queries with joins	RDBMS (e.g., PostgreSQL, MySQL)
High-throughput key-value lookups	Key-value / wide-column store (e.g., DynamoDB)

Principle: Let the query requirements drive the database choice. A CQRS view can use a completely different database technology than the command side.

Data access module design

The Data Access Module (DAO) is responsible for all interaction with the view database. It must handle three concerns carefully:

Concurrent updates

Multiple events for the same or related entities can arrive close together. The DAO must handle concurrent updates safely. Two approaches:

Locking — use pessimistic or optimistic locking to prevent conflicts.
Atomic field-level updates — update only the specific fields that changed, rather than replacing the entire record. This avoids overwriting concurrent changes to other fields.

Idempotent event handling

Events may be delivered more than once (at-least-once delivery). The DAO must handle duplicates without applying the same change twice. A common approach is to track the maximum eventId processed for each aggregate source. If an incoming event has an ID that is less than or equal to the stored maximum, the DAO skips it.

Handling replication lag

Because the view is eventually consistent, a client might write data on the command side and then immediately query the view before the view has been updated. Two strategies help:

Event ID tokens — the command side returns an event ID with each write response. The client passes this token with its next query. The query side checks if its view has processed that event yet. If not, it can wait or tell the client the data is not ready.
Client-side optimistic updates — the client UI updates itself immediately after the write, without waiting for the view to catch up. This gives a responsive user experience even when the view lags behind.

Design tip: Prefer atomic field-level updates over full-record replacement. They reduce the risk of concurrent update conflicts and work well with NoSQL databases like DynamoDB.

Adding and updating CQRS views

Building a new view

When you create a new CQRS view, you need to populate it with historical data. The view must process all past events to reach the current state. But message brokers do not store events indefinitely.

The solution is to use archived events. Store domain events in a durable archive (for example, Amazon S3). When building a new view, replay the archived events. A tool like Apache Spark can process large archives efficiently.

Rebuilding an existing view

Sometimes you need to rebuild a view — for example, after fixing a bug in the event handler or changing the view schema. Reprocessing all events from the beginning can be very slow as the event history grows.

The solution is incremental snapshots. Periodically save a snapshot of the view's current state. To rebuild, load the latest snapshot and then replay only the events that came after it. This is much faster than replaying the full event history.

Principle: Archive domain events in durable storage (e.g., S3) for building new views. Use incremental snapshots for efficient view rebuilds.

7.4 Implementing a CQRS view with AWS DynamoDB

This section walks through a concrete implementation of a CQRS view: the Order History Service. This service provides a queryable view of a customer's past orders, built with AWS DynamoDB as the view datastore.

The OrderHistoryEventHandlers module

The OrderHistoryEventHandlers module subscribes to domain events from multiple services (Order Service, Kitchen Service, Delivery Service, etc.). When an event arrives, the handler calls the DAO to update the DynamoDB view.

Each event handler translates a domain event into a specific update operation on the Order History table. For example, an OrderCreated event inserts a new item, while a DeliveryPickedUp event updates the delivery status field.

Data modeling and query design with DynamoDB

Table structure

The Order History table uses orderId as the partition key (primary key). Each item stores the order details as hierarchical attributes — nested structures that hold information from multiple source services.

Global Secondary Index (GSI)

To support queries like "find all orders for a given customer, sorted by date," the table uses a Global Secondary Index (GSI) with consumerId as the partition key and orderCreationDate as the sort key. DynamoDB requires a GSI because you can only query by the table's primary key without one.

Keyword search

DynamoDB does not have built-in full-text search. To support keyword filtering, the implementation tokenizes relevant text (such as restaurant name and menu item names) into a keywords set attribute. The query uses a contains() filter expression to check if the keywords set includes the search term.

Limitation: The contains() filter approach is less powerful than a true text search engine. It works for simple keyword matching but does not support ranking, stemming, or fuzzy matching. For advanced search, consider Elasticsearch as the view datastore instead.

Pagination

DynamoDB uses opaque LastEvaluatedKey tokens for pagination. The client receives a token with each page of results and passes it back to fetch the next page. This is efficient but means you cannot jump to a specific page number — only forward through the result set.

The OrderHistoryDaoDynamoDb class

The OrderHistoryDaoDynamoDb class is the DAO that handles all DynamoDB interactions. It implements both the update operations (called by event handlers) and the query operations (called by the query API).

UpdateItem over PutItem

For updates, the DAO uses DynamoDB's UpdateItem operation instead of PutItem. UpdateItem modifies specific fields atomically, while PutItem replaces the entire item. Using UpdateItem is safer for concurrent updates because two handlers updating different fields of the same order will not overwrite each other's changes.

Duplicate event detection

To ensure idempotent event handling, the DAO uses a conditional update expression. Each item stores the maximum eventId it has processed, tracked per aggregate source (e.g., per Order aggregate, per Kitchen aggregate). The update only succeeds if the incoming event's ID is greater than the stored maximum. If the condition fails, the event is a duplicate and the DAO skips it.

Implementation summary: Use UpdateItem for concurrency safety, conditional expressions for duplicate detection, a GSI for customer-centric queries, a keywords set for simple text search, and LastEvaluatedKey tokens for pagination.

DynamoDB concept	Purpose in this implementation
`orderId` partition key	Uniquely identifies each order item in the table
GSI on (`consumerId`, `orderCreationDate`)	Enables querying orders by customer, sorted by date
Keywords set attribute	Supports keyword filtering via `contains()`
`UpdateItem`	Atomic field-level updates for concurrency safety
Conditional expression (max eventId)	Detects and skips duplicate events
`LastEvaluatedKey`	Opaque token for forward-only pagination

Key Takeaways

In a microservice architecture, data is spread across services with private databases. You cannot use distributed SQL joins. Two patterns solve this: API Composition (simple) and CQRS (powerful).
API Composition: an API Composer queries multiple provider services and combines results in memory. Place the composer in the client, API Gateway, or a standalone service. Use parallel calls for lower latency.
API Composition drawbacks: increased overhead, reduced availability (mitigate with caching or incomplete data), and no transactional consistency across providers.
Use CQRS when API Composition would require inefficient in-memory joins, when the provider's database does not support the required query type, or when the query logic does not belong in the data-owning service.
CQRS splits data handling into a command side (CUD + event publishing) and a query side (views + event handlers). Query-only services subscribe to events from multiple services.
CQRS benefits: efficient queries, diverse database choices, separation of concerns, and support for event-sourced applications. Drawbacks: architectural complexity and replication lag (eventual consistency).
Choose the view datastore based on query needs: document store, text search engine, graph DB, RDBMS, or key-value store.
The DAO must handle concurrent updates (atomic field-level updates), idempotent event handling (track max eventId per source), and replication lag (event ID tokens or optimistic UI updates).
Build new views by replaying archived events (e.g., from S3 via Spark). Rebuild views efficiently using incremental snapshots.
DynamoDB implementation: use UpdateItem for concurrency safety, conditional expressions for duplicate detection, a GSI for customer queries, a keywords set for text filtering, and LastEvaluatedKey for pagination.

Chapter 5 Home