Serverless Beyond the Buzzword 2nd edition: Chapter 7. Data

The first edition was published in 2020, and with the pace of change being as brutal and unforgiving as it is, I started making notes for the second edition within a month of finishing the manuscript. The overall structure has remained the same, but I go far deeper into the different topics in the 2nd edition. There are also more visuals and a couple of new topics. This series of articles provides a summary of each of the chapters with some personal afterthoughts. Serverless Beyond the Buzzword 2nd edition can be purchased here: https://link.springer.com/book/10.1007/978-1-4842-8761-3

Written article continues below the video

Data is often one of the most important assets within an organisation. As such, there are high expectations around protecting and managing it. With the advancement of technology and the wide adoption of the cloud, specialised databases have been built that cater to specific use cases. Instead of only the traditional relational database, we now have many other options such as NoSQL databases, Time Series, Graph and Ledger. Choosing the right tool is essential to ensure that value can be more easily generated from data.

In Serverless, each microservice can have its own distinct database. This can be an exclusive relationship, with other microservices having to go via the owner to interact with that data, or it can be a shared database. Instead of a single centralised database, we can use purpose-built databases for different use cases, such as a Ledger database for transactions, a Graph database for recommendations, and a document database for a product inventory.

Event Sourcing

In a multi-database approach, most databases will store, log, backup, and restore changes only for themselves. This could lead to multiple sources of truth, especially when there is an overlap in the data. The book covers one approach to addressing this using a technique called Event Sourcing. Event Sourcing uses a centralised event store, which contains an immutable list of events that happened within the architecture to automatically determine the actual state of the data.

AWS Serverless database services

Amazon DynamoDB (DDB)

DDB is a NoSQL database with single-digit millisecond performance at any scale. It is the most popular option for building Serverless applications, offering two billing modes: provisioned and on-demand. The provisioned mode requires configuring the desired read and write capacity and bills for the configured capacity even if not fully utilised. The on-demand mode automatically scales capacity based on actual requests and only bills for the consumed capacity.

DynamoDB does have some challenges. The schema and configuration must be well-designed to manage costs and meet the application requirements. Changing these later can be challenging and costly. DynamoDB's lack of query capabilities, such as 'joins', makes it unsuitable for data that includes many relationships.

Amazon Aurora Serverless V2

In April 2022, AWS announced the general release of Aurora Serverless V2 after being in preview for over a year. V2 has been built for serious production relational data workloads with multiple features that cater for high availability, quick and granular scaling and mixed configuration support.

Amazon Timestream

Timestream is a Serverless time series database designed to store and analyse data that has been stored sequentially and tightly bound to specific dates and times. This can be especially useful when the time-order of information is critical for data analysis. It is commonly used for IoT sensor data and logs.

Amazon Quantum Ledger Database (QLDB)

QLDB is a Serverless ledger database that provides a transparent, immutable, cryptographically verifiable transaction log owned by a central trusted authority. This service can be used when the integrity of the data is of utmost importance, and any changes to the data must be verifiable.

Other data services

AWS AppSync

AppSync is a Serverless GraphQL API that integrates with several cloud services behind the scenes. AppSync handles most of the integration with data sources such as DynamoDB and Aurora, where it can retrieve data from and present it to a client via a single standardised interface.

Amazon S3

S3 is an object storage service that can store almost any kind of data in any format. S3 is suitable for use cases such as hosting static websites and files, data lakes, streaming media, archive backup, and logs. There are different storage classes, each suitable for a particular data life cycle, such as 'actively used' or 'long-term archive'. Generally speaking, the less frequently the data is accessed, the lower the storage cost.

Check out the book's mini site for more information and ordering here: https://serverlessbook.co