21 Aug 2019

Building a chat application - part 2

Understanding the database schemas and the required access patterns.

Why DynamoDB

On Part 1 of this blog series, I have described my motivations to develop a chat application. If you didn’t read that, please go ahead! As described on that post, I have decided to use AWS DynamoDB to store our data. The main reason for that is that I simply wanted to use 😅 But besides that, the description of DynamoDB by Amazon can support the decision:

Fast and flexible NoSQL database service for any scale

This pretty much sums up the reasons to use such database for this project:

  • fast: speed is crucial when chatting. We need something fast.
  • NoSQL: enough of over complicating simple problems.
  • database service: I want to avoid headaches to maintain a database. Amazon will handle.
  • any scale: what if this application becomes useful for many people? It’s good to be ready.

Even though AWS describes DynamoDB as flexible (no more need to detail you columns, types, etc!) I would say that it actually is very inflexible. The main reason for that is because in DynamoDB we need to design the schemas to be ready to handle very well all your access patterns. Once this is done and your use cases change, it might be very difficult to adapt. AWS describes a bit better about this on their best practices that anyone that works with DynamoDB should read. There it’s detailed that before thinking about schema, we need to think about the access patterns. So this is what we’ll go through on the following sections.

Access Patterns

The access patterns of this chat can be resolved by using two DynamoDB tables: users and chats. The former one will contain the logic to handle users of the chat and friendships, while the latter will contain details about chat groups, messages and members of each chat. This logic could use a single table (which is often a best practice by DynamoDB team), but to keep things more simple I have decided to go with two. So now to the access patterns.

Chats

  1. Create a private chat (only two users allowed)
  2. Add message to a chat
  3. List messages of a chat
  4. Create a group chat (unlimited users)
  5. Add users to a chat
  6. List chats a user can belongs to

User

  1. Create a user.
  2. Set user last login.
  3. Add user as friend.
  4. Search for users filtering by username.
  5. Get list of friends (can also filter).

Future Access Patterns

The patterns above define just the basic patterns that are required for a chat application to be usable. Once they are done and working, we could think about some new ideas, such as:

  • Adding a friend requires the other to accept the request.
  • User can block another one.
  • Users can leave a chat.
  • Users can set do not invite to a chat.
  • Users can send a message to all friends, i.e. stories.

Chats Table Schema

Now that we have the access patterns, it’s time to set up how the table schema can allow easy and effective access. For that, we will start from understanding that a unique item in a table can be composed of a primary key and a sort key, so by using Sort Key to Organize Data, it’s possible to fit multiple access patterns on a single table. The table below shows how is the schema for the Chats table:

PartitionKey SortKey createdAt chatName message username
<chatId> config <createdAt> <groupName>    
<chatId> message_<timestamp>_<messageId> <createdAt>   <message> <username>
<chatId> member_<username> <createdAt>      

This means that the following example could be how the DynamoDB table looks at some point:

PartitionKey SortKey createdAt chatName message username
952e9c2c config 2019-05-14 Basic Finnish    
952e9c2c member_janeDoe 2019-06-15      
952e9c2c member_johnDoe 2019-06-16      
952e9c2c message_2019-06-17_22a7e56b 2019-06-17   Hey John! janeDoe
c44a425c config 2019-07-03 Advanced Italian    

So now let’s break it down how this schema will make sure that the access patterns are working:

When creating a chat we just need to write details of a chat and setting the SortKey to be config. Same goes for adding a message to the chat, but here the SortKey is a bit different: it starts with message_, but then it has a timestamp and finally the messageId. This is useful because the third access pattern is to list messages of a chat, and for that we can query items where PartitionKey is the chatId and the sort key BEGINS_WITH message_. Here there are a few details on how to work with this query, but for now it’s just important to understand that by having the timestamp on the sort key, we can easily retrieve an ordered list of messages for that chat. A <messageId> is also set there to make sure that two messages created at the same time have unique SortKey.

Creating a group chat is the same as creating a private chat, but the next posts will show a bit what is the difference of each entity. Adding chats to a user only requires creating a new item with sortKey following the pattern member_\<username\>, which is essential for understanding how we can list the chats that a user belongs to. For that, we will need to create a Global Secondary Index (GSI) with SortKey field being the HASH Key for the table. Now if we want to get all chats for janeDoe, all we need to do is query the GSI with the key being member_janeDoe. Until this moment, this GSI does not have RANGE key.

Users Table Schema

Simmilar to the Chats table, the Users table will also use different SortKeys to organize better the data. Below you can see the schema that I’ve come up:

PartitionKey SortKey createdAt email lastLogin
<username> config <createdAt> <email> <lastLogin>
<username> friend_<username> <createdA> <email> <lastLogin>

And one example of how the table will look like is:

PartitionKey SortKey createdAt email lastLogin
janeDoe config 2019-05-14 jane@doe.com 2019-08-23
johnDoe config 2019-06-15 john@doe.com 2019-08-21
janeDoe friend_johnDoe 2019-07-01    
johnDoe friend_janeDoe 2019-07-01    

For creating a user and setting the lastLogin, all we need to do is to insert / update the data where the PartitionKey is the username of the user we want to create/update. To add a user as a friend we are going to create two different entries where we have the SortKey starting with friend. If at some point we want to allow userA to be friends with userB, but userB is not friends with userA, that would still be ok for our implementation.

Listing users is when things gets a bit more exciting. In order to search for users and filter by username, we need to create a GSI that has column SortKey the primary key and Partition Key as the sorting key. With the GSI ready, we would just need to query that index where SortKey is config and PartitionKey begins_with the username to filter. The results would be listed in alphabetically order already since the usernames are on the RANGE key. Listing a user’s friends is even easier, since we just query the table with the username on the PartitionKey and SortKey starting with friend_.

Next steps

Now that we have defined the tables and access patterns, it’s time to start writing and making the queries. This is going to be the main topic for the next blog post, where we will discuss a bit about how to define AppSync’s resolvers with DynamoDB. If you want to already check it out some examples from amazon, you can find more details here.


Looking forward for part 3!