Architext is a multiplayer virtual reality text game that allows you to explore and create worlds entirely made of words (pun intended). And because of the huge expressive power of words, there is no limit to what you can build.
- Enter and explore worlds made by other players.
- Freely build your own worlds using a simple set of commands.
- Share your worlds or play them privately with your friends.
- Talk and interact with other connected players.
- Export your worlds as a backup or to share them online.
As a creator, Architext allows you to take a physical place from your imagination to a reality where you can share it with anyone. It's also great to build puzzle/escape room games and to create a setting to run role-playing games in.
Go to http://archi-text.com and play the five-minute tutorial :-)
Then you can play The Monk's Riddle, an escape room–type game that is a bit more challenging and shows what you can do with the tools at your disposal.
In this overview I try to cover most of the important parts of the system as well as architecture decisions. But be aware that it is not complete and will become outdated, since I won't be updating it with every change. It should make you understand the system as a whole though.
Also, I try to define some things along the way, since different people use the same words with different meanings. Let me know if you find any ambiguity.
At its core, Architext is a monolithic app written with simplicity and flexibility in mind. I refrain from optimizing prematurely, while maintaining an architecture that will let me do so when needed.
- The
Coremodule is the heart of the app. It is a Command and Query-driven module that handles all business logic, queries and use cases, while exposing a thin facade. - The
Chatbotmodule handles text messages sent by players, including interactive multiple-step commands. It is just a layer on top of the Core module, and calls on its functionality when needed. - Both the
CoreandChatbotmodules are directly used by a thin SocketIO server that contains no logic other than session management and authentication. - The web app uses an SDK that is auto-generated by a custom solution (see py-writes-ts repository) to communicate with the server, ensuring end-to-end type safety.
graph TD
web["React Web App"]
types["Auto-generated SDK"]
subgraph Backend
server["SocketIO Server"]
chatbot["Chatbot Module<br/><hr/><i>+ process_message(string)</i>"]
subgraph Core
subgraph Facade
architext["Architext<br/><hr/><i>+ handle(Command)</i><br/><i>+ query(Query)</i>"]
commands["Commands"]
queries["Queries"]
end
core["Core Module's Functionality"]
end
end
web --> types
types --> server
server -.->|generates| types
server --> chatbot
server --> Facade
architext --> core
chatbot --> Facade
style types fill:#fff2cc,stroke:#a67f03,stroke-width:2px,stroke-dasharray: 5 5
Let's now discuss the inner workings of the Core module. The next diagram is a little more involved, but I will explain it in detail.
---
config:
layout: elk
---
graph TD
subgraph CORE MODULE
subgraph FACADE
architext["Architext<br/><hr/><i>+ handle(Command)</i><br/><i>+ query(Query)</i>"]
end
subgraph APPLICATION LAYER
commandhandler["Command Handler"]
eventhandler["Event Handler"]
queries["Query Manager"]
queryhandler["Query Handler"]
bus["Message Bus"]
end
subgraph DOMAIN_LAYER
events["Events"]
entities["Entities"]
services["Domain Specific Logic"]
end
end
subgraph EXTERNAL EVENT QUEUE
externalevent["External Event"]
end
external["External System"]
architext -->|dispatches commands to| bus
bus -->|dispatches commands to the corresponding| commandhandler
events -->|published by handlers to| bus
architext -->|dispatches queries to| queries
queries -->|dispatches queries to the corresponding| queryhandler
external -->|consumes| externalevent
bus -->|dispatches events to the corresponding| eventhandler
queryhandler -->|uses| DOMAIN_LAYER
commandhandler -->|uses| DOMAIN_LAYER
eventhandler -->|uses| DOMAIN_LAYER
events -->|may be published by handlers to| externalevent
Let's start with command handling. Commands are Python data-only objects that describe the intent of the user to modify the state of the system. The facade forwards Commands to the Message Bus which calls the appropriate handler.
The command handler will mutate the state of the domain entities, and may trigger Events. Events are data-only objects that describe something that happened in the system.
Those Events are published to the message bus, which will call all the event handlers subscribed to that event.
Command and Event handlers can also publish Events to an external event queue, opening the door to segregate certain functionality from the main monolith as needed.
Commands must not contain references to any entity from the domain model. Entities must not be leaked outside of the Core module. The same applies to the results returned to the client code.
Reasons why I chose this Event Driven approach:
- Makes it easy to respect the single responsibility principle in the command handlers. Any side effects of an action that are not part of the main use case (like sending notifications to interested parties) should be handled by the event handlers.
- The system also benefits from better error resiliency by default. The notification system may fail, but the command handler will still be able to do its job.
- Events and Commands create a unified language to communicate between the different layers of the system.
Queries are data-only objects that describe the intent of the user to query the state of the system. The facade forwards queries to the Message Bus which calls the appropriate handler, and then returns the query result.
Queries must not contain references to any entity from the domain model. Entities must not be leaked outside of the Core module. The same applies to the results returned to the client code.
The Core module needs to drive some external systems to have an impact on the outside world:
- Store and retrieve data, aka a
Repository. - Send notifications to users, aka a
Notifier. - Publish events to external systems, aka an
External Event Publisher. - Handle transactions and roll back changes in case of failure, aka a
Unit of Work.
Let's see how that is solved in the Core module.
---
config:
layout: elk
---
graph RL
subgraph CORE MODULE
subgraph APPLICATION LAYER
subgraph PORTS
repository["Repository"]
notifier["Notifier"]
uow["Unit of Work"]
external["External Event Publisher"]
end
end
subgraph ADAPTERS
memoryrepo["Memory Repository"]
fakenotifier["Fake Notifier"]
fakeexternal["Fake External Event Publisher"]
fakeuow["Fake Unit of Work"]
sqlalchemyrepo["SQLAlchemy Repository"]
sionotifier["SocketIO Notifier"]
realuow["Unit of Work"]
end
end
memoryrepo --> repository
sqlalchemyrepo --> repository
fakenotifier --> notifier
sionotifier --> notifier
fakeexternal --> external
fakeuow --> uow
realuow --> uow
I have adopted a Ports & Adapters architecture. The Application layer defines each of its
dependencies as an interface that we call Port. Then, instead of directly importing and using
the needed library or tool, the Application code expects to be initialized with objects that
match those Port interfaces, and then use them as needed.
This way the Application layer does not depend on those specific implementations. All it knows
are the interfaces defined by itself.
Then, we create the specific implementation for each Port, which are named Adapters.
For each Port we have created two kinds of Adapters:
Fakeadapters, that are used for testing purposes and do not depend on any external infrastructure.Realadapters, that are used in production.
For example, the Notifier port has the following implementations:
Fake Notifier, which just saves the notifications for later inspection.SocketIO Notifier, which sends the notifications as events to a SocketIO server.Chatbot Notifier, which lets theCoremodule drive theChatbotmodule to notify users through a message in the chat.Multi Notifier, which lets you build a composed notifier, using a different implementation for each type of notification.
Dependencies are injected into the Core module by its facade. The Architext class expects a Unit of Work in its constructor, which contains all the dependencies needed by the Core module.
The Unit of Work is a special Port, because it is used to contain a reference to all other Ports to be used by the application code. This is done simply to simplify dependency passing throughout the system.
By default, all data access is done using the
Repositoryinterface. This poses a challenge as some complex queries may need to be written in raw SQL to optimize performance. To solve this we can substitute any query handler by an alternative one that uses a raw SQL query. We should do this sparingly, since it couples the query to the database, which will add maintenance overhead.
This architectural decision has some benefits:
- It allows us to defer implementation of the adapters until we already have much more information to make the right decisions.
- It allows us to change technologies, even using different ones for different parts of the system, without significant changes in the existing codebase.
- It encapsulates the implementation of the adapters, preventing it from being scattered around the codebase, and making it easier to change.
- It allows us to test all functionality without depending on external systems or mocks.
For example:
- Most of the functionality was first written without a database using only the memory
Repository, speeding up development and testing. Then, connecting the system to a SQL database was done in one coding session without touching the existing code.- The test suite runs in 40 seconds using a SQLite database, 2.5 seconds using the memory
Repository, which is useful for continuous testing during development.- Testing the notification logic is straightforward, without the need for any mocks.
- At the moment, the
External Event Publisheris not yet implemented. The fake implementation just forwards messages to theMessage Bus. Even without its implementation, all functionality is working. Also, we can test a flow that may include different services in a simple test without mocking or setting up any infrastructure.
The Domain layer defines:
- Entities
- Events
- Domain-specific logic (some call it Services)
The goal of the system is to maintain a set of entities in a consistent state, mutate them according to a set of operations (commands) and look at them to get information (queries).
Entities are the "things" that the system is about. Each entity contains the data and logic relevant to that thing.
Entities are part of the Core module inner workings, and should not be leaked outside of it. That coupling may cause problems in the future, making it harder to change the domain model, or even worse, making it change because of some change in the modules that depend on it.
The entities are bundled in aggregates. An aggregate is a group of entities that are related to each other and that do not make sense nor can be identified on their own.
In each kind of aggregate one entity is designated as the root. This is the only entity that any entity outside of the aggregate can hold a direct reference to, and it should do it through an ID. Entities within the aggregate reference each other using a direct reference. An entity's methods should not use any other entity as a reference, except for the root, which can use any other entity inside the aggregate.
This kind of organization makes it easier to persist entities and avoid inconsistencies due to race conditions. Aggregates should be read and modified together, and all their entities can be locked together for each transaction.
In the following diagram, filled diamonds connect entities that are part of the same aggregate with the root entity.
Choosing whether to put entities inside the same aggregate is a decision that should take into account many factors, such as the read and write access patterns, the number of entities, and the complexity and size of the aggregate.
This diagram could get outdated, but still should give you an idea of how the entities are organized.
classDiagram
%% ===== Entities =====
class User {
+id: str
+name: str
+world_id: str?
+active: bool
+email: str?
+world_visit_record: Dict~str, WorldVisitRecord~
+set_room(room_id, world_id) void
+room_id: str? <<property>>
+visited_world_ids: Set~str~ <<property>>
}
class WorldVisitRecord {
+world_id: str
+last_room_id: str
}
class World {
+id: str
+name: str
+initial_room_id: str
+owner_user_id: str?
+description: str
+visibility: "public"|"private"
+base_template_id: str?
}
class Room {
+id: str
+name: str
+world_id: str
+description: str
+exits: Dict~str, Exit~
+items: Dict~str, Item~
}
class Exit {
+name: str
+description: str
+destination_room_id: str
+visibility: Visibility
}
class Item {
+name: str
+description: str
+visibility: Visibility
}
class WorldTemplate {
+id: str
+name: str
+description: str
+world_encoded_json: str
+author_id: str?
+visibility: "public"|"private"
}
class MissionRequirement {
+complete_mission_with_id: str
}
class Mission {
+id: str
+name: str
+description: str
+requirements: List~MissionRequirement~
}
class MissionLog {
+mission_id: str
+user_id: str
+completed_at: date
}
%% ===== Relationships =====
%% User & visits / current world
User "1" *-- "0..*" WorldVisitRecord : visit records
User --> "0..1" World : current world (world_id)
WorldVisitRecord --> Room : last_room_id
%% World & Rooms
World "1" o-- "0..*" Room : rooms (by Room.world_id)
World --> Room : initial_room (initial_room_id)
World --> User : owner (owner_user_id)
WorldVisitRecord --> World : world_id
%% Room aggregate
Room "1" *-- "0..*" Exit : exits
Room "1" *-- "0..*" Item : items
Exit --> Room : destination (destination_room_id)
%% Templates
WorldTemplate --> User : author (author_id)
%% Missions
Mission "1" *-- "0..*" MissionRequirement : requirements
MissionRequirement --> Mission : complete_mission_with_id
MissionLog --> Mission : mission_id
MissionLog --> User : user_id
Events are data-only objects that describe something that happened in the system. Let's see an example:
@dataclass
class UserChangedRoom(Event):
user_id: str
method: Literal["used_exit", "teleported", "changed_world"]
room_entered_id: Optional[str] = None
room_left_id: Optional[str] = None
exit_used_name: Optional[str] = NoneThis event is published by multiple command handlers when a user moves from one room to another, enters, or leaves a world.
This event is consumed by handlers that, for example, notify users when another enters or leaves the room they are in.
Events could contain references to entities, but they should not, since an event may be published to the external event queue and consumed by other systems. Entities should never be leaked outside of the Core module.
Thanks to events, we just write the notification logic once and raise the event whenever a user changes room. This way we avoid the need to maintain the same notification logic in each command handler.
Apart from Entities and Events, the Domain layer contains other domain-specific logic. This includes:
- Operations that use more than one aggregate.
- Logic that does not use any domain entity or logic shared by multiple entities and/or the application layer, such as the rules that govern how items and exits are searched for from a user-provided incomplete name.
Most of the organization of this module comes from the legacy first version of Architext, where everything was handled in the chatbot module (and before I studied the software architecture concepts I like to apply nowadays).
The Chatbot module uses some concepts also applied in the Core module but not all of them, since this is a smaller and far less critical part of the system, with different requirements.
- We still have Ports & Adapters to keep the functionality isolated from infrastructure.
- We don't use repositories as the module does not need to persist any data.
- We don't have Commands or Queries; the only interface of the module is the process_message function, which processes a text message from the user.
- The process_message function finds a suitable
Verband lets it handle the message. Verbscan be stateful and take more than one user message before releasing control of the chat.
Entrypoints are the actual entry points of the server. They drive the Core and Chatbot modules and expose their functionality to the outside world.
They should be as thin as possible, and should not contain any logic that is not related to the entrypoint itself.
The responsibilities of the entrypoints should be restricted to:
- Initializing the
CoreandChatbotmodules with the right dependencies. - Accepting connections from the outside world.
- Parsing user input, calling the right functionality of the app and sending the result back to the user.
- Handling and notifying errors.
- Authentication of users (but not authorization, that resides in the application layer of the
Coremodule).
The SocketIO entrypoint is (for now) the only entrypoint. It is a very thin, simple layer on top of the rest of the system, so not much to say about it.
- It uses
Clerkas authentication provider. I may regret this when I try to migrate to a custom solution, but for now it has saved me a lot of time. - It uses
SocketIOas the communication protocol. It would be better to use SocketIO when needed and HTTP REST the rest of the time, but for now it is easier to maintain just one protocol. Switching to REST would be as easy as writing a new thin entrypoint. - It uses the custom
@eventdecorator to register events with standardized input validation, return type and error handling. It also registers the endpoint for the automatic TypeScript SDK generation done with mypy-writes-tslibrary.
It is unlikely that I will ever need to scale the system beyond a single instance. Anyway, these are some strategies I could apply:
- Since the
Coremodule is stateless, it can be scaled horizontally by running multiple instances of it pointing to the same database. - The
Chatbotmodule is stateful. To horizontally scale it, we need to externalize its state, or ensure every request for any given user is handled by the same instance. - SocketIO connections consume much more resources than HTTP REST requests. If the number of concurrent connections is high we should move everything we can to a REST API, so only users actively playing in the chat will be served by SocketIO.
- In case a query becomes a bottleneck, we should switch to a raw SQL implementation (just for that query).
- We could use a DB cache to speed up queries and reduce hits to the database.
- We could use DB read-only replicas to reduce the load on the main database.
- We could shard the database by world, so that each instance serves a subset of the worlds.
- We could index the database!
I am developing Architext using Test Driven Development (TDD). I tend to keep only use-case tests. They ensure that the system behaves as expected while allowing me to change the inner workings of the system without having to update the tests. And because of the Ports & Adapters architecture I can easily test whole use cases without worrying about infrastructure or mocks.
The way I write a new functionality usually follows this pattern:
-
Write tests covering the functionality you want to implement. In the case of a new use case, I would write a test that covers the entire flow of the use case.
-
Design the new pieces of code needed to implement the functionality, and write unit tests for them. Most of these unit tests will be deleted once the use-case tests pass. They are only here to help me describe the functionality and as acceptance tests.
-
Implement the functionality until the unit tests pass. Return to step 2 until the use-case tests pass.
-
Remove most of the unit tests created in step 2. The use-case tests already check that the modules do their job, so they are not needed anymore. Only keep unit tests for complex functionality that could easily break.
I use small throwaway tests to aid myself while developing. Why not keep them? I don't want to make the system more rigid than it needs to be and maintaining tests of code that is likely to change often.
What doesn’t often change are the requirements of the use cases, so those are the tests that I keep. They ensure the system is still working as expected while letting me change most of the inner workings of the system without having to update them.
