Skip to content

Latest commit

 

History

History
179 lines (92 loc) · 20.8 KB

File metadata and controls

179 lines (92 loc) · 20.8 KB

Application Infrastructure

Dynamic applications — those that require maintaining a mutable state, such as a knowledge repository or a real-time messaging system — present particular challenges within the IPFS ecosystem. While IPFS allows for maintaining a shared state through the continuous publication of new CIDs and the use of mechanisms like IPNS (introduced and used in the deployment infrastructure), it is not specifically designed for the efficient updating of dynamic data in real time.

This introduces certain limitations when building interactive applications where multiple users need to read and write data concurrently. In particular, the lack of native mechanisms to manage versions, synchronize changes, or prevent conflicts between simultaneous edits makes developing mutable-state applications directly on IPFS inconvenient.

To overcome these limitations, we developed AstraDB — an infrastructure designed to facilitate the creation and management of dynamic and collaborative applications within the IPFS ecosystem. The following sections describe the components and architecture of AstraDB, how it represents entities such as articles or conversations, and how it adapts to the needs of real-time, community-driven, and decentralized applications, such as collaborative wikis and me

AstraDB

AstraDB is an infrastructure designed to facilitate the development of community-driven, distributed, and decentralized applications that require maintaining a mutable state. Built as a higher-level layer over OrbitDB, it abstracts many of the low-level aspects required to operate databases in peer-to-peer networks.

Its design prioritizes collaboration among multiple users, allowing anyone to actively contribute to the maintenance of the application without relying on central servers or dedicated infrastructure. To achieve this, it automates fundamental tasks such as synchronization, node discovery, and data replication among collaborators.

AstraDB exposes a simple interface inspired by the key-value model, where each entity — such as a conversation or an article — is associated with a unique key. The main operations allow adding new values under a key, querying existing ones, and subscribing to real-time updates, thus facilitating the development of collaborative applications without exposing developers to the complexity of the distributed environment.

Data Representation

AstraDB uses OrbitDB for data representation. OrbitDB is a peer-to-peer distributed database that relies on IPFS for storage and libp2p for synchronization between nodes. It is an eventually consistent database, designed specifically to operate in decentralized networks without depending on central servers, making it especially suitable for distributed applications.

OrbitDB offers different types of databases adapted to various models and use cases, such as document databases or sequential event structures.

To guarantee true decentralization and prevent the existence of a privileged entity with special permissions, AstraDB uses event-based databases, which follow an append-only structure. This type of database maintains an immutable history of entries, where nodes can only add new records without modifying or deleting existing ones. This ensures the integrity and preservation of information, allowing any user to contribute without compromising the system’s consistency.

This decision to work with append-only structures, however, introduces a new challenge: how to represent updatable information if the value of a key cannot be overwritten? The solution adopted by AstraDB consists of assigning each key its own event database. In this way, for example, an article can be represented as a sequence of updates stored in chronological order. Each entry in the database corresponds to a change made to that entity, and its complete history allows reconstructing its most recent version or auditing its evolution over time.

Each entity in AstraDB is represented by its own event database:

Each entity is represented by its own event database.

This approach also enables efficient distribution: since in OrbitDB each database must be replicated locally in order to be read, AstraDB allows each node to synchronize only the keys (and therefore the databases) that are relevant to it. This significantly reduces storage and communication overhead.

To keep track of which keys exist within an AstraDB instance — for example, which articles are present in a knowledge repository — an additional event-type database is used, functioning as a global index. New keys are added to this index as they are created, allowing nodes to discover available entities without needing to know the entire system in advance.

The global index is implemented as an event database:

The global index is implemented as an event database

Identification and Data Access

Each database in AstraDB is identified through a unique address generated by OrbitDB. This address is composed of three elements: the database type, the access controller, and the name identifier. Since AstraDB always uses the same type (events) and an open access controller — which allows any user to add information — the final address is determined solely by the chosen name.

This property is fundamental for the proper functioning of the system. It allows any node, by simply knowing the name of an entity, to access its database and synchronize with the rest of the nodes already replicating it. This means that when a database with the same name is created, a new one is not generated; instead, the same existing database is returned and, therefore, shared.

Example of an OrbitDB database address:

/orbitdb/zdpuAmrcSRUhKQnRQ6p4bphs7DJWGBkqczSGFYynX6moTcDL

AstraDB extends this identification mechanism to build logical hierarchies of entities. For instance, the name of a database can take a structure such as wiki:article, allowing large amounts of data to be organized in a structured and meaningful way.

Hierarchical structure of identifying keys:

Hierarchical structure of identifying keys

This approach also allows multiple instances of AstraDB to coexist independently. The identity of each instance — that is, the set of keys and their respective value databases — is determined by the name identifier used to create its main database. This makes it trivial to create new applications, wikis, or collaborative systems simply by using different names, without causing collisions or interference with other instances.

Representation of multiple independent AstraDB instances:

Representation of multiple independent AstraDB instances

Synchronization and Eventual Consistency

In OrbitDB, synchronization between nodes occurs eventually, meaning that over time, all instances of the same database will contain the same information — though not necessarily at the same moment.

OrbitDB emits events when a database is updated, which is crucial for AstraDB, especially when synchronizing with other nodes and receiving new entries. However, due to the eventual consistency model that characterizes OrbitDB, even when using an event-based sequential database, the update event triggered by synchronization may not represent the latest chronological insertion. It is possible that, after synchronization, content may appear before or after more recent entries within the internal structure of the database.

To address this, AstraDB implements an abstraction layer on top of OrbitDB that detects all new entries regardless of their position. This abstraction keeps a record of the hash identifiers of previously seen entries and, upon receiving an update event, scans the database for unseen entries, emitting a new event for each of them.

This mechanism is particularly important in applications such as real-time messaging, where it is necessary to ensure that no message goes unnoticed — even if it was inserted in an intermediate position in the database as a result of merging two previously unsynchronized instances.

Example of synchronization between nodes:

Example of synchronization between nodes

The example illustrates this phenomenon: two nodes maintain different versions of the same database, named Chat. After synchronization, OrbitDB merges both versions, generating a new consistent instance. However, from the perspective of node 1, the new entries may not appear exclusively at the end of the database, and the only automatic notification emitted by OrbitDB corresponds to the last appended entry. Therefore, it is essential to traverse the entire database to ensure the complete detection of entries incorporated during synchronization.

Collaborators

In AstraDB, every existing instance of an OrbitDB database — upon opening — must locate other nodes already replicating it to synchronize its content. If none are available, the database will appear empty, implying a possible loss of information if no nodes still preserve it.

To solve this issue, AstraDB introduces the concept of collaborators: nodes that choose to actively preserve and replicate all existing databases. In the case of the knowledge repository, for instance, this means locally maintaining every published article in the system.

The presence of at least one connected collaborator ensures that any user can access the complete content of a database. In this way, the network does not rely on a centralized infrastructure but rather on the willingness of users to actively participate in preserving the application.

Unlike other distributed approaches based on economic incentives, AstraDB relies on a community model: anyone can choose to become a collaborator without the need for special permissions or rewards. Its sole function is to provide persistence to the stored databases, helping ensure the system remains accessible.

Connectivity

For synchronization to occur between different OrbitDB databases, both peers must first be connected. Given the eventual consistency model used by OrbitDB, it does not take responsibility for maintaining node connections—it simply assumes that, if two nodes are connected, their replicas will eventually synchronize. Therefore, this responsibility is delegated entirely to the underlying network layer, which in this case is built on LibP2P [60], through the implementation provided by Helia [40]. This means that it is our responsibility to explicitly define how nodes communicate by selecting which transport protocols to use according to the environment where they run.

LibP2P is a modular library designed for building peer-to-peer networks. It does not define a single connection method but instead offers a set of interchangeable components—such as transport, encryption, multiplexing, and peer discovery—that can be combined depending on the environment’s requirements. This flexibility is essential for our infrastructure, as it allows us to adapt node behavior to the restrictions of web environments or leverage the full capabilities of independent processes.

LibP2P provides multiple transport protocols that allow nodes to maintain a persistent connection. Each node can be configured with multiple protocols, and the choice depends on where that node runs. In our infrastructure, we distinguish between:

  • Independent nodes, which run in environments such as Node.js.
  • Web nodes, which run inside a web browser.

Independent nodes use TCP as their primary protocol. It is stable, widely supported, and imposes no technical restrictions in controlled environments. This protocol allows collaborator nodes to communicate directly with one another.

On the other hand, web nodes face stricter constraints. Modern browsers do not allow direct TCP or UDP connections for security reasons, and instead restrict connections to secure protocols that comply with Secure Context policies, such as HTTPS.

To overcome this limitation, the infrastructure uses WebRTC-Direct, a transport protocol compatible with browsers that enables direct connections between an independent node and a web node. WebRTC [88] is a standard technology designed for real-time applications such as video calls and file exchange. LibP2P adapts it to achieve peer-to-peer communication between nodes without relying on central servers or certificates.

Thanks to this combination of protocols — TCP for independent nodes and WebRTC-Direct for web nodes — the system achieves a fully functional network where any node can connect with others, even within browsers, without compromising the decentralization of the architecture.

Connection topology between independent nodes and web nodes:

Connection topology between independent nodes and web nodes

Collaborator Discovery

In AstraDB’s infrastructure, beyond defining transport protocols and how nodes connect, it is crucial to address the mechanism by which a node identifies which other nodes it should connect to in order to synchronize and obtain a database — that is, how collaborator discovery takes place.

To preserve the decentralized nature of the system, this function is implemented by leveraging the content discovery model used by IPFS. IPFS relies on a content addressing scheme [24], where data is identified and accessed through a unique identifier based on its content, known as the Content Identifier (CID), instead of its physical location.

Content localization in IPFS is performed using a Distributed Hash Table (DHT), which functions as a distributed index. Each node maintains a portion of the global index that maps each CID to the provider nodes of that content. When a node needs to access specific content, it queries the DHT to identify the active providers corresponding to that CID [29].

This mechanism is directly adopted in AstraDB for collaborator discovery. Since the CID represents the database — derived from its name identifier — and collaborators are the nodes providing that CID, an interested node queries the DHT to obtain a list of active nodes replicating the database. In this way, a fully decentralized model is established, where collaborators announce their availability and new nodes can discover and synchronize with them without relying on predefined nodes or central servers.

User Identity

In AstraDB, users are not identified through centralized usernames, but rather through a cryptographic scheme based on public and private key pairs, managed via OrbitDB identities.

If a node starts without a provided private key, a new identity is automatically generated. The resulting key pair is then stored and reused on subsequent executions, allowing the node’s identity to persist across sessions.

Architecture

The infrastructure of AstraDB is built upon the concepts and technologies detailed in the previous sections, integrating decentralized storage through IPFS and OrbitDB, together with peer-to-peer connection management via LibP2P.
This technical foundation enables the construction of a modular architecture that clearly separates data management from connection management.

This architecture is composed of two main components that operate in parallel and independently: the Key Repository and the Connection Manager.

AstraDB architecture:

AstraDB architecture

Key Repository

The Key Repository module is responsible for the overall management of data within the infrastructure. Its main function is to maintain a record of all existing keys and handle the necessary updates to them.

At startup, this component creates the central database based on the provided name parameter and synchronizes it with the available collaborators. If no connection is established with any collaborator, it is assumed that this is a new database, and the module continues operating autonomously.

On nodes configured as collaborators, upon receiving updates about new keys from the central database, the Key Repository opens and stores the corresponding databases locally. This process ensures key persistence and allows other users to access them through synchronization.

All databases managed by this module —both the central one and those associated with each key— are represented through an abstraction called Database, which encapsulates all interaction with OrbitDB.

During the initialization of a Database instance, it must be defined whether synchronization is required. If so, the system waits to connect to a collaborator node to obtain all available updates. Once synchronized, the database becomes ready for use, including new content insertion and full data retrieval.

Additionally, any update received in a database managed by this module automatically generates an event that can be listened to by other system components. This enables the implementation of reactive functionalities, such as real-time interfaces, facilitating the integration of additional logic based on changes in the distributed state.

Connection Manager

The Connection Manager is responsible for handling connections between nodes, including the discovery and establishment of connections with collaborators that provide the database, as well as the publication of the node as a provider when it acts as a collaborator.

At startup, the module constructs the CID corresponding to the central database. For this purpose, it uses Helia [40], the JavaScript implementation of IPFS, to compute the identifier from the provided name. This CID is shared across all AstraDB instances that use the same name, allowing nodes to identify each other and synchronize consistently.

Then, three services are initialized that run concurrently during system execution: SearchForProviders, ProvideDB, and ReconnectToProviders.

The SearchForProviders service performs continuous searches for new database providers using the IPFS DHT through LibP2P. When it finds one, it attempts to establish a connection to initiate synchronization.

The ProvideDB service, running only on collaborator nodes, announces in the DHT that the node can provide the database, facilitating its discovery by others.

Finally, ReconnectToProviders maintains a list of previously connected providers and performs periodic reconnection attempts, ensuring continuity in case of temporary disconnections.

These services allow nodes to discover each other, share information in a decentralized way, and keep the network resilient against disconnections or partial failures.

Use Case Implementation

The architecture and mechanisms of AstraDB were tested in two representative use cases: Astrawiki, a knowledge repository, and Astrachat, a real-time messaging system.

In the case of Astrawiki, keys represented wiki articles, while values corresponded to the modifications made to each article. This made it possible to reconstruct the full edit history of an article from its stored sequence of changes.

For Astrachat, each key represented a specific chat, and the values were the messages sent within that chat. The system enabled user login through the provision of a private key and supported real-time listening for updates to a given chat, thus ensuring the immediate reception of new messages.

Both use cases successfully demonstrated how, in their implementation, the underlying ecosystem —such as IPFS, OrbitDB, and LibP2P— could be abstracted away thanks to the mechanisms provided by AstraDB. This abstraction layer allowed developers to focus exclusively on the specific application logic, without needing to deal with the lower-level details of the distributed infrastructure.

As a result, not only were the functional requirements of both systems fulfilled, but a solid foundation was also established for extending the platform to new scenarios. As long as entities within a use case can be represented as keys and their evolution as a sequence of events, AstraDB provides a robust and reusable environment for building distributed applications in a simple and modular way.