Web 3.0 – Docker

How to Implement Decentralized Storage Using Docker Extensions

Marton Elek — Thu, 27 Oct 2022 14:00:00 +0000

This is a guest post written by Marton Elek, Principal Software Engineer at Storj.

In part one of this two-part series, we discussed the intersection of Web3 and Docker at a conceptual level. In this post, it’s time to get our hands dirty and review practical examples involving decentralized storage.

We’d like to see how we can integrate Web3 projects with Docker. At the beginning we have to choose from two options:

We can use Docker to containerize any Web3 application. We can also start an IPFS daemon or an Ethereum node inside a container. Docker resembles an infrastructure layer since we can run almost anything within containers.
What’s most interesting is integrating Docker itself with Web3 projects. That includes using Web3 to help us when we start containers or run something inside containers. In this post, we’ll focus on this portion.

The two most obvious integration points for a container engine are execution and storage. We choose storage here since more mature decentralized storage options are currently available. There are a few interesting approaches for decentralized versions of cloud container runtimes (like ankr), but they’re more likely replacements for container orchestrators like Kubernetes — not the container engine itself.

Let’s use Docker with decentralized storage. Our example uses Storj, but all of our examples apply to almost any decentralized cloud storage solution.

Storj is a decentralized cloud storage where node providers are compensated to host the data, but metadata servers (which manage the location of the encrypted pieces) are federated (many, interoperable central servers can work together with storage providers).

It’s important to mention that decentralized storage almost always requires you to use a custom protocol. A traditional HTTP upload is a connection between one client and one server. Decentralization requires uploading data to multiple servers.

Our goal is simple: we’d like to use docker push and docker pull commands with decentralized storage instead of a central Docker registry. In our latest DockerCon presentation, we identified multiple approaches:

We can change Docker and containerd to natively support different storage options
We can provide tools that magically download images from decentralized storage and persists them in the container engine’s storage location (in the right format, of course)
We can run a service which translates familiar Docker registry HTTP requests to a protocol specific to the decentralized cloud
- Users can manage this themselves.
- This can also be a managed service.

Leveraging native support

I believe the ideal solution would be to extend Docker (and/or the underlying containerd runtime) to support different storage options. But this is definitely a bigger challenge. Technically, it’s possible to modify every service, but massive adoption and a big user base mean that large changes require careful planning.

Currently, it’s not readily possible to extend the Docker daemon to use special push or pull targets. Check out our presentation on extending Docker if you’re interested in technical deep dives and integration challenges. The best solution might be a new container plugin type, which is being considered.

One benefit of this approach would be good usability. Users can leverage common push or pull commands. But based on the host, the container layers can be sent to a decentralized storage.

Using tool-based push and pull

Another option is to upload or download images with an external tool — which can directly use remote decentralized storage and save it to the container engine’s storage directory.

One example of this approach (but with centralized storage) is the AWS ECR container resolver project. It provides a CLI tool which can pull and push images using a custom source. It also saves them as container images of the containerd daemon.

Unfortunately this approach also have some strong limitations:

It couldn’t work with a container orchestrator like Kubernetes, since they aren’t prepared to run custom CLI commands outside of pulling or pushing images.
It’s containerd specific. The Docker daemon – with different storage – couldn’t use it directly.
The usability is reduced since users need different CLI tools.

Using a user-manager gateway

If we can’t push or pull directly to decentralized storage, we can create a service which resembles a Docker registry and meshes with any client.ut under the hood, it uploads the data using the decentralized storage’s native protocol.

This thankfully works well, and the standard Docker registry implementation is already compatible with different storage options.

At Storj, we already have an implementation that we use internally for test images. However, the nerdctl ipfs subcommand is another good example for this approach (it starts a local registry to access containers from IPFS).

We have problems here as well:

Users should run the gateway on each host. This can be painful alongside Kubernetes or other orchestrators.
Implementation can be more complex and challenging compared to a native upload or download.

Using a hosted gateway

To make it slightly easier one can provide a hosted version of the gateway. For example, Storj is fully S3 compatible via a hosted (or self-hosted) S3 compatible HTTP gateway. With this approach, users have three options:

Use the native protocol of the decentralized storage with full end-to-end encryption and every feature
Use the convenient gateway services and trust the operator of the hosted gateways.
Run the gateway on its own

While each option is acceptable, a perfect solution still doesn’t exist.

Using Docker Extensions

One of the biggest concerns with using local gateways was usability. Our local registry can help push images to decentralized storage, but it requires additional technical work (configuring and running containers, etc.)

This is where Docker Extensions can help us. Extensions are a new feature of Docker Desktop. You can install them via the Docker Dashboard, and they can provide additional functionality — including new screens, menu items, and options within Docker Desktop. These are discoverable within the Extensions Marketplace:

And this is exactly what we need! A good UI can make Web3 integration more accessible for all users.

Docker Extensions are easily discoverable within the Marketplace, and you can also add them manually (usually for the development).

At Storj, we started experimenting with better user experiences by developing an extension for Docker Desktop. It’s still under development and not currently in the Marketplace, but feedback so far has convinced us that it can massively improve usability, which was our biggest concern with almost every available integration option.

Extensions themselves are Docker containers, which make the development experience very smooth and easy. Extensions can be as simple as a metadata file in a container and static HTML/JS files. There are special JavaScript APIs that manipulate the Docker daemon state without a backend.

You can also use a specialized backend. The JavaScript part of the extension can communicate with any containerized backend via a mounted socket.

The new docker extension command can help you quickly manage extensions (as an example: there’s a special docker extension dev debug subcommand that shows the Web Developer Toolbar for Docker Desktop itself.)

Thanks to the provided developer tools, the challenge is not creating the Docker Desktop extension, but balancing the UI and UX.

Summary

As we discussed in our previous post, Web3 should be defined by user requirements, not by technologies (like blockchain or NFT). Web3 projects should address user concerns around privacy, data control, security, and so on. They should also be approachable and easy to use.

Usability is a core principle of containers, and one reason why Docker became so popular. We need more integration and extension points to make it easier for Web3 project users to provide what they need. Docker Extensions also provide a very powerful way to pair good integration with excellent usability.

We welcome you to try our Storj Extension for Docker (still under development). Please leave any comments and feedback via GitHub.

Clarifying Misconceptions About Web3 and Its Relevance With Docker

Marton Elek — Thu, 15 Sep 2022 14:30:00 +0000

This is a guest post written by Marton Elek, Principal Software Engineer at Storj.

This blog is the first in a two-part series. We’ll talk about the challenges of defining Web3 plus some interesting connections between Web3 and Docker.

Part two will highlight technical solutions and demonstrate how to use Docker and Web3 together.

We’ll build upon the presentation, “Docker and Web 3.0 — Using Docker to Utilize Decentralized Infrastructure & Build Decentralized Apps,” by JT Olio, Krista Spriggs, and Marton Elek from DockerCon 2022. However, you don’t have to view that session before reading this post.

What’s Web3, after all?

If you ask a group what Web3 is, you’ll likely receive a different answer from each person. The definition of Web3 causes a lot of confusion, but this lack of clarity also offers an opportunity. Since there’s no consensus, we can offer our own vision.

One problem is that many definitions are based on specific technologies, as opposed to goals:

“Web3 is an idea […] which incorporates concepts such as decentralization, blockchain technologies, and token-based economics” (Wikipedia)
“Web3 refers to a decentralized online ecosystem based on the blockchain.” (Gevin Wood)

There are three problems with defining Web3 based on technologies and not high-level goals or visions (or in addition to them). In general, these definitions unfortunately confuse the “what” with the “how.” We’ll focus our Web3 definition on the “what” — and leave the “how” for a discussion on implementation with technologies. Let’s discuss each issue in more detail.

Problem #1: it should be about “what” problems to solve instead of “how”

To start, most people aren’t really interested in “token-based economics.” But, they can passionately critique the current internet (”Web2”) through many common questions:

Why’s it so hard to move between platforms and export or import our data? Why’s it so hard to own our data?
Why’s it so tricky to communicate with friends who use other social or messaging services?
Why can a service provider shut down my user without proper explanation or possibility of appeal?
Most terms of service agreements can’t help in practicality. They’re long and hard to understand. Nobody reads them (just envision lengthy new terms for websites and user-data treatment, stemming from GDPR regulations.) In a debate against service providers, we’re disadvantaged and less likely to win.
Why can’t we have better privacy? Full encryption for our data? Or the freedom to choose who can read or use our personal data, posts, and activities?
Why couldn’t we sell our content in a more flexible way? Are we really forced to accept high margins from central marketplaces to be successful?
How can we avoid being dependent on any one person or organization?
How can we ensure that our data and sensitive information are secured?

These are well-known problems. They’re also key usability questions — and ultimately the “what” that we need to solve. We’re not necessarily looking to require new technologies like blockchain or NFT. Instead, we want better services with improved security, privacy, control, sovereignty, economics, and so on. Blockchain technology, NFT, federation, and more, are only useful if they can help us address these issues and enjoy better services. Those are potential tools for “how” to solve the “what.”

What if we had an easier, fairer system for connecting artists with patrons and donors, to help fund their work? That’s just one example of how Web3 could help.

As a result, I believe Web3 should be defined as “the movement to improve the internet’s UX, including for — but not limited to — security, privacy, control, sovereignty, and economics.”

Problem #2: Blockchain, but not Web3?

We can use technologies in so many different ways. Blockchains can create a currency system with more sovereignty, control, and economics, but they can also support fraudulent projects. Since we’ve seen so much of that, it’s not surprising that many people are highly skeptical.

However, those comments are usually critical towards unfair or fraudulent projects that use Web3’s core technologies (e.g. blockchain) to siphon money from people. They’re not usually directed at big problems related to usability.

Healthy skepticism can save us, but we at least need some cautious optimism. Always keep inventing and looking for better solutions. Maybe better technologies are required. Or, maybe using current technologies differently could best help us achieve the “how” of Web3.

Problem #3: Web3, but not blockchain?

We can also view the previous problem from the opposite perspective It’s not just blockchain or NFTs that can help us to solve the internet’s current challenges related to Problem #1. Some projects don’t use blockchain at all, yet qualify as Web3 due to the internet challenges they solve.

One good example is federation — one of the oldest ways of achieving decentralization. Our email system is still fairly decentralized, even if big players handle a significant proportion of email accounts. And this decentralization helped new players provide better privacy, security, or control.

Thankfully, there are newer, promising projects like Matrix, which is one of very few chat apps designed for federation from the ground up. How easy would communication be if all chat apps allowed federated message exchanges between providers?

Docker and Web3

Since we’re here to talk about Docker, how can we connect everything to containers?

While there are multiple ways to build and deploy software, containers are usually involved on some level. Wherever we use technology, containers can probably help.

But, I believe there’s a fundamental, hidden connection between Docker and Web3. These three similarities are small, but together form a very interesting, common link.

Usability as a motivation

We first defined the Web3 movement based on the need to improve user experiences (privacy, control, security, etc.). Docker containers can provide the same benefits.

Containers quickly became popular because they solved real user problems. They gave developers reproducible environments, easy distribution, and just enough isolation.

Since day one, Docker’s been based on existing, proven technologies like namespace isolation or Linux kernel cgroups. By building upon leading technologies, Docker relieved many existing pain points.

Web3 is similar. We should pick the right technologies to achieve our goals. And luckily innovations like blockchains have become mature enough to support the projects where they’re needed.

Content-addressable world

One barrier to creating a fully decentralized system is creating globally unique, decentralized identifiers for all services items. When somebody creates a new identifier, we must ensure it’s truly one of a kind.

There’s no easy fix, but blockchains can help. After all, chains are the central source of truth (agreed on by thousands of participants in a decentralized way).

There’s another way to solve this problem. It’s very easy to choose a unique identifier if there’s only one option and the choice is obvious. For example, if any content is identified with its hash, then that’s the unique identifier. If the content is the same, the unique identifier (the hash itself) will always be.

One example is Git, which is made for distribution. Every commit is identified by its hash (metadata, pointers to parents, pointers to the file trees). This made Git decentralization-friendly. While most repositories are hosted by big companies, it’s pretty easy to shift content between providers. This was an earlier problem we were trying to solve.

IPFS — as a decentralized content routing protocol — also pairs hashes with pieces to avoid any confusion between decentralized nodes. It also created a full ecosystem to define notation for different hashing types (multihash), or different data structures (IPLD).

We see exactly the same thing when we look at Docker containers! The digest acts as a content-based hash and can identify layers and manifests. This makes it easy to verify them and get them from different sources without confusion. Docker was designed to be decentralized from the get go.

Federation

Content-based digests of container layers and manifests help us, since Docker is usable with any kind of registry.

This is a type of federation. Even if Docker Hub is available, it’s very easy to start new registries. There’s no vendor lock-in, and there’s no grueling process behind being listed on one single possible marketplace. Publishing and sharing new images is as painless as possible.

As we discussed above, I believe the federation is one form of decentralization, and decentralization is one approach to get what we need: better control and ownership. There are stances against federation, but I believe federation offers more benefits despite its complexity. Many hard-forks, soft-forks, and blockchain restarts prove that control (especially democratic control) is possible with federation.

But we can call it in any other way. I believe that the freedom of using different container registries and the process of deploying containers are important factors in the success of Docker containers.

Summary

We’ve successfully defined Web3 based on end goals and user feedback — or “what” needs to be achieved. And this definition seems to be working very well. It’s mindful of “how” we achieve those goals. It also includes the use of existing “Web2” technologies and many future projects, even without using NFTs or blockchains. It even excludes the fraudulent projects which have drawn much skepticism.

We’ve also found some interesting intersections between Web3 and Docker!

Our job is to keep working and keep innovating. We should focus on the goals ahead and find the right technologies based on those goals.

Next up, we’ll discuss fields that are more technical. Join us as we explore using Docker with fully distributed storage options.

Connecting Decentralized Storage Solutions to Your Web 3.0 Applications

Tyler Charboneau — Thu, 19 May 2022 16:01:57 +0000

One thing has become increasingly clear: Web 3.0 (AKA “web3”) is coming soon, and some even expect it to fully emerge in 2022. Web 3.0 also promises to alter many of the internet’s core mechanisms. One key ingredient is decentralization. While many of today’s applications are centralized — where authorities serve and manage data through one primary server — Web 3.0 apps will leverage distributed systems.

JT Olio, Marton Elek, and Krista Spriggs analyzed these trends during their presentation, “Docker and Web 3.0 — Using Docker to Utilize Decentralized Infrastructure and Build Decentralized Apps.” Accordingly, they discussed how containerization and tooling have eased this transition.

JT discusses Web 1.0 and Web 2.0, and how priorities have changed.

We also have unique considerations for storage and usage of decentralized data. How do we tap into it? How does that approach work with decentralized nodes or federated systems? We’ll cover one excellent way to approach this, then outline another use case that’s even simpler. Let’s jump in.

Example #1: Using a Storage Bucket as a Directory

The process of deploying your decentralized application (dApp) differs slightly from traditional methods. Your users will access app data that’s distributed across multiple volunteer nodes — or even federated storage nodes. Since it’s distributed, this data doesn’t live on a central server with strictly-delegated access.

This requires something of a shared gateway bridge between your storage nodes and users themselves. As Marton shared, this shared bridge works well with local bridges and native support. You can even use this solution with Kubernetes, which Krista explained throughout Demo #1. We’ll tackle that example now, and explain how you can achieve similar results with other tooling.

Prerequisites

An active Kubernetes v1.13+ deployment
Your database (PostgreSQL, in this instance)
Your Decentralized Cloud Storage (DCS) solution
The Container Storage Interface (CSI) for S3 (by ctrox, via GitHub)

Kubernetes 1.13+ is needed for compatibility with the CSI. Using Postgres lets you both store and backup your data to other accessible locations, which can also benefit the user. Your DCS serves as the location for this backup. The best part is that this works for most any application you’re already familiar with.

Mount the DCS Bucket Within Your Kubernetes Container

To help decentralize your storage, you’ll use CSI for S3 to point to your own gateway as opposed to an S3 endpoint. CSI’s key advantage is that it lets you “dynamically allocate buckets and mount them via a fuse mount into any container.”

First, you’ll need a Kubernetes (K8s) StorageClass YAML file. Following Krista’s example, creating a simple configuration file requires you to denote some key fields. While specifying elements like apiVersion and metadata are important, let’s zero in on some key fields:

provisioner – tells K8s which volume plugin to use for provisioning persistent volumes. In this case, you’ll specify ch.ctrox.csi.s3-driver to target CSI for S3. While Kubernetes ships with numerous internal options, you’re able to denote this external provisioner for your project, as it follows the official K8s provisioning specification.
mounter – tells K8s to mount a local, cloud, or virtual filesystem as a disk on your machine of choice. Krista advocates for rclone, so we’ll use that here. Rclone is a command-line program for managing cloud-based files, making it quite important while integrating with platforms like Amazon S3 and over 40 others. For example, you might prefer something like Google Cloud, Digital Ocean Spaces, or Microsoft Azure Blob Storage. We’ll stick with S3 in this instance, however.
bucket – tells K8s where core objects are stored. Give your bucket a unique name, which can be anything you’d like.

You might’ve also noticed that you’ll have to pull in some secrets. This requires you to create a secrets.yml file (or named something similar) that contains your .envs. Kubernetes’ documentation specifies the following definition formatting:

apiVersion: v1
kind: secret
metadata:
  name: mysecret
type: Opaque
stringData: 
  config.yaml:
    apiUrl: "https://my.api.com/api/v1"
    username: 
    password:

You can create all specified secrets using the kubectl apply -f ./secret.yaml command. Additionally, you can verify that you’ve created your secret successfully via the kubectl get secret mysecret -o yaml command. This outputs useful details related to creation time, type, namespace, and resource version. Note that mysecret will change to match the metadata name within your config file.

Set Up Your Database Processes

Next, you’ll be using a Postgres database in this exercise. Accordingly, performing a pg_dump transfers all database objects into a backup file. This next critical step helps your application access this data despite it being decentralized. Use the cp (copy) command shown below, in conjunction with your preferred directory, to specify your targeted DCS mount. However, for another project you might opt to use MariaDB, MySQL, or any other leading database technology that you’re comfortable with.

Your application can access the data contained within the DCS volume.

Defining the CronJob

Additionally, your Postgres backup lives as an active container, and therefore runs consistently to ensure data recency. That’s where the CronJob comes in. Here’s how your associated YAML file might partially look:

Because a CronJob is a scheduled task, it’s important to specify a frequency at which it runs. You’ll determine this frequency within the schedule field. While Krista has set her job to run at an “aggressive” once-per-minute clip, you may opt for something more conservative. Your policies, your users’ needs, and the relative importance of pushing “fresh” data will help determine this frequency.

Finally, pay special attention to your env fields. You’ll use these to specify DNS entry points for your container, and choose a mount point related to your CSI’s persistent volume claim:

Assign your PG_HOST the appropriate value which points to that DNS entry. You’ll likely have to use an expanded data query like Krista did, as this lets you effectively contact your active Postgres service.

Spin Everything Up

With all dependencies in place, it’s time to run your application and verify that storage is properly connected. To kick off this process, enter the kubectl apply -f base command. This creates your base K8s resources.

Next, apply your sample application — consisting of your StorageClass.yaml file, Postgres backup shell script, and CronJob. Enter the kubectl apply -f ex command to do so.

Your interface will display an output confirming that your CronJob and CSI persistent volume claim (pvc) are created.

Lastly, there are a few more steps you can take:

1. Tail your Kubernetes events using the kubectl get events --sort-by=’.metadata.creationTimestamp’ --watch command. This confirms that your PVC volume is successfully provisioned, and that the backup job is starting.
2. Confirm that your containers are running and creating using the kubectl get pods command.
3. Run kubectl get pods again to confirm that both containers are running as intended.

As a final layer of confirmation, you can even inspect your logs to ensure that everything is running appropriately. Additionally, the uplink ls --access [accessgrant] command to check on your latest database backups. You’ve successfully connected your application to decentralized storage!

Quick Tip: If you’d like to shorten your commands, then using an alias is helpful. Instead of typing kubectl each time, you can subsequently type k after entering alias k=kubectl into your CLI.

This is great — but what if you have another application reliant on native integration? Let’s hop into our second example.

Example #2: Using the Docker Registry

The Docker Registry is a scalable, stateless, server-side application that lets you store and distribute Docker images. It’s useful when you want to share images however you’d like, or tightly control image storage.

Thankfully, setting up the Registry for DCS is pretty straightforward. Within your registry configuration YAML file, specify a storage section with your DCS vendor, access grant, and bucket as shown below:

You’ll then want to launch the Registry using the registry serve cmd/registry/config-dev.yml command. Your output looks something like this:

Next, pull your image of choice. For this example, use the following command to pull the latest alpine image:

docker pull alpine:latest

Alpine images are preferable due to their small size and access to complete package repositories. However, you can use another image that fits your project better, as needed. You’ll then want to tag this image with a unique name. This’ll come in handy later. Push that image to the Registry with the docker push image.dcs.localhost:5000/alpine:[tag name] command. This process occurs layer-by-layer until completion.

Now, it’s time to confirm that everything is within the Registry using the Uplink tool. Enter uplink ls --access [accessgrant] sj://registry/docker/registry/v2/repositories/alpine to jumpstart this process — which summons a list of alpine repositories:

Adding / after alpine lists additional items like layers and manifests. Tacking on _manifests/tags parses your tags directory.

Congratulations! You’ve successfully established a decentralized storage solution for your Docker Registry.

Conclusion

Uptake for Web 3.0 is getting stronger. Just last year, over 34,000 developers contributed to open source, Web 3.0 projects. There are therefore numerous industries and use cases that can benefit from decentralized storage. This need will only grow as Web 3.0 becomes the standard. We can even tap into Docker to set up storage mechanisms more easily. Since containers and Web 3.0 decentralization overlap, we’ll see many more applications — both Docker and non Docker-based — adopt similar approaches.

Want to host your resources more simply with no maintenance? Docker Hub provides centralized, collaborative storage for your project’s and team’s images. You can push images to Docker Hub and pull them down. Docker Hub also interfaces directly with Docker Desktop to facilitate seamless management of your deployments. If you’re planning to leverage Docker for your next dApp, Docker Desktop’s GUI simplifies the process of managing your containers and applications.