PumpkinDB: What’s Next?

Just recently, we’ve released PumpkinDB 0.2 and it’s a nice incremental improvement over our very first release that attracted quite a lot of gazers, despite a very minimal announcement (heck, just a link to a very minimal website, we had no other materials!). With this release, we’re hoping to attract the next wave of early adopters and contributors to help us figuring out project’s next challenges.

PumpkinDB is essentially a database programming environment, largely inspired by core ideas behind MUMPS. Instead of M, it has a Forth-inspired stack-based language, PumpkinScript. Instead of hierarchical keys, it has a flat key namespace and doesn’t allow overriding values once they are set. Core motivation for immutability was that with the cost of storage declining, erasing data is effectively a strategical mistake.

While not intended for general purpose programming, its main objective is to facilitate building specialized application-specific and generic databases with a particular focus on immutability and processing data as close to storage as possible, incurring as little communication penalty as possible.

In this release, the most notable external change was splitting PumpkinDB into multiple crates. Why is this important? Well, the biggest benefit is that now one can:

a) embed PumpkinDB into their Rust (and soon, perhaps, C) applications andb) build their own custom PumpkinDB-compatible servers with added or reduced functionality very easily.

After all, PumpkinDB is a database engine first and foremost and this change has actually allowed us to build on top of it. As an example (and I hope to have enough capacity to do this relatively soon), this functionality would be used to build a successor for es4j — a lazy event sourcing engine that can be both embedded (where feasible) or accessed remotely.

The introduction of the Dispatcher API in this release has enabled us to split up standard library functionality into manageable modules. Very importantly, it allows for smooth addition, re-configuration and removal of APIs to the engine, so that the aforementioned custom engines can be built easily.

While working on this release, I’ve ventured out to start playing with SPDK — a really nice library that allows communicating with NVMe SSDs bypassing the kernel and thus enabling most efficient communication with the controller, zero-copy reads and writes and most efficient use of the underlying disk to reduce write amplification and remove the variables introduced by operating and file systems. It’s still very early, but the idea is that we can start abstracting away our underlying storage layer (we currently have lmdb hard-coded) so that we can replace it with other engines to suit different use cases. On one hand, that would allow contributors to add other existing engines. On the other hand, this will be a gateway to developing a raw-SSD storage engine to gain the most benefits from that type of hardware. There is a lot to figure out there. Will it be just a B+Tree optimized for sector-addressable SSDs, or will we be able to add support for byte-addressable devices as well? Will it be built on hard-coded assumptions or will it be able to learn from usage patterns over time and infer best storage techniques itself? The sky is the limit (but, of course, our real limit is how many more talented people can we attract to contribute on this end)

We understand that lack of examples is one of the most serious roadblocks for people who are trying to understand PumpkinDB, but can’t do it easily. To address this, we’ve published the very first draft of an example app in Java that uses a remote PumpkinDB server to manage a todo list. While it’s still a bit buggy (our Java skills have definitely atrophied after switching to Rust) and is begging for major refactoring, it’s a starting point.

Todo List Example

Its architecture is briefly explained in the Removing Obscuring Abstractions post I published recently. Hopefully it’ll clarify some of the details (bonus hint: PumpkinScript instructions in that article point to the documentation!)

Another big change we’re dreaming about is the re-haul of the interpreter. Right now, it’s very simplistic. It never even turns the code into any sort of AST, opting for a completely dynamic interpretation approach. While this works relatively fine for small, I/O-bound programs, the more complex these programs become, the more we can see that the interpreter is rather slow (although it was surprisingly better at the Ackermann function than LuaJIT, and I still have no idea why).

So the idea here is to find a way to compile PumpkinScript down to a higher-level VM type of assembly, while keeping in mind some of our unique properties and challenges, such as: granular management of transactional data’s lifetime, temporary resource management (this bug is begging for it), stack abstraction, splitting heap and database data pointers (this would help us with zero-copy reads), pinned heap memory allocation (for SPDK-based storages), and others.

This idea goes beyond improving performance or memory management. It will also help adding other languages to the stack. Even though it is already possible to write a transpiler to PumpkinScript, removing that step and allowing to compile directly to a VM, and being able to re-use all those functionality modules, will certainly make it more straightforward.

Lastly, but not least importantly, we need your feedback, criticism and (yay!) contributions. While PumpkinDB is definitely not for everybody, if you have a taste for something a little quirky, something drawing a bit from the past while trying to peek into the future, something that challenges a few assumptions and doesn’t have to maintain backwards compatibility for a while, (ok, that’s enough!) then you might be in for a treat.

Join us on our Gitter chat or simply browse the list of “starter issues” — either way, welcome and I hope you will enjoy your part of the ride!