Journal Entry IDs
Accounting in IT
Last year I came across a fascinating project called TigerBeetle. It’s a new database which was exclusively built as a financial transaction database. The idea is to have a mission-critical safety journal of all your financial transactions without the need to build the whole core accounting logic around a normal database - may it be a relational or a key-value or any other possible kind for this use case.
The only other possible database solution I could imagine for this is an EventStore , which could also act as accounting journal. But also here you have to be careful how to define the domain and entities including logic around it.
Background Perspective
An accounting journal is basically an append-only-log (aol) or write-ahead-log (wal) where all journal entries (sometimes called primanotas) are stored in strong chronologically order. The normal attention could be to do this just by adding (inserting) these entries to a table in a normal SQL database, because also each SQL database works with a WAL inside. Sounds natural, doesn’t it?
But there’s more to that. If you work in an environment with a really heavy transaction count and even a lot of parallel requests arrive, a normal table in a SQL database becomes a bottleneck.
So let’s imagine using an EventStore as accounting journal, then we would need to define what kind of streams we would like to have which leads to the question: what is our aggregate in this domain?
The best and only answer I came around is the journal entry itself. Using a (ledger) account isn’t a good idea because in double-entry bookkeeping you always have multi-legged - at least a two-legged - transactions with the same amount of credits and debits, so money never gets lost. Splitting the entries to multiple account-streams wouldn’t ensure that, also the stream would get really long and never be closed, especially for a so called counter-account.
Another idea would be a ledger-stream which holds all entries. But again this would become a very long stream even if it gets split on the shortest possible accounting period e.g. one day. And thinking about the details of accounting, you might come to the conclusion that everything around the journal could just be a read model. So the way to go would be short-lived journal entry streams which include the accounting-period where they belong to.
Comparison
That’s why I thought about how and what unique identifier for a journal-entry would be useful and collision free especially in a distributed environment, where multiple replicas of the service which handles the incoming entries are running.
The clear disadvantage of using a UUID is that it cannot be automatically sorted in the correct order. From a database perspective it also needs more storage and selecting is slower, even with the right index, because of the non-sequential nature. An ULID might solve the issue of ordering, but also lacks in efficiency and complexity.
The next usual idea is to just use auto-increment values, especially with an SQL based store and letting the database doing the work. Unfortunately this scales very bad and leads to predictable IDs and potential security breaches.
So the possible best solution is to copy the idea of using so called SnowflakeIDs - a format invented by Twitter back then to get unique sequential IDs across distributed nodes. A SnowflakeID is a 64bit ID based on a timestamp, a nodeID and a strong chronologically increasing sequence number. Because it can be stored as int64 or uint64 it’s super efficient as primary-key or index as well as perfect selectable and sortable by a database.
Based on the fixed length the only limit is the amount of IDs that can be generated in 1 millisecond. But what is the limit? Snowflake can generate 4096 IDs per millisecond which means it can generate 4,096mio IDs per second.
I guess almost 4,1mio journal entries per second is enough to handle a real high load of financial transactions.