"One to rule them all" - service rebuild story

V1 → V2

Image of the One Ring from Lord of the Rings

Kacper Walczak · 03-09-2024

Service rebuild + NoSQL migration of millions of documents on Google Cloud.

Updated: Kacper Walczak, 14 Sep 2024

Introduction

I was working at educational project. We got there a microservice to store Tags that were attached to the Tree Structure content model.

I have faced a problem to rebuild entire service to V2, due to the fact that V1 was not serving our purposes because V1 models allowed data to be not properly inserted into Tree Structure.

Main reason: Data from DB was not shown properly on screen - it was stored in wrong places, and not retrievable.

Data was stored in NoSQL database Firestore on Google Cloud Platform and even it was not fully opened it contained few millions of documents.

Requirements for migration

We needed to use Feature Flags because system was still running
We needed to migrate millions of documents to new model before switching FF(Feature Flag)
We needed to rewrite frontend view as well(it was outsourced previously and it was easier to rewrite than to deal with it) to fulfill new improvements to process
We wanted to securely switch to V2 with possibility to fallback to V1 if something was not right

Problem

Our V1 model was using Firestore to store all it's data. But it was not doing it right, V1 stored data in this form:

Problem: list of shared spaces with tags inside would match ONLY with weird integration IDs... and they were based on LIMIT(1) and... there was possibility to match wrong shared doc due to the Firestore way of generating ID's that can move upwards our new Shared document what... could destroy LIMIT(1) ordering.

Solution

Prepare Requirements

Firstly we need to gather all requirements from the business. We need to check if everything is covered and written on some document(can be as simple as shared Word doc via browser).

Once we gathered them we can move on.

Create Architecture Proposal

Second step is to think about all pros and cons of different approaches for our goals.

Programming can solve your problems in infinite ways, but there will be cons and pros of each of them.

Decide what are your Architecture Drivers like:

when project should be finished?
what availability of this new product should be?
do we need to test it in some specific way?
how much money(spreads on team that can do the job) do we have for it(ask your boss if you don't know, maybe just ask for how many ppl we can put there)?
what are security rules?

Once we will be sure about all drivers, we can carry on with thinking about mulitple approaches and compare if they fit our needs.

For this problem I have created this model:

With it we could simply fetch data for tree with:

FETCH TAGS: {pathToSpace: Space1/Space2/Space3}

And the response would look like this:

{
    "Space1": [Tag1, Tag2],
    "Space2": [Tag2, Tag3],
    "Space3": [Tag4]
}

I have proposed even GraphDB approach (take a look at Graph DB comparison), due to the fact that with it we could omit Space1/Space2/Space3 with simply:

FETCH TAGS: {id: Space1, withChildren: true}

or

FETCH TAGS: {id: Space3, withParents: true}

And result will be the same as with NoSQL approach.

You can achieve same behavior with Linked List Data Structure in regular SQL/NoSQL but it will be a bit slower than GraphDB (graph store data in RAM).

Accept Architecture with Team/Architect/Team Lead

Our Architect decided to use simplest possible approach to this problem, we stayed with:

FETCH TAGS: {pathToSpace: Space1/Space2/Space3}

Inserts are similar.

Prepare PoC

I have prepared 3 PoC, one with Neo4j (I love Cipher ❤️ and library Apoc.spanningTree for Neo4j), one with Tigergraph using Breadth-First-Search Algorithm (Tigergraph is really hard if you don't have PhD and need to go through their documentation at that time, it pointed to mathematical graph theory bullshi*, but they luckily provided this function in their api/github so I took it from there and removed unnessesary elements).

The final version was built around NoSQL Firestore, due to the pricing of Neo and TG (Firestore is cheaper in our case).

Test if it fulfill Requirements

We have put final product in front of the business on multiple video-calls.

They bring new ideas (and new requirements... sic!).

But it passes...

Create Feature Flags

Feature Flag is simply an object that we will check if is true/if exists and if so, then route to V2 -> otherwise route to V1 approach.

In NoSQL Firestore you can simply create another collection of documents with ID, like:

Create collection FeatureFlags,
Append simple document with just id v2_enabled_fetch_tags to the DB:

[
    "v2_enabled_fetch_tags": {},
]

On your actions you can check now if it exists, you can always remove this FF and it will fallback to V1 service.

Prepare V2

Now you need to prepare your V2 final product with all architecture drivers covered.

Create new project or add to existing repo subfolder V2, it's up to you.

Write every required endpoints, models, etc.

Setup Cloud deployment files like: <something like app>.yaml | Dockerfile | tf folder (Terraform to simply spawn your machine) | etc.

It will depend on what way of deployment you have focused with this product.

Bring to the project some TestContainers or simply use Emulators to emulate DB in Unit Tests.

Write Integration Tests to ensure business logic is done in a valid way.

Preapare ACL (Anti Corruption Layer)

Anti Corruption Layer can be used within your models (like get this children if V2, else get that), within your commands in CQRS (if version: 1, do that, if version: 2, do else).

In this approach we simply created top layer to reroute to specific service (we have had one repo with v2 folder).

python_approach.py

def fetch_tags(path):
    v2_enabled = ff_repo.exists(FF.v2_enabled_fetch_tags)  # FF.v2_enabled_fetch_tags return simply "v2_enabled_fetch_tags"
    service = v2_service if v2_enabled else v1_service
    return service.fetch_tags(path)

So as you can see it's pretty straightformward.

Write migration scripts

I have used Javascript with RxJS library due to the fact that we needed to concurrently migrate documents to the new shape in DB.

I have used CQS (Command Query Seperation) pattern to distinguish between Read and Write (to ensure no problems arise, still millions of documents and I'm just a human).

Example of CQS:

const Read = (db) => ({
    fetchTags: (spacesIdArr) => // fetch here... return Observable
    // ... rest of the methods
})
 
const Write = (db) => ({
    insertTags: (path, tags) => // insert tags in some way
    // ... rest of the methods
})

Why RxJS?

RxJS provides perfect API for async operations with operators like: merge (to increase throughput) | concatMap (to freely iterate with order) | pairwise (to check if prev/next elements are similar) | etc.

example-migration.js

// ...
/**
 * Flattened tree - each represents way to leaf
 * @example
 *  space1 {tags: [...]}
 *    space2 leaf {tags: [...]}
 * 
 *  space3 {tags: [...]}
 *    space4 {tags: [...]}
 *      space5 leaf {tags: [...]}
*/
const spacesToMigrate = [
    ["space1", "space2"],
    ["space3", "space4", "space5"],
]
const fetchTagsObservables = spacesToMigrate.map(idsInOrder => Read(DB).fetchTags(idsInOrder)) // sub and get {"space1": [Tag1,],}
const spaceTreesWithTags$ = merge(...fetchTagsObservables)
const migration$ = spaceTreesWithTags$.pipe(
    map(res => Object.entries(res)),
    merge(entries => {
        const path = []
        return entries.map(([id, tags]) => {
            path.push(id)
            return Write(DB).insertTags(path, tags)
        })
    })
)
 
migration$.subscribe({
    next: () => {
      // fetch and insert - will go: bum, bum, bum, bum... ;)
    },
    error: e => console.error('Problem with: ' + e),
    complete: () => console.warn('Finished migration')
})

Migrate data to the new collection

It's my favorite part, you simply run your migration scripts to the new collection in Firestore (so any old data will stay untouched and fallback will be possible).

You should ask business if some specific time will be better to do it.

Not wait til done...

Deploy V2

Simply as title says, do the job on your current CI/CD pipelines or do it manually to initial tests, do it with some Terraform script and terraform apply [options] [plan file], etc.

It depends on how you used to deploy/host your microservices.

Switch Feature Flags

Now create script that will instantyly Switch FF's.

Move traffic to V2 service

This script should instantyly re-traffic to new version of the service.

Use your app with V2 service

Now your users are fully engaged with V2 version of your rewritten service.

Clean up Feature Flags

This is the last part, we always need to cleanup FF's, it is done mainly quite some time after the job is done (due to possible fallbacks, if you need data integrity you can simply in ACL write to V1 and V2, it will allow you to always have chance to go back to prev solution).

In this article we have learned how to safely migrate your services to V2/V3/etc.

Check next:

Graph DB usage comparison

READ

Latest readings

Readings are sites which will help you with detailed
information about given topic. Read latest ones from Learn.

06-03-2026

Local Voice Assistant with Ollama

Build your own local voice assistant powered by Ollama.