4

Advanced Promises in Javascript (Dataloader Pattern)

 2 years ago
source link: https://www.mikealche.com/software-development/advanced-promises-in-javascript-building-a-simple-dataloader?amp%3Butm_medium=rss&%3Butm_campaign=advanced-promises-in-javascript-building-a-simple-dataloader
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Hi there! In this post I want to show a possible scenario/problem that may appear in some applications and lay down the basics of the technique used to solve it.

The first scenario

The scenario we’re talking about is one in which there’s an application whose logic is so complex that some performance inefficiencies start appearing.

The reason the inefficiencies start appearing is that different parts of the code don’t make use of the work being done by other parts. For example, the same user gets fetched twice from the database instead of being fetched once and passed around.

In this moment you may well interrupt and say, “Well that gets easily solved with a caching layer”

That’s true! So let me lay down this other possible scenario:

A second scenario

Two different parts of the application fetch two different users separately (meaning they make two separate db calls).

Well, what’s wrong with the above scenario?…. Nothing’s wrong really!

BUT there’s a simple detail that may allow room for optimization:

If those two calls to fetch two separate users are being done concurrently, then maybe we can improve the performance of our app by doing a single call to the database that brings us back the two users that we need

You may object: “Wait Mike, the two parts of my application that we’re talking about are pretty different between them; I don’t want that code to be tangled together”

Don’t worry we won’t tangle them together!

We will create a “smart” fetching layer that will allow us to load database records in a very simple way, but that will also try to optimize the calls as most as possible by batching them together.

Finally before we begin with the actual code, I want to expose a third possible scenario:

A third scenario

At the moment of writing this post GraphQL is pretty popular, and while it’s really nice to work with it, it has a terrible performance problem inside of it: the n+1 query problem.

The n+1 problem

The n+1 problem appears in GraphQL whenever we want to fetch a list of items and “”join”” them together with other items.

For example, we want to fetch a list of startups and join them together with their founder (for simplicity, let’s assume all of them have a single founder).

Let’s say we’re loading 25 startups, then GraphQL will do the following:

  • Load the 25 startups (1 db call)
  • For the first startup, load its founder (1 db call)
  • For the second startup, load its founder (1 db call)
  • For the third startup, load its founder (1 db call)
  • For the 25th startup, load its founder (1 db call)

In the end, we end up with 26 calls (1 for the startup list + 25 for each startup’s founder).

If instead of 25 we call n the amount of startups we want to list then you can see why the problem is called n+1: you always end with n+1 db calls.

This is pretty bad right?

Yes, it’s horrible.

The one thing that can help us solve this problem is that the 25 requests for founders get called concurrently by GraphQL.

The fact that they get called concurrently should ring you a bell.

Like we talked about in the second scenario, maybe there’s a way we can build a smart fetching layer that actually batches all the 25 independent calls into a single big call that brings us the 25 founders.

Let’s make things more concrete

I’ll try to create a simple example problem for us to work on.

Although simple, if we solve this problem we would’ve basically solved all of the three scenarios above.

Say we have a function loadUserById which takes a single id and returns a promise that resolves on a user.

//Imagine this array is actually bigger....
const users = [
  { id: 1, name: "John" },
  { id: 2, name: "Paul" },
  { id: 3, name: "Mike" },
  { id: 4, name: "Albert" }, 
];

const emulateAPIThatTakesTime = async (value) =>
  new Promise((resolve) => setTimeout(resolve, 1000, value));

const loadUserById = async (userId) => {
    console.log("Doing a roundtrip to the DB")
    return emulateAPIThatTakesTime(users.find(user => user.id === userId))
}

Our codebase is filled with calls to this function.

Maybe there’s even a part which looks like this

await Promise.all([loadUserById(1), loadUserById(2), loadUserById(3),...,loadUserById(25)]);

Which ends up doing 25 roundtrips to the DB…..

NOTE:

Before you keep on reading, the example I’m portraying is an ABSTRACTION of the problem. Of course, if you had this exact line of code in real life you’d simply replace it with a single query to the DB that brings you all the users at once.

The thing is that in real life the line of code above appears in different more disguised ways, masqueraded between your business logic and the perils of the framework that you’ve chosen.

Code that behaves the same as the line above is pretty common in real codebases, the only difference is that it’s written in such a way that’s hard to tell that something wrong is actually happening (i.e: look at how Apollo would resolve nested fields from a query of many items if the resolves are implemented naively).

I hope that extensive note will save me from some “know it all” comments on Twitter 😂, let’s recap:

We have many concurrent calls to the loadUserById function and we want to do something smarter, like batching the calls to the DB, but without actually changing our code too much.

The idea is to keep our code structure with many separate calls (Since we don’t know exactly what we want yet, I’ll name the function “something”):

await Promise.all([something(1), something(2), something(3),...,something(25)]);

But do a single trip to the DB

before you continue reading, try to think how you’d achieve that.

Ready?

Let’s start with the basics

We know we want to batch the calls. So there’s no escaping to building a function that returns a bunch of users on a single DB call.

const users = [
  { id: 1, name: "John" },
  { id: 2, name: "Paul" },
  { id: 3, name: "Mike" },
  { id: 4, name: "Albert" },
];

const emulateAPIThatTakesTime = async (value) =>
  new Promise((resolve) => setTimeout(resolve, 1000, value));

const loadUsersByIds = async (userIds) => {
  const filteredUsers = userIds.map((id) =>
    users.find((user) => user.id === id)
  );
  console.log("Doung a roundtrip to the DB");
  return emulateAPIThatTakesTime(filteredUsers);
};

The loadManyUsersById function takes an array of user IDs and returns a promise that resolves to a bunch of users.

Let’s go back and look at the code that we need to solve:

await Promise.all([something(1), something(2), something(3),...,something(25)]);

The function called something needs to be “smart” enough to send the ID that it receives by parameter to the loadManyUsersById function.

The problem is that something can’t actually call loadManyUsersById, because the moment it calls it, it’s game over: we’re asking for a roundtrip to the DB.

It seems like we need a way to keep track of all the IDs that are being requested by calls to something and then do them all at once.

A good way to keep track of things is with the state that Classes introduce. So let’s build a class, and let’s feed it the function that knows how to load a bunch of users.

//....
class IntelligentLoader{
   constructor(batchLoader){
       this.batchLoader = batchLoader
   }
}

const intelligentLoader = new IntelligentLoader(loadUsersByIds);

We also know that we need to create the something function which takes a single ID and resolves to a single user. A good name for that may be: “load”

Let’s create a method called load on our IntelligentLoader that will behave like that

class IntelligentLoader{
   constructor(batchLoader){
       this.batchLoader = batchLoader
   }
   load(id){
   // What do we write here?
   }
}

What we choose to put inside the body of the load method is the actual magic that we will be creating.

Let’s take little steps:

First of all, we said that we can’t directly call loadManyUsersById (or as we call it inside the class: “this.batchLoader“) because we’d already be losing by making one roundtrip to the DB per each call.

However, we do know that we need to return a promise that resolves to a single user:

class IntelligentLoader{
   constructor(batchLoader){
       this.batchLoader = batchLoader
   }

   load(id){
      return new Promise((resolve, reject) => {
          // We need to return a single user
      });
   }
}

Let’s take another small step:

We said that we wanted to keep track of all the IDs that are being requested before making the actual call, so let’s do that

class IntelligentLoader{
   constructor(batchLoader){
       this.batchLoader = batchLoader
       this.requestedIds = []; //Let's keep track of the IDs requested in an internal array

   }

   load(id){
      return new Promise((resolve, reject) => {
          this.requestedIds.push(id);
          // We need to return a single user
      });
   }
}

Now there’s no escaping: we need to call loadManyUsersById (a.k.a this.batchLoader). However, we want to call it only once, regardless of how many times IntelligentLoader.load is called.

So let’s add a flag called this.batchLoaderHasBeenCalled as an instance property of the class.

class IntelligentLoader{
   constructor(batchLoader){
       this.batchLoader = batchLoader
       this.requestedIds = []
       this.batchLoaderHasBeenCalled = false;
   }

   load(id){
      return new Promise((resolve, reject) => {
          this.requestedIds.push(id);
          if (!this.batchLoaderHasBeenCalled) {
              this.batchRequest = this.batchLoader(this.requestedIds);
              this.batchLoaderHasBeenCalled = true;
          }
      });
   }
}

The code above doesn’t work though because we’re missing two things:

  1. Actually returning a value for each call
  2. Making sure that the this.requestedIds array gets populated with all the needed IDs before the call to this.batchLoaderHasBeenCalled happens

Let’s start with point number 1, since it’s the easiest:

Actually returning a value for each call

Basically at line #15 of the above code snippet, this.batchRequest is a promise that resolves to the bunch of users that we want.

We just need to find the single user corresponding to the id that’s been passed to the load method and then call resolve with it’s value.

class IntelligentLoader{
   constructor(batchLoader){
       this.batchLoader = batchLoader
       this.requestedIds = []
       this.batchLoaderHasBeenCalled = false;
   }

   load(id){
      return new Promise((resolve, reject) => {
          this.requestedIds.push(id);
          if (!this.batchLoaderHasBeenCalled) {
              this.batchRequest = this.batchLoader(this.requestedIds);
              this.batchLoaderHasBeenCalled = true;
          }
          const allTheDBRecords = await this.batchRequest;
          const indexOfTheID = this.requestedIds.findIndex(
              (requestedId) => id === requestedId
          );
          resolve(allTheDBRecords[indexOfTheID]);
          
      });
   }
}

Easy enough! (Note: this assumes the records are returned in order)

Let’s tackle point #2 now:

Making sure that the this.requestedIds array gets populated with all the needed IDs before the call to this.batchLoader happens.

For that we will need the help of a Node.js utility process.nextTick

Each time the event loop makes a full trip, it’s called a tick.

process.nextTick takes a callback as argument calls it after the call stack has unwound but before the event loop continues.

In order words Node.js will execute the function at the end of the current operation but before the next cycle of the event loop begins.

Let’s see some examples to better understand how it works:

(async () => {
  process.nextTick(() => {
    console.log("Hey");
  });
  console.log("There");
})();
// Logs
/**
 * There
 * Hey
 */

(async () => {
  setTimeout(() => {
    console.log("There 1");
  }, 0);
  setImmediate(() => {
    console.log("There 2");
  });
  process.nextTick(() => {
    console.log("Hey");
  });
})();
// Logs
/**
 * Hey
 * There 1
 * There 2
 */

(async () => {
  let ids = [];
  const addId = (id) => ids.push(id);
  process.nextTick(() => {
    console.log("The IDs are", ids);
  });
  addId(1);
  addId(2);
  addId(3);
})();
// Logs
/**
 * The IDs are [ 1, 2, 3 ]
 */

So it seems like process.nextTick will allow us to call the this.batchLoader function immediately after all the IDs are pushed to the this.requestedIDs array.

Let’s do that:

class IntelligentLoader {
  constructor(batchLoader) {
    this.batchLoader = batchLoader;
    this.requestedIds = [];
    this.batchLoaderHasBeenCalled = false;
  }

  async load(id) {
    return new Promise(async (resolve, reject) => {
      this.requestedIds.push(id);
      process.nextTick(async () => { // Wrap the call to batchLoader inside process.nextTick
        if (!this.batchLoaderHasBeenCalled) {
          this.batchRequest = this.batchLoader(this.requestedIds);
          this.batchLoaderHasBeenCalled = true;
        }
        const allTheDBRecords = await this.batchRequest;
        const indexOfTheID = this.requestedIds.findIndex(
          (requestedId) => id === requestedId
        );
        resolve(allTheDBRecords[indexOfTheID]);
      });
    });
  }
}

You can now see that we wrap the call to this.batchLoader inside a process.nextTick

This actually delays calling this.batchLoader all the different calls to IntelligentLoader.load push the id into the this.requestedIds array.

Now, if many calls to IntelligentLoader.load happen concurrently the this.requestIds array will get populated with all the IDs before the call to this.batchLoader occurs!

Yay!! It seems like we’ve solved the problem!

In-depth JavaScript tutorials

I try my best to create high quality JavaScript content. Subscribe if you want to receive more of it :)

I hate spam. Unsubscribe at any time.

Putting it all together

If we put all the pieces together we implement what is known as a super simple dataloader

const users = [
  { id: 1, name: "John" },
  { id: 2, name: "Paul" },
  { id: 3, name: "Mike" },
  { id: 4, name: "Albert" },
];

const emulateAPIThatTakesTime = async (value) =>
  new Promise((resolve) => setTimeout(resolve, 1000, value));

const loadUsersByIds = async (userIds) => {
  const filteredUsers = userIds.map((id) =>
    users.find((user) => user.id === id)
  );
  console.log("This is a roundtrip to the DB");
  return emulateAPIThatTakesTime(filteredUsers);
};

class IntelligentLoader {
  constructor(batchLoader) {
    this.batchLoader = batchLoader;
    this.requestedIds = [];
    this.batchLoaderHasBeenCalled = false;
  }

  async load(id) {
    return new Promise(async (resolve, reject) => {
      this.requestedIds.push(id);
      process.nextTick(async () => {
        if (!this.batchLoaderHasBeenCalled) {
          this.batchRequest = this.batchLoader(this.requestedIds);
          this.batchLoaderHasBeenCalled = true;
        }
        const allTheDBRecords = await this.batchRequest;
        const indexOfTheID = this.requestedIds.findIndex(
          (requestedId) => id === requestedId
        );
        resolve(allTheDBRecords[indexOfTheID]);
      });
    });
  }
}

const intelligentLoader = new IntelligentLoader(loadUsersByIds);

(async () => {
  const users = await Promise.all([
    intelligentLoader.load(1),
    intelligentLoader.load(2),
    intelligentLoader.load(3),
  ]);
  console.log(users);
})();
/**
This is a roundtrip to the DB
[
  { id: 1, name: 'John' },
  { id: 2, name: 'Paul' },
  { id: 3, name: 'George' }
]
*/

As you can see in the logs, we make a single “roundtrip to the DB” although we’re fetching our users by IDs separetely.

In other words, although we’re making multiple calls intelligentLoader.load(1), intelligentLoader.load(2), etc… We manage to batch them all!

Conclusion

In this post we explored some advanced usage of promises. We saw how we can batch many different calls into a single one.

We also touched the surface of the process.nextTick utility that Node.js gives us

And finally we understood a bit of how dataloaders work.

Now, there are some caveats. We haven’t dealt with the possibility of things falling or executing in an environment different than Node (i.e: the browser). However, I think we’ve managed to cover the most interesting part in my opinion

If you want to see the real-world implementation of what we just did, go check out the repo

What do you think?

Did you like this post? Leave me a comment, send me an email, or message me on Twitter! 😀

In-depth JavaScript tutorials

I try my best to create high quality JavaScript content. Subscribe if you want to receive more of it :)

I hate spam. Unsubscribe at any time.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK