The first building blocks

Asynchronous Programming in PHP

When starting this article I wanted to write about quite a lot of things and quite a lot of concepts. However, trying to explain just the fundamental blocks of what asynchronous programming is, I quickly hit the character limit I had and was faced with a choice. I had to decide between going into details of the A’s and B’s or give an eagle’s eye perspective of what is out there in the async world. I chose the former.

We will cover a very basic, naive and simplistic take on what asynchronous programming is like. However, I do believe that the example we explore will give the reader a good enough picture of the building blocks of a powerful and complex technique.

Enjoy!

A service for fetching news

Imagine we work in a startup! The startup wants to build this really cool new service where users input a topic into a search field and they get a bunch of news collected from the best online news sites there are. We are the back-end engineering team and we are tasked with building the core of this fantastic new product – the news aggregator. Luckily for us, all of the on-line news agencies which we will be querying provide nice APIs. All we need to do is for each requested topic to make a call to each of the APIs, collect and format the data so it’s readable by our front-end and send it to the client. The front-end team takes care of displaying it to the user. As with any startup, hitting the market super fast is of crucial importance, so we create the simplest possible script and release our new product. Below is the script of our engine.

 format_europe($europe_news),
    'asia_news' => format_asia($asia_news),
    'africa_news' => format_africa($africa_news)
  ];

  echo json_encode($formatted);

This is as simple as it gets! We give a big “Thank you” to the creators of PHP for making the wonderful file_get_contents() function which drives our API communications and we launch our first version.

Our product proves to be useful and the number of clients using it starts to increase from day to day. As our business expands and so does the demand for news from The Americas and from some other countries. Our engine is easy to expand, so we add news from the respective news services in a matter of minutes. However, with each additional news service, our aggregator gets slower and slower.

A couple of months later our first competitor appears on the market. They provide the exact same product, only it’s blazingly fast. We now have to quickly come up with a way to drastically improve our response time. We try upgrading our servers, scaling horizontally with more machines, paying for a faster Internet connection, but still we don’t get even close to the incredible performance of our competitor. We are in trouble and we need to figure out what to do!

The Synchronous nature of PHP

Most of you have probably already noticed what is going on in our “engine” and why adding more news sites makes things slower and slower. Whenever we make a call to a news service in our script, we wait for the call to complete before we make the next call. The more services we add, the more we have to wait. This is because the built-in tools that PHP provides us with are in their nature designed for a synchronous programming flow. This means that operations are done in a strict order and each operation we start must first end before the next one starts. This makes the programming experience nice, as it is really easy to follow and to reason about the flow. Also, most of the time a synchronous flow fits perfectly with our goals. However, in this particular example, the synchronous flow of our program is what in fact slows it down. Downloading data from external services is a slow operation and we have a bunch of downloads. However, nothing in our program requires the downloads to be done sequentially. If we could do the downloads concurrently, this would drastically improve the overall speed of our service.

A little bit about I/O operations

Before we continue, let’s talk a little about what happens when we work with any input/output operations. Whether we are working with a local file or talking to a device in our computer or communicating over a network, pretty much the flow is the same. It goes something like this.

When sending/writing data…

There is some sort of memory which acts as an output buffer. It may be allocated in the RAM or it may be memory on the device we are talking to. In any case, this output buffer is limited in size.
We write some of the data we want to send to the output buffer.
We wait for the data in the output buffer to get sent/written to the device with which we are communicating.
Once this is done, we check if there is more data to send/write. If there is, we go to 2. If not, we go back to whatever we were doing immediately before we requested the output operation (we return).

When we receive data a similar process occurs.

There is an input buffer. It also is limited in size.
We make a request to read some data.
We wait while the data is being read and placed into the input buffer.
Once a chunk of data is available, we append its contents in our own memory (in a variable probably).
If we expect more data to be received, we go to 3. Otherwise we return the read data to the procedure which requested and carry on from where we left off with it.

Notice that in each of the flows there is a point in which we wait. The waiting point is also in a loop, so we wait multiple times, accumulating waiting time. And because output and input operations are super-slow compared to the working speed of our CPU, waiting is what the CPU ends up spending most of its time doing. Needless to say, it doesn’t matter how fast our CPU or PHP engine is when all they’re doing is waiting for other slow things to finish.

Lucky for us, there is something we can do.

The above processes describe what we call blocking I/O operations. We call them blocking, because when we send or receive data the flow of the rest of the program blocks until the operation is finished. However, we are not in fact required to wait for the finish. When we write to the buffer we can just write some data and instead of waiting for it to be sent, we can just do something else and come back to write some more data later. Similarly, when we read from an input buffer, we can just get whatever data there is in it and continue doing something else. At a later point we can revisit the input buffer and get some more data if there is any available. I/O operations which allow us to do that are called non-blocking. If we start using non-blocking instead of blocking operations we can achieve the concurrency we are after.

Concurrently downloading files

At this point it is a good idea that our team looks into the existing tools for concurrent asynchronous programming with PHP like ReactPHP and AMPHP. However, our team is imaginary and is in the lead role of a Proof-of-Concept article, so they are going to take the crooked path and try to reinvent the wheel.

Now that we know what are blocking and non-blocking I/O operations, we can actually start making progress. Currently when we are fetching data from news services we have a flow like the following:

Get all the data from service 1
Get all the data from service 2
Get all the data from service 3
Get all the data from service n

Instead, the flow we want to have would look something like the following:

Get a little bit of data from service 1
Get a little bit of data from service 2
Get a little bit of data from service 3
Get a little bit of data from service n
Get a little bit of data from service 1
Get a little bit of data from service 3
Get a little bit of data from service 2
We have collected all the data

In order to achieve this, we first need to get rid of file_get_contents().

Reimplementing file_get_contents()

The () function is a blocking one. As such we need to replace it with a non-blocking version. We will start by re-implementing its current behavior and then we will gradually refactor towards our goal.

Below is our drop-in replacement for file_get_contents().

function fetchUrl(string $url) {
    $host = parse_url($url)['host'];
    $fp = @stream_socket_client("tcp://$host:80", $errno, $errstr, 30);
    if (!$fp) {
        throw new Exception($errstr);
    }
    stream_set_blocking($fp, false);
    fwrite($fp, "GET / HTTP/1.1\r\nHost: $url\r\nAccept: */*\r\n\r\n");

    $content = '';
    while (!feof($fp)) {
        $bytes = fgets($fp, 2048);
        $content .= $bytes;
    }
    return $content;
}

Let’s break down what is happening:

We open a TCP socket to the server we want to contact.
We throw an exception if there is an error
We set the socket stream to non-blocking.
We write an HTTP request to the socket.
We define a variable $content in which to store the response.
We read data from the socket and append it to the response received so far.
We repeat step 6 until we reach the end of the stream.

Note the stream_set_blocking() call we make. This sets the stream to non-blocking mode. We feel the effect of this when we later call fgets(). The second parameter we pass to fgets() is the number of bytes we want to read from the input buffer (in our case – 2048). If the stream mode is blocking, then fgets() will block until it can give us 2048 bytes or until the stream is over. In a non-blocking mode, fgets() will return whatever is in the buffer (but no more than 2048 bytes) and will not wait if this is less than 2048 bytes.

Although we are now using non-blocking input this function still behaves as the original file_get_contents...

</div

Asynchronous Programming in PHP

Asynchronous Programming in PHP

A service for fetching news

The Synchronous nature of PHP

A little bit about I/O operations

Concurrently downloading files

Reimplementing file_get_contents()

Recommend

My views on NeoHaskell

Your personalized news briefing, powered by AI

亚马逊和Temu两者之前的差距分析

Terraform Hands-On - Part 2 | Deployment on AWS

10 Best SendGrid Alternatives for Its Email Template Builder

Kotlin Coroutines : Suspending vs Blocking

Language Models as Moral Machines

INTRODUCING | Fayda, the New Ethiopian National ID, to Be Primary Identification...

DJI introduced 1-inch sensor pocket camera - PingWest

A Tip for Properly Handling Loading States in Web Apps

About Joyk