4

Understanding Java’s Project Loom

 1 year ago
source link: https://www.marcobehler.com/guides/java-project-loom
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

You can use this guide to understand what Java's Project loom is all about and how its virtual threads (also called 'fibers') work under the hood.

Project Loom’s Virtual Threads

All of them showed, how virtual threads (or fibers) can essentially scale to hundred-thousands or millions, whereas good, old, OS-backed Java threads only could scale to a couple of thousand (TBD: check OS-thread hypothesis in real-world scenarios).

  1. The example the blog posts used, letting 100.000 virtual threads sleep.

Hundred-thousand sleeping virtual threads, fine. But could I now just easily execute 100.000 HTTP calls in parallel, with the help of virtual threads?

Let’s find out.

Why are some Java calls blocking?

Here is the code from our getURL method above, which opens a URL and returns its contents as a String.

When you open up the JavaDoc of inputStream.readAllBytes() (or are lucky enough to remember your Java 101 class), it gets hammered into you that the call is blocking, i.e. won’t return until all the bytes are read - your current thread is blocked until then.

How come, I can now supposedly execute this call a million times in parallel, when running inside virtual threads, but not when running inside normal threads?

Parts of the puzzle - topics you never knew you wanted to know more about after CS 101: Sockets & Syscalls.

Sockets

When you want to make an HTTP call or rather send any sort of data to another server, you (or rather the library maintainer in a layer far, far away) will open up a Socket. And accessing sockets, by default, is blocking.

However, operating systems also allow you to put sockets into non-blocking mode, which return immediately when there is no data availabel. And then it’s your responsibility to check back again later, to find out if there is any new data to be read.

Syscalls

When executing the getURL() call above, Java doesn’t do the network call (open up a socket, read from it, etc) itself - it asks the underlying operating system to do the call. And here’s the trick: Whenever you are using good-old Java threads, the JVM will use a blocking system call (TBD: show OS call stack.).

When run inside a virtual thread, however, the JVM will use a different system call to do the network request, which is non-blocking (e.g. use epoll on Unix-based systems.), without you, as Java programmer, having to write non-blocking code yourself, e.g. some clunky Java NIO code.

To cut a long story short (and ignoring a whole lot of details), the real difference between our getURL calls inside good, old threads, and virtual threads is, that one call opens up a million blocking sockets, whereas the other call opens up a million non-blocking sockets.

Now, if you tried out this (non-sensical) example in the real world⟨™), you’d find that depending on your operating system, and if you are sending or receiving data, you’d run into operating system socket limits - a reminder that using virtual threads is not an automagically scaling solution without you needing to know what you are doing (isn’t that always true? :) )

Filesystem calls

While we are at it. How would virtual threads behave when working with files?

With sockets it was easy, because you could just set them to non-blocking. But with file access, there is no async IO (well, except for io_uring in new kernels).

To cut a long story short, your file access call inside the virtual thread, will actually be delegated to a (…​.drum roll…​.) good-old operating system thread, to give you the illusion of non-blocking file access.

How do virtual threads work?

Even though good,old Java threads and virtual threads share the name…​Threads, the comparisons/online discussions feel a bit apple-to-oranges to me.

It helped me think of virtual threads as tasks, that will eventually run on a real thread⟨™) (called carrier thread) AND that need the underlying native calls to do the heavy non-blocking lifting.

In the case of IO-work (REST calls, database calls, queue, stream calls etc.) this will absolutely yield benefits, and at the same time illustrates why they won’t help at all with CPU-intensive work (or make matters worse). So, don’t get your hopes high, thinking about mining Bitcoins in hundred-thousand virtual threads.

Hype & Promises

Almost every blog post on the first page of Google surrounding JDK 19 copied the following text, describing virtual threads, verbatim.

While I do think virtual threads are a great feature, I also feel paragraphs like the above will lead to a fair amount of scale hype-train’ism. Web servers like Jetty have long been using NIO connectors, where you have just a few threads able to keep open hundreds of thousand or even a million connections.

The problem with real applications is them doing silly things, like calling databases, working with the file system, executing REST calls or talking to some sort of queue/stream.

And yes, it’s this type of I/O work where Project Loom will potentially shine. Loom gives you, the programmer or maybe even more "just" the (HTTP/database/queue) library & framework maintainers, the benefit of essentially non-blocking code, without having to resort back to the somewhat unintuitive async programming model (think of RxJava / Project Reactor ) and all the consequences that entails (troubleshooting, debugging etc).

However, forget about automagically scaling up to a million of private threads in real-life scenarios without knowing what you are doing. There is no free lunch.

What about the Thread.sleep example?

We started this article with making threads sleep. So, how does that work?

  • When calling Thread.sleep() on a good, old Java, OS-backed thread, you will in turn, generate a native call that makes the thread sleepey-sleep for a given amount of time. Which is a non-sensical scenario anyway quite costly for 100_000 threads.

  • In case of VirtualThread.sleep(), you will mark the virtual thread as sleeping and create a scheduled task on a good, old Java (OS-thread-based) ScheduledThreadPoolExecutor. That task will unpark / resume your virtual thread after the given [sleep-time]. Exercise for you: apples-to-oranges, again?

Want to see more of these short technology deep dives? Leave a comment below.

Meanwhile, check out Load Testing: An Unorthodox Guide to find out, why you should worry about other things than scale.

Acknowledgements

Thanks to Tagir Valeev, Vsevolod Tolstopyatov. Andreas Eisele for comments/corrections/discussions.

There's more where that came from

I'll send you an update when I publish new guides. Absolutely no spam, ever. Unsubscribe anytime.

Comments

Login
Raghavan alias
0 points
4 days ago

A well narrated article, making the perceived-to-be-complex-topic in an easy manner. Thank you Marco! I liked the stuff you mentioned about SysCalls in the right context!

Cheers, Raghavan alias Saravanan Muthu.

Anonymous
0 points
4 days ago

I enjoy your writing style and how you have presented your journey, as always, mixed with some common sense.

aydar.kh
0 points
45 hours ago

As always, everything is clear and understandable about the complex topic!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK