Friday Q&A 2009-01-16

mikeash.com: just this guy, you know?

Friday Q&A 2009-01-16

Happy Friday to everyone, and welcome back to another Friday Q&A. This week I'll be taking Eren Halici's suggestion to discuss the various ways to do interprocess communication on OS X.

IPC is an interesting and sometimes complicated topic, especially on OS X, which has a veritable zoo of IPC techniques. It can be hard to decide which one to use, and sometimes hard to even know what's available.

OS X is a funny mixture of mach and UNIX so you end up with IPC mechanisms from both:

Mach ports: The fundamental IPC mechanism in mach. Fast, light-weight, extremely capable, and difficult to use. Mach ports will not only let you talk to other processes, but do things as drastic as inject code into other people's programs. The poor state of the mach documentation makes it hard to get started and easy to make mistakes with it. On the other hand, the core mach_msg function is probably the most optimized syscall in the system, so they're really fast to use, and your machine will barely blink if you decide to allocate million mach ports at once.
- CFMachPort: A very thin wrapper around mach ports. CFMachPort essentially exists to allow a mach port to be used as a runloop source. It can also help with creating and destroying the ports. It helps a little with receiving messages and not at all with sending them.
- CFMessagePort: This nice CoreFoundation wrapper around some mach functionality makes it easy to set up synchronous back-and-forth communication between two unrelated processes. You can start a server with just a few lines of code. Another program can then look up that server by name and message it. You get the speed advantages of mach without all the messy stuff going on underneath.
- NSPort/NSMachPort/NSMessagePort: Cocoa has some mach port wrappers too. They're mainly geared toward use with Distributed Objects (more on that below) but can be used on their own as well, if you're brave.
POSIX file descriptors: There are actually several kinds of these but they can all be used with the typical read and write calls once they're set up.
- Pipes: The archetypal POSIX IPC mechanism. If you've ever used the | pipe operator in a UNIX shell, you've used a pipe. Pipes get created in pairs within the same process, so they're good for communicating between parents and children (or between two children of a single, coordinating parent) but not so good for communicating between unrelated processes. Make them with the pipe call.
- FIFOs: It's like a file, but it's like a pipe! A FIFO gets an entry in your filesystem, just like a file, but writes don't go to the filesystem, instead they go to whatever process has opened the fifo for reading. You can make these with the mkfifo call. The end result is a pipe that has a filesystem entry, which can make it easy for two unrelated processes to hook up. The processes don't even have to know that they're talking to a fifo. Try it out in your shell:
```
    $ mkfifo /tmp/fifo
    $ cat /tmp/fifo
    
    # in another shell
    cat > /tmp/fifo
    type some junk here
```
- Sockets: You probably know these from working with TCP/IP, but they can also be used to communicate locally, and not just by connecting to localhost. If you create a socket in the AF_UNIX family you get a socket that's only for local communication and uses more flexible addressing than TCP/IP allows. AF_UNIX sockets can be created using a filesystem path much like a FIFO by using the socket and bind calls, but allowing multiple clients and more options for how the communication works. They can also be created anonymously using the socketpair call, giving you something much like a pipe, except bidirectional.
Shared memory: Shared memory is a magical piece of memory which appears in multiple processes at once. In other words, you write to it from process A, and read from it in process B, or vice versa. This tends to be very fast, as the data itself never touches the kernel and doesn't have to be copied around. The downside is that it's really difficult to coordinate changes to the shared memory area. You essentially get all of the disadvantages of threaded programming and most of the disadvantages of multi-process programming bundled together in one neat package. Shared memory can be created using either mach or POSIX APIs.
Miscellaneous, not really IPC: There are some techniques which don't really count as "IPC" but can be used to communicate between programs if you want to.
- ptrace: This system call exists mainly for writing debuggers, but could in theory be used to do non-debugger things too. Not recommended, included only for completeness.
- Files: Sometimes it can be useful to communicate using plain old files. This can be as simple as creating a lock file (a plain empty file that works simply by being there) for mutual exclusion, or you can transfer actual data around by writing it to a file, then having the other program read it. This tends to be inefficient since you're actually writing to the filesystem, but it's also easy and nearly universal; every application can read files!

Those are all what I would call system-level functionality, things which are either provided directly by the kernel/libSystem, or which are thin wrappers around them. OS X also provides a bunch of higher-level IPC mechanisms at the framework level:

Apple Events: Scourge of the Skies, Champion of the Ugly Contest, King Slow, Emperor Horrible. Apple Events are all of these things, but they're also tremendously useful. They're the only IPC mechanism which is universally supported by GUI applications on Mac OS X for remote control. Want to tell another application to open a file? Time for Apple Events. Want to tell another application to quit gracefully? Apple Events time. Underneath it all, Apple Events are built on mach ports but this is mostly not exposed in the API.
- AppleScript: Everything Apple Events is and worse, but still often useful, AppleScript is a scripting language built on top of Apple Events. Generally it's best to avoid AppleScript and simply send the corresponding raw Apple Events instead, either directly or through a mechanism like Scripting Bridge. AppleScript support is the standard way to allow users to script your application, although if you ever try to add AppleScript support to your application you'll find yourself wishing for a different standard.
Distributed Objects: It's like Objective-C, but it happens over there! DO gives you proxy objects that can be used (mostly) just like local objects, with the exact same syntax and everything, except that your messages fly across to the other process and get executed there. DO normally runs over mach ports but can also be used with sockets, allowing it to work between computers as well. DO is really cool technology and it's the sort of thing that tends to blow people's minds when they come to Objective-C from lesser languages such as Java or C++. Unfortunately DO is also really old and crufty and tends to be strangely unreliable. This is especially true when using it with sockets to talk to remote machines, but is even true when using it locally. DO is also completely non-modular, making it essentially impossible to swap out the IPC mechanism it uses for something custom (like if you want to encrypt the stream). It is worthy of investigation if only to learn about how it works, and despite the shortcomings can still be very useful in certain situations.
Distributed Notifications: These are simple one-way messages that essentially get broadcast out to any process in the session that's listening for them. Extremely easy to use, and available in both Cocoa and CoreFoundation flavors. (And they interoperate!) The downside is that they don't guarantee delivery and they're very resource-intensive due to potentially messaging every application on your system. They would be completely unsuitable for something like transmitting a large picture to another process, but are great for simple one-off things like "I just changed my preferences, re-read them now". Internally this is implemented by using mach ports to talk to a centralized notification server which manages the task of getting notifications to where they want to go.
Pasteboard: Probably the IPC mechanism that you've directly used the most. Every time you copy and paste something between applications, that's IPC happening! Inter-app drag and drop also uses the pasteboard, and it's possible to create custom pasteboards for passing data back and forth between applications. Like distributed notifications, pasteboards work by talking to a central pasteboard server using mach ports.

So which one is right for you? Well, it all depends on what you're doing. I've used nearly every one of these to accomplish different things over the years. You'll have to see which one fits your problem best, and I hope the above gives you a good place to get started.

That wraps things up for this week's Friday Q&A. Come back next week for another exciting installment. Did I miss your favorite IPC mechanism? Don't understand the point of one of them? Did I miss the whole point of one of them? Post your comments below.

Remember, Friday Q&A is driven by your suggestions. I'm much easier than Public Radio: I don't ask for your money, just your ideas, but your contributions make this possible. Post your ideas in the comments, or e-mail them. (Reminder: I will use your name unless you tell me not to.)

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

I'm curious about the unreliable behavior you described regarding DO, and why it's worthy of investigation in spite of this. Perhaps more detail on this could be a topic for a future Friday.

I think your higher-level IPC section should have included a blurb about MIG. Most of the system uses it as a high level* way to automate the mach minutia, far more than any of the other methods you listed.

* high level being a somewhat relative term

An interesting review of the possibilities. In just over a year of Cocoa coding I have found myself using all of these methods of IPC with the exception of FIFO's and shared memory. There is no getting away from IPC in modern coding.

My first port of call was DO, which looks great on paper put doesn't seem to stack up too well in practice - I would hesitate to use it in anger for large scale network communication. It doesn't win any prizes in the cross platform parade either, but then neither does the .NET remoting technology.

AppleScript et al are undeniably nasty, whether you look at them at the individual event level, at the OSA component level or at the actual script level. It is however pervasive and powerful.

MIG? might as well cover RPC's with rpcgen too :)

"Generally it's best to avoid AppleScript and simply send the corresponding raw Apple Events instead, either directly or through a mechanism like Scripting Bridge."

While I completely understand the sentiment, as a long-time application scripter I feel this is still a little unfair. It's true the language itself has many significant faults - unpredictable and ambiguous syntax being the biggest of them (even William Cook, one of the original designers, now admits this was a mistake). However, even after fifteen years the AppleScript language still provides the most elegant and reliable API for creating and sending Apple events to other applications.

(It also provides the only OSA-based language component worth a damn, although IMO that's as much the fault of the OSA API as other language developers. But I digress.)

While the format string-based AEBuild* functions are fine for performing fairly simple tasks from C/C++/ObjC, they become very verbose for more complex tasks. And while Scripting Bridge may provide a higher-level API, it heavily obfuscates the actual Apple events mechanism (far more than AppleScript does), reducing functionality, confusing and misleading users as to what's going on, and causing various incompatibilities with many scriptable applications. Even appscript, which must've been through a few thousand hours of design and testing by now, still occasionally chokes on some application commands that would work flawlessly in AppleScript.

The other thing to remember is that an awful lot of what AppleScript gets blamed for isn't actually its fault at all, but the fault of individual application developers who don't adequately design, test and/or document their Apple event APIs. In turn, some of that blame can be placed on Apple in general for not providing application developers with sufficient tools and guidance in the first place.

BTW, while I've never used DO myself, I'm disappointed to hear that it has problems too. IMO Apple should develop a new, conventional object-oriented IPC system - something simple, efficient and reliable. Perhaps something along the lines of the traditional browser DOM, which is what most developers and users seem to expect and prefer anyway. Apple event IPC - which they've never gotten to work completely right and most likely never will - could then be quietly moved to legacy status and eventually retired completely, thereby curing all its ills. Then again, if they can't get DO right either... :/

Another nice thing about Mach ports is that you can send a memory buffer as part of a message (OOL data). The buffer is remapped into the target process with copy-on-write semantics, so basically you can send over an arbitrarily large chunk of data between processes with virtually no overhead. The Mach APIs will even deal with deallocating the memory for you if you tell it. Amit Singh's book has details.

I'd also be curious to hear what problems you've had with DO.

Wow, I had to write some code on windows and it's great writing a CreateRemoteThread. After writing that code i moved over to Mac. I never looked into the mach API's but now i will give them a lookover

Erik: Thanks for the suggestion. To go over it quickly here, DO tends to be unreliable under really heavy use, with complicated objects, or when used remotely. It also tends to deal unreliably with timeouts. It's still worth investigation because if your needs are simple, those problems mostly go away.

arwyn: Yep, MIG (stands for Mach Interface Generator for those who don't know and want to find out more) is definitely worth checking out if you want to look more into the mach side of things.

has: I strongly disagree and think it's completely fair. AppleScript is so bad that I went as far as writing my own Apple Event builder library (AEVTBuilder) to avoid using it in one application. I don't know how you're getting anything to work flawlessly in AppleScript but whatever secret you have, I'd like to know what it is. My experience with AppleScript is that it's a frustrating exercise in trial and error for even the most trivial of tasks. It's useful for banging out quickie scripts to get some apps to do what you want as a user. As a developer, my opinion is that AppleScript the language is 100% useless and should simply be avoided. AppleScript support in your app can still be worthwhile but it hurts pretty hard.

Pau Garcia i Quiles: The reason I didn't mention DBUS is perfectly answered by the beginning of your second question: "It's not included with Mac OS X". To be frank, IPC mechanisms which don't ship with the OS and which require installing daemons do not interest me.

I see the asshole crowd is out now. Thanks very much, 74.79.232.49 for that delightful security test, but I'm afraid I'm going to have to delete both the attempted script injection and the very long string of nonsense now.

mikeash: "I strongly disagree and think it's completely fair. AppleScript is so bad that I went as far as writing my own Apple Event builder library (AEVTBuilder) to avoid using it in one application."

And I wrote appscript so that I could control AppleScriptable applications from Python (and later Ruby and ObjC). More precisely, I spent the first year trying to make appscript work better than AppleScript... and then I then spent the next four years just trying to make the damn thing work as well as AppleScript.

Along the way, I learnt a valuable lesson: you cannot out-AppleScript AppleScript, and thinking you can just gets your butt kicked. It's pretty much impossible to create a high-level Apple event bridge that works better than AppleScript does - even if you get the basics right (which early versions of appscript didn't, and most others never have), you still have a ton of problems to deal with: almost every application available today is developed and [hopefully] tested against AppleScript, with the result that many of AppleScript's own unique quirks are relied on by those applications. For a high-level bridge to work really well, it not only needs to speak Apple events fluently, it also needs to replicate AppleScript's own particular way of speaking them.

"I don't know how you're getting anything to work flawlessly in AppleScript but whatever secret you have, I'd like to know what it is."

No secret, alas (otherwise I'd patent it and be rich). Just lots of hard work. Although it's still less work than is needed to script applications equally well from Python/Ruby/ObjC, since at least with AppleScript you have a very well established community and lots of existing example scripts and documentation to help you. (And believe me, it took me several years and much poking from the likes of Matt Neuburg to suck it up and admit this.)

"My experience with AppleScript is that it's a frustrating exercise in trial and error for even the most trivial of tasks."

Oh, I quite agree. AppleScript syntax causes real problems: in order to provide high-level readability (i.e. anyone can look at existing AppleScript code and get a fair idea of what it does) it sacrifices low-level semantics (i.e. it's a real bear to understand how it does it). e.g. I recently discussed this at http://macscripter.net/viewtopic.php?pid=108997

However, the majority of problems in figuring out how to get any particular scriptable application to do what you want it to do are due to the schlonky implementations and crappy documentation provided by individual application developers. And these deficiencies affect all application scripters equally; so regardless of what language you use, you'll find it "a frustrating exercise in trial and error".

...

Like I say, I think Apple should replace the whole lot with a conventional, dumb-as-rocks object-oriented IPC system, as that's what most users - and most Apple engineers - seem to want anyway, but realistically I don't expect them to do so any time soon as that would be an outright admission that they screwed up the first time around.

In the meantime, sure, blame AppleScript for mistakes it is responsible for (e.g. syntax soup). However, blaming it for everybody else's screw-ups as well - crappy application scripting interfaces and documentation; faulty Apple event bridges, etc. - only serves to let the real culprits off the hook. And that ultimately doesn't serve anyone's interests.

"DO is really cool technology and it's the sort of thing that tends to blow people's minds when they come to Objective-C from lesser languages such as Java or C++."

I've never used DO, but your description of it sounds very much like Java's Remote Method Invocation or Jini. Except that Jini *does* allow you to specify the IPC mechanism to allow for encryption or whatever.

It's worth noting that anything dealing with Mach under the covers will invariably have to deal with Mach bootstraps. A naive use of CFMessagePort is okay in most cases, but on Leopard, each user has his own Mach bootstrap, and each console session (be it SSH or the GUI console) has its own Mach bootstrap underneath there. Therefore, you cannot expect, having registered a DO server in the GUI (aka Aqua) bootstrap, that a client in the SSH (aka StandardIO) bootstrap will be able to look up the name.

I'd highly recommend checking out the Daemonomicon for further details.

http://developer.apple.com/technotes/tn2005/tn2083.html

Also, if you're going to use Mach, don't bother using the mach_* APIs. Learn to love MiG. Even XNU exports its interfaces via MiG, and there are very few clients of the mach_* APIs. MiG has evolved over years and years to encompass various corner cases in its generated code that would be very difficult for just about anyone to deal with their first time through. And when using MiG, keep in mind that, to do really interesting stuff, you're going to have to delve into the implementation at some point. MiG doesn't do a very good job of maintaining a wall of separation between interface and implementation, so don't feel too dirty about it.

More on MiG...

ftp://ftp.cs.cmu.edu/project/mach/doc/unpublished/mig.ps

Also, for simple, synchronous communication over a socket, you can also check out BetterAuthorizationSample.

http://developer.apple.com/samplecode/BetterAuthorizationSample/index.html

It has a nice library for sending commands to a server over the wire as serialized CFDictionary's, making it very easy to create a rich, synchronous protocol for client/server communication. For asynchronous communication, you'd have to look elsewhere though. I had written a library to do what BetterAuthorizationSample's did using asynchronous means by way of CFRunLoop's, but I can't release it for various reasons.

It's also worth pointing out the most basic difference between Mach and Unix domain sockets. Mach is a message-based protocol, while sockets are streams. So when you get a Mach message, it's guaranteed that it's the whole message. With sockets, you have to work out things like dealing with incomplete messages, whether the kernel buffer has been filled, etc. Mach takes care of all that for you. The downside is the the loss of portability, lack of documentation and misunderstandings that come from poor terminology choices on the parts of the original Mach developers. But for local interprocess communication, Mach is (in my opinion) the better option once you have mastered it.

The deal-breaker for Mach is if you need to pass file descriptors between processes. Mac OS X does not currently support passing file descriptors via Mach messages. Nor can it pass Mach port rights over sockets. Those two worlds are separate in this respect.

I'd also like to note that IPC is particularly important on Mac OS X, especially on the daemon side of things, because it's the mechanism for expressing dependencies when writing a launchd daemon or agent.

By the way, Mike, your comment system doesn't handle umlauts very well. :)

Thanks for the comment. I'm aware that the comment system falls down with non-ASCII characters, but the python/MySQL combination has defeated my every attempt to fix the problem. Maybe I'll try again soon, in the meantime, sorry, only ASCII accepted here.

Comments here sound like xpc is the way to go for IPC

Great write-up, I am a big believer in commenting on blogs to inform the blog writers know that they’ve added something worthwhile to the world wide web!..

I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me.. Thanks for all your help and wishing you all the success in your business.

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:Web site:The Answer to the Ultimate Question of Life, the Universe, and Everything?Comment:Formatting: <i> <b> <blockquote> <code>. URLs are automatically hyperlinked.Code syntax highlighting thanks to Pygments.

Recommend

Friday Q&A 2009-01-23

Friday Q&A 2009-01-30: Code Injection

Friday Q&A 2009-02-06: Profiling With Shark

Late Night Cocoa: NSOperationQueue Problems

Friday Q&A 2009-02-13: Operations-Based Parallelization

Friday Q&A 2009-02-20: The Good and Bad of Distributed Objects

Friday Q&A 2009-02-27: Holistic Optimization

Friday Q&A 2009-03-06: Using the Clang Static Analyzer

Friday Q&A 2009-03-13: Intro to the Objective-C Runtime

Friday Q&A 2009-03-20: Objective-C Messaging

About Joyk