

Friday Q&A 2009-07-17: Format Strings Tips and Tricks
source link: https://www.mikeash.com/pyblog/friday-qa-2009-07-17-format-strings-tips-and-tricks.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Greetings and welcome back to Friday Q&A. This week I'm going to discuss some tips and tricks for using printf
-style format strings in C, as suggested by Kevin Avila.
Introduction
Almost everyone doing C or Objective-C programming uses format strings. In C, they're used by the printf
family of functions. In Cocoa, NSLog
and NSString
both use them. They're a powerful way to build strings, but many people only know the basics. This week I'll delve into some hidden corners to take full advantage of the power it offers. Note that if you don't know the basics already, this article isn't going to make a lot of sense to you, so read up on a good printf
tutorial before continuing.
Finding the Documentation
Hopefully all my readers know this, but just in case: if you type man printf
at your shell prompt, you will get a bunch of confusing stuff that does not appear relevant to C programming. That's because you're actually reading the documentation for the shell command printf
, not the C function. To see documentation on the C function, you need to type man 3 printf
. The Cocoa documentation also contains information on format strings, but since the only significant difference in Cocoa format strings is the addition of the %@
specifier for printing the -description
of objects, I like to just use the printf
documentation.
Varags and Type Promotion
Format strings are always used with a function (or method) that takes variable arguments. This is important for several reasons.
First, the more obvious reason is that C doesn't provide any mechanism for the called function to know how many or what type of variable arguments it got. This means that your format string must exactly match the arguments you provide. Any mismatch could lead to bad output or a crash.
The less obvious reason is that C promotes types in values that get passed as variable arguments. In short, anything smaller than an int
gets promoted to int
, and float
gets promoted to double
. So when you pass in a char
, you'll use a format specifier for int
to print it, and likewise with passing a float
and using a double
specifier.
Types of Unknown Size
Frequently when programming in C or Cocoa you'll use a typedef
whose definition is not guaranteed. Examples of this are size_t
, socklen_t
, NSInteger
, and CGFloat
.
For size_t
it's easy: printf
actually has a format specifier for size_t
: use the z
with one of the standard int
specifiers.
For CGFloat
it's also easy: because float
gets promoted to double
, the same %f
specifier will work with either. No need to change anything.
For socklen_t
and NSInteger
you need to get a little cleverer. You can't use %d
because they might be bigger than an int
. You can't use %ld
or %lld
because they might be smaller than those, and type promotion doesn't carry over. They could even be bigger than those. What you'll want to do here is make an explicit cast to your variable to a size you know will be large enough to hold it, and then use that specifier. For example:
printf("%jd", (intmax_t)myNSInteger);
Strings of Limited Length
The %s
specifier will print a C string. This is tremendously handy. However sometimes you want to print a sequence of characters that isn't necessarily a C string. For this, you can use the .
(that's a period) modifier to specify a length. For example, here is a convenient way to turn a FourCharCode
into an NSString:
uint32_t valSwapped = CFSwapInt32HostToBig(fcc); // FCCs are stored backwards on Intel NSString *str = [NSString stringWithFormat:@"%.4s", &valSwapped;];
The .4
tells NSString
that the string is only four characters long, which keeps it from running off the end.
Sometimes you don't know the length ahead of time. This used to happen a lot with Pascal strings, but they're getting pretty rare these days. For this, you can use *
as your length, and then it will read the length as a separate argument. (Note that this separate argument must be of type int, so beware types of unknown size!)
Here's an example of that:
printf("%.*s", length, charbuffer);
And here's how you can use that to print a Pascal string, in case you ever run into one:
printf("%.*s", pstring[0], pstring + 1);
Printing Pointers
Printing pointers is a handy thing to do but many people don't know how to do it right. You often see code like this:
printf("0x%x", pointer);
This is wrong! Not only is the output ugly (you don't get leading zeroes) but it's not guaranteed to work at all, because you're passing a pointer but specifying an int
.
The correct way is easy: just use the %p
specifier. You get nice hexadecimal output and the type always matches.
Beware of NULL
This one is so commonly ignored that gcc
and clang
actually have a workaround just for this, but it's still interesting to know. NULL
can legally just be a #define
to 0
, like so:
#define NULL 0
If you then try to pass NULL
as a pointer argument to a vararg function like NSLog
, your code is no longer conformant, because you're really passing an int
! For example, this is, strictly speaking, wrong:
printf("%p", NULL);
(Note that the same goes for nil
.)
This is easy to fix: if you ever need to do this sort of thing, you can just cast the NULL
to a pointer type like so:
printf("%p", (void *)NULL);
Note that this problem is most commonly encountered in functions which need a NULL
-terminated list of arguments, like -[NSArray arrayWithObjects:]
or execl
. Yes, that means all of the code out there which looks like this is, strictly speaking, wrong:
[NSArray arrayWithObjects:a, b, c, nil];
How do we get away with it? The compiler helps. As I mentioned before, gcc
and clang
have a workaround for this. They #define
NULL
to be a magic symbol which has either pointer or integer type depending on the context in which it's used, so the correct pointer value is passed into the function.
Always Constant Format Strings
I see far too much code which does this:
NSLog(someString);
This works most of the time, but what if someString
contains the character sequence %@
, or another format specifier? Then you probably crash.
It gets worse. What if you do this with printf
or similar instead, and someString
comes from a source outside your control, like off the internet? Then horrible things can occur.
One of the format specifiers supported by printf
(but not Cocoa) is the %n
specifier. This is very different from the other specifiers, in that it actually gives you a value back instead of taking one from you. It wants an int *
argument, and will write the number of characters written so far into that argument. For example:
printf("%d%n%d", a, &howmany, b);
After this executes, howmany
will contain the width of the first integer being printed.
If an attacker has control over the format string, then they can use the %n
specifier to write an arbitrary value to a location in memory! This can then be used to take over your program. This attack is not theoretical.
In general, you should not pass anything other than a constant string as a format string. Every so often it is useful to build a format string dynamically first, but think hard before you do this whether you can accomplish your goal without that, and if you do it, then take extra care to ensure that your string will always be valid.
Random Access Arguments
Typical format string usage is straight through start to finish. The first specifier uses the first argument, the second specifier uses the second argument, etc. However this is not mandatory! You can actually have any specifier use any argument. This is done by adding n$
to the format specifier, where n
is the argument number to print. Arguments count from 1. For example, this prints the two arguments in reverse order:
printf("a = %2$d b = %1$d", b, a);
You can even reuse the same argument more than once. This can be handy when writing out a long string and you need to use the same variable string, for example a name, multiple times.
printf("%1$s could not be accessed, error %d. Try rebooting %1$s.", name, err);
Note that if you do this, you must not skip any arguments. For example, this is invalid:
printf("a = %2$d", b, a);
The reason for this is revealed in the fact that C does not tell the called function about the arguments. It has to retrieve all type information and argument counts from the format string itself. Here you're giving it incomplete information. It knows there are two arguments, but it has no idea of the type of the first argument. This means that it cannot know how to access the second argument, so the result of making this call is undefined.
Conclusion
That wraps up this week's Friday Q&A. There's a lot more to what format strings can do than what I discussed today. Read the man page and take a look at how you can control precision, padding, output formats, and more.
Friday Q&A will be going on hiatus for at least one week and probably two due to various things which are going to keep me busy in that time.
In the meantime, keep those suggestions coming in. The more topics I have to choose from, the better topics you'll be able to read, so send them in!
Comments:
printf("a = %$2d b = %$1d", b, a);
When in reality it should be:
printf("a = %2$d b = %1$d", b, a);
The dollar sign should be after the digit.
Cheers,
Dave
This is mainly valid as a portability concern: Some other operating system may be more free-wheeling in its headers' definition of NULL, and *then*, it's worth being careful with how you use NULL.
[NSArray arrayWithObjects:a, b, c, nil];
You are correct that using
%ld
will correctly print an NSInteger on all current Cocoa architectures. And two years ago, pointers were always 32-bit on all current Cocoa architectures. Four years ago, integers were always big-endian on all current Cocoa architectures. If you write your code to depend on today's assumptions, your code will break tomorrow.
Numbered argument specifiers are not part of the C standard but they are part of the POSIX standard, so unless you need your code to be portable to non-POSIX platforms you can depend on them to exist. See http://www.opengroup.org/onlinepubs/000095399/functions/printf.html
However my example which mixes numbered specifiers and non-numbered specifiers is not supported at all. It's an all-or-nothing thing.
Jean-Daniel Dupas: I don't believe you're correct that C99 defines NULL as a pointer. The C99 standard is available here: http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf
The relevant passages are this:
But I managed to find a interesting sentence in POSIX though:
3.244 Null Pointer
The value that is obtained by converting the number 0 into a pointer; for example, (void *) 0. The C language guarantees that this value does not match that of any legitimate pointer, so it is used by many functions that return pointers to indicate an error.
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html
Unless POSIX also defines NULL to be a "null pointer" then I'm afraid that definition isn't relevant to the question. All this definition means is that NULL is not necessarily a null pointer.
Something that's been very helpful to me when printf-debugging is using macros to print variables without ever having to mess with format strings. It turns out that 90% of the time I can just say `LOG_ID(name)` and the "name = Vincent" information is all I needed.
Details here:
http://vgable.com/blog/2008/08/05/simpler-logging-2/
Also, Dave Dribin created an excellent DDToNSString() function that can automagically convert a C-type into an NSSString:
http://www.dribin.org/dave/blog/archives/2008/09/22/convert_to_nsstring/
I've been using a modified DDToNSString() in a LOG_EXPR() macro that (mostly) Just Works no matter what type it's given. Once I've worked out a few more kinks, and understand the esoteric build settings it needs, I'll write something up on it.
Well, except that a Class is a valid id. And that we use NULL all over the place (NSError **, anyone?).
However, it's not a problem for things like NSError **. The fact that NULL (and nil and Nil) can be an integer 0 is only a problem when using varargs. For explicitly typed parameters, the 0 will be converted to the null pointer.
That brings up another good Friday Q&A idea. Maybe you should cover implementing a function that uses variadic arguments, some of the pitfalls in doing so, etc. (Maybe even touch on variadic arguments in preprocessor macros.)
Also, Jean-Daniel, NULL and a null pointer are different. A null pointer is a pointer which has been assigned the value NULL. NULL itself is just 0.
Other fun Mac OS X-specific format string tidbits...
* NSString and CFString may be constructed with a format string, and you can specify "%@" to print the description of a Cocoa or CF object, respectively.
* The syslog(3) API allows you to specify "%m" to print the current errno. This does not require a corresponding argument in the argument list, so use with care.
Thanks for the article idea, I'll put it on my list.
printf("0x%08x\n", (uint32_t)ptr);
And use llx instead of x for 64-bit systems. Whenever you're printing out pointer values, 99.99% of the time you're debugging something, so you know the size of pointers on your platform. Hence, it's ok to be lazy and ditch the pointer-to-integer cast entirely.
I also strongly recommend always compiling with the -Wformat warning option (enabled with -Wall) with GCC -- it'll help you catch a lot of easy-to-miss errors often due to typos such as too many arguments, not enough arguments, mismatched format specifiers and arguments, etc.
GCC also has a nifty `format' function attribute which you can use to tag any functions you write that are wrappers around printf/scanf (such as a custom logging function), and it can then check the arguments you pass to that -- see http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bformat_007d-function-attribute-2291 for more info.
<pre> uint32_t valSwapped = CFSwapInt32HostToBig(fcc); // FCCs are stored backwards on Intel
NSString *str = [NSString stringWithFormat:@"%.4s", &valSwapped;];
</pre>
While %.Ns is a clever trick, this wont actually work in general because OSTypes are defined to be in MacRoman the character set where stringWithFormat uses the system encoding which may be different.
Instead you need to use something like:
return [[[NSString alloc]initWithData:data encoding:NSMacOSRomanStringEncoding] autorelease];
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.
Name:Web site:The Answer to the Ultimate Question of Life, the Universe, and Everything?Comment:Formatting: <i> <b> <blockquote> <code>. URLs are automatically hyperlinked.Code syntax highlighting thanks to Pygments.Recommend
-
10
Friday Q&A 2011-06-17mikeash.com: just this guy, you know? Friday Q&A 2011-06-17: gdb Tips and Tricks by Mike Ash It has been said that...
-
10
Compile-Time Tips and Tricksmikeash.com: just this guy, you know? Friday Q&A 2011-04-15: Compile-Time Tips and Tricks by Mike Ash Greetings,...
-
6
C Macro Tips and Tricksmikeash.com: just this guy, you know? Friday Q&A 2010-12-31: C Macro Tips and Tricks by Mike Ash The year is almost o...
-
12
Highlights From a Year of Friday Q&Amikeash.com: just this guy, you know? Friday Q&A 2009-12-18: Highlights From a Year of Friday Q&A by Mike Ash
-
10
Building an HTTP Servermikeash.com: just this guy, you know? Friday Q&A 2009-12-11: A GCD Case Study: Building an HTTP Server by Mike Ash It...
-
10
Building Standalone iPhone Web Appsmikeash.com: just this guy, you know? Friday Q&A 2009-12-04: Building Standalone iPhone Web Apps by Mike Ash ...
-
17
Using Accessors in Init and Deallocmikeash.com: just this guy, you know? Friday Q&A 2009-11-27: Using Accessors in Init and Dealloc by Mike Ash ...
-
8
Probing Cocoa With PyObjCmikeash.com: just this guy, you know? Friday Q&A 2009-11-20: Probing Cocoa With PyObjC by Mike Ash It's another Fri...
-
14
Friday Q&A 2009-11-13mikeash.com: just this guy, you know? Friday Q&A 2009-11-13: Dangerous Cocoa Calls by Mike Ash It's another Friday,...
-
7
Choose the best output format - Azure CLI Tips & Tricks 2 In Azure CLI, you can choose from different output formats. This tip explains how to get results in a particular format and shows how to set your preferred format...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK