4

Parse this, I dare you.

 3 years ago
source link: http://rachelbythebay.com/w/2012/02/08/parse/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Parse this, I dare you.

It seems I may have stumbled across another one of those problems which creates a lot of yelling and screaming. You also probably will get to hear rather bold assertions which, when challenged, yield no results.

Here is the problem. It seems easy enough.

Take a line of characters and split it into four pieces. The first two pieces are always wrapped in double quotes ("), and the last two are not. Those first two pieces may contain any printable content (see the isprint(3) man page). You can't be sure what it will be.

The only real assurance you have is that any instances of " inside the actual first two pieces will be escaped, so it will appear as \". Likewise, the escape character itself, \, will show up as \\.

Oh, for what it's worth, the data won't be too long. If the whole thing goes past 1024 characters, I'd be surprised. If a line goes past 4096, I'm willing to assume it's garbage and can be ignored.

One example line might be this:

"abc def" "123 \"foo\" 456" blah blah

That should turn into four separate items:

  • abc def
  • 123 "foo" 456

That's it.

This particular bit of insanity was brought on by a discussion of the other things I've been doing this evening.


February 10, 2012: This post has an update.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK