3

2020-08-16: I wrote the world's worst text editor (so you don't have to)

 2 years ago
source link: https://briancallahan.net/blog/20200816.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Dr. Brian Robert Callahan

Dr. Brian Robert Callahan

academic, developer, with an eye towards a brighter techno-social life


[prev]
[next]

2020-08-16
I wrote the world's worst text editor (so you don't have to)

Having a small game and an echo program doesn't really provide us with too much functionality. We need to have a couple of meta programs, or programs that can aid us in creating new programs. What is the minimal meta program suite? For us, it looks like we can get by with an assembler and a linker. A C compiler would be helpful. But we can't create those things without a text editor. A text editor will allow us to create any program we want. But what counts as a text editor?

According to Wikipedia, a text editor is "a type of computer program that edits plain text." So let's write the most minimal program that can edit plain text. Some text editors you may have heard of include ed, vi, emacs, and nano. We also know of editors like Visual Studio Code, Sublime Text, and Atom. Of these, all but ed are visual, or screen-oriented, text editors. Ed, besides being the standard text editor, is a line-oriented text editor.1

You can grab the finished code for this write-up here on GitHub.

What functions do we need?

I have come up with a short list of functionality that I think represents the absolute bare minimum for a program that can edit plain text:

  • Write a line of text at an arbitrary location in the text file
  • Add a newline at an arbitrary location in the text file
  • Delete a line of text at an arbitrary location in the text file
  • Shift lines down and up to deal with the addition or removal of lines
  • Save a text file to permanent storage, for later editing or further processing

Note here the emphasis is on lines, not characters. The other emphasis is on the arbitrariness of action location. A text editor should not just start at the top and continually append lines at the end but should also be able to insert lines of text in between existing lines of text or at the very top of a file.

Another important thing to note is that there is no consideration for printing the contents of the file to a terminal or other output device. We can absolutely write a text editor that does not display the text and can use another program like cat(1) to display the text instead.2 However, I think it is not only reasonable but will be better for our incredibly sparse UI to have a way of printing at least the active line to the terminal.

Setting about

First thing we need is a catchy name. I settled on planck, after the units of measurement and to signify that you can't get any smaller than this (at least, I can't; perhaps you can). At this point we should know the drill: copy over _start.s, _syscall.s, and crt.s and create a new C file called planck.c. We are only going to need five syscalls: _exit, read, write, open, and close. According to the manual page for open(2), it takes a variable number of arguments. Since we intend to potentially create brand new files, we are going to have a signature of int open(const char *path, int flags, int mode) so we can handle the creation of new files with a proper 000644 mode. The other four syscalls we will implement as declared in their manual pages. We also want to keep our strlen function.

Further constraints

There are some further contraints for us to deal with. For one, we have no malloc function, so we will stick to a statically sized buffer for our text file.3 Perhaps we should write one in a future blog post; dynamic memory allocation would be a nice feature to have. But that's for another day. For today, let's pick a fixed buffer size that is still small but reasonable. If I'm thinking in terms of planck being used as a text editor for code, I try to keep things with 80 columns of text. It's a style I like. So let's give ourselves lines of 128 characters (it'll really be 126 characters, since each will end with a newline and a NUL character for reasons that will become clear later). And let's say we can have text files of up to 1024 lines. When it comes to C code, we can always have multiple text files. We don't need everything in just one file. I think 1024 is reasonable. If not, well, it's all open source software we can simply make it larger and recompile. We will call this array linecol and make it a two-dimensional array: char linecol[1024][128];

main, part 1

Since we know how to pass argc and argv to our program and we are doing so, we can have an optional parameter: if the user specifies a file in the command invocation, we will attempt to open that file and use it. That will let us do some work on our text file, save it, and reopen it later to work further. That seems like an absolute requirement. If for some reason we can't open the file, probably because it doesn't exist, let's presume that's the name of the new file the user wants to create. If a file isn't provided on the command line, we can ask the user to provide a file name when they go to save the file. We will also need a buffer to store the file name, a buffer to store the line currently being edited, and some variables to store the current line number, letter being typed, and some loop variables. The code that sets up our variables and handles the situation where the user specifies a file on the command line looks like this.

int
main(int argc, char *argv[])
{
	char linecol[1024][128];
	char file[1024], line[128];
	char buf[5], c;
	int co = 0, fd, i, j, li = 0, save_name = 0;

	if (argc > 2) {
		dputs("usage: ", 2);
		dputs(argv[0], 2);
		dputs(" [file]\n", 2);

		_exit(1);
	}

	for (i = 0; i < 1024; i++) {
		for (j = 0; j < 128; j++)
			linecol[i][j] = '\0';
	}

	if (argc == 2) {
		for (i = 0; i < strlen(argv[1]); i++)
			file[i] = argv[1][i];
		file[i] = '\0';

		save_name = 1;

		if ((fd = open(file, 0x0000, 0)) == -1) {
			dputs("planck: error: could not open ", 2);
			dputs(file, 2);
			dputs("\n", 2);

			goto begin;
		}

		co = 0;
		li = 0;
		while (read(fd, &c, 1) > 0) {
			linecol[li][co] = c;

			if (++co > 126) {
				dputs("plank: error: line ", 2);
				dputi(li, 2);
				dputs(" is longer than 127 characters\n", 2);
			}

			if (c == '\n') {
				if (++li > 1023) {
					dputs("plank: error: ", 2);
					dputs(argv[1], 2);
					dputs(" is greater than 1024 lines\n", 2);

					_exit(1);
				}
				co = 0;
			}
		}

		if (close(fd) == -1) {
			dputs("planck: error: could not close ", 2);
			dputs(file, 2);
			dputs("\n", 2);

			_exit(1);
		}
	}

dputs

I wanted a convenience function to write strings to the terminal; dputs is that function. It follows a similar function signature to fputs but uses a file descriptor instead of a file stream.

static void
dputs(const char *s, int fd)
{

	write(fd, s, strlen(s));
}

main, part 2

Even with a sparse UI, we should give the user some information on startup. Ed gives you the number of characters in the file it opened. I'm not sure how meaningful that is for us. I think a better piece of information is the number of lines contained in the file. So let's print that, or a 0 if we are working with a new file.

begin:
	dputi(li, 1);
	dputs("\n", 1);

dputi

We also need a way to print numbers to the console. Unfortunately, it is not as simple as printing strings. But it is still not too difficult. You may remember this code from our SnakeQR game.

static void
dputi(int n, int fd)
{
	char num[5];
	int i = 0;

	do {
		num[i++] = n % 10 + '0';
	} while ((n /= 10) > 0);

	for (i--; i >= 0; i--)
		write(fd, &num[i], 1);
}

main, part 3

Now we are ready to write the main loop. Our main loop is going to look like this.

  1. Ask for the line number the user wants to work with
  2. Ask what command the user wants to execute on that line
  3. Perform the specified action on the specified line

Based on the list of functions we came up with earlier that we decided we absolutely had to implement, we have the following commands.

  • d: delete line
  • i: insert text
  • n: insert newline after specified line
  • p: print line
  • q: quit (line ignored)
  • s: save (line ignored)

So with this, let's write the beginning of the main loop.

get_command:
	dputs("line: ", 1);

	if ((li = dgeti(buf, sizeof(buf), 0)) > 1024) {
		dputs("?\n", 1);
		goto get_command;
	}

	dputs("command: ", 1);

	(void) dgets(buf, sizeof(buf), 0);

dgeti and dgets

It is not enough for us to print strings and numbers, we also need to be able to read them in. So we will need two more functions to accomplish that.

Let's start with dgets, which will let us read in a string. We will read in size-1 bytes from the file descriptor we specify into the buffer we specify. That way, if we use static buffers and sizeof(buffer) for the size, we will always be safe and never overrun our buffers. Even better, this way we will always be able to return valid strings, since we will NUL-terminate at the end of the read loop and be guaranteed that the NUL byte is within the buffer. This is, unsurprisingly, what the OpenBSD functions strlcat(3) and strlcpy(3) do. I've done my best to ensure this is a safe function, though I know that someone will come along and use it unsafely. That's the funny thing about security.

static int
dgets(char *s, int size, int fd)
{
	int i;

	for (i = 0; i < size - 1; i++) {
		if (read(fd, &s[i], 1) < 1)
			return 0;

		if (s[i] == '\n')
			break;
	}
	s[i] = '\0';

	return strlen(s);
}

Getting numbers also requires careful consideration; dgeti is the resulting function. We read in digits into the buffer, ensuring that they are digits, then do some basic arithmetic and return the result of our arithmetic. If we did not input all digits, we will return a number intentionally greater than 1024, which will signal to our main loop that an invalid line number was specified.

static int
dgeti(char *s, int size, int fd)
{
	int i, n = 0;

	for (i = 0; i < size - 1; i++) {
		if (read(fd, &s[i], 1) < 1)
			return 0;

		if (s[i] == '\n')
			break;

		if (s[i] < '0' || s[i] > '9')
			n = 1;
	}
	s[i] = '\0';

	if (s[0] == '\0')
		return 0;

	if (n == 1)
		return 1025;

	n = 0;
	for (i = 0; i < strlen(s); i++)
		n = (n * 10) + (s[i] - '0');

	return n;
}

main, part 4

Now that we have the line number we want to work on and the command we want to execute, the rest of our text editor is simply one big switch statement that contains the logic for each command.

	switch (buf[0]) {
	case 'd':
		for (i = 0; i < 128; i++)
			linecol[li - 1][i] = '\0';
		for (i = li; i < 1024; i++) {
			for (j = 0; j < 128; j++)
				linecol[i - 1][j] = linecol[i][j];
		}
		for (i = 0; i < 128; i++)
			linecol[1023][i] = '\0';
		break;
	case 'i':
		if (li == 0) {
			dputs("?\n", 1);
			break;
		}
		--li;
		if (linecol[li][0] != '\0' && linecol[li][0] != '\n') {
			dputs(linecol[li], 1);
			dputs("\n", 1);
		}
		if (dgets(line, sizeof(line) - 1, 0) == 0)
			break;
		for (i = 0; i < 128; i++)
			linecol[li][i] = '\0';
		for (i = 0; i < strlen(line); i++)
			linecol[li][i] = line[i];
		linecol[li][i] = '\n';
		for (i = 0; linecol[i][0] != '\0'; i++)
			;
		if (i < li) {
			for (j = 0; j < 128; j++)
				linecol[i][j] = linecol[li][j];
			for (j = 0; j < 128; j++)
				linecol[li][j] = '\0';
		}
		break;
	case 'n':
		if (linecol[1023][0] != '\0') {
			dputs("planck: error: cannot add line, already at limit\n", 2);
			break;
		}
		for (i = 1022; i > li - 1; i--) {
			if (linecol[i][0] != '\0') {
				for (j = 0; j < 128; j++)
					linecol[i + 1][j] = linecol[i][j];
				for (j = 0; j < 128; j++)
					linecol[i][j] = '\0';
			}
		}
		linecol[li][0] = '\n';
		break;
	case 'p':
		if (li == 0) {
			for (i = 0; linecol[i][0] != '\0' ; i++)
				dputs(linecol[i], 1);
		} else if (linecol[li - 1][0] == '\0') {
			dputs("?\n", 1);
		} else {
			dputs(linecol[li - 1], 1);
		}
		break;
	case 'q':
		goto done;
	case 's':
		if (save_name == 0) {
			dputs("File: ", 1);
			dgets(file, sizeof(file), 0);
			save_name = 1;
		}

		if ((fd = open(file, 0x0001 | 0x0200, 000644)) == -1) {
			dputs("planck: error: could not open ", 2);
			dputs(file, 2);
			dputs("\n", 2);
			break;
		}

		for (li = 0; li < 1024; li++) {
			if (linecol[li][0] == '\0')
				break;
			dputs(linecol[li], fd);
		}

		if (close(fd) == -1) {
			dputs("planck: error: could not close ", 2);
			dputs(file, 2);
			dputs("\n", 2);

			_exit(1);
		}
		break;
	default:
		dputs("?\n", 1);
	}

	goto get_command;

done:
	return 0;
}

Some notes on implementation:

  • When a line is deleted, we shift all lines below the deleted line up by one. Likewise, when a line is added, we push all lines below down (with a check to make sure we don't go past our array limits).
  • When we insert text, we print the line to be edited. Whatever the user types in will completely replace the line. So be careful! That's why we print the line out, so the user can make small tweaks like fix typos and have all the information needed.
  • If the user enters no text in insert mode, the line is not replaced. This has the side effect of not being able to make a blank line via insert mode. This is why we need the newline mode. I think this is an ok tradeoff, since that way asking for the wrong line and then going to insert mode doesn't force you to have to retype the whole line again.
  • We read in at most sizeof(buf)-1 characters in insert mode. This is so that lines can always end with a newline and a NUL character.
  • It takes two commands to create a new non-blank line of text: first, the n command on line n-1 to insert the new line as line n; second, the i command on line n. Line 0 can be used to insert a new line at the top of the file. If you want a blank line, the n command alone will suffice.
  • As a special feature, the p command on line 0 prints the entire file.
  • Saving a file is as simple as dputs each line to the file descriptor of our file, stopping once we reach the first line that begins with the NUL character, which means that we have run out of text.

And that's it. Our text editor is complete. For completeness, our final main function looks like this.

int
main(int argc, char *argv[])
{
	char linecol[1024][128];
	char file[1024], line[128];
	char buf[5], c;
	int co = 0, fd, i, j, li = 0, save_name = 0;

	if (argc > 2) {
		dputs("usage: ", 2);
		dputs(argv[0], 2);
		dputs(" [file]\n", 2);

		_exit(1);
	}

	for (i = 0; i < 1024; i++) {
		for (j = 0; j < 128; j++)
			linecol[i][j] = '\0';
	}

	if (argc == 2) {
		for (i = 0; i < strlen(argv[1]); i++)
			file[i] = argv[1][i];
		file[i] = '\0';

		save_name = 1;

		if ((fd = open(file, 0x0000, 0)) == -1) {
			dputs("planck: error: could not open ", 2);
			dputs(file, 2);
			dputs("\n", 2);

			goto begin;
		}

		co = 0;
		li = 0;
		while (read(fd, &c, 1) > 0) {
			linecol[li][co] = c;

			if (++co > 126) {
				dputs("plank: error: line ", 2);
				dputi(li, 2);
				dputs(" is longer than 127 characters\n", 2);
			}

			if (c == '\n') {
				if (++li > 1023) {
					dputs("plank: error: ", 2);
					dputs(argv[1], 2);
					dputs(" is greater than 1024 lines\n", 2);

					_exit(1);
				}
				co = 0;
			}
		}

		if (close(fd) == -1) {
			dputs("planck: error: could not close ", 2);
			dputs(file, 2);
			dputs("\n", 2);

			_exit(1);
		}
	}

begin:
	dputi(li, 1);
	dputs("\n", 1);

get_command:
	dputs("line: ", 1);

	if ((li = dgeti(buf, sizeof(buf), 0)) > 1024) {
		dputs("?\n", 1);
		goto get_command;
	}

	dputs("command: ", 1);

	(void) dgets(buf, sizeof(buf), 0);

	switch (buf[0]) {
	case 'd':
		for (i = 0; i < 128; i++)
			linecol[li - 1][i] = '\0';
		for (i = li; i < 1024; i++) {
			for (j = 0; j < 128; j++)
				linecol[i - 1][j] = linecol[i][j];
		}
		for (i = 0; i < 128; i++)
			linecol[1023][i] = '\0';
		break;
	case 'i':
		if (li == 0) {
			dputs("?\n", 1);
			break;
		}
		--li;
		if (linecol[li][0] != '\0' && linecol[li][0] != '\n') {
			dputs(linecol[li], 1);
			dputs("\n", 1);
		}
		if (dgets(line, sizeof(line) - 1, 0) == 0)
			break;
		for (i = 0; i < 128; i++)
			linecol[li][i] = '\0';
		for (i = 0; i < strlen(line); i++)
			linecol[li][i] = line[i];
		linecol[li][i] = '\n';
		for (i = 0; linecol[i][0] != '\0'; i++)
			;
		if (i < li) {
			for (j = 0; j < 128; j++)
				linecol[i][j] = linecol[li][j];
			for (j = 0; j < 128; j++)
				linecol[li][j] = '\0';
		}
		break;
	case 'n':
		if (linecol[1023][0] != '\0') {
			dputs("planck: error: cannot add line, already at limit\n", 2);
			break;
		}
		for (i = 1022; i > li - 1; i--) {
			if (linecol[i][0] != '\0') {
				for (j = 0; j < 128; j++)
					linecol[i + 1][j] = linecol[i][j];
				for (j = 0; j < 128; j++)
					linecol[i][j] = '\0';
			}
		}
		linecol[li][0] = '\n';
		break;
	case 'p':
		if (li == 0) {
			for (i = 0; linecol[i][0] != '\0' ; i++)
				dputs(linecol[i], 1);
		} else if (linecol[li - 1][0] == '\0') {
			dputs("?\n", 1);
		} else {
			dputs(linecol[li - 1], 1);
		}
		break;
	case 'q':
		goto done;
	case 's':
		if (save_name == 0) {
			dputs("File: ", 1);
			dgets(file, sizeof(file), 0);
			save_name = 1;
		}

		if ((fd = open(file, 0x0001 | 0x0200, 000644)) == -1) {
			dputs("planck: error: could not open ", 2);
			dputs(file, 2);
			dputs("\n", 2);
			break;
		}

		for (li = 0; li < 1024; li++) {
			if (linecol[li][0] == '\0')
				break;
			dputs(linecol[li], fd);
		}

		if (close(fd) == -1) {
			dputs("planck: error: could not close ", 2);
			dputs(file, 2);
			dputs("\n", 2);

			_exit(1);
		}
		break;
	default:
		dputs("?\n", 1);
	}

	goto get_command;

done:
	return 0;
}

Conclusion

So did we create a text editor? I'm honestly not sure. It is definitely a program that edits plain text, though it does not do so particularly well. And it intentionally omits some very basic functionality that may in fact be considered essential if we are to truly create a text editor: searching for text comes immediately to mind. So does replace. But in theory we could achieve that functionality with utilities like grep(1) and sed(1). So maybe not. I'm sure Hacker News will debate it (I don't have an account, and I don't read it, but I do know my blog posts are being shared there) and I am interested in what others think represents the absolute minimum for a text editor.

If we want to extend this into a more usable editor, I think the first thing we should do is implement dynamic memory and use a doubly linked list as our data structure for our text (see here and here).

Another thing is that our text editor is untested in practice; we have not written any new code with it. Fortunately, we can do the meta programming exercise of editing planck's source code in planck, and that does work. Next, we will write a new program using planck. And that will have to do as our stand-in for real-world usability (remember: planck should not be considered usable).

It continues to be fun to attempt these extreme minimalism challenges. So I'll probably keep doing it.

1 Learning ed is still a worthwhile goal today, and is still useful if you find yourself with a hosed machine. Michael W. Lucas literally wrote the book on ed. This book, and all his books, are worthy of your money. My blog is not, never has been, and never will be sponsored. I don't get, nor do I want, any compensation for this free advertising. Michael doesn't even know I'm writing this.

2 I wrote that version of cat with the text editor we are creating in this post. That will be the subject of a future blog post.

3 Yes, we could use mmap(2) and munmap(2), but we should focus on one or two things per blog post. We can always write more blog posts.


OpenBSD httpd


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK