Tutorial: A Simple HTTP Server

The very first use-case for NMFU was to simplify the protocol layer of an embedded HTTP server, and so that's what the first tutorial will cover.

The goal here is to have an API that we can feed received HTTP data into as it arrives, and get out an object containing information about the request to be serviced.

Defining the parser

We start by defining what output we want our parser to generate -- for now, let's just get the requested path.

out str[32] request_path;

Observe the explicit size here: NMFU doesn't do dynamic allocation, and so all fields must have a defined length -- the size here is the length of the underlying buffer, which by default is null-terminated.

Next, we tell NMFU what we want our parser to parse inside the parser block:

parser {
   "GET /";
   request_path += /[\/a-zA-Z0-9.\-_?=&+]+/;
   " HTTP/1."; /\d/;

   wait "\r\n\r\n";
}

So what is this doing? An NMFU parser at its core is a sequence of matches -- here we have three different types: the direct-match which matches an exact sequence of characters, the regex-match which matches a regular expression, and the wait-match which discards all non-matching input until its argument matches.

The first line of the parser, "GET /" matches those 5 characters in order. The next line is an append-statement, which takes whatever its argument matches and appends it to a given string output. The regex shown here matches a typical URL. Notice how we've placed the first "/" in the preceding match: all URLs should start with one, and so we can save a byte of RAM by not putting it into the string.

Then, we use a combination of direct and regex matches to math the last part of the first line of the HTTP request. We could have combined the two into a single regex, but doing this avoids having to escape the slash and dot.

Finally, we wait for the end of the request, signified by two empty lines (or, equivalently, two newlines with nothing between them).

So what can we do with this? Well, let's assume this full parser is in a file called http_server.nmfu, then we can compile it into C with

$ nmfu http_server.nmfu

which will generate two files, http_server.c and http_server.h in the same directory as http_server.nmfu containing our parser.

Using the parser

NMFU tries to keep its generated API as simple as possible. The entire parser state is contained within the http_server_state struct, which has a helper typedef defined as http_server_state_t.

Note

All definitions generated by NMFU are based on the output filename without extension (which can be customized with the -o command line option), defaulting to the input filename.

We initialize this state object with the http_server_start function, e.g.

int main() {
   http_server_state_t parser;

   http_server_start(&parser);
}

Then, we can provide input to the parser via the http_server_feed function, which takes two pointers, the start and (exclusive) end of the data to read. For example, if we were reading from stdin, it would look something like:

int count = 0;
char buf[32];

while ((count = read(STDIN_FILENO, buf, 32)) > 0) {
   http_server_feed(&parser, buf, buf + count);
}

This, however, is not complete. We need to deal with the return from _feed, which is an enum with three possible values. Either the parser encountered an error (such as out of space in a string or no match), the parser reached the end of its program, or the parser is waiting for more input. These correspond to the results HTTP_SERVER_FAIL, HTTP_SERVER_DONE or HTTP_SERVER_OK respectively.

Updating our example, we might have something like

   while ((count = read(STDIN_FILENO, buf, 32)) > 0) {
      switch (http_server_feed(&parser, buf, buf + count)) {
         case HTTP_SERVER_OK:
            continue;
         case HTTP_SERVER_FAIL:
            fprintf(stderr, "invalid input");
            return;
         case HTTP_SERVER_DONE:
            goto finished;
      }
   }

   if (count < 0) {
      perror("read error");
      return;
   }
finished:
   // do something with the output

Warning

Note that if you want to try this in a terminal you'll probably want to change the \r\n (which is what the HTTP RFC specifies) into just an \n to test.

Finally, we just need to extract the data from the parser. All the output variables are placed inside the c subobject in the parser state, so we can just use

finished:
   printf("got request for /%s\n", parser.c.request_path);

(using the / since we omitted it from the string)

Putting this all together, we might have something along the lines of

#include <stdio.h>
#include <unistd.h>
#include <http_server.h>

int main() {
   http_server_state_t parser;
   int count = 0; char buf[32];

   http_server_start(&parser);

   while ((count = read(STDIN_FILENO, buf, 32)) > 0) {
      switch (http_server_feed(&parser, buf, buf + count)) {
         case HTTP_SERVER_OK:
            continue;
         case HTTP_SERVER_FAIL:
            fprintf(stderr, "invalid input");
            return 2;
         case HTTP_SERVER_DONE:
            goto finished;
      }
   }

   if (count < 0) {
      perror("read error");
      return 1;
   }
finished:
   printf("got request for /%s\n", parser.c.request_path);
   return 0;
}

which should read an HTTP 1.x request off of stdin and print out the path being requested.

Simple conditionals: Handling request methods

Now, this server is only barely functional. Let's provide it with slightly more functionality by getting it to recognize different request methods.

First, we'll declare another output variable, this time using the enumeration syntax.

out enum{GET, POST, UNSUPPORTED} method;

Then, we'll replace the first part of our parser with

parser {
   case {
      "GET " -> {method = GET;}
      "POST " -> {method = POST;}
      else -> {method = UNSUPPORTED; wait " ";}
   }
   "/";

We've introduced a new statement, the case-statement. This statement basically tries to match all of the expressions given to it simultaneously, and whichever one successfully terminates first determines the next set of statements to execute. Multiple branches can be matching at the same time, however ambiguity as to which branch to execute is not allowed. For example,

case {
   "POST" -> {},
   "PUT" -> {}
}

will work fine, despite both matches starting with the same letter, but

case {
   /[pP]UT/ -> {},
   "PUT" -> {}
}

would not, since both conditions would match "PUT".

Note that the case statement also introduces us to the first way NMFU can deal with errors, with the else condition. If all of the conditions fail to match after a certain input character, control is immediately transferred to the body of the else condition. Specifically, if we gave GER to our parser, the first two letters GE would be consumed by the GET option, however the R would not match anything. Therefore, it gets "sent" to the body of the else condition, which in this case winds up being the wait match, which will then discard it as expected.

Regardless, our parser should now be capable of differentiating between different request methods, and even give useful error information if it gets one that it doesn't recognize (perhaps to generate a 405 Method Not Allowed response).

The enumeration we defined will be exposed to C as the enum http_server_method, with another helper typedef http_server_method_t, with values HTTP_SERVER_METHOD_GET, HTTP_SERVER_METHOD_POST, etc.

We can access it from the state object with

printf("got request for /%s\n", parser.c.request_path);
switch (parser.c.method) {
   case HTTP_SERVER_METHOD_GET:
      puts("with a get request");
      break;
   case HTTP_SERVER_METHOD_POST:
      puts("with a post request");
      break;
   default:
      puts("with an unknown method");
      break;
}

Error handling: the `try` statement

Let's go back to how we read the request path. What if we wanted our server to offer up a useful error message if the request path was too long? (since there is a defined status code for this, 414 URI Too Long)

We can use the try-catch functionality of NMFU to accomplish this. Let's add an output flag to indicate if this happens with

out bool uri_too_long = false;

Note

In a full server, this would probably be a "status" enumeration as opposed to individual flags, but for now this will work.

Then, we can replace the line reading request_path with

try {
   request_path += /[\/a-zA-Z0-9.\-_?=&+]+/;
}
catch (outofspace) {
   uri_too_long = true;
   wait "\r\n\r\n";
   finish;
}

The try block here adds an error handler in much the same way else does for the case statement. While there is a nomatch error which functions identically to the case statement, here we're using the outofspace error, which will fire when trying to append a character to a full output. Here, we just set the uri_too_long output to true, wait for the end of the request, and then terminate the parser early with the finish; statement.

We could read this from C with

if (parser.c.uri_too_long) {
   // send a 414
}

This concludes the first part of the HTTP tutorial, the second part covers handling headers and numbers with loops.