1. Short answer

    Serialization is a process that transforms data structures into a sequence of bytes. It can be an expensive operation because your program needs to collect values that are stored internally as references and transform every data type into a common encoding.

    Longer answer

    If you’ve been learning about what happens when you sending over the network or writing to disk, you will have probably encountered the term “serialization”. Huh? To reduce confustion (perhaps?) other people have introduced other terms, like “marshalling”.

    The best place to figure out what “serialization” means is the word itself. “in serial” means one after another. This constrasts with terms such as “in tandem” or “in parallel”. So this implies that when writing objects to disk, we need to put bytes one after another.

    But, you may ask, aren’t bytes in memory stored one after another anyway? Yes, and no. In a sense, this is true. RAM is shaped the same way disks are. Memory addresses start at 0 and go up to 264 - 1, and are a sequence. But in practice, data that is stored in memory isn’t able to

    There are a few reasons why objects are stored in a non-serial fashion. This post covers three of them:

    • live data contains references, serialized data does not
    • human-readable encodings are not machine-friendly
    • programming languages often store more than just the values that they’re representing, but the serialized forms only need the values

    Live data contains references, serialized data does not

    Whenever you encounter a data structure that can grow, such as a list, you will be implicitly dealing with something that manages references. They’re needed because when data structures grow, they sometimes need to be moved around in memory.

    One a list has been written to disk, it’s stuck. That means that the first thing that will be written in full is the most deeply nested object. If you consider the following JSON object, we can’t finish writing b to disk without first writing b1 and b2:

    {
       "a": [1.0, 1.1, 1.2],
       "b": {
           "b1":  [10.0, 11.0, 12.0],
           "b2":  [20.0, 21.0, 22.0],
        }
    }
    

    Sidebar: some serialization formats allow references within the document that they’re writing. Notably XML and YAML. They’re rarely used, but available.

    In a sense, live data structures operate in tandem. Each list occupies its own portion of the address space and each can be modified without interfering with the others.

    Human-readable encodings are not machine-friendly

    Every data type has its own mapping between being a sequence of zeros and ones to the values that are being represented. Human-readable formats have a different mapping. Translating between two systems takes engineering effort.

    If you are interested in more details about this translation process, here are some more details.

    The integer 42 is two bytes long when represented as two numerals in the UTF-8 encoding. UTF-8 is what JSON uses, so there’s a good chance that it’s accurate in your case. But your CPU will probably use 8 bytes (64 bits) to represent that number. They differ because the CPU wants use the same amount of space for every integer. So, while 42 in base 2 is 101000, your computer adds 58 leading zeros.

    Programming languages often store more than the values they’re represening

    Consider the number 42 stored as variable answer. Here’s an example from a hypothetical programming langauge:

    fn meaning_of_life() {
        var answer = 42;
        return answer;
    }
    

    answer is an integer, and so should be easy to store compactly. But perhaps there might be a rule in the language that says that when answer leaves the scope, it should be deleted — unless there is a live reference to answer. To enable this rule, the programming language needs to keep track of references to answer.

    So, instead of storing just 42, the computer might be storing something like this:

    structure Variable {
        value: int,
        number_of_references: int,
    }
    

    In memory, the Variable structure needs space for two integers, even though one is actually used for the thing we care about.

    It gets worse. Most programming languages store more than integers. So, they might need much more internal machinery. Here’s is a fuller example of what a dynamic language might need to do:

    structure Variable {
        value: Value,
        type: DataType,
        number_of_references: int,
    }
    
    structure Value {
        address_in_memory: int,
        length_in_bytes: int,
    }
    
    enumeration DataType {
        Integer, Float, String
    }
    

    Few of these details are necessary in a serialized form. We just care about 42.

    What this means for you

    Computer programmers love to introduce jargon. Don’t be afraid to ask people what terms mean — they’re normally metaphors that were useful at the time.

  2. tl;dr Can’t be done directly. You have two options: a) mock async I/O with threads, or b) redirect STDIN, STDOUT & STDERR to other file handles that support overlapping (aka non-blocking/async) I/O, such as named pipes.

    Background

    tokio-rs 0.1 is out! Yay! This is great news for networking, I wonder what life is like for file I/O?

    Its getting started examples use an echo server, but I really wanted to learn how to create an efficient worker to fit with th Hadoop Streaming API (among other use cases). That means reading from STDIN and writing to STDOUT. It turns out, tokio doesn’t have support for non-blocking I/O for stdio.

    Turns out, others have looked into this. As it happens, async I/O and upstream progress has stalled pending more research into how IO Completion Ports work. It seems that Windows being different makes life difficult.

    This is going to require more research into how STDIN & co works w/ IOCP. I will tentatively assign this to the 1.0 milestone, but will potentially have to punt if it is tricky.

    — carllerche, Dec 2015

    Does it look possible?

    A fairly large number of projects don’t think so. Here is a quote from 2008 that is a fairly telling portent:

    Development of the library Boost.Process stopped two years ago. One of the biggest outstanding issues is adding support for asynchronous I/O to stdin/stdout/stderr.

    — Boris, asio C++ mailing list

    Let’s work our way through the MSDN documentation to figure the situation out. To start, let’s clear up a few terms os that we all know what we’re talking about.

    What is an IOCP?

    There are some significant differences between the UNIXish multiverse and Windows family when it comes to networking I/O. As well as differing APIs, there is also differing terminology.

    Unlike calling select or poll on a single file descriptor, Windows offers you the ability to wrap a file handle in an IO completion port (IOCP). The file descriptor and the completion port are independent, but linked. The port takes care of dealing with the file itself.

    Its proponents believe (with good reason) that the completion port model is a good one for supporting interleaved reads and writes acrosss multiple threads without blocking.

    Some notes on Windows terminology differences:

    • Windows uses the term “overlapped I/O” where most UNIX-esque programmers would use the term “non-blocking I/O”.
    • STDIN, STDOUT and STDERR are sometimes referred to as CONIN$, CONOUT$ and CONERR$ within Windows documentation

    With all of this in mind, creating an IOCP looks like this under the covers:

    HANDLE WINAPI CreateIoCompletionPort(  
      _In_     HANDLE    FileHandle,
      _In_opt_ HANDLE    ExistingCompletionPort,
      _In_     ULONG_PTR CompletionKey,
      _In_     DWORD     NumberOfConcurrentThreads
    );
    

    The important parameter is FileHandle, an object created by CreateFile. That handle must support overlapped I/O. Here is the relevant extract of the creating an CreateIoCompletionPort reference:

    The handle passed in the FileHandle parameter can be any handle that supports overlapped I/O. Most commonly, this is a handle opened by the CreateFile function using the FILE_FLAG_OVERLAPPED flag (for example, files, mail slots, and pipes). Objects created by other functions such as socket can also be associated with an I/O completion port. For an example using sockets, see AcceptEx. A handle can be associated with only one I/O completion port, and after the association is made, the handle remains associated with that I/O completion port until it is closed.

    — “CreateIoCompletionPort function” MSDN

    This raises an important question, do the file handles for CONIN$, CONOUT$ & CONERR$ support FILE_FLAG_OVERLAPPED? We need to look to the documentation for CreateFile to see.

    After some browsing, one comes across the section on async I/O describing how to provide the flag. We provide it within the dwFlagsAndAttributes parameter.

    Synchronous and Asynchronous I/O Handles

    CreateFile provides for creating a file or device handle that is either synchronous or asynchronous. A synchronous handle behaves such that I/O function calls using that handle are blocked until they complete, while an asynchronous file handle makes it possible for the system to return immediately from I/O function calls, whether they completed the I/O operation or not. As stated previously, this synchronous versus asynchronous behavior is determined by specifying FILE_FLAG_OVERLAPPED within the dwFlagsAndAttributes parameter. There are several complexities and potential pitfalls when using asynchronous I/O; for more information, see Synchronous and Asynchronous I/O.

    This gets us closer, but we still don’t yet know. When you read the Consoles section of the same article, you discover the documenation explicitly states that the parameter is ignored.

    Consoles

    The CreateFile function can create a handle to console input (CONIN$). If the process has an open handle to it as a result of inheritance or duplication, it can also create a handle to the active screen buffer (CONOUT$).

    dwFlagsAndAttributes ignored

    So after all of that we discover that no, it’s not possible.

    Maybe I should have read that original post in a little more detail before hunting through all of the documentation myself:

    > If you look at the MSDN docs for CreateFile then you will see, under the
    > heading Consoles, that CreateFile ignores file flags when creating a
    > handle to a console buffer. I doubt that there is any way to do genuine
    > asynchronous io to a console buffer.

    — Roger Austin, , asio C++ mailing list

    Other Approaches

    Clearly, many projects face similar issues. They want to write to STDOUT as fast as possible, without blocking the mail thread. What have they done to create non-blocking servers that access these blocking APIs?

    There are two main options:

    • use threads
    • redirect STDOUT/etc to another file handle such as a named pipe and perform async I/O on that

    Threading

    In an article entitled “Asynchronous I/O in Windows for Unix Programmers”, Ryan Dahl (creator of node.js), provides a very good discussion of IOCP that includes file I/O, rather than just network I/O. His suggested approach for Console applications is to spawn threads that wait for events that then communicate with the main thread.

    Console/TTY

    It is (usually?) possible to poll a Unix TTY file descriptor for readability or writablity just like a TCP socket—this is very helpful and nice. In Windows the situation is worse, not only is it a completely different API but there are not overlapped versions to read and write to the TTY. Polling for readability can be accomplished by waiting in another thread with RegisterWaitForSingleObject().

    emphasis added

    This approach is taken by FastCGI within libfcgi/os_win32.c. STDIN is mocked out, but STDOUT is kept synchronous. The StdinThread function loops in a thread until shutdown:

    /*
    
    *--------------------------------------------------------------
     *
     * StdinThread--
     *
     *    This thread performs I/O on stadard input.  It is needed
     *      because you can't guarantee that all applications will
     *      create standard input with sufficient access to perform
     *      asynchronous I/O.  Since we don't want to block the app
     *      reading from stdin we make it look like it's using I/O 
     *      completion ports to perform async I/O.
     *
     * Results:
     *    Data is read from stdin and posted to the io completion
     *      port.
     *
     * Side effects:
     *    None.
     *
     *--------------------------------------------------------------
     */
    static void StdinThread(LPDWORD startup){
    
        int doIo = TRUE;
        int fd;
        int bytesRead;
        POVERLAPPED_REQUEST pOv;
    
        while(doIo) {
            /*
             * Block until a request to read from stdin comes in or a
             * request to terminate the thread arrives (fd = -1).
             */
            if (!GetQueuedCompletionStatus(hStdinCompPort, &bytesRead, &fd,
            (LPOVERLAPPED *)&pOv, (DWORD)-1) && !pOv) {
                doIo = 0;
                break;
            }
    
        ASSERT((fd == STDIN_FILENO) || (fd == -1));
            if(fd == -1) {
                doIo = 0;
                break;
            }
            ASSERT(pOv->clientData1 != NULL);
    
            if(ReadFile(stdioHandles[STDIN_FILENO], pOv->clientData1, bytesRead,
                        &bytesRead, NULL)) {
                PostQueuedCompletionStatus(hIoCompPort, bytesRead, 
                                           STDIN_FILENO, (LPOVERLAPPED)pOv);
            } else {
                doIo = 0;
                break;
            }
        }
    
        ExitThread(0);
    }
    

    Redirect to another file handle, such as a named pipe

    A very old version of Twisted seems to have an implement this approach. (From glancing at the current code, it looks like Twisted has moved back to threads :/ )

    There are bound to be more examples around of using a proxy handle around though, as it seems like quite a nifty approach. The relevant MSDN article is “Creating a Child Process with Redirected Input and Output

    The important takeaways seem to be:

    • set up SECURITY_ATTRIBUTES correctly
    • make sure your named pipes have unique names
    • make sure that you are reading and writing correct ends of the pipe from each process

    An extract of old Twisted code demonstrating how to proceed looks like this:

    # Counter for uniquely identifying pipes
    counter = itertools.count(1)
    
    class Process(object):  
      ...
      def __init__(...):
        ...
    
        # Set the bInheritHandle flag so pipe handles are inherited. 
        saAttr = win32security.SECURITY_ATTRIBUTES()
        saAttr.bInheritHandle = 1
    
        # in duplex mode so we can read from it too in order to detect when
        # Create a pipe for the child process's STDIN. This one is opened
        # the child closes their end of the pipe.
        self.stdinPipeName = r"\\.\pipe\twisted-iocp-stdin-%d-%d-%d" % (self.pid, counter.next(), time.time())
        self.hChildStdinWr = win32pipe.CreateNamedPipe(
                self.stdinPipeName,
                win32con.PIPE_ACCESS_DUPLEX | win32con.FILE_FLAG_OVERLAPPED, # open mode
                win32con.PIPE_TYPE_BYTE, # pipe mode
                1, # max instances
                self.pipeBufferSize, # out buffer size
                self.pipeBufferSize, # in buffer size
                0, # timeout 
                saAttr)
    
        self.hChildStdinRd = win32file.CreateFile(
                self.stdinPipeName,
                win32con.GENERIC_READ,
                win32con.FILE_SHARE_READ|win32con.FILE_SHARE_WRITE,
                saAttr,
                win32con.OPEN_EXISTING,
                win32con.FILE_FLAG_OVERLAPPED,
                0);
    
        # Duplicate the write handle to the pipe so it is not inherited.
        self.hChildStdinWrDup = win32api.DuplicateHandle(
                currentPid, self.hChildStdinWr, 
                currentPid, 0, 
                0,
                win32con.DUPLICATE_SAME_ACCESS)
        win32api.CloseHandle(self.hChildStdinWr)
        self.hChildStdinWr = self.hChildStdinWrDup
    

    Which approach to take?

    There are others who are significantly more experienced in this area than I. The conventional approach certainly seems threads, but using redirection does appeal to me for some reason. As it nears midnight, my sugestion to the Tokio team and others would be to go with the approach that’s easiest to maintain unless benchmarks prove compelling.

Tim McNamara