Articles

Comparing I/O in C with Java

Recently, I was having a somewhat heated discussion with a friend about the Java I/O library (specificially java.io.*).  His position was that the library is unnecessarily cluttered and verbose, and that  I/O in C is much simpler and more productive.  Whilst I agreed with some of that, I also argued that the Java I/O library is powerful and more flexible than C.  Here are some of the main points we covered:

  1. Abstracting stream encodings away from stream processing.  The ability for objects in Java to delegate to an InputStream or OutputStream provides a very nice way to decouple the encoding of information from the processing of it.  An interesting example in the Whiley compiler is that of name mangling.  That is, encoding the Whiley type of a function into its name.  To generate the mangling, I have an OutputStream implementation which takes a Whiley type and serialises it into binary data.  Then, a second implementation of OutputStream takes an arbitrary stream of binary data and encodes it into (roughly speaking) 7-bit ASCII — in other words, it turns the serialised data into a form that can safely be used as part of a method name according to the JVM Spec.  And, of course, I have the mirror of this for reading the type out of a mangling.
  2. Stateful encoders/decoders are important.  One idea we discussed was that encoding could be handled if: (1) Java had support for lambdas; and (2) classes like E.g. FileReader accepted a decoding (lambda) function.  This would cover a large number of use cases, and conceptually simplify the I/O framework.  However, we would need our decoding functions to have state to be useful (and I don’t believe that JSR 335 supports this).  State is necessary typically in situations where we can — or want — to read chunks larger than necessary.  For example, when performing some kind of buffering we read a large chunk and cache it for later.  Other examples of stateful stream components include unusual things, such as providing the ability to insert logging into the pipeline.

At this point, things were all making sense to me.  Having worked with C for many years (albeit some time ago now), I’m fairly familiar with how I/O is handled in C.  In particular, I/O mostly goes through the FILE* structure (although you can read/write to file descriptors directly if you like).  You have to rely on functions like fopen() and popen() to create FILE* instances because we don’t know the internal layout of FILE* and, hence, cannot construct our own instances.  So, my first attack on this structure was to say something like:

“How do you create a FILE* instance from a memory buffer?

In Java, this is relatively easy since we can create e.g. a ByteArrayInputStream and pass that to anything accepting an InputStream. Well, it turns out you can do this in C! I had never heard of it before, but there is a fmemopen() function for exactly this use case.

Undeterred, I countered with:

“How do you create a FILE* instance which automagically encodes/decodes into a user-defined format (e.g. as per my name mangling example above)?”

Again, this is relatively easy to do in Java by providing your own implementation of e.g. InputStream. At this point, there was a long pause (in fact, overnight) in the discussion.  The next day, my friend comes back and says: “ah, you obviously haven’t heard of the fopencookie() function then!”. Nope, I hadn’t.

This is an excerpt from the manpage on fopencookie():

The  fopencookie()  function allows the programmer to 
create a custom implementation for a standard I/O stream.  
This implementation can store the stream's data at a 
location of its own choosing; for example, fopencookie() is 
used to implement fmemopen(3), which provides a stream 
interface to data that is stored in a buffer in memory.

In order to create a custom stream the programmer must:

* Implement four "hook" functions that are used internally
   by the standard I/O library when performing I/O on the 
   stream.

* Define a "cookie" data type, a structure that provides
   bookkeeping information (e.g., where to store data) 
   used by the aforementioned hook functions.  The 
   standard I/O package knows nothing about  the 
   contents of this cookie (thus it is typed as void * when 
   passed to fopencookie()), but automatically supplies 
   the cookie as the first argument when calling the hook 
   functions.

* Call fopencookie() to open a new stream and associate 
   the cookie and hook functions with that stream.

...

Well, I guess you learn a new thing every day …

11 comments to Comparing I/O in C with Java

  • pjmlp

    Except you’re missing the fact that fmemopen() and fopencookie() are not portable, while the Java code will run everywhere.

  • Hi Pjmlp,

    True, but I guess that’s more about the languages themselves rather than their I/O libraries …

  • Shannen

    Correction: Java will run everywhere with enough resources for a JVM.

  • Nenad

    @Shannen
    Nowadays that’s just about anywhere ;) .

    As for I/O comparison between Java and C, you should take a look at Java NIO/NIO2. Almost sure you can achieve stateful encoders/decoders as you described with NIO.

  • Yes, Java NIO/NIO2 has much performance improvement, that could be have a new point to compare C I/O and NIO/NIO2.

  • Java is good but...

    @Nenad

    About >90% microprocessors of today aren’t able to run a JVM, even a simplified one. If you wan’t to find one just disassemble your keyboard and you’ll find a very simple CPU that has most probably been programmed in C.

    You can also brag about Nio/2 but remember it’s mostly an interface to epoll C API.

  • Nenad

    @Java is good but…
    If I disassemble my keyboard I’ll probably find microcontroller, not a microprocessor. That’s whole different area of programming. In that area assembler and C are kings. As for embedded systems, take a look at your phone, tablet, TV. Most likely you’ll find microprocessor capable of running JVM of some sort.

    Blog is about differences between I/O libraries in C and Java, and points out nicely that C has some nice functions too. I can brag about NIO/2 because its a fine library. At the end, both NIO and IO libs will have to call OS to execute code. Do you know any major OS that’s not written in C/C++?

  • Infernoz

    This is comparing an archaic unsafe structured language, with minimal native library support, with a modern very fleshed out OOP language with vast library support, which frankly blows C away for safety, features and productivity.

    When I first saw the flawed OOP concepts of C++ at university, after C, it was an epiphany for me after the clumsiness of C, especially when I discovered STL! Java later made C/C++ look complicated because it does not need any of the conditional complication, incestuous and conflicting defines, over complicated statement modifiers, and silly header files that C and C++ have; this was quite horrible when I later had to revisit C/C++! Java soft-linking really makes C/C++ look primitive, especially when combined with Maven for builds, rather than make.

    The OOP elements of Java like interfaces and base classes, and stable data types make coding so much easier for binary and especially Unicode character stream coding, this alone makes this discussion about brittle _static_ hook functions for obscure C functions quite laughable. Static conceits poison the C lib and makes it a joke for many 21st century programs.

    The java.io.*, java.util.*, and java.text.* packages have encapsulated, replaced and extended the whole idea of separate piped *NIX filter program processes; these are no longer constrained by the backwards static state and static hook nonsense in C. These extendable building blocks allow for much faster and more flexible coding, and can easily support multi-processing without the need for heavyweight processes.

    Re: About >90% microprocessors of today aren’t able to run a JVM, even a simplified one:
    BS, primitive embedded hardware is irrelevant, and even that often uses C++, rather than C. Many portable devices run Android now, and Android runs a variant of Java, including OpenJava libraries, for most of its applications. I’d bet the Objective-C support for I/O in Apple iOS also makes a mockery of C I/O.

    Major OS kernels are written in C++, not C, specifically because the C has no sensible multi-processing support; Kernels and devices drivers are the rare places where it can makes sense spending the significant extra time writing and debugging in languages like C/C++; however Microsoft Update Tuesday demonstrates just how unsafe the C/C++ can be compared to VM based OOP languages!

  • garren

    Major OS kernels are written in C++, not C, specifically because the C has no sensible multi-processing support;

    Windows continues to be primarily C/C++(i.e., C with namespaces) as I recall. Linux/*BSD/OSX are all C. I can’t think of a single ‘Major OS’ kernel that can be said the be entirely or even mostly C++. Please correct me if I’m wrong, but all of these unarguably kernels seem to be doing just fine in a language that had no ‘sensible multiprocessing’ support. What ‘major kernels’ are you referring to?

    Microsoft Update Tuesday demonstrates just how unsafe the C/C++ can be compared to VM based OOP languages!

    If recent Oracle JVM issues are any indication, this comparison is becoming less and less true, isn’t it? I won’t argue the safety of VM based languages in theory, but in practice and given enough time, they seem to lose a little bit of that safety. Granted, I’m referring to the VM implementation and not the language.

  • Warren P.

    The article is great. That random detail about the fopencookie slays me.

    Sadly I feel the comments are off in the woods.

    Warren

  • Glad you liked it Warren!

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>