Articles

On the Internet and Object-Oriented Programming

The rise of the internet over the last, say, two decades has been pretty unstoppable (we all know that). But, is it now affecting the prominence of object-oriented programming? I’m going to try and argue in this post that: yes, it is. That is, at least for “classical” object-oriented languages (i.e. not JavaScript).

What is the ‘Net effect?

The starting point of this discussion is to think about how the internet affects programming.  It’s not easy to reduce this to some simple ideas, but, still, I think there are useful observations to make…

The internet has become a dominant feature of most programs. This is not saying much.  The fact is, most people today are writing programs that in some way interact with the internet.  There are so many kinds of programs which do this.  Of course, web applications.  Certainly, mobile apps.  Yes, tools for scraping “big” data of the web.  Don’t forget the Internet of Things devices on WiFI which (yuk) like to punch holes via UPnP. There are so many others.  And, yes, there are still some applications being developed which don’t do any networking.  Embedded systems is one area we see this, and in standard Operating System utilities, and more.

Yes, Dave, the internet is big.  So what?  We know the internet is all about moving data — bits and bytes.  Many exciting data formats have arisen for exactly this, like XML, JSON, Protocol Buffers and even, at a stretch, (yuk) Java Serialization (to name just a tiny few).  It’s all about moving data.  That’s what modern programs spend a lot of time doing.  Reading from a database.  Writing HTTP to a socket.  Waiting for JSON from a webpage.  Data comes in, it gets mashed up and spat out again.  And, in that simple process, a lot of amazing stuff happens!

Right, at this point, nothing is controversial.  It’s all pretty obvious.  So, what’s it got to do with object oriented programming?

Big Little Data

Object-oriented programming is supposedly all about data (for now, let’s stick with SmallTalk, C++, Java, C# as our definition of OOP).   Objects encapsulate data, right?  There’s even a problem — the expression problem — which suggests OOP is to data as functional programming is to functions.  Apparently, OOP is about State, Behaviour and Identity.  We want to hide our data in objects to prevent people seeing it and in case we want a different implementation. Polymorphism or Dynamic Dispatch is a powerful mechanism here.  That is, you don’t know anything much about the object you’re dealing with. Protection modifiers (public, private, etc) are somewhat less good at this (think: need ownership), but still OK.  Yup, reflection breaks this (but that’s not surprising).  But, overall,  it works pretty well.  For example, collection libraries.  We want an abstract List interface that can be implemented by different algorithms (ArrayList, LinkedList, CopyOnWriteArrayList, etc). Great.  That’s not the problem.

But.  Most programs don’t usually involve writing collection libraries. We use ArrayList and HashMap and, probably, that’s about it.  So, what do programs do a lot of?  Moving data over the wire! And the thing about the wire is that, by definition, it’s not encapsulated.  The concept of encapsulation is backward here because data must be completely visible (in some sense).  Furthermore, it doesn’t make sense to think about “different implementations” of our data objects.  We know exactly what data fields our objects need because that is determined by the wire protocol.

That’s it.  OOP is all about hiding your data and leaving your options open.  The wire is all about giving away your data in a well defined format.  These two things are somehow in opposition.  What’s the net effect of this? Friction, of course.  The more we use the network, the more of a pain OOP will become…

13 comments to On the Internet and Object-Oriented Programming

  • Hi Dave. Interesting post… you may be onto something here, although forgive me for quibbling a bit.

    Firstly, standard bugbear: the view of objects that you articulate — data abstraction — is not really what most closely defines O-O, at least not in my view and it’s not just me (if you haven’t already, see William Cook’s “On Understanding Data Abstraction, Revisited”, and Jonathan Aldrich’s “The Power of Interoperability”). Rather than merely hiding representations, it’s more about making them irrelevant, by giving complete primacy to behaviour.

    It’s true that we transfer state over the wire a lot nowadays, but we’re not actually programming by manipulating that wire representation very much of the time — or at least, I hope not! This definitely happens more than it should, but that’s by-the-by (you can use strings for everything in any language… of course there are real practical problems leading people to work this way).

    I actually think objects are a really great abstraction for distributed applications that are “modelling” a “state of the world” — which most web applications are. There is a pretty clear mapping from Fielding’s “resources” (the REST stuff) onto object-oriented concepts, and Smalltalk-style “messaging” is pretty easy to see too. The fact that current web technology doesn’t harness this, and forces us to dig down into representations all the time is something that holds us back.

    Consider for example that in Smalltalk a message send generally has a fixed format (the argument list) but you wouldn’t say that obviates the need for the object abstraction itself. Also, many wire representations are extensible anyway. You can argue that we don’t make much use of that extensibility… which, I’d argue, is because most web applications are so simple, conceptually, that they don’t need it. It’s in the same way that you can get away with procedural coding quite comfortably up to a certain complexity of task. The web makes simple things complex, to the extent that actually complex applications are rare.

    Making state transfer work in a simple yet powerful way in distributed applications is actually, I’d argue, a good target for better meta-level abstractions — which have mostly (but not entirely) been pioneered by O-O languages… more on that another day.

  • Hey Stephen,

    although forgive me for quibbling a bit.

    All good!

    is not really what most closely defines O-O

    Well, I certainly agree that it’s not the only thing that defines O-O. Though I think it is the most important aspect, despite what Bill Cook says.

    Rather than merely hiding representations, it’s more about making them irrelevant

    I’m OK with that view — it’s just a more extreme position that I took in the post.

    I actually think objects are a really great abstraction for distributed applications that are “modelling” a “state of the world” — which most web applications are.

    So I definitely don’t agree with that. In a web application, objects are normally proxies for database rows. The format of the database dictates everything and can certainly outlive the program (i.e. behaviour) itself. Using objects as proxies is not a big drama here, but it does create friction.

    Consider for example that in Smalltalk a message send generally has a fixed format (the argument list)

    No, I don’t think this is a good analogy. There are big differences. The program on the receiving end can, for example, be written in a completely different language or, indeed, in many different languages. There is no concept of polymorphism in this context. There is no concept of information hiding (because everything is exposed on the wire). And, e.g. even the notion of “object” cannot be assumed to exist for the receiver (e.g. i.e. the receiver could be just loading it into a struct, or a string or printing it to the console).

    The key is that the protocol itself is more important than either the sender or receiver. The protocol is agreed beforehand and oftentimes is fixed in stone by others. There is no point in making this “representation” irrelevant because there are usually no other alternatives. (OK, that is not completely true because we do sometimes abstract over the transport protocol itself).

    Anyhow, I guess that’s roughly my thoughts 🙂

  • Hmm. Clearly we are not understanding each other… I don’t really get your point. Just because we’re programming over a network, why shouldn’t all the usual stuff about interfaces apply? I think it does. And if so, why is the O-O style suddenly not a good way to do your programming? To answer your points as I understand them….

    “Everything is exposed” — just to say it’s on the wire doesn’t mean it’s “exposed” in an information-hiding sense. It might be opaque to the programmer. More below on this.

    “[Protocols are] agreed beforehand… fixed in stone by others” — how is this different to programming against APIs? They’re only set in stone as much as anything in software ever is (i.e. not much). Again, APIs are like that.

    “Multiple languages” — there can always be another language on the other end of whatever interface you’re consuming. It doesn’t say anything about how my (local) half of the interaction is best expressible.

    “No polymorphism” — of what? The heterogeneity inherent in network applications means there can hardly *not* be polymorphism.

    “There’s no alternative representation” — this is confusing the messages (which might well have a fixed wire format) with the implementation of what lies behind the interface (which admits a huge space of implementations).

    “Proxying a row in a database” — if you’re doing that, you’re not following an O-O style. Of course you *can* do it, in the same way you can program procedurally even in an O-O (or functional) language.

    I think the key thing is that is that I see O-O concepts as having meaning at the design level, not just the language level. That’s why the “language on the other end” is such a red herring. Yes, there are fixed protocols everywhere, but there are fixed protocols in Smalltalk applications. There *is* information hiding, in that a given web service (say) can decide what to include in its messaging interface (what GETs and POSTs it understands, say) and what to keep private. As for polymorphism — like any heterogeneous, interoperable system, the web *is* polymorphic by nature (your “many languages” point is testament to that). Polymorphism in my book means multiple implementations of the same abstraction(s), interacting either with the same same client code in different programs (ADT-style polymorphism) or in the same program (object-style polymorphism). The web obviously exhibits the latter.

    You might object that the architecture of the web makes a point of shipping around representations (that’s one of the big ideas of REST, after all). This means avoiding basing state at specific sites, to encourage cacheability and scalability etc.. But there’s no contradiction, because the meaning of the shipped-around bytes is allowed to be opaque… replication is primarily for availability, and is done such that the opaque stuff eventually gets posted back to a server (some instance of some resource) that can act on it. It is even common to ship around behaviour, a.k.a. code (JavaScript) to remotely interpret the opaque state… that is very O-O, conceptually. Exactly how much shipping around goes on is necessarily an application-specific concern, because it affects consistency properties… there are no fast rules about how much opacity or representation transfer to have, and certainly no rule that “everything is exposed”. (Of course “exposed on the wire” and “exposed via a programmatic interface” are different. If you can wire-snoop at the appropriate place, any plaintext in any computer system is “exposed”.)

    Maybe it’s the distinction between “the web as it *must* be” (the inherent nature of distributed applications) and “the web as it currently is” (a big mess of popular technologies and practices). I’m mostly talking about the former and it seems you’re talking about the latter. I’m the first to agree that the experience of programming networked applications is a mess, in any mainstream language. What I don’t want to leave unchallenged is the idea that O-O is inherently a bad fit for web-like applications. It’s not.

  • Hey Stephen,

    I’ll keep it short as currently in the ECOOP keynote this morning 🙂

    just to say it’s on the wire doesn’t mean it’s “exposed” in an information-hiding sense

    I’m saying it does. It means whatever your communicating with knows exactly what’s in your object, and may depend on that. We can’t assume both ends are written in the same language, or by the same people, or are part of the same project. Sometimes those things are true and we do have a little more flexibility.

    [Protocols are] agreed beforehand… fixed in stone by others” — how is this different to programming against APIs?

    Well, it’s pretty similar in fact. Once you’ve released a public API … well everyone knows the contents of that API (i.e. the functions which make it up), and arbitrarily changing that API can have bad consequences.

    this is confusing the messages (which might well have a fixed wire format) with the implementation of what lies behind the interface

    So, I don’t think we’re at odds actually. You’re focusing on an abstraction of the wire protocol which the programmer is interacting with. That’s fine, and I’m not saying such an abstraction won’t exist, or is not useful. I’m just saying that the contents of the underlying data are fully exposed by the wire.

    Consider the analogy with a database. You might abstract over different databases (which in some sense JDBC does). But, ultimately, the schema will be the same (or close to) across them. Therefore, in your program, you will have objects which proxy the rows in this schema. You may have some abstraction on top of this. I’m just arguing that those proxy objects (if you like) don’t benefit from OOP. Furthermore, that in most cases, there isn’t much of an abstraction layer. Therefore, a language which aggressively promotes abstraction (such as most OOP languages do) will introduce friction that some will want to avoid (by moving to other languages). Something like that anyway.

  • “Proxying a row in a database” — if you’re doing that, you’re not following an O-O style.

    Well, most web applications do exactly this. That’s why, for example, Rails and SQLAlchemy are so popular.

    To be honest, the only sensible thing to do is to write objects which exactly match the rows in your database. Otherwise, your application will be crippled by not having accessed some information which exists. The database itself is far more important than the application. It will almost certainly outlive the application, etc.

  • (your “many languages” point is testament to that). Polymorphism in my book means multiple implementations of the same abstraction(s)

    Yeah, OK fair enough.

  • [Protocols are] agreed beforehand… fixed in stone by others” — how is this different to programming against APIs?

    I think this is really the key thing here. Basically, the data you ship over the wire is part of your API and is analoguous to having defined it using public fields (or e.g. in a struct). That’s it. Ultimately, there are two scenarios:

    1) In addition to a load of struct-like objects in your program, there are also a lot of objects layered on top which have many possible different imeplementations (i.e. will benefit from abstraction).

    2) Your program is mostly made up of struct-like objects along with various functions that transform them.

    All I’m saying is that scenario (2) is way more common that you might think and is essentially the most common scenario.

    Even for me as someone who builds compilers (which is a pretty specialised domain) you might think I’m in scenario (1). There certainly are some abstractions in my compiler (e.g. for name resolution, subtype checking, etc). But, actually, after a long time have I realised that I’m actually in scenario (2). An OOP language is hindering me, not helping me.

  • We can view web services (in most general sense) as objects and requests as messages. Responses can be either dumb values (eg. text) or references to another objects (links to other services). On this macro level we also have representation hiding, polymorphism etc.

    On all levels we occasionally need to extract (and export) pieces of dumb data from some active entity. When we need to communicate more aspects of this entity, we can choose to either export more attributes (giving up data hiding), export the code (this gets messy with stateful code), or publish it as a service (return the reference to the object or the link to the web service).

    I think most OOP-related friction stems from two conflicting wishes. In one hand we desire coherent systems with lot of shared rules and knowledge (allowing easy communication inside the system). In the other hand we want to make systems modular and composable while promoting independence and different implementation approaches. I don’t see how different independent systems (OS processes, companies, persons) could communicate without exporting their internal knowledge into some dumb(er) common format.

  • Hey Aviar,

    I don’t see how different independent systems (OS processes, companies, persons) could communicate without exporting their internal knowledge into some dumb(er) common format.

    Yeah, I think this is roughly what I’m trying to say!

  • Surely this debate is now moot? Purescript, React, Mithril, & Angular are all aggressively functional rather than object-oriented. Duplication of state by pushing it down a wire requires that state to be immutable, because otherwise all the update anomalies we know so well from pre-relational databases come back with a vengeance. The alternative O-O model (Millerite things like E, Joe, Waterken etc) that rely on messaging back to distributed objects will always be less efficient for small data items… and we know almost all data are small. It’s simpler, quicker, and easier to just send the state itself, rather than send updates back to either a single point of failure (bad) or some kind of federated eventually consistent object store (worse).

    OO works where data is both procedural and extensible, i.e. coalgebraic. Widget sets, not even collections, are the paradigmatic example. No-one writes more collections (outside a data strutters class) but people used to want more and more widgets. Now even “widgets” are HTML, just another fixed data format.

  • All I’m saying is that scenario (2) is way more common that you might think and is essentially the most common scenario. … But, actually, after a long time have I realised that I’m actually in scenario (2). An OOP language is hindering me, not helping me.

    If you believe that, our planned curriculum (incl SWEN225) is not just misguided but actively unethical. I guess 225 (along with 324) go to Whiley then?

  • Hey James,

    Surely this debate is now moot?

    Well, I’m not sure. Are you in for/against the hypothesis here? I think you’re saying that “OO has its place”? i.e. when you say this:

    OO works where data is both procedural and extensible

    I definitely don’t disagree with that. The point of the post though is that the world is moving away from this as the default. Like you said, widgets are just HTML/CSS/JavaScript now.

  • If you believe that, our planned curriculum (incl SWEN225) is not just misguided but actively unethical.

    Well, make no mistake, I definitely do believe it. But, that doesn’t mean all of my colleagues agree with me, or that I believe I could persuade them in any sensible timeframe. Also, frankly, teaching students one of the most widely used languages is hard to argue with! I’m definitely not going to try suggesting Whiley here 🙂

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>