Programming is hard, and I recently had a bit of back and forth with a friend who thinks the fairly common belief amongst software people that “some folks will just never get it” is wrong. Since this is a belief I find at least fairly plausible, his arguments go me to thinking. In particular, he pointed at this paper on a model of learning that seems to explain the bimodal distribution of performance in first year computing courses.
The first point is that the bimodal distribution of marks in first year computing (CS1, in the paper’s language) appears to be real and uncommon. That is, comparable courses in physics, psychology, chemistry, calculus, biology and so on don’t have the same distribution. This suggests that there is something about programming that makes it relatively easy for some and relatively hard for others.
The question is: what is this thing, and can we figure out how to make it sufficiently easy for everyone? By “sufficiently easy” I mean, “as easy as reading is”, say. Almost everyone can be taught to read well enough to get by in the modern world. Can almost everyone be taught to code well enough to get by, where “get by” means, say, able to write short programs that manipulate a little data (I see this very commonly in my work, where otherwise capable people get stuck on simple data-processing tasks because they don’t know enough Python to get by.) I’m not talking about super-complex algorithms, just combining a couple of comma-separated-files into one and maybe taking the mean of one of the columns, or similar.
The next bits of this post is going to be a running analysis of the above-linked paper, written as I read it. I will then follow up with some ideas of my own, in particular the “barrier to entry” (BTE) hypothesis, that says coding depends on “getting” one very simple concept, that of the algorithm, which is arguably beyond a significant fraction of the population, and demonstrably alien to human ways of thinking. I will run a little model to demonstrate (or not) that this hypothesis also reproduces the biomodal mark distribution seen in CS1. Although the BTE hypothesis is not unrelated to the “learning edge momentum” (LEM) hypothesis of the paper, it has somewhat different characteristics, doesn’t depend on the same claim about programming concepts, and may or may not imply that some people will “never get it.” It should at least clarify what they don’t get, and allow an experiment to be performed that will shed some light on whether “not getting it” is innate or can be overcome.
Part 2 of the paper discusses the persistent failure to find measurable characteristics of people that correlate well with “programming ability” as measured either by job performance, marks in CS1, or some other measure (it is not always clear what the final assessment of subject’s “programming ability” is based on.)
Curiously, this search for correlates includes completely bizarre instruments such as the “Myers-Briggs” scale, which uses categories that are little better than astrological in their specification, being neither mutually exclusive nor jointly exhaustive, and apparently based on some strange Romantic notion of feeling versus thinking.
The logic of this search seems to be that if there is an innate property of individuals that determines their programming ability, it should also determine other abilities, and therefore show up in such alternative measures. It is not clear why this should be so. An ability that depends on the smooth inter-operation of multiple, otherwise unrelated, “atomic” capabilities may not show up on such simple tests.
I am going to argue below that programming does in fact depend on a small number of abilities, but also on the interaction of those abilities in real-world environments, and it may well be that in a testing environment the important differences won’t show up. So the lack of our ability to find an alternative measure or predictor of programming ability does not in any way speak to its innateness or lack thereof.
We don’t have any good measures of artistic ability, so far as I know, yet few people would claim that everyone is equally capable of creating art. Although equally, few people would claim that an artist can become great without steady, hard application. My claim that talent exists and is important is not intended to be determinative: a less talented person may outperform a more talented one by hard work and mindful practice, but they will have to do more of that than a more talented person would.
To be even more clear what I am saying about innate ability: there is no basis for the claim that innate ability is bimodally distributed. There is no capability of any kind that is not distributed on a continuum across humans. This means that the bimodal mark distribution in CS1 must be the result of a threshold effect of some kind. People who fall below the threshold fail, people who exceed it get high marks. With sufficient application or appropriate pedagogy, many of the people who fail should be able to pass. The question is one of how the amount of effort they have to put in scales with the level of “innate” ability.
Or let’s back off even further, and call this “ab initio” ability. For it is clearly and uncontroversially true that CS1 has a bimodal result distribution. Absent evil instructors who simply fail half their students regardless of performance, there must be something about these students, whether due to past history, developmental stage, genetics or magic, that distinguishes them from their more successful peers. The failure to find correlates that identify this population says nothing about the existence of differences, because clearly such differences exist, regardless of whether or not they are innate.
The apparently popular suggestion that to identify the cause of a dramatic, local, obvious, weirdness in CS1 marks we must move to multi-institution, multi-national studies is risible. The effect is clearly obvious in single classes of a few hundred students taught in a single year at a single institution. It is implausible that a multi-centre studied would be required to get at the cause, and that this seems to be a preferred research direction, rather than stepping back and doing a natural history of educational failure in CS1–that is, sitting down with each student and discussing their experience in depth, rather than designing blind instruments and hoping they hit the unknown target–points to one source of the problem in CS1 education. It may be that the nature of the problem has been misconstrued, and a deeper dive into the whole issue seems to be a better direction to take, rather than extending the statistical reach of tools that have been shown to not adequately explain most of the variation in CS1 outcomes.
The author points out that the bimodal distribution of marks is a “big effect” but then suggests it ought to have a “big cause”, which is superficially similar to what I wrote immediately above, but really isn’t. In particular, my critique in the paragraph above is of the claim that tools that won’t work on a population where the effect is large should be extended to studies that cover much larger populations. Instead, I want to look at the problem more closely, and ask (as the author does) if the source of the bimodal distribution is in the population, or in the subject.
I may be anticipating the result of the paper a bit here, but consider the following analogy: getting over an 8 foot high wall in an obstacle course. This is a difficult obstacle to surmount, and any population whatsoever will immediately have a bimodal distribution of success: some will succeed, some will fail. You can’t get much more bimodal than that, because we are looking at a single task that has a crisply defined binary outcome: the program either runs correctly, or it does not… I mean, the subject either gets over the wall, or they do not.
If we go looking for a “single innate source” of this bimodal distribution of outcomes, we won’t find one. Tall people will do a bit better, but they also have to be strong, and the amount of strength they need will depend on how much they weigh. Short strong people may do better than tall heavy people. If we look at any single factor we might well find a mediocre correlation with measured outcome, as we do in CS1. This does not mean that there is not some abstract surface in the space of strength/height/weight that doesn’t give us a fairly strong predictor, dividing good jumpers from bad. But there is not discontinuity in any of the underlying “simple” characteristics. It is the task that creates the bimodal distribution, not the population characteristics.
And yet: you cannot make a short person taller. There are some characteristics of the people involved in this task that are fixed in their nature, and such that people who are at the lower end of the scale are going to have to work harder to succeed in the task than than their fellows at the higher end.
Just one characteristic among many has to have this aspect to create two sub-populations, those below some critical height who are going to have to work exponentially harder as their stature drops below the threshold, and those above, who are going to find the task increasingly trivial.
Telling people in such a population that everyone is born equally able to do the task is cruel and insulting, as it implies that those who are struggling with it are simply not working hard enough.
From a learner’s point of view, one goal in exploring any new field should be to ask yourself, “Does this field require attributes rather than skills in some areas, and if so, are those attributes ones I can change relatively easily (like weight or strength) or hardly at all (like height)? And if I have identified that I am deficient in an attribute that makes success in this area easier, do I have the commitment to put in the work and develop the technique that will make up for that?”
This does not mean “some can’t”, but it does mean “some may have to work far harder to achieve the same level of success others achieve without much effort.”
The meaning of the claim “programming concepts are tightly integrated” in section 4 is curiously at odds with the learning momentum model introduced in section 3. The model in section 3 assumes that each module in a course is significantly dependent on the preceding module and nothing else. The claim “programming concepts are tightly coupled” in section 4 is taken to mean that all basic programming concepts depend on each other
The linear one-way model of section 3 seems appropriate to physics or mathematics, where no biomodal distribution of first-year marks is typically observed. It certainly does not describe the everything-depends-on-everything-else claim of section 4 regarding computing.
Nor do I believe the section 4 claim is particularly accurate. It is true that teaching a programming language may require half a dozen concepts to be grasped before the first program runs, but teaching programming is not teaching a programming language. And different languages have radically different requirements for new learners. Python, for example, has practically none at all. The simplest useful Python program is:
print "Hello world"
Conceptually, if you can type that into a text editor, save it to first_program.py and type:
at a command prompt, you can write a program, and everything else you need to learn can be introduced incrementally in linear order with one-way dependencies the way we try to do in first year calculus, linear algebra or physics.
If you are teaching anything else as a first program, or using a language that requires any more concepts than that, you are doing it wrong.
Yet so far as I know, CS1 classes using Python as their language of choice have the same distribution as those using Java, FORTRAN, Perl or brainfuck. I’m pretty sure we would have heard about it, otherwise.
Curiously, I regard the multi-connectedness of computing concepts that the author touts as being in opposition to his hypothesis: if computing concepts really were multi-connected in the way he claims, there would be many opportunities for students to recover from mis-steps. And I believe this is actually the case in subjects like physics and math, which are more multi-connected than computing, which in my view is far more linear and hierarchical than most subjects.
In first year mechanics we teach kinematics first. That involves four concepts: position, time, velocity and acceleration, of which only the first two are independent. We then teach dynamics, which adds the concept of force, as well as two specific forces, gravity and friction. Dynamics is almost wholly dependent on kinematics, so if you don’t get kinematics, you won’t get dynamics. Motion in two dimensions and orbital motion are two common advanced topics, and again are entirely dependent on preceding concepts. This fits very well with the model introduced in section 3, yet we do not see bimodal mark distribution in first year physics.
So I a) disagree with the author’s contention that computing concepts are tightly multi-connected, and b) disagree that his characterization of computing concepts as tightly multi-connected makes them a suitable instance for his simple model and c) contend that other subjects, such as first year physics (which I have taught) are better suited to his simple model, yet do not exhibit bimodal mark distributions predicted by that model.
In particular, the claim “If we fail to acquire some concepts from a given domain we lack the structure within which to set further concepts, and learning becomes harder” seems to me to be unrelated or even counter to the (false) claim that computing concepts are highly structured with a dense web of inter-connections. Failure to learn one concept will not disrupt such a dense structure nearly so much as failure to learn one concept in a linear, one-way dependent structure like we see in physics (and computing.)
Likewise, the claim that when teaching programming “The edges of the puzzle pieces are sharply defined, there is only one correct place that each new piece can fit” seems very much at odds with the claim, with regard to teaching programming, that “There is, I suggest, no right place to start, and no correct ordering of topics, because a programming language is a domain of tightly integrated concepts, where almost every concept depends on many others.”
The analogy to the puzzle is a tempting one, but again: there is nothing unique to programming about this.
The assertion “There is no progammer gene!” is irrelevant. There is no gene for height, either, yet relatively few people would claim that height is not an more-or-less fixed attribute of individuals, and anyone who set out to improve wall-jumping prowess by primarily focusing on altering the height of learners by some suitably Procrustean technology would rightly be considered odd.
This is not a debate about whether or not there is a gene for programming. This is a debate about whether there is anything we can do for those adult human beings who enter CS1 and seem unable to get over the barrier, despite their and our best efforts to enable them to do so. If they lack a learnable skill, we can help them, or they can help themselves. If they lack a more-or-less fixed attribute, there is much less we can do, and insisting that “everyone can code” starts to look like the claim “everyone can get themselves over an eight-foot wall unaided”.
At the end of section 4, the author makes the claim that the learner population effectively bifurcates during the course of the course, and ends up as two different populations of learners. If this is the case, it is more-or-less a matter of chance who ends up in which population. But this is not consistent with the observation–reported earlier in the paper–that people with poor outcomes may invest more time in their course-work. People who are doing poorly in any course typically work hard, but in every other course they catch up. The author’s claims about the unique nature of computing do not explain in the least why, having failed to successfully learn how to type
print "hello world" and
python first_program.py in the first class, students are less able to pick up the ideas from their peers or other sources before learning the computing equivalent of motion under constant acceleration in the next class.
At the beginning of section 5 the author reasserts the claim that computing concepts cannot be understood except with reference to each other, and suggests ways that this claim might be tested. I believe this is worth doing, especially in comparison with other first-year topics, and expect that computing will be found to be if anything less interdependent than first year mechanics.
In short, the paper does not make a convincing case that learning outcomes in first year computing are due primarily to the subject matter rather than the properties of the learners. It is certainly the case that there is something about programming, specifically, that makes it different. But it is important in these issues to lay out in a precise way what you are arguing about, and the author doesn’t do that, to the extent that by the end of the paper they are arguing that “some can, some can’t” is actually true–but only after the course is complete (which is a tautology, I guess?)
Furthermore, the notion that no one has tried intense remedial education for students who fall behind rapidly seems unlikely. It is an obvious thing to do, and if it worked we wouldn’t still be asking these questions 40 years on.
My own belief–unencumbered by much data, mind, so not having a particularly high absolute plausibility–is that programming success depends on a fairly small number of skills and attributes, but we don’t know which ones are skills and which ones are attributes, and we do a lousy job of teaching them in any case.
- Attention to detail. I’m copy-editing a novel right now. Want to know how many mistakes I make per page (which is about 690 words, for complicated reasons)? About 3. That’s an error of some significance every 200 words. My novel would not compile.
- Ability to follow instructions precisely. I once was on a philosophy mailing list where members were solicited for art recommendations. We were asked to submit in a very specific format. Out of the hundred-odd respondents, two people followed the instructions (one was the person compiling the list; the other one was me.)
- Algorithmic orientation. Breaking down tasks into repeatable steps is utterly un-natural, almost anti-human. My evidence for this is that the industrial revolution got off the ground in 1750 CE not 1750 BCE. That’s the whole trick that Wedgewood and others invented: replace a holistic approach with an algorithmic approach. Not natural.
- Representational intelligence. Programing depends in keeping track of what names represent. This is even more extreme in languages that allow for pointers, values and references, but even in Python you have to remember what every name represents in every context.
That’s just a first pass at what I think is important, and again, if any of those things is a more-or-less fixed attribute rather than a learnable skill, there are some people who will always have to work much, much harder to code. In particular, I think the latter two are places to look for non-maleable individual differences. Humans have a hell of a hard time putting steps in order, and dischronia or temporal disorientation is a symptom in a number of fairly common neurological disorders, including dyslexia and various common dementias, which suggests our time-ordering ability is relatively fragile, and therefore may exhibit a relatively wide variation within a population.
This is the “Barrier To Entry” hypothesis: that a lack of one or more of these skills or attributes, most likely the ability to break tasks down into step-wise, ordered pieces, and secondarily the ability to track the referential meaning of various groups of characters on the screen–is what separates programmers from non-programmers.
And like wall-climbing, the odds of just one of these characteristics being a more than mediocre predictor for success in CS1 is low, but some combination of them may well have more predictive power. If they do, then it is an entirely separate question as to whether or not they are skills or attributes. The two issues really have nothing to do with each other.
[2014-02-17: minor edits for clarity. As pointed out in the text, I make a lot of errors when writing.]