Generalized Concrete Nonsense, or Why We Teach Factually Wrong Things
I have been attending university for almost three years now, and during this time, I have been watching a worrying pattern emerge in the Hungarian educational system. It is not exclusively our faculty where this scary phenomenon rears its ugly head. I have read several pieces of learning resource (textbooks, course handouts, presentation slides and the like), released by a couple different, highly-reputed Hungarian universities, and almost all of them seem to be suffering from a serious class of problems.
Up until now, I could put up with these problems, although I was quite annoyed. Today, however, I had enough of a bad luck to come across a set of lecture slides of abysmal quality, an experience that made me unable to keep quiet any longer.
So, what is the problem I am talking about? It is really very simple, yet totally unexpected and shocking. It is that a lot of university course material, at least in the field of information technology and computer science, contains significant amounts of factually wrong assertions presented as facts to the unsuspecting students.
In fact, “significant” puts it very mildly. The lecture slides I’ve encountered today contain several capital errors, frequently two or three of them per slide. In order to illustrate the seriousness of these mistakes, I am quoting some actual phrases, definitions and assertions found in the slides in question (translated to English, since the original material is written in Hungarian) that I have encountered.
I will try to rationalize some of the mistakes, look a bit behind their cause and suggest corrections. One of my friends always suggests that I should just report the mistakes to teachers, which I usually try to do, but that only works when a teacher is willing to admit the mistake and accept the correction. Unfortunately, that is not always the case, furthermore one who composes an entire lecture with several dozens of major errors in it may have a completely different worldview to the point where it is impossible to try and discuss with him or her even the most obvious mistakes, because he or she will tend to unconditionally deny all of them. (My experience with this kind of teachers and teaching assistants tells me that they often do so in the name of authority, which is even more unfortunate.)
So, let me enumerate some of the offending pseudo-facts below.
1. The Overall Confusion
This is more of a general issue, illustrated by a specific example. The teacher responsible for this particular course is notoriously picky about using exactly correct terminology (one can fail his/her exams by calling an array-of-arrays a “two dimensional array” in C++), yet sometimes fails to define and use terminology at least consistently, let alone correctly. Take for example the description of the difference between the syntax and semantics of a programming language. Starting with generative, context-free grammars, the lecture slides show that in a subset of a natural language (Hungarian in this case), it is possible to derive “grammatically correct” sentences (i.e. ones that have a valid Abstract Syntax Tree), some of which are meaningful and others that are nonsensical. Specifically, two examples are given:
- “The naughty boy dropped the plate.” – grammatically correct (has an AST) and meaningful
- “The plate dropped the naughty boy.” – grammatically correct (has an AST) but not meaningful.
Then the slide concludes that “The solution is hard and problematic, so let’s consider artificial languages instead of natural languages!”.
This is wrong on two, closely related levels. First, the problem of checking whether a sentence is semantically meaningful is entirely unrelated to it being in a natural or artificial language. After the sentence has been matched against the grammar, exactly one thing is known about it: that it is syntactically valid. In a context-free grammar (which are the only type of grammar this particular slide is concerned with), syntactic validity is a necessary but not sufficient premise of semantic validity regardless of the language, so conflating this with the fact that natural languages, by the way, tend to be harder to analyze than most artificial languages is an error and a sign of confusion. Second, as a consequence of this orthogonality, artificial languages are not necessarily easier to analyze (either sintactically or semantically), since one can design an artificial language which is, for example, highly context-sensitive. In fact, some programming languages, for example C++ and Perl, are well-known for being difficult to parse correctly.
2. The Oversimplification
Regardless of whether it is overgeneralization, oversimplification, or a mere, honest lack of understanding of real-world language processing tools that let the following assertions into the course material, they are still incorrect:
Definitions of a programming language:
- formal syntactic description: this is what compilers use
- formal semantic description: for checking consistency of meaning
This somehow implies that compilers do not use formal semantics for semantic analysis, which is blatantly false. Several functional languages, for example, are backed by a solid theoretical background and their implementation follows the formal semantics of, for example, the type system very closely. Not to mention languages suited for theorem proving which actually perform, well, rigorous theorem proving as part of their operation.
The rampage then continues with statements like:
- “lexical analysis can be specified using a regular grammar”
- “syntactic analysis can be specified using a context-free grammar”.
Not only is this trivially false in the general (and frequent) case, but it also induces a false sense of security in students (who know mostly nothing about language theory at this point, that’s why they are attending this class in the first place) that there are no hard problems in syntactic analysis. Are we still surprised that parsers for arbitrary protocols continue to be the rich source of hard-to-detect, long-running, often security-critical bugs?
(Incidentally, these kinds of statements also make me question whether the author has ever written an actual lexer or parser…)
3. The Hand-Wavery
Now we have reached my least favorite topic for bike-shedding: the question of “strong typing” versus “weak typing”. These two hand-wavy expressions are the source of endless, heated, meaningless arguments deeply filled with politics, flamewars, personal attacks and numerous other fallacies. So, in my opinion, they shouldn’t even appear in any sort of official teaching material. Academic works should only make use of terminology with precise meaning, terminology that is preferably agreed upon universally, over the meaning of which general consensus is achieved. “strong” and “weak” typing do not constitute such terminology.
Ironically, pickiness to terminology doesn’t mean that the aforementioned teacher does a good job of explaining ambiguous words. In fact, he/she asserts that…
A language is strongly typed, if the type of every value, object, formal parameter and function can be unambiguously determined at compilation time.
(There is no corresponding definition of “weak typing”, by the way.)
The above definition is precisely that of static typing, which he/she then (correctly) (re-)defines in the subsequent paragraph:
Typing is static, if type checking is performed at compilation time, while dynamic typing means that types are checked at run-time.
However, these correct definitions are immediately undone by having them conflated and twisted once again:
This is the reason why strong typing and static typing are entangled.
Again, this is only true if we define “strong typing == static typing”, but unless we are willing to make that step, it is simply false. I can only guess – based on my experience hearing other programmers trying to explain type systems to one another – that “strong typing” in Hungarian terminology is often mistaken for “does not allow or discourages implicit conversions”, likewise “weak typing” is conflated with “allows or encourages implicit conversions”. If we now substitute these definitions into the sentence above, it still fails to represent the truth, which is that static vs. dynamic typing and the presence of implicit type conversions are two orthogonal concepts, and there are statically-typed languages with many implicit conversions (e.g. C), just like there exist dynamically-typed languages with no or few implicit conversions (for instance, Python or Sparkling).
4. The Incorrect Implementation Detail
What I find one of the most destructive practices in teaching is unnecessarily leaking implementation details into the description of a particular abstraction, effectively rendering the abstraction useless because students won’t be able to disregard the One True Implementation they have been taught. This is usually done in the name of “simplification”, which is an invalid argument, because saying (wrongly) that “abstraction X is implemented using technique Y” isn’t any simpler or less confusing than stating that “abstraction X is usually implemented using technique Y, but it can, and sometimes is, implemented by Z instead.”
I have encountered two specific examples. The first one is “the stack”. More specifically, one slide says that:
Non-static local variables are stored on the so-called stack.
Again, this is wrong on two levels. First of all, most languages do not require that there be a stack at all. For instance, C has perfectly well-defined semantics for automatic local variables, yet the word “stack” isn’t to be found in my copy of the C99 standard. Not once. Of course, C compilers usually place some local variables onto the stack, but what about variables for which registers are allocated? What about an interpreter that puts locals on the freestore rather than the stack because that’s how it is convenient to implement? Secondly, what about languages that use closures extensively, of which the compilers emit code that allocates memory dynamically for captured locals?
Yet another memory-related implementation detail is “static variables are always allocated on the so-called static (main) memory segment.” Apart from failing to define what is meant by “the main memory segment” (which is not universally-accepted terminology as far as I know), it even explicitly states that static variables are “always” placed there. Again, this is trivially proven false by considering e.g. immutable statics that get inlined or statics that the compiler could optimize away for some other reason.
5. Simply Wrong
The penultimate mistake I am considering is a plain old misdefinition. It goes like this:
Lisp features recursion, dynamic memory allocation and deallocation (garbage collection).
This definition simply confuses dynamic memory allocation with garbage collection! Brrrrr! This is horrible, there are simply no excuses for it, especially when an alleged teacher of computer science or a programmer writes down such a horrendous lie.
6. Too Little Information
As a bonus, let’s see a fresh shot of ignorance:
Languages use two types of subprograms: these are called procedures and functions.
How so? There are languages in which the only kind of subprogram is the function, be it pure or side-effecting. There are other languages that employ other types of subprograms too, the most obvious being syntactic macros, which are proper units of code, executed at compilation time, that generate code. Would they not count as subprograms in the vocabulary of this teacher, meanwhile they are a means for structuring, decomposing and reusing one’s code, they are evaluated or executed by the compiler and/or the runtime, and they are textually part of the source code?
Now that you can see some of the things Hungarian programming and computer science students are taught, I can only ask you: how would it be possible to get rid of these mistakes, if at all? What will their effect be on the future generation of programmers? How can we educate those who will be struggling to get out of the trap of low-quality education that they have involuntarily fallen into?
The only thing that remains is hope.
Seeking 100% truth is a worthy goal. However, I think there are valid reasons for compromise.
In my first few years of school, I learned that it is not possible to subtract 5 from 3. A few years later, it became possible. Similarly, I couldn’t divide 5 by 3 for some time. Initially it was said to be impossible. Similarly, I couldn’t find the square root of -3 for some time.
Clearly, these were simplifications, made deliberately to assist my understanding. Maybe the teacher could have said “for now, we will imagine that you can’t subtract 5 from 3, just so that you get the idea.” But I don’t think that would have been better.
The teacher has to judge what to say in order to stimulate the learning process. Maybe the teacher will use a metaphor, which is technically a lie or an error, but which can be effective. From some points of view, lumping dynamic allocation and garbage collection together may be suitable.
And it’s always possible that the teacher doesn’t know all of the details. I think it can be disturbing when one comes to realise how fallible teachers can be. Your university, state, country, employer, etc (just like all the others in the world) can’t afford to have the best possible teachers. There aren’t enough of them. But the best students are able to learn, even from teachers who make mistakes, or who have holes in their knowledge.
I found this area fascinating when, after completing several degrees (including PhD) in computing-related fields, I took a course in education, to help me to teach university students. I think it was the most eye-opening course I’ve ever done. My previous model of teaching was that the teacher knows information, students are empty, and the teacher’s job is to communicate information to the students. This model of education leads to suboptimal education. A more successful model is for the teacher to arrange the environment so that the student can’t help but acquire knowledge from the surrounds, including peers, teachers, books, etc.
Finally, I feel compelled to point out that you have ended your essay with a “simply wrong” statement. There is clearly a lot more than hope. You clearly have some understanding of the field, and I assume that a number of your colleagues have also learned something at university, and I assume that this learning also “remains”!
Good luck with your studies. Perhaps one day you will experience the joys and challenges of teaching!
Hi John, thanks for your answer!
I understand your first point, but my problem is that we are no longer in elementary school. We are twenty-something-year-old men and women. We have been engineering students for 3 years. We simply ought to have a minimal ability of abstraction so that non-lies don’t distract us from learning. Also, the thing that is wrong with blatant errors (like conflating dynamic memory management and GC) is the general tendency whereby once you have learnt something the wrong way around, and it has imprinted, it’s spectacularly hard — and takes a lot of effort — to first un-learn it and the re-learn it correctly. It simply does more net harm than good.
My parents and grandparents are/were teachers – my grandmother taught in elementary school, my father and my grandfather in university, and my mother is a high school teacher. They have been practicing for several decades and the one thing all of them agreed on was that teaching misleading pseudo-facts introduces a mental debt which students will have to pay for a long, long time.
I’m glad that it looks like so. I don’t personally claim much experience in the field by any means (I have only been programming for twelve years of which only the last 6 or 7 years might be called nontrivial), but even with this little experience, obvious errors like the confusion about very basic terminology, are easy to notice (and be upset about when they are presented as The Truth ex cathedra).
I teach CS at a university in Portugal and notice a similar pattern. I blame this state of affairs in a tacit assumption within some sectors of academia that programming should be taught as practical skill rather than a set of foundational principles; this reflects in many ways, not least of which is the choice of programming languages (often justified by “employability”) and the accepted wisdom that mathematical methods and programming should be kept divorced.
The fact that programming is taught as a practical skill wouldn’t be much of a problem if “practicality” wouldn’t mean “I don’t mind if it’s outright wrong in 10% of the cases” in the vocabulary of the aforementioned teachers. Here, in Hungary, there’s another extreme with regards to how we teach programming: it is sometimes taught without any sense of practicality, to the point where it’s all theorem and proof, theorem and proof, and we learn about “transitive closures” of “sequential data structures” yet students have no idea why a linked list sucks when it comes to cache locality: after all, it’s absolutely isomorphic to an array, isn’t it?!
So, in my opinion, the education is completely schizophrenic about programming. It’s either complete hand-wavery, interspersed with the presentation of factually false assertions, or overly theoretical, useless, impractical pseudo-science. I think a healthy balance should be maintained between the two: whereby we can mathematically describe what we are doing without resorting to lies, but the ultimate goal of programming should be to have a very practical, useful tool with which we can solve problems.
Currently, our education system provides neither.
Your argument about “programming language employability” is spot on. The one thing that I just can’t for the sake of it understand is this. We have an entire, one-semester course for Java programming, “because it’s Enterprise!”. We don’t have any functional programming courses. Software Engineers don’t have any low-level performance-intensive programming courses. Bionics Engineers can take one optional course, FPGA programming, which I have completed, and it was awesome – but I think it should be available or even obligatory for future software engineers…
So, we are wasting an entire semester on Java, a language which adds literally no value, nor any practical or theoretical advancements to computing, a language that any intermediately experienced C++ programmer could pick up in about a week. Yet, students who haven’t written a single line of Haskell or assembly in their life will have their degree in Information Technology and Software Engineering. This is simply a scandal.