H2CO3's tech rants

Posts by: H2CO3


JOIN Considered Harmful

Information is ubiquitous, and so is its representation: data. The amount of data produced and necessitated by information technology has seen an explosion in the last few years. Underlying the highly fashionable, even more vague, and, quite often, bullshitty keywords of “Big Data”, “Machine Learning”, “Hadoop” and “MapReduce” is an honest, desperate need for better data management and data processing technologies so that we can keep up with all our smart fridges’ log files, gradually piling up on terabyte-sized SSDs. Of course, one of the central challenges is still the classic one: storing that data in a sufficiently structured and efficient manner so that computations are easy to perform correctly and quickly. Databases have become increasingly sophisticated and diverse, but due to the nature of human perception of information, one piece of good old technology still possesses the key role: the relational model. Humans tend to love categories, classes, groups, partitions, clusters and lumps. (And the author of this post apparently loves Thesaurus, too.) I guess it’s because it is easier for us to understand related pieces of information as coarse-grained sets of data — there are simply fewer things to remember. This, and the fact that we are usually not only interested in the raw data itself, but also in the connections between various kinds of data, naturally give rise to the relational model, as it allows one to group related pieces of data into relations (tables), and express further connections between tables by means of keys. The usual implementation of the relational model in RDBMSs is based on the notion of primary keys and foreign keys. Primary keys uniquely identify a record/row within its own table, while foreign keys ought to represent connections between records across tables. Conceptually, this is the right thing, except that it is...

Read more


Untree, Or, I Will Get My Data Anyway

With a biologist colleague of mine, we are recently doing some phylogenetics. This branch of bioinformatics is, broadly speaking, the science of determining evolutionary relationships between species and arranging them in a tree structure that represents which species inherited and diverged from which one. For visualizing such trees, biologists and bioinformaticians like to draw pretty plots like this: Evaluation of our work naturally involves comparing our results to previously-obtained data. Of course, when trees are big and they have many nodes and leaves, it’s nowhere near feasible to compare them by visual inspection, it just has to be done computationally. Unfortunately, experience shows that many authors of relevant studies do not provide machine-readable data along with the text and images in their papers. I will not try and guess the reason behind this issue just yet, but the solution is quite clear: we will need to transform human-readable trees in papers into something machine-readable.

Read more


Thesis

I have completed my Bachelor’s Thesis, “Design and FPGA implementation of a protein structure comparison method based on alignment of backbone conformations”. You can find the text and the code on GitHub.

Read more


Let’s Stop Bashing C

This blog post is a quick reply to Let’s Stop Copying C. To begin with, I agree with most of the things Eevee wrote. However, I think she went a little too far and described some of her personal opinion as if it was a fact. I think that as programmers, we need to be more honest about these kind of things, so here goes my rebuttal of some points she made. What’s Wrong with Integer Division? The point the author makes about integer division is that it can confuse beginners. Sure enough, it can! But what can’t confuse beginners? After all, they are… beginners. This is consequently not a great argument against integer division, or any other language feature, for that matter. However, the behavior of integer division is very useful. I would argue that in most cases, one expects it to behave just as it does. The reason for it is simple: when one is working with integers, one often has a problem related to integers in the first place. And we don’t want to involve nasty floating-point operations when, say, trying to zip a list of indices with their corresponding array elements. Performance is not even the point; floating-point numbers are much more complex (and a lot harder to use correctly even for experienced programmers) than integers, and above all, they have different semantics. I think that if one wants to escape the nice and friendly realm of closed operations, one should indicate it explicitly, via a cast – exactly how it’s done in C. Or maybe have another operator that does floating-point division, I don’t mind that either. What’s Wrong with Increment/Decrement? Bashing the increment and decrement operators became quite popular since Swift has taken the side of their omission. After all, if Apple does...

Read more


HaskLSD, a Drug-Dependently Typed Language

Let’s face it: being a Bionics Engineer has its own advantages. One such advantage is that you can poke needles in people’s brains, hook them up to an oscilloscope and make sense of whatever kind of signal comes out. Examining the human nervous system is an extremely important aspect of biology in the 21st century, and with the advent of the convergence of numerous fields of science, especially biology and computer engineering, I think it’s time we finally bring together programming languages and brains. For the last few couple of months, I’ve been working on a new, pure, functional, strict and dependently typed language. But what is the point in yet another such language, you might ask. Well, the innovation lies in the human. I believe we should introduce imagination, art and freedom into programming. Enter HaskLSD. HaskLSD has a unique type system that closely integrates with its compilation model. Before you compile a HaskLSD program you wrote, you need to take a small amount of designer drug, and wear EEG electrodes. The compiler is connected to the EEG via USB, and adjusts the type system of the language on the fly, in accordance with the images and emotions you experience during hallucination. Hence, all types are dependent not only on values (in the traditional sense of dependent types), but also on the personal feelings of the programmer. This makes programming much more personal, human-like and artistic. So how does a HaskLSD program look like, after all? Here’s the infamous Hello World in its full glory: [crayon-59c32edd83798903098178/] This small example program gets the first 3 (essentially random) colors from the programmer’s visual cortex, uses them to color the unicorn he is imagining, and makes the unicorn say “Hello World”. Being an unsafe side-effecting operation, unicorn extraction and manipulation is...

Read more


Generalized Concrete Nonsense, or Why We Teach Factually Wrong Things

I have been attending university for almost three years now, and during this time, I have been watching a worrying pattern emerge in the Hungarian educational system. It is not exclusively our faculty where this scary phenomenon rears its ugly head. I have read several pieces of learning resource (textbooks, course handouts, presentation slides and the like), released by a couple different, highly-reputed Hungarian universities, and almost all of them seem to be suffering from a serious class of problems. Up until now, I could put up with these problems, although I was quite annoyed. Today, however, I had enough of a bad luck to come across a set of lecture slides of abysmal quality, an experience that made me unable to keep quiet any longer. So, what is the problem I am talking about? It is really very simple, yet totally unexpected and shocking. It is that a lot of university course material, at least in the field of information technology and computer science, contains significant amounts of factually wrong assertions presented as facts to the unsuspecting students. In fact, “significant” puts it very mildly. The lecture slides I’ve encountered today contain several capital errors, frequently two or three of them per slide. In order to illustrate the seriousness of these mistakes, I am quoting some actual phrases, definitions and assertions found in the slides in question (translated to English, since the original material is written in Hungarian) that I have encountered. I will try to rationalize some of the mistakes, look a bit behind their cause and suggest corrections. One of my friends always suggests that I should just report the mistakes to teachers, which I usually try to do, but that only works when a teacher is willing to admit the mistake and accept the correction....

Read more


Thick Skin, Thick Functions: Unification of Callables in Swift

I was recently thinking about a fairly important problem in the design and implementation of languages with closures. Consider the following piece of (pseudo)code (I am deliberately trying not to use any particular language for now): [crayon-59c32edd83ca1446442297/] That is, we have two functions, noContextRand() and contextualRand that take no arguments and return an integer. We define another function, increment, which takes one such function, calls it and adds 1 to its return value, then returns the result. So far so good. But how might this get compiled? Let’s agree on a few points before digging deeper. The first thing I’d like to assume is that contextualRand() is actually a closure and it captures prngState. While prngState may be a global variable and this contextualRand() may not really need to be a closure, for the sake of argument let’s just pretend that it is. (Whether it actually is does not in fact matter – I simply want to address the issue of closures vs ‘free’ functions.) Another requirement is that the code above typechecks and compiles without errors – that is, closures and non-closure functions are treated uniformly, modulo their type signature of course. The third aspect is that I want to preserve C linkage and ABI, i.e. context-free functions should have exactly the arguments that their declaration suggests in the executable image and they should not include hidden ‘context’ or ‘environment’ parameters. The last thing I expect from this sort of code is that it can be compiled to reasonably efficient machine code. By that, I mean both running time and executable image size, the latter being an important factor in icache usage efficiency. In fact, preservation of C linkage, specifically the absence of extraneous unused parameters helps performance too, by decreasing register pressure. So let’s consider some of...

Read more


Infiltrating Political Correctness – Or, the Demise of Technology

I love my profession. I love being a programmer. And you know what the primary reason for that is? It’s that our field is honest and scientific. If done right, there are no bells and whistles. We do technology. We love technology. We do it passionately, we do it freely, and our goal is to make the life of the passengers of this small, blue, ball-shaped spaceship a bit easier and nicer. At least, that’s what the traditional hacker culture was about. Because we don’t want politics, and we don’t want to run in unnecessary circles. We want to leave that to all those glorious Social Justice Warriors™ who enjoy doing it. We don’t enjoy it. It gets in the way of actual work, it delays advancement and development. Ask Linus Torvalds about the topic; in fact, just watch some of his talks where he answers related questions. It’s no coincidence he despises anything remotely similar to a “Code of Conduct”, which would stop people from pointing out loudly and publicly if something is so wrong and stupid that putting it into the Kernel would cause a catastrophe. But it seems that it’s happening. Political correctness is everywhere, and it managed to infiltrate numerous important parts of the programming community. A prime example is Stack Overflow. In the past few years, Stack Overflow has lost several high-reputation contributors, either because they had been explicitly banned, or because they were no longer welcome by higher forces (namely, the moderators and community managers) of the site. In July 2012, Stack Overflow started the “Summer of Love” campaign. The intent was good: becoming more welcoming to newcomers and beginners. But the timing and the realization couldn’t possibly be worse. The preceding year, 2011, was when a significant decline in quality had kicked...

Read more


Locatable generics – an attempt at creating minimal-impact generic types

(and, as a side effect, a unified, quasi-monolithic compiler and build system) Generics are an essential tool in the workbench of the typical 21-st century programmer. Their purpose is two-fold at the very least. One of their goals is to increase code reuse and eliminate redundancy that would arise if one had to re-implement certain functions or classes for each possible set of types they operate on. Another notable feature of generic types is that they introduce a level of abstraction by delegating the task of selecting and emitting functions, operations and other behavioral entities appropriate for a specific type to the compiler – freeing the programmer from having to think about lower level issues, thereby saving him time, reducing the chance for errors, and potentially making code clearer to the future reader. (The latter property of generics is – unfortunately – not implicit or automatically true by any means, though. Generics can be (and are indeed) abused badly, mostly due to overgeneralization, to the point where the level of abstraction generated by the immense amount of parametrized types overwhelms the mental abilities of the reader and actually ends up obfuscating the code.) The State of the Art Currently, most mainstream languages can be roughly fit into one of two categories depending on how they implement generics. The first family of languages, in which we can find C++ and Rust, for example, uses compile-time expansion of generics, substituting actual types for type parameters ahead-of-time. Effectively, in these languages, generics can, as a very crude approximation, be viewed as syntactic sugar for copying and pasting the skeleton of the code then adjusting the types. The advantage of this approach is obviously raw speed; the code runs exactly as fast (or slow) as it would run should it have been written...

Read more


Jailbreaking Is Useful Even For an App Store Developer

This summer I’m working as an intern at an infosec and telecommunications company. We are developing an iOS app which needs some advanced capabilities (e.g. access to the microphone of the device) for which it asks the user using iOS’ built-in facilities. Upon program launch, iOS pops up an alert view and lets the user grant or deny the appropriate permission. However, this only ever occurs once in the lifetime of an app – if you flip the corresponding switch in the Settings app, and/or uninstall and reinstall the application, the alert view won’t appear anymore. This was quite annoying because I was working on the GUI/UX part of the app, so I needed to test this facility regularly (among others). Searching the internet for this issue led me to this question on Stack Overflow. However, none of the answers to that question have I found satisfactory. The accepted answer completely misses the point (I don’t even know why the OP has accepted it; in fact, he explicitly states in the question that he doesn’t want to toggle the permissions, he wants the app to ask for them again.) The second answer which is highly upvoted (and which some users suggest should be the accepted one) works, but with a caveat: it works by resetting all the permissions/privacy settings of all the installed applications. I didn’t want to do that, since I use my personal iPad as my primary testing device, and resetting all my frequently-used applications would have been a real pain. So I decided to investigate a bit further and interrogate iOS to reveal how each individual capability and permission is associated with an application. Fortunately, my iPad is jailbroken (I wouldn’t ever use an iOS device without jailbreaking it), so I have had a couple of...

Read more