Friday, June 16, 2017

Privilege and Idiocy

My sister recently posted a link to a test (How Privileged Are You) on her FaceBook page. She’d taken the test and posted her score. A number of her friends commented with their scores as well. Sadly (at least in my opinion) none of them posted a critique of the test itself, only scores that it had produced, seemingly taking it for granted that those scores were meaningful.

I was curious, so I looked at the test. It struck me as one of the most idiotic things I’d seen in quite a while (not sure if that means I'm picky or just lucky). A lot of online tests are pretty awful, but at least among those I’ve seen, this truly does take the prize for the worst of them all.

The directions for the test tell you to “Check off all the statements that apply to you.”

The first few statements


1. I am white.

This at least tells us the nature of the test—the more items you check, the more “privileged” it’s going to claim you are. Other than that, it’s pretty close to pure idiocy. Just for example, if a child is (roughly) half Caucasian and half Filipino, do they count as “white” or not? The simple reality is that trying to draw clear lines between races is difficult and error prone at best. Yes, if you’re careful enough to define what “white” means, you can find people who do and don’t qualify—but by the time you decided on a clear-cut definition to which people can give an accurate and meaningful answer, you’re going to lose the ability for it to do what’s really intended: create division and discrimination.

2. I have never been discriminated against because of my skin color.

This is essentially impossible for anybody to truly answer. There are clear-cut cases where somebody can state with certainty that they were discriminated against because of their skin color (the US in the 1950s or earlier, South Africa under Apartheid, etc.)

It's impossible, however, for anybody to know that they were *not* discriminated against because of skin color. That time in third grade when nobody wanted to choose me for their team at recess? Probably because I was skinny and nerdy--but I can't say with certainty that nobody ever thought "let's skip the pasty white guy".

3. I have never been the only person of my race in a room.

This reaches a level of excrescence I’d never previously experienced. Allow me a bit of a digression to attempt to explain why that’s true1.

I grew up in Aberdeen, South Dakota. I don’t have any official statistics on its demographics at that time, but my guess would be that its population was at least 90% western Europeans (and over 95% wouldn’t surprise me even a tiny bit). I was something like 14 years old before the first time I actually traveled outside South Dakota at all—and that was to a small town in Minnesota, where the population was about equally white (maybe even more so, though I don’t really know).
So, in the view of the people who wrote this test, at that time I was apparently almost supremely privileged. Essentially any and every time I’d been in a room full of people, not only had there been at least one other Caucasian, but in most cases, the vast majority (or all) of them were Caucasian.
Much more recently, I’ve traveled to (for one example) the Philippines. While I was there, it was quite common that I was the only Caucasian in the room, and in some cases probably the only Caucasian within a few miles.
So if we treat this test as meaningful, we believe that I’m actually less privileged now than I was as a child.

In reality, exactly the opposite is true. Going to the Philippines was a huge privilege, and having members of my wife’s family to lead me around, show me the sights, and expose me their viewpoints on life was an even greater one. At least for a child like me who was curious about the world, growing up in a white ghetto was far from a privilege, and getting to see other parts of the world far from repressive.

4. I have never been mocked for my accent.

When I was a kid in South Dakota, we were often visited by our cousins from Minnesota. Each of us thought the other had a funny accent, and yes, each group mocked the others’ accents.
Apparently in the minds of the people who wrote this test, I’d have been more privileged if my cousins had never visited. I’m left wondering what sort of lunacy would lead somebody to this conclusion. Does anybody honestly think that a little good-natured mocking actually hurt me? Would I have been better off in any way if my cousins hadn’t visited?

5. I have never been told I am attractive “for my race.”

Like pretty nearly everybody in the US close to my age, there was a period of about six months (or so) when I’d have had to live by myself under a rock in the mountains to avoid hearing that I was “Pretty Fly for a White Guy”, usually several times in fairly quick succession.

Does somebody honestly think that hearing that silly song somehow made me “less privileged”? Seriously?

More selections


I guess I’ll stop with the point by point critique here, but not without pointing out one more detail: no, I did not go through the list and cherry-pick these particular statements out of a long list that actually made more sense.

Having done that, now I really am going to cherry pick a few that I think are particularly egregious, for various reasons.

  • I have never been raped.
First, I’m utterly horrified that anybody would imply that living without being raped is a privilege. This is not a privilege—it’s a right! Anybody who even hints at the possibility that living without being raped is a privilege really needs to get a clue.

Second, looking at the definition some people use of “rape”, my initial reaction of “of course I’ve never been raped” was apparently wrong.
When I went into the military (US Air Force,
in 1984) part of the medical examination involved a doctor pushing a finger in your sphincter. To my thinking, this was simply a normal part of a medical examination, to which tens of thousands of people are subjected every year.

Looking through the post-modern definition of rape, however, it was actually rape, as some people define the term.
Was there penetration? Yeah, pretty definitely.
Was this something I really wanted to have happen? Definitely not.
Was there “coercion or economic inducement” applied to get me to consent? Well, yeah—if I hadn’t consented, I couldn’t get the job or the pay that went with it.

At least in my opinion, calling this rape is utterly idiotic. The reality is that yes, there was a level of “coercion” involved—but if I’d decided to, I could still have refused, and it wouldn’t have happened. I’m reasonably certain that many of the people who consider themselves to have been raped were in roughly similar situations. They gave consent to something with which they were at least mildly uncomfortable, but could have refused—and if they had, what they’re terming as “rape” wouldn’t have happened.
Of course, there are people who define rape even more broadly than that—in some cases, that the “inherent differences in power” between men and women are so great that even when a woman thinks she’s consenting (or even wildly enthusiastically initiating the sexual encounter) that it’s still rape.

  • I’ve never been told that I’m overweight or “too skinny.”

I’ve actually been told both of these things—and in both cases, it was clearly true. As a teenager I was underweight. I’m now rather overweight. This isn’t a result of “privilege”; it’s a result of having been too skinny at one time, and now being overweight.
Regardless of people talking about “body types” and such, the simple fact of the matter is that I’m overweight for a couple pretty simple reasons. First, I eat too much—especially snacks. Second, I don’t get enough exercise. If there’s any blame to be laid here, it’s certainly not on anybody else for having the nerve to point out the simple fact that I’m overweight—quite the contrary; instead of writing this, I should probably be going for a healthy walk right now.

I have a little more sympathy for people who are severely underweight. This can be much more difficult to deal with. If you're honestly anorexic, seek help if you aren't already getting it.

If your weight is actually reasonably normal and you're healthy, somebody saying otherwise--well, now you know at least some people to avoid. Seriously, in schools (for example) people being nasty and mean to each other has been happening for a long time. You may feel that you lack "privilege" because people pick on you--but I can pretty much guarantee that the people you think of as "privileged" are being picked on as well. Even the head cheerleader, homecoming queen, center of the basketball team, quarterback of the football team, etc., are getting picked on by somebody. There may be somebody, somewhere, sometime who honestly went through school without anybody every picking on them, and if so I guess they really are privileged--but personally, I doubt it's ever happened, even once.

  • I have never been cyber-bullied for any of my identities.

I find this egregious for a number of reasons. First of all, when I was a child, cyber bullying didn’t exist. Apparently, these people think that made everybody “privileged”. Again, the reality is quite the opposite. Having an immense treasure of knowledge like the Internet easily available is a truly massive privilege. If you choose to waste your time on some social media site where cyber-bullying happens...well, don’t do that! Seriously, I don’t mean to say people should be excused for cyber-bullying, but the simple fact is that in a case like this, self defense is utterly trivial, and you'll be better off for it anyway.

Second, the people who wrote this idiocy have (like with many other things) taken “identity”, which used to have a clear definition, and abused it horribly, to the point that it doesn’t really mean anything any more. What’s worse, they’re apparently proud of having done so—they refer to a single person as having multiple “identities” in at least a half dozen of their statements.

Now, it may be open to argument that in the specific case of dissociative identity disorder (previously known as multiple personality disorder), that a single body could be “home” to multiple personalities, each of which should be treated as having its own identity.

Dissociative identity disorder, however, is a serious and fairly rare psychiatric condition. Exactly how rare is open to some debate—the highest estimate I’ve heard is that it might be as great as 1% of the population; most estimates are substantially lower.

In any case, taking for granted that everybody who takes the test (or even most of them) have multiple identities is abusing the word to mean essentially the opposite of what it really does. What they’re apparently getting at is something like "characteristics", rather than actual identity (of which, by nature, I have exactly one—and there’s a solid case to be made that even in a case of DID, there’s still really only one identity, albeit of a personality that’s been badly damaged by abuse).

The Verdict on the Test

Worse than the individual items in this list is the basic attitude it shows in general. Essentially all of the questions really seem to revolve around one point: a belief that if people act as if they like or admire me, that’s “privilege”. It seems to me that it does a fair job of testing one thing. I’m not a psychiatrist, but I see little choice but to conclude that regardless of the actual score you get in this test, if you consider the test or score as having even the slightest bearing on reality, you should probably seek professional help immediately (no, I’m not being sarcastic at all—I’m 100% serious).

Real Privileges

I believe I am extremely privileged. Let me list a few of the factors in my life that I think have been real privileges:
1. I learned to read at an early age.
2. I had a large supply of books available at an early age (for people less ancient than myself, feel free to substitute Internet access—it not only provides much more information, but makes it dramatically easier to access as well).
I’ve lumped these together because they largely go hand in hand. They are also absolutely immense privileges—if you checked off every one of the items in the test under discussion, it still means far less than having learned to read.
Most learning comes from reading. Somebody who’s been taught to read can learn nearly anything and everything else s/he desires.
3. Other early education
I lump all the other early education together. Don’t get me wrong: learning about history, geography, mathematics, etc., is all important—but somebody who knows how to read can learn them independently, so all the other education is still minor compared to learning to read and having reading material available.
4. Having children
I think having children is the single greatest privilege of my adult life. I don’t know of a way to describe what a huge privilege this really is, at least for me. The test at hand would undoubtedly (correctly) conclude that I’m highly privileged—but everything it attempts to measure, combined, is still utterly trivial compared to the privilege of having children.
5. Being intelligent
Years ago (from 1988 to 1991) I spent three years helping teach/train developmentally disabled people to help them live as independently as possible. Just to give an example that’s obvious to me, seeing the immense difference between how quickly my children have learned and how much slower and more difficult it is for people who are even mildly handicapped has given me a tremendous appreciation for the degree to which intelligence smooths the way through modern life, and makes nearly everything easier.
6. Living in modern times
For the vast majority of human history, large parts of the population were in danger of starving to death, or being a casualty of war (or both)--often many times in their lives. Although I’ve never actually been in danger of starving to death, I’ve gone for two or three weeks at a time without an actual meal. Even that degree of hunger dwarfs the trivial nonsense discussed in this test.
Even for people whose lives weren’t directly endangered, anything approaching “fulfillment” from life was far beyond most peoples’ dreams throughout most of human history. Slaves and serfs frequently had no choice at all in their lives, and even nominally “free” people frequently had little real choice, and essentially no chance at “upward mobility’ at all.
Even compared to when I was a child growing up, the amount of information people have easily accessible is absolutely astounding. Being able to type a paper for school on a computer and correct spelling errors easily instead of writing it by hand and starting over if you made too many spelling mistakes--there's no way to meaningfully compare these.
7. Living in the western world
Even today, many people are raised in enforced ignorance. There are still people in danger of dying from starvation or diseases to which cures and/or effective vaccinations are well known. A great deal has been done relatively recently (much thanks to the Bill and Melinda Gates Foundation, among many others), but there’s seriously a lot left to do, and areas where such assistance is routinely refused as well.

Conclusions

1. The test is utter nonsense
2. There are real privileges—and they’re huge compared to the trivial nonsense covered by the test questions. They’re also very close to universal among essentially everybody with access to the test though. We, the people with easy access to the Internet are much more alike than different—and all of us are almost supremely privileged.

1 I’m trying to ignore the obvious logical problem here: anybody who’s ever been alone in a room has clearly been the only person of their race in the room, so it's nearly inevitable that at some time or other, *everybody* has been the only person of their race in a particular room.

Thursday, February 4, 2016

Raw loops vs. standard algorithms revisited

I was recently reading a Post on Meeting C++ about raw loops vs. STL algorithms.

I didn't think his solution using STL algorithms took full advantage of what the library really has to offer. He did use std::mismatch to find the point at which items mismatched, which is clearly a good thing. The rest of the code, however, looked to me like he was thinking primarily in terms of loops, and merely changed the syntax from a for loop to use std::for_each instead.

At least in my opinion, this doesn't really gain much. At least in my observation, using `std::for_each` is most often a mistake. It's still little more than a generic loop. The name of algorithm used should usually give me a good idea of what some code is doing, but `std::for_each` really doesn't--I have to look at the content of the "loop" to get any idea of what it's really doing.

In this case, the basic idea (simplifying a tiny bit) is to take the paths to two files, and find a relative path from one to the other. The two paths have already been parsed into components held as strings in a vector, so our job is to generate a path with "../" for all the leading components that are identical, then copy the components from one of the vectors into the output after that (and finally append a file name on the end).

Just going from that description, some choices seem fairly obvious, at least to me. To generate the leading part of the path, we probably want to use `std::generate`. Then to copy the remaining elements, we probably want to use `std::copy`. Although it may not be quite as obvious just from the wording, `std::fill_n` actually works out a little better than `std::generate_n`.

Our result is going to be in the form of a string. We want to append strings together to create the whole output. One obvious way to do that is to use an `std::stringstream`. So, let's start by doing a tiny bit of fudging to turn this into a standalone example (I don't think this has a significant effect on what we're doing). Rather than extract the vectors from a cache as he did, let's start with a couple of fixed vectors:


    std::vector a = { "dir1", "dir2", "dir2_1", "p" };
    std::vector b = { "dir1", "dir2", "dir2_2", "p" };

Then we do the step that (I think) he got right:


    auto pos_p = std::mismatch(a.begin(), a.end(), b.begin(), b.end());

From there, we can define our stringstream, and an iterator to give us access to it in a way that's friendly to standard algorithms:


std::ostringstream path;

std::ostream_iterator(path, "/");

Then we want to generate the leading part, then copy the trailing part of the result path:


std::fill_n(out, pos_p.first - a.begin(), "..");
std::copy(pos_p.second, b.end(), out);

At least in my opinion, this makes it much more apparent that the first part of the path is composed of instances of ".." separated by "/", and the second part is composed of elements from the second vector (again, separated by "/").

This hasn't shortened the code a lot (maybe not at all), but it has (at least in my opinion) made the actual intent much clearer.

For better or worse, I've seen a fair number of arguments/claims that using a stringstream and an ostream_iterator can produce substantially slower code that doing the job by appending data directly to a string. I haven't tried to run a benchmark to verify this, but for the moment, let's assume you were working with a compiler for which it really was true (and true to an extent that the slow-down was a serious concern).

If so, we could still use roughly the same structure--we just have to define our own iterator to handle appending things to a string instead of inserting them into a stringstream. We lose some versatility (we can only insert things that are already strings, rather than handling all sorts of conversions to strings like a stream can), but maybe we gain enough speed for that to be worthwhile. So, if we decide this is really worth doing, implementing it is pretty straightforward (moreso than most people seem to think).

Our iterator to append to a string (with a little front-end function to avoid having to pass template parameters explicitly) can look something like this:


#include 

template 
class appender_t : public std::iterator{
    String *s;
    String sep;
public:
    appender_t(String &s, String sep) : s(&s), sep(sep) { }

    appender_t operator=(String const &add) {
        *s += add + sep;
        return *this;
    }

    appender_t operator++() { return *this; }

    appender_t &operator *() { return *this; }
};

template 
appender_t appender(String &s, String sep) {
    return appender_t(s, sep);
}

Most of this is boilerplate--the only parts that really implement any logic at all are the constructor and `operator=`. The constructor is still pretty trivial: it just saves a pointer to the two strings used to initialize it. The assignment operator isn't much more complex--it just says when a string is written through the iterator, we append that string plus the specified separator to the string we stored.

Using this, the rest of the code barely changes at all. We define our output and iterator a tiny bit differently:


    std::string path;

    auto out = appender(path, "/"s);

From there, however, virtually nothing has changed (which is pretty much the point of using iterators to start with--they isolate the algorithm from the container, so algorithms can work with different kinds of containers.

So anyway, using that we can eliminate the pesky `std::stringstream` intermediary, so we're just appending directly to a string, just like we were to start with--but now doing it with algorithms that actually fit the task at hand.

A complete, working copy of the final code is available on Coliru.

I should probably add that this code is purely an attempt at producing exactly the same result as the code in the original article. As Luigi Ballabio was kind enough to point out in a comment, that's not really the right thing to do.

Saturday, August 9, 2014

Do Quora users actually have brains at all?

Right now I'm somewhat ashamed to admit it, but I have an account on Quora. Every week (or so--I don't pay very close attention) they send me an email digest of popular questions and answers. It might be an exaggeration (but if so, only a fairly small one) to say I was shocked by the question and answer that were most highly featured in their most recent email.

The question was: "Why do Americans seem to be so scared of a European/Canadian style of healthcare system?" The top-voted answer (at least as I read this) is: "In a word - fear" (yes, I cut and pasted that--if you think I must be making this up, I'd consider that entirely forgivable). Now, it's true that the author does go into more about detail about what he thinks drives that fear, but "fear" is what he gives as the actual answer. So, "Why do Americans seem to be so scared" is answered with "fear", and (at least as I write this) that has received literally thousands of up-votes.

Unfortunately, the "answer" goes downhill from there. The author (a "Dan Munro", who gives an unsupported claim of "knows some healthcare stuff"). He gives four supposed reasons for his claim of "fear":

  1. A false assumption (with big political support) that a system based on universal coverage is the same thing as a single payer system.
  2. A fear of "rationing" - which was set ablaze by Sarah Palin and her cavalier remarks about "death panels."
  3. An attitude and culture of what's loosely known as American Exceptionalism.
  4. A fierce independence that has a really dark side (which he goes on to explain as an attitude that when/if anybody "fails", it's considered their own fault).

To put it simply, I've yet to see any real support for any of these. The only one that seems to have even the slightest basis in reality is the second. It us true that a few (Sarah Palin being the most obvious) have tried to generate fear on this basis. It may even be true that it has succeeded in at least some tiny number of cases. The vast majority of people who oppose (further) government regulation of healthcare, however, seem to find it nearly laughable.

The most ridiculous, by far, is the claim of "American Exceptionalism". While frequently advanced (relative to an almost surprising range of subjects), this seems to be a pure straw-man at least with respect to this question. I have quite literally never heard anybody dismiss an argument on the basis that "it came from Canada, Europe, or outside the US in general, and we can't possibly learn anything from 'them'." At least in my experience, it simply doesn't happen. I obviously can't possibly know exactly what every single person in the US believes or feels, but I've yet to see or hear anything that gives even the slightest shred of support for this particular belief.

He then goes on to quote some of the usual statistics about how much of the US GDP is spent on healthcare, and a study about deaths from preventable medical errors in US hospitals.

Unfortunately, the numbers he quotes (210,000 to 440,000 annually) seem to be open to a great deal of question. Depending on which study you prefer to trust, the number is might be as low as 32,500 annually (and no other study I can find gives a number any higher than about 100,000 annually). Despite this, the largest number he can find is quoted as if it were an absolute fact that's not open to any question at all.

Worse, however, is a much simpler fact: he makes no attempt at comparing this result to the numbers for other countries, and (perhaps worst of all) he makes absolutely no attempt at telling us how the change(s) he apparently advocates would improve this in any way. So, even if we assume he's gotten the factual part right, we have absolutely *no* reason to believe any particular plan will improve it in any way.

Although I can't claim to speak for the US in general (or anybody else at all) in this regard, that leads directly toward a large part of the reason I have personally found it impossible to generate any enthusiasm for the plans that have been advanced to change healthcare in the US.

The usual argument seems to run something like this. The advocate starts by pointing to US citizens paying far more than others for healthcare, and having shorter average life spans. He then uses that to support his claim that we need to pass some particular health care plan he supports.

Unfortunately, nobody making such arguments seems to (even try to) "connecting the dots" to show exactly how or why *their* plan will improve the problems they ascribe to the current system. Virtually none can provide any real breakdown of US healthcare costs vs. costs elsewhere, to indicate exactly what is driving the higher costs in the US. Absolutely none (that I've seen) takes any next step toward showing how their plan will fix those problems.

When I've participated in a discussion, it usually runs like this:
Them: Our current system is broken. We need to pass bill X to fix it.
Me: What will X fix, and how will it fix it?
Them: Didn't you listen? It's really broken!
Me: Okay. What will X fix and how will it fix it?
Them: I'm telling you, it's seriously broken!
Me: Yes, I hear that. Now, can you tell me what X will fix and how it will fix it?
Them: [usually starting to get pretty loud by now] Damn it! What will it take to get it through your head that it's broken? Are you a complete idiot?

This seems to go on about as long as I'm willing to participate. I've yet to hear a single advocate of a single system actually answer a single real question about what it will fix, how it will fix it, what costs will be reduced, how much they will be reduced, etc. No matter how often you ask, even about relatively simple details of any proposed program, nobody seems to have a single answer about what they think it will accomplish, not to mention providing any reason for me to believe that it really will accomplish that.

If you'll forgive a (mildly) medically-oriented metaphor, Quora seems to be infected with a similar culture. Questions pre-suppose a given answer (in this case, that resistance to changes in health-care stems from fear rather than things like lack of information). This is certainly far from the only answer that seems to do little beyond echoing back the question, with only straw man arguments to support it, then a rather disjointed attempt at denigrating what the author dislikes, without even an attempt at claiming that his "solution" would really fix the supposed problem(s) he cites, not to mention actually providing anything similar to real evidence that of improvement.

So yes, although Quora users undoubtedly do have brains, it certainly appears to me that failure to actually put them to use is somewhere between common and rampant.

Considering questions more specifically about health care: I, for one, am fairly open to the possibility of reforming how healthcare is run in the US. So far, however, I've yet to hear anybody advance a coherent, understandable argument in favor of any proposed plan. Most simply seem convinced that the current system is *so* badly broken that absolutely any change would have to be an improvement. Even the slightest study of history, however, indicates that there is no such thing as a situation so bad that it really can't get worse.

Monday, December 2, 2013

Inheritance, templates, and database models

Some years ago (around 1991, as a matter of fact) I observed that object oriented programming, and C++ in particular, was following a path similar to that followed by databases decades before.

At the time, multiple inheritance had recently been added to C++. I noted that single inheritance corresponded closely to the hierarchical database model, and multiple inheritance corresponded closely to the network model.

As I'd guess most people reading this are well aware, most databases have long-since switched to (something at least intended to be similar to) the relational model and there are formal proofs that the relational model is strictly more powerful than either the hierarchical or the network model. That led to an obvious question: what sort of program organization would correspond to the relational model?

I would posit that C++ templates (especially function templates) correspond closely to the relational model. Rather than being restricted to situations like "X is derived from Y", we can instantiate a template over any type satisfying some relation--though in the case of a template, the relation is basically a set of operations that must be supported by the type rather than a set of values that must be stored in the database.

The major problem with templates (by themselves) is that we have very little control over how that matching is done. The matching criteria are difficult for most people to understand, and even harder to control. Even at best, the control currently available (e.g., via static_assert) is clumsy use and hard to read.

Concepts refine that capability considerably. Concepts are similar to defining foreign keys in a database. Right now, what we have is similar to a bunch of tables, with some fields (or sets of fields) in some of those tables used as foreign keys in other tables--but no declarations of what is supposed to be used as a foreign key, and therefore no way for the compiler to (a C++ equivalent of) enforce referential integrity.

As I'm sure most programmers are already well aware, a key (no pun intended) capability of a relational database is defining relationships between tables so the database manager can enforce referential integrity. That is similar to what C++ Concepts add--something at least roughly equivalent to foreign key declarations in a database, defining what relations are intended and should be enforced by the compiler.

Now don't get me wrong: I'm not trying to write a sales-pitch for Concepts, or anything like that. Rather the contrary, I think quite a few people have probably done a better job of explaining them and how they improve C++ than I've even attempted. What I find much more interesting is seeing (or maybe just imagining--others may disagree with my ideas) a fairly clear similarity between two types of programming that (I think) many probably see as quite unrelated or perhaps even directly opposed.

Sunday, August 11, 2013

When is a question about a puzzle, and when is it really about programming?

The following question was recently asked on Stack Overflow:
This is the problem: There are two players A and B. Player A starts and chooses a number in the range 1 to 10. Players take turn and in each step add a number in the range 1 to 10. The player who reaches 100 first wins.

In the program, player A is the user and player B is the computer. Besides that the computer must force a win whenever possible. For instance, if the player A reaches 88 at some point, the computer must choose 89, as this is the only way to force a win.

A user then voted to close the question, with the comment that it was about a logic or math problem rather than about programming.

Personally, I disagree with that. In fact, I think it's an unusually good example of a problem for learning to program.

Allow me to digress for a moment. The first step in learning to program is learning to take some algorithm, and transcribe it into the syntax of a chosen programming language. Now, it's certainly true that being able to express an algorithm in a chosen programming language is important. It's not just about syntax either. Just for example, expressing an algorithm well often involves knowing a great deal about the accepted idioms of that language as well. Nonetheless, the whole business of simply writing code is only the last step in a larger, usually much more involved process.

That larger, more involved process deals primarily with finding solutions to problems. Only once we've found a way to solve the problem can we write code in our chosen language to carry out that solution. Now, it's certainly true that in many cases, we want to use an iterative approach to a solution -- we don't expect the first code we write to be the final an ultimate solution to a real problem for real users. Nonetheless, when we write code, we need to write it to do something, and we need to decide on that something before we can write code to do it.

Now, the question already notes that 89 is a winning "position". That immediately leads us to two questions: 1) why is it a winning position? 2) Based on that, can we force a score of 89?

The answer to the first is simple: 89 is a winning position, because from there our opponent can't reach the goal of 100, but no matter what move he makes, in the subsequent move we can reach 100. That is so because 100-89 = 11. The smallest move the opponent can make is 1 and the largest is 10, so no matter what he chooses, the remainder will be between 1 and 10, and we can win.

That leads fairly directly to the next step: we can force a score of 89 if we can reach a score of 78 -- and we can force a score of 78 if we can reach a score of 67 (and so on, counting backwards by 11 until we reach 12 and finally 1.

Therefore, the first player can force a win if and only if he chooses 1 in his first turn. If he chooses anything larger, the second player can choose a number that gives a result of 12, guaranteeing himself the win.

This is a large part of why many of the better computer science programs have classically used (more or less) functional languages like Scheme, and emphasized recursion as a basic primitive. It's not really about recursion as a flow control primitive or anything like that. Yes, we've known for years that a for loop, while loop, etc. can be expressed via recursion (and likewise that anything we can do with recursion can also be done via other flow control primitives).

What's important isn't that recursion allows arbitrary flow control. What's important here is the way of thinking about and approaching the problem. Recursion just gives a simple and direct way to approach the problem: instead of solving for 100, if we know that 89 is a winning position we can solve for 89 instead. Note: we could just about as easily think of it mathematically. Instead of thinking of this as a case of recursion, we could think of it as an inductive proof. The steps I've outlined above are essentially the same as those for an inductive proof.

Now, it's certainly true that this approach isn't suitable for every programming problem. If you're dealing with something like: "why is this text field being computed as 134 pixels wide when it should be 135?", chances are recursion isn't going to enter into your thinking, and might not be a lot of help if it did.

Nonetheless, if you're trying to solve a problem, and can define one step toward a solution, chances are very good that an inductive/recursive approach will be useful in solving the whole problem. It's most obvious in a case like this where the problem and solution are mathematical, so we have a well-defined line from problem to solution, but it can also apply to quite a few other cases as well (including the previous problem about text width, which it's sometimes useful to think of in terms of "some arbitrary string followed by one character", and continue with shorter and shorter strings until we reach only one or two characters).

Thursday, January 24, 2013

Word frequency counting

Eric Battalio recently posted about Jumping into C++. Like many people who've written C++ at times, I found his post quite interesting and entertaining. A fair number of people posted their own ideas about how to accomplish the same task -- some concentrating on minimal source code, others on maximum speed, and so on.

After seeing his follow-up After the Jump, I decided one point could stand being addressed. In his followup, he says: "About reading a file using iostream. Don't. It is slower and difficult to manage for complicated file formats. This application was trivial but I will avoid in the future."

On this subject, I have to disagree. Iostreams do have some shortcomings, and some implementations are also (un-)fairly slow. Nonetheless, they also have some pretty major advantages if (a big if, here) you use them well. I hope he won't feel insulted if I point out that (at least in my opinion) there are better ways to use iostreams than the way he did it.

For those who haven't read his posts, the task he set was quite simple: count the unique words in a file. He decided to do a bit of parsing, however, so that something like "one,two" would not include the comma. Unfortunately, he didn't (IMO) do this very well, so that would be counted as "onetwo" instead of treating "one" and "two" as separate words.

That brings up the obvious question: can we read those correctly, without doing extra parsing to patch things up after the fact? Of course, there's not a whole lot of suspense -- pretty much anybody with an IQ higher than a mosquito's can probably figure out that I wouldn't ask the question in the first place unless the answer was yes.

The way to do this (or the way I'm going to advocate in this post, anyway) depends on part of how iostreams are designed. When you read a string from a stream, any leading whitespace characters in the stream are ignored, then non-whitespace characters are read into the string, and reading stops at the first whitespace.

About now, you're probably thinking: "Yes Jerry, we already got past the 'IQ higher than a mosquito's' question; I already knew that!" What you may not know, or at least may not have seriously considered (though a few of you probably have) is exactly how the iostream determines what is or is not whitespace. The answer to that is a little complex: the iostream has an associated locale. The locale, in turn, contains a ctype facet -- and the ctype facet is what determines classes of characters.

So, to ignore those commas (and hyphens, etc.) we simply need a ctype facet that classifies them as white space, tell the iostream to use a locale containing that ctype facet, and we can just read words. Since it seems reasonable to me that (for example) "Jerry's blog" should be read as two words, not three, I'll classify upper and lower case letters and apostrophes as "alphabetic" and everything else as "white space".

Using that, the code we get code that's fairly simple, readable, and as a bonus, even a bit faster (though that wasn't really the intent).

Friday, December 7, 2012

Heaps and heapsort

Like most programmers, quite some time ago I'd read about heaps and the heapsort. Being a dutiful student, I could quote the standard line, saying that a heap was an implicit tree where node N was less than (or greater than, depending on the type of heap) nodes 2N and 2N+1. Like (I think) most, I'd also implemented a heapsort -- but to be honest, most of really came down to a line by line translation from the description in Knuth Volume 3 into the syntax of the programming language I was using at the time (Pascal, the first time around, and later the same again in C).

For years, given that I'd written a Heap sort that worked, I considered my knowledge of heaps and the heap sort entirely adequate. To be entirely honest, however, during most of that time I never really fully understood heaps. In particular, I never really understood the 2N and 2N+1 -- why those particular numbers/nodes? What was the significance of 2N and 2N+1? Why not 3N or 1.5N+65 or something? In short, that "implicit tree" had never really gelled for me -- I'd read the words, but never truly understood what they meant or how the "tree" worked.

For some reason a few months ago my lack of understanding on that particular subject bubbled to the top of the heap (I know: puns are the death of real humor) and I decided to implement a heap (and heap sort) entirely on my own, based only on my recollection about 2N and 2N+1. Instead of going to the book and translating syntax, I sat down with a supply of paper and pencils (the most underrated programming tools ever) and re-invented the heap and the algorithms to create and maintain a heap from first principles.

So, what did I do to finally and truly understand a heap? It ultimately came down to one piece of paper:

And from there came true understanding at last. If we arranged the nodes by layers, it "just happened" that the children of every node N were nodes 2N and 2N+1. Better still, after three layers of the tree, it was pretty apparent that as/if I added more to the tree, the same relationships would hold. Only 30 years (or so) after I'd first written a heap sort, I finally understood what I was doing!

I finally also understood why people who understood heaps thought they were cool: since the "links" in the tree are all based on (relatively simple) math, it's a tree you can walk almost any direction you'd care to think about by simply doing the right math. Just for example, in a normal binary tree, the only way to get from a node to its parent is to traverse down the tree, and remember the parent on the way there so when you need it back, you can recall it from wherever you have it stored. Likewise, getting to the sibling of a node requires that you get back to the parent (see above) and then take the other branch downward to get to the sibling.

With a heap, however, both of these are simply trivial math: the children of node N are nodes 2N and 2N+1, so the parent of node M is node M/2 (using truncating/integer math). Likewise, the sibling of node N is either node N+1 or N-1 -- if N is even, the sibling is N+1, and if N is odd, the sibling is N-1.

So, for any given node, we can walk through the tree up, down, or sideways (literally) with a single, simple math operation. Better still, all the nodes are contiguous in memory, with no space used up on links, so it's also pretty cache-friendly in general.

There was one other point I ran into repeatedly as I worked at understanding what I was doing well enough to produce working code: the math all depends on 1-based indexing. Making it work right with 0-based indexing like most currently languages use was not trivial. In fact, I finally decided that for this purpose it was better to maintain the illusion of 1-based addressing for most of the code, and since I was writing this in C++, have a truly trivial class to handle the translation from 1-base to 0-based addressing (i.e., subtract one from every index I used). As simple of an operation as that seems like it should be, keeping it in one place still simplified the rest of the code tremendously (and made it much easier to see the relationship between the code and that fundamental 2N/2N+1 relationship that defines the heap).

Anyway, in case anybody cares to look at the resulting code, I've decided that trying to post significant amounts of code directly here is too painful to be worthwhile, so here is a link to it on ideone.com. I think this is long enough for now, but sometime soon I'll probably write a post going through that code in more detail, explaining how it works and why I did things the way I did.