Tag Archives: best

The posts that aren’t all that bad…

Getting the most data speed out of your cell phone

You may have noticed there have been very few posts here. There’s a reason for that. The first and foremost is that sending my rants in to the void has not been as personally cathartic as I’d hoped. My other goal for the blog, which actually has been somewhat successful, was to simply provide a vehicle for putting information out on the web that I thought might be useful for people, and that I couldn’t find elsewhere. Based on the traffic stats, those posts have actually been worthwhile, and my only reason for not doing more of this kind of post has been that I’ve been too busy playing with my son, finishing up my projects at MIT, and trying to get a job (in that order).

So, going forward, I’m just going to focus on the second category of posts (though I reserve the right to devolve to the first occasionally). This blog was getting too negative, anyway. In that spirit, here’s a particularly useful trick I just figured out while sitting in a coffee shop working remotely.

I recently gave up my nice window office since I was feeling guilty about taking up a nice spot but only working part time. So, I’ve been doing a lot of work remotely, usually from a coffee shop given that working at home just isn’t very productive when there’s an adorable toddler running around begging to be hugged. So, I splurged and decided to start paying the extra $20 a month to use my phone as an internet connection for my computer. This is becoming a pretty common thing, and Sprint even offers phones that will create a WiFi network on the fly (I use Bluetooth with my iPhone). I expect this will become even more common once the iPhone hits Verizon, as Apple will reportedly allow this version of their phone to create WiFi hotspots, too.

I would typically just leave my phone laying flat on the table next to my laptop. However, giving it a minute of thought, this is actually pretty dumb, for two reasons. First, having the phone so close to the laptop is probably not smart, as computers are notorious spewers of electromagnetic interference at pretty much every frequency imaginable. In theory, they should be shielded, but nothing is perfect and between the memory data rates and the processor clock speeds, a computer pretty much has the cell phone spectrum covered directly, if not with overtones. So, keep the cell phone away form the computer at least a foot or so.

Most importantly, however, leaving the cell phone flat on a table is a bad idea because it puts the antenna horizontal, whereas cell phone signals are polarized vertically. (What this means, if you’re not a fan of electromagnetics, is that the electrons in the cell phone tower antenna are being shaken up and down, not side-to-side. Radio waves are really just a way of keeping track of how electrons interact with each other. Without anything interfering, the electrons in your cell phone’s antenna will be wiggled in the same orientation and frequency as those in the cell tower antenna. However, antennas are designed for their electrons to be wiggled in a certain direction (it’s almost always along the long axis of the antenna) and a cell phone’s antenna is oriented with the assumption that the user is holding it upright against their ear.) Once I realized this, I put my phone up against a nearby wall so that it was standing straight up and down (as if somebody were holding it) and my data rates nearly doubled.

So, if you’re using your cell phone as an internet connection, keep it a bit away from the computer and prop it up so it’s vertical. Keeping it vertical in your pocket probably isn’t a great idea, since your body is pretty good at blocking radio. If you find this helps, please let me know in the comments. Right now my experience alone isn’t very statistically significant, to say the least.

Accelerating code using GCC’s prefetch extension

I recently started playing with GCC’s prefetch builtin, which allows the programmer to explicitly tell the processor to load given memory locations in cache. You can optionally inform the compiler of the locality of the data (i.e. how much priority the CPU should give to keep that piece of data around for later use) as well as whether or not the memory location will be written to. Remarkably, the extension is very straighforward to use (if not to use correctly) and simply requires calling the __builtin_prefetch function with a pointer to the memory location to be loaded.

It turns out that in certain situations, tremendous speed-ups of several factors can be obtained with this facility. In fact, I’m amazed that I haven’t read more about this. In particular, when memory is being loaded “out of sequence” in a memory bandwidth-constrained loop, you can often benefit a great deal from explicit prefetch instructions. For example, I am currently working on a program which has has two inners loops in sequence. First, an array is traversed one way, and then it is traversed in reverse. The details of why this is done aren’t important (it’s an optical transfer matrix computation, if you’re interested) but the salient aspect of the code is that the computation at each iteration is not that great, and so memory bandwidth is the main issue. Here is the relevent section of code where the arrays are accessed in reverse:

/*
* Step backward through structure, calculating reverse matrices.
*/
for (dx = n-1; dx > 0; dx--)
{
Trev1[dx] = Trev1[dx+1]*Tlay1[dx] + Trev2[dx+1]*conj(Tlay2[dx]);
Trev2[dx] = Trev1[dx+1]*Tlay2[dx] + Trev2[dx+1]*conj(Tlay1[dx]);
dTrev1[dx] = dTrev1[dx+1]*Tlay1[dx] + dTrev2[dx+1]*conj(Tlay2[dx]) +
Trev1[dx+1]*dTlay1[dx] + Trev2[dx+1]*conj(dTlay2[dx]);
dTrev2[dx] = dTrev1[dx+1]*Tlay2[dx] + dTrev2[dx+1]*conj(Tlay1[dx]) +
Trev1[dx+1]*dTlay2[dx] + Trev2[dx+1]*conj(dTlay1[dx]);
}

Despite having exactly same number of operations in the forward and reverse loops, it turns out that the vast majority of time was being spend in this second (reverse) loop!

Why? Well, I can’t be entirely certain, but I assume that when memory is accessed, the chip loads not just the single floating point double being requested, but an entire cache line starting at that address. Thus, the data for the next couple of iterations is always loaded into L1 cache ahead of time when you’re iterating forward in address space. However, in the reverse loop, the chip isn’t smart enough to notice that I’m going backwards (nor should it be) and so it has to wait for the data to come from either L2 or main memory every single iteration. By adding a few simple prefetch statements to the second loop, however, the time spent in this section of code went way down. Here is the new code for the second loop:

/*
* Step backward through structure, calculating reverse matrices.
*/
for (dx = n-1; dx > 0; dx--)
{
Trev1[dx] = Trev1[dx+1]*Tlay1[dx] + Trev2[dx+1]*conj(Tlay2[dx]);
Trev2[dx] = Trev1[dx+1]*Tlay2[dx] + Trev2[dx+1]*conj(Tlay1[dx]);
__builtin_prefetch(Trev1+dx-1,1);
__builtin_prefetch(Trev2+dx-1,1);
__builtin_prefetch(Tlay1+dx-1);
__builtin_prefetch(Tlay2+dx-1);
dTrev1[dx] = dTrev1[dx+1]*Tlay1[dx] + dTrev2[dx+1]*conj(Tlay2[dx]) +
Trev1[dx+1]*dTlay1[dx] + Trev2[dx+1]*conj(dTlay2[dx]);
dTrev2[dx] = dTrev1[dx+1]*Tlay2[dx] + dTrev2[dx+1]*conj(Tlay1[dx]) +
Trev1[dx+1]*dTlay2[dx] + Trev2[dx+1]*conj(dTlay1[dx]);
}

The prefetch instructions tell the processor to request the next loop’s data, so that the data is making its way through the memory bus while the current computation is being done in parallel. In this case, this section of code ran over three times as fast with the prefetch instructions! About the easiest optimization you’ll ever make. (The second argument given to the prefetch instruction indicates that the memory in question will be written to.)

When playing around with prefetch, you just have to experiment with how much to fetch and how far in advance you need to issue the fetch. Too far in advance and you increase overhead and run the risk of having the data drop out of cache before you need it (L1 cache is very small). Too late and the data won’t have arrived on the bus by the time you need it.

Why did I not prefetch the dTrev1 and dTrev2 memory locations? Well, I tried and it didn’t help. I really have no idea why. Maybe I exceeded the memory bandwidth, and so there was no point in loading it in. I then tried loading it in even earlier (two loops ahead) and that didn’t help. Perhaps in that case the cache got overloaded. Who knows? Cache optimization is a black art. But when it works, the payoff can be significant. It’s a technique that is worth exploring whenever you are accessing memory in a loop, especially out of order.

Zen and the Art of Linux Maintenance

As I sat watching the Ubuntu upgrade work its way through the packages, at some point the computer became unresonsive to mouse clicks. I ended up having to do a hot shutdown in the middle. As you might imagine, this completely and utterly hosed my Linux partition.

You might wonder why I keep banging my head against the wall of Linux, despite my rantings about it. So did I. As I sat starting at the kernel panic message, however, I realized something:

As much as I complain, part of me enjoys putting up with this stupid operating system, even though it long ago exausted its utility by causing me to spend so much of my time that it was no longer worth any amount of avoided software cost.

As an engineer, I like to tinker and fix things, and Linux gave me the opportunity (or rather, forced me) to delve into the workings of the OS in order to manage it. Linux provided me with the illusion of feeling useful and productive on a regular basis as it required me to put my knowledge to work fixing the never ending litany of problems.

But as I sat looking at a hosed partition, I had the embarassed, hollow feeling that I’d really wasted an extraordinary amount of time focused on my computer as an object of inherent interest, as opposed to an expedient for actual useful work. My linux machine had become a reflexive endevour, largely existing for its own purpose, like a little bonsai garden that I tended to with wearing patience.

And now what do I have for it? I have some profoundly uninteresting knowledge of the particulars of one operating system, and a munged disk that’s about as practically useful as a bonsai tree. (Yes, my actuall work is backed up, but it’s never trivial getting everything exactly the way you had it with a new system install, no matter how much you backed up.)

This was all good, though, because it ripped from my hands something I didn’t have the good sense to throw away. Rather than huddle down with an install CD and try to fix my little Linux partition, I just let it go and started to get back to work, actual work in the outside world, using Windows.*

It feels good. I’m done with operating systems as a hobby, tired of indulging technology for its own sake. One must not get too attached to things.

*I’m not trying to insult OS X, which I think is probably better than Windows. I just don’t have a Mac at work. (I can only fight one holy war at a time.)

How to make a left-wing progessive media statement

In the interest of giving fair time to all opinions, I’ve decided to step aside and table my regularly scheduled rabid wall-punching right wing diatribe. Instead, today’s post has been guest written by a member of the Green Party in Cambridge, on the topic of how to give a proper media statement.

How to make a left-wing progressive media statement

by Sheila Baldwin-Cooper-Oscar-Meyer

Are you planning to attend a protest against a G7 convention? Going to picket outside of an oil company? Just planning to throw a brick through some deserving corporate window? If there’s any chance that you might be interviewed by a reporter, especially on camera, you should brush up on the following official advice for progressive media statements.

  1. Make sure your voice goes up—preferably a dissonant interval like a half-tone or a diminished fifth (“The Maria”)—at the end of every sentence. Otherwise, you’ll sound offensively declarative and patriarchal. Kind of like a Republican.
  2. Shrill monotone nasal intonation! I can’t emphasize this enough. A low, calm voice does NOT get the message across. You want to aim for something between a child’s whine and a cat being ingested in a jet engine. You know who have creepy-low, calm voices? Republicans.
  3. Use the word “shocked” or “outraged” at least five times. Per sentence. If you’re not shocked, you’re probably a Republican.
  4. Use the phrase “the current administration” in a smugly mocking tone in every other sentence. Republicans!!!

Despite this advice, you may find yourself flustered in the heat of the moment. The best of us do (especially with all the great weed that one tends to find at a protest). If all else fails, chant something that rhymes. It will be hard, so fortunately the research and development wing of the progressive movement has discovered that “ho” and “go” rhyme, even if–and this is crucial–you put other words in between them. An example: “Hey hey, ho ho, lateral extraction drilling has got to go.” Does it mean anything? No. But did you actually learn anything about economics or environmental science while you were majoring in gender studies at Brown? Exactly. Stick to the playbook; it’s time tested by a generation who managed to dismantle an entire culture while higher than a roadie at an Allman Brothers concert.

And just remember: when all else fails, call somebody a “fascist”.

The Great Hudson Arc: A 250-mile-wide mystery

Annotated satellite photo of Hudson Bay arc.
(Click for a larger view.)

It’s nice to find out that there are still mysteries left in this world, let alone ones that are visible from space. On the southeast corner of Hudson Bay, the coast line traces a near perfect arc, roughly concentric on another ring of islands in the bay. So, what caused it? The obvious answer, proposed in the 1950s, is that it’s the remnants of a large impact crater. Apparently, however, there is none of the usual geologic evidence for this, and over the past 50 years, there has been debate on its origins. From other sites I’ve read, many geologists seem to have concluded that it is a depression caused by glacial load during the ice age, though a recent conference paper (2006) argues that it may indeed be a crater. The current thinking is summarized nicely on this web page:

There is fairly extensive information on this in Meteorite Craters by Kathleen Mark, University Press, isbn 0-8165-1568-9 (paperback). The feature is known as the Nastapoka Arc, and has been compared to Mare Crisium on the Moon. There is “missing evidence,” which suggests that it isn’t an impact structure, however: “Negative results were . . . reached by R. S. Dietz and J. P. Barringer in 1973 in a search for evidence of impact in the region of the Hudson Bay arc. They found no shatter cones, no suevite or unusual melt rocks, no radial faults or fractures, and no metamorphic effects. They pointed out that these negative results did not disprove an impact origin for the arc, but they felt that such an origin appeared unlikely.” (p. 228)

I know next to nothing about geology, but in the spirit of rank amateur naturalists that came before me, I won’t let that stop me from forming an opinion. In physics, whenever you see something that is symmetric about a point, you have to wonder about what is so special about the center of that circle. Could it really be chance that roughly 800 miles of coast line are all aiming at the same point? If not, what defined that point? One explanation for how large circular formations are created is that they start as very small, point-like features that get expanded over eons by erosion; in other words, the original sink-hole that started to erode is what defines the center of the improbable circle. There are also lots of physical phenomena that makes circles, such as deposition and flow of viscous materials from a starting point, assuming isotropic (spatially uniform) physical conditions everywhere. However, the planet is not isotropic. In fact, you can see plenty of arc-like features on coastlines and basins visible from satellite photos, and I can’t find a single one that is even close to as geometrically perfect as the Hudson Bay arc. If you overlay a perfect circle on Hudson Bay, as I’ve done in the picture, you see that it is nearly a perfect circle. How would erosion, or a glacial depression, manage to yield such a perfect geometry? Is it really possible for the earth to be that homogeneous over such a large distance, and over the geologic span of time required to create it? To my untrained eye, at least, it screams single localized event.

If so, it would seem that it would’ve been a major event, on par (at least based on size) with the impact site that is credited with putting a cap on the Cretaceous Period and offing the dinosaurs. On the other hand, this fact only serves to heighten the mystery, as you’d think there would be global sedimentary evidence for it. Whether the arc is the result of one of the biggest catastrophic events in earth’s history, or an example of nature somehow managing to create a near perfect circle the size of New York State by processes acting over unimaginably long spans of time, its existence is fascinating.

The Boston Symphony on a weeknight: Death is gaseous and awesome

One of the nicest things about being a student in Boston is the $25 “BSO Student Card,” which lets you attend certain Thursday night performances of the Boston Symphony Orchestra for free. Of course, Thursday night is not the big night for the Boston intelligentsia to attend the symphony, and tickets for the cheap seats are actually cheap, even if you’re not a student. Thus, it’s fair to conjecture that you get a different crowd at the Thursday night performances, to put it politely, and it’s clear that many of us “far in the back” are not taking the experience as seriously as those paying $150 for the privilege. I fear that the musicians probably think of Thursday night as riff-raff night, and regard it as a rehearsal for the weekend’s benefactor show. If they don’t, they probably will from now on.

This week the orchestra played Edward Elgar’s “The Dream of Gerontius,” which is a huge piece for full chorus and orchestra with pipe organ. It is a setting of a poem of the same name, which deals with the death of a man and his transport beside his guardian angel to His final Judgement and on to Purgatory. (Too much capitalization there? Well, better safe than sorry, I say. The grammarian version of Pascal’s wager.)

The beginning of “The Dream…” is a somber orchestral prelude, setting the mood using perhaps the quietest tone in which I’ve ever heard an orchestra play. (For the first time I’ve seen, the concert notes are printed with the admonition “Please turn the page quietly.”) The hall is hushed, and this beautiful string adagio begins to wax quietly, creating a hallowed, church-like atmosphere. But it does not last long, this being Bingo night at Symphony Hall. An older gentleman in the balcony starts to go into a comical, high-pitched coughing fit that sounds like an asthmatic cat being repeatedly gut punched. They are probably looking frantically for this guy in whatever ICU he wandered out of. Going out in public was probably a poor call, but he clearly has a health problem and can surely be forgiven, if not lauded for his thematic complement to the subject matter. Jesu, Maria–I am near to death, And Thou art calling me; I know it now, sings the tenor. But there are others for whom Judgement will not be so kind…

Continue reading

Studies show reading this essay will make you smarter

Recently, there was an interesting article in BusinessWeek about the flip-flop of studies on the efficacy of echinacea to cure the common cold. The article focused on the possibility of incorrectly performed studies. But, there may have been nothing wrong with any of the studies, even though they differed in their results. The statistical nature of clinical studies means there is always a small possibility that false effects will be seen. However, biases inherent to statistical research may result in a surprisingly large percentage of published studies being wrong. In fact, it has been suggested that the majority of such studies are.

First, I’ll have to briefly explain something about how statistically-based studies are done. When people do such trials, they consider as “significant” any result that would only happen by chance 1 in 20 times. In the language of statistics, they design the study so that the “null” hypothesis (e.g. that echinacea has no effect on a cold) would only be rejected falsely at most 5% of the time based on the normal random variability expected in their study. In other words, they accept that 5% of the time (at most) they will erroneously see an effect where there truly isn’t any. This 5% chance of a mistake arises from unavoidable randomness, such as the normal variation in disease duration and severity; in the case of the echinacea studies you might just happen to test your drug on a group of people who happened to get lucky and got colds which were abnormally weak.

In summary, to say a drug study is conducted at the 5% significance level, you are saying that you designed the study so that you would falsely conclude a positive effect when there were none only 5% of the time. In practice, scientists usually publish the p-value, which is the lowest significance (which you can only compute after the fact) that would have still allowed you to conclude an effect. The main point, however, is that any study that is at least significant at the 5% level is generally considered significant enough to publish.

So, being wrong at most 1 in 20 times is pretty good, right? The cost of putting a study out there that is wrong pales in comparison to the good of the 19 that actually help, right? Does it really matter if, of the 1000s of studies telling us what we should and shouldn’t do, dozens of them are wrong? In theory, there will always be an order of magnitude more studies that are truly helpful.

The problem is, this conclusion assumes a lot. Just because the average study may have a p-value of, say 2%, it doesn’t mean only 2% of the studies out there are wrong. We have no idea how many studies are performed and not published. Another way of looking at the significance level of an experiment is “How many times does this experiment have to be repeated before I have a high probability of being able to publish the result I want?” This may sound cynical, but I’m not suggesting any dishonesty. This kind of specious statistics occurs innocently all the time due to unknown repeated efforts in the community, an effect called publication bias. Scientists rarely publish null findings, and even if they do, such results are unlikely to get much attention.

Taking 5% as the accepted norm for statistical significance, this means only 14 groups need to have independently looked at the same question, in the entire history of medicine, before it’s probable that one of them will find a falsely significant result. Perhaps more problematically, consider that many studies actually look at multitudes of variables, and it becomes clear that if you just ask enough questions on a survey, you’re virtually guaranteed to have plenty of statistically significant “effects” to publish. Perhaps this is why companies find funding statistical studies so much more gratifying than funding the physical sciences.

None of what I have said so far is likely to be considered novel to anybody involved in clinical research. However, I think there is potentially another, more insidious source of bias that I don’t believe has been mentioned before. The medical research community is basically a big hypothesis generating machine, and the weirder, the better. There is fame to be found in overturning existing belief and finding counterintuitive effects, so people are biased towards attempting studies where the null hypothesis represents existing belief. However, assuming that there is some correlation between our current state of knowledge and the truth, this implies a bias towards studies where the null hypothesis is actually correct. In classical statistics, the null hypothesis can only be refuted, not confirmed. Thus, by focusing on studies that seek to overturn existing belief, there may be an inherent bias in the medical profession to find false results. If so, it’s possible that a significant percentage of published studies are wrong, far in excess of that suggested by the published significance level of the studies.

Statistical studies are certainly appropriate when attempting to confirm a scientific theory grounded in logic and understanding of the underlying mechanism. A random question, however, is not a theory, and using statistics to blindly fish for novel correlations will always produce false results at a rate proportional to the effort applied. Furthermore, as mentioned above, this may be further exacerbated by the bias towards disproving existing knowledge as opposed to confirming it. The quality expert W. Edwards Deming (1975) once suggested that the reason students have problems understanding hypothesis tests is that they “may be trying to think.” Using statistics as a primary scientific investigative tool, as opposed to merely a confirmative one, is a recipe for the production of junk science.