The mean of a set of numbers $x_1, x_2 \ldots x_n$ is the unique value $\overline{x}$ which minimizes $\sum (\overline{x} - x_i)^2$, the sum of squared distances between it and the numbers. Why? The mean seems very natural — for example, it's related to intuitive ideas about fairness — but the sum of squares less so. For instance, what's so special about squaring? Why not the sum of fourth powers $\sum (\overline{x} - x_i)^4$ ? Or the sum of unsigned distances $\sum |\overline{x} - x_i|$ ?

My friend Dillon asked me this question. I felt I knew the answer, due to reading this post by Eric Neyman. But though I could do the algebra proving it, I wasn't able to convince Dillon (or myself, after a few minutes) of it on an intuitive level. It still felt like there were unanswered questions. It was unsatisfying. This post is my attempt to write an explanation that you can "feel in your bones."

For the uninitiated, Magic: The Gathering is a trading card game where cards belongs to one or more of five colors — white , blue , black , red , and green . Each color has its own aesthetic and style of play.

Consider a matrix acting on a vector, $\mathbf{W}\mathbf{x}$. You can see this as function $f$ with type $\mathbb{R}^n \rightarrow \mathbb{R}^m$, where $\mathbf{W}$ is an $m \times n$ matrix, being applied to the vector $\mathbf{x}$. What is the derivative of $f$?

The framework behind taking derivatives in higher dimensions I covered in a previous post. I'll use the notation I introduced in that post here. $D(f)$ is the total derivative of a function $f$. $D(f)(\mathbf{x})$ is the total derivative of $f$ evaluated at a point $\mathbf{x}$. I'll also use the concept of a Jacobian matrix from that post.

The analogue of the derivative for functions whose inputs and outputs are vectors is called the total derivative. The total derivative of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ is an object that gives you a function for each point in $\mathbb{R}^n$. In other words it is a function $\mathbb{R}^n \rightarrow \mathbb{R}^n \rightarrow \mathbb{R}^m$.

I'll write $D(f)$ for the total derivative of $f$; the function $\mathbb{R}^n \rightarrow \mathbb{R}^n \rightarrow \mathbb{R}^m$. $D(f)(\mathbf{x})$ is what you get when you put in $\mathbf{x}$ — it is a particular function $\mathbb{R}^n \rightarrow \mathbb{R}^m$.

[This post is likely only to be of interest to you if you've read the paper under discussion.]

The paper Risks from Learned Optimization in Advanced Machine Learning Systems is written in natural language. It discusses the possibility of two functions — the objective function of the base optimizer, and the objective function of the mesa optimizer — being equal or unequal. As Samuel Marks pointed out during a presentation, in this informal setting, it's not clear that these functions even have the same type. Here's two formalizations where it's clearer what we mean when we discuss equality between these two functions.

Why do less dense things rise? Recently I was asked this as a puzzle. I had always taken it as a basic truth, but we should be able to explain it in terms of forces.

This is a brain teaser I came across recently (original source):

A king has 100 bottles of wine, exactly one of which is poisoned. He decides to figure out which it is by feeding the wines to some of his servants, and seeing which ones drop dead. He wants to find out before the poisoner has a chance to get away, and so he doesn't have enough time to do this sequentially - instead he plans to give each servant some combination of the wines tonight, and see which are still alive tomorrow morning.

a) How many servants does he need?

b) Suppose he had 100 servants - then how many wines could he test?

When I was a kid, I remember hearing about the various damages of the world. War, corruption, the mismanagement of countries, families starving below the poverty line. Like a lot of kids, I was disturbed at this news and concocted various kid schemes to make everything better. I designed a utopia in fifth grade where everything was solar-powered, so the environment wouldn't get damaged. (I really wanted blaster guns, so I also made solar-powered blaster guns. But I remembered that it was supposed to be a utopia without war. I compromised by deciding the blasters were used only for mining.)

Later I still felt uneasy. I was overwhelmed by the scale of the wrongness, like one is overwhelmed by the number of stars in the universe. It was a relief to mentally assign responsibility for these problems to people of similar scale — presidents, secretary-generals, diplomats. I imagined people infinitely wiser than myself in these positions. (In fairness, that was correct, because I was twelve.) They were the competent authorities.

Undone is a series about schizophrenia. It treats this topic without holding any of its characters at arm's length. We are (nearly) always able to understand why each character's actions make complete sense to them, with the deep understanding that is almost the same as feeling.

I saw Thor: Ragnarok recently. It's trying to be an imaginative, comedic sci-fi adventure movie with a touch of soul, and it pretty much is. However, it seemed to me and my friend that we could strictly improve the movie with some small tweaks. (That is, these tweaks would have done better something the movie was trying to do already, without affecting much else.)

I know the Coriolis Effect from 9th-grade earth science as what causes hurricanes to spin; counterclockwise in the northern hemisphere, and clockwise in the southern. The idea is this: imagine teleporting a section of air from the Equator to the North Pole. An exceptionally violent wind would result, because the rotation of the earth means that air around the equator is moving at ~ 1000 miles/hour, while air around the North Pole is hardly moving at all.

Generalize. So as air moves north in the Northern Hemisphere, it will find itself to be moving eastward (the Earth rotates from west to east) faster than the other air surrounding it. Hence it will end up moving in a northeasterly direction (relative to that other air), rather than just north. For the same reasons, air that is moving south in the Northern Hemisphere will find itself being pushed to the west. Things are reversed in the Southern Hemisphere. The overall effect is to add a lateral component to north/south motion on any rotating sphere.

[Epistemic status: providing suggestions for how to think about something (or how to justify how you already think about something) based on my own experience.]

When deciding between various options under uncertainty, one attractive framework is to calculate the expected value of each option, then choose the option with the highest expected value.

[Epistemic status: clarifying language in the hope of helping people who were confused like me, not stating anything about what is or isn't the case in the world.]

When you imagine how the world should be, imagine also what work needs to be done to get it there. But do not yet imagine yourself doing that work. To do so at this stage would be to invite excess complication. First imagine what someone will need to do. Then, in place of "someone" put "myself".

This is an experiment where I narrate my thought process as I solve a math problem. My goal: writeups of the solutions to math problems usually present a polished, streamlined version of the solver's thought process that omits errors, wrong turns, and heuristics. I wondered what it would be like to represent the thought process "warts and all."

Most programming languages have a way to represent "no value" or "nothing". Python has None, Ruby has nil, Java and friends have null. Javascript has two ways to represent this concept -- undefined and null.

I bought a bottle of cranberry juice which said "100% juice" on the label. But, later, I found that the label said it included apple juice. I became confused and suspicious about fruit juice labels, a state which lasted many years.

But it turns out that fruit juice labels are pretty comprehensible. I learned this from user rumtscho's lovely post on this topic at cooking.stackexchange.com. The following is just a restatement of that post.

One way to convey digital information across distances is through copper wire (Ethernet cable). Here we just vary the voltage in the wire between two states A and B. When we are at A, we are sending a 0, and when we are at B we are sending a 1.

What is voltage? Voltage is the delta between two points of an electrical field.

These states are called symbols.

Number of symbols / seconds is a unit called baud. If your symbol rate is 1 symbol per second, you are sending information at 1 baud.

[Epistemic Status: I'm just writing this to help myself remember. YMMV.]

Question: we can write down a general formula for the roots of a quadratic, cubic, or quartic polynomial in terms of the coefficients. Why can't we do it for a quintic polynomial?

How easy is it to do a lot of good? Imagine that opportunities to improve the world were tangible and visible. Say they look like purple jelly beans. When you pick one up, bam! Someone's life is a little better.

In world A, jelly beans are plentiful. Maybe they rain down from the sky every week. Everywhere in the country, this is true. The sidewalks are covered in jelly beans. Everyone picks up a few just going to work every day.

The Feynman Lectures on Physics are calmly beautiful and I am reading them right now instead of going to bed, but what the hell is with this paragraph in the middle of chapter 3?

One of the most impressive discoveries was the origin of the energy of the stars, that makes them continue to burn. One of the men who discovered this was out with his girlfriend the night after he realized that nuclear reactions must be going on in the stars in order to make them shine. She said “Look at how pretty the stars shine!” He said “Yes, and right now I am the only man in the world who knows why they shine.” She merely laughed at him. She was not impressed with being out with the only man who, at that moment, knew why stars shine. Well, it is sad to be alone, but that is the way it is in this world.

Some subgroups of a group $G$ are normal, and some are not. It's not easy to get an intuition for what that means. A week or two after encountering the concept in my abstract algebra class, I knew the following:

Normal subgroups $N$ of $G$ are the only ones you can quotient by; i.e. you can write $G/N$ and it's a valid group.

They are the kernel of some homomorphism $G \rightarrow H$, where $H$ is any other group. Kernel means every element gets mapped to the identity (the subgroup is "killed").