Category Archives: math

Pandemic blog 20: R_0, random graphs, and the L_2 norm

People are talking about R_0. It’s the number that you wish were below 1. Namely: it is the number of people, on average, that a carrier of SARS-CoV-2 infects during the course of their illness. All the interventions we’re undertaking are designed to shrink that number. Because if it’s bigger than 1, the epidemic grows exponentially until it infects a substantial chunk of the population, and if it’s smaller, the epidemic dies out.

But not everybody has the same R_0! According to some of the epdemiologists writing about COVID, this matters. It matters, for instance, to the question of how far into the population the infection gets before it starts to burn itself out for lack of new susceptible hosts (“herd immunity”) and to the question of how much of the population eventually falls ill.

Here’s an easy way to see that heterogeneity can make a difference. Suppose your population consists of two towns of the same size with no intermunicipal intercourse whatsoever. The first town has an R_0 of 1.5 and the second an R_0 of 0.3. Then the mean R_0 of the whole population is 0.9. But this epidemic doesn’t die out; it spreads to cover much of the population of Contagiousville.

I learned an interesting way to think about this heuristically from Tim Gowers on Twitter:

You can read Tim’s interesting thread yourself, but here’s the main idea. Say your population has size N. You make a graph out of the pandemic by placing an edge between vertices i and j if one of the corresponding people infects the other. (Probably better to set this up in a directed way, but I didn’t.) Or maybe slightly better to say: you place an edge if person i and person j interact in a manner such that, were either to enter the interaction infected, both would leave that way. If one person in this graph gets infected, the reach of the infection is the connected component of the corresponding vertex. So how big is that component?

The simplest way to set this up is to connect each pair of vertices with probability c/n, all such choices made independently. This is an Erdos-Renyi random graph. And the component structure of this graph has a beautiful well-known theory; if c > 1, there is a giant component which makes up a positive proportion of the vertices, and all other components are very small. The size of this component is nx, where x is the unique positive number such that

x = 1 - e^{-cx}.

If c < 1, on the other hand, there is no big component, so the pandemic is unlikely to reach much of the population. (Correspondingly, the equation above has no nonzero solution.)

It is fair to be skeptical of this model, which is completely static and doesn’t do anything fancy, but let me just say this — the most basic dynamic model of epidemic spread, the SIR model, has an endstate where the proportion of the population that’s been infected is the unique positive x such that

x = 1 - e^{-R_0x}.

Which looks pretty familiar!

Now what happens if you want to take into account that the population isn’t actually an undifferentiated mass? Let’s say, for instance, that your county has a red town and a blue town, each with population n/2. Two people in the red town have a probability of 2/n of being connected, while two people in the blue town have a probability of just 1/n of being connected, and a red-blue pair is connected with probability 1/n. (This kind of random graph is called a stochastic block model, if you want to go look at papers about it.) So the typical red-town person is going to infect 1 fellow red-towner and 0.5 blue-towners, for an R_0 of 1.5, while the blue-towner is going to have an R_0 of 1.

Here’s the heuristic for figuring out the size of the big component. Suppose x is the proportion of the red town in the big component of the graph, and y is the proportion of the blue town in the big component. Take a random red person; what’s the change they’re in the big component? Well, the chance they’re not connected to any of the xn/2 red-towners in the big component is

(1-2/n)^{xn/2} = e^{-1}

(oh yeah did I mention that n was infinity?) and the chance that they’re not connected to any of the blue-towners in the big component is

(1-1/n)^{yn/2} = e^{-(1/2)y}

so all in all you get

x = 1 - e^{-(x + (1/2)y}

and by the same token you would get

y = 1-e^{-((1/2)x + (1/2)y)}

and now you have two equations that you can solve for x and y! In fact, you find x = 47% and y = 33%. So just as you might expect, the disease gets farther in the spreadier town.

And you might notice that what we’re doing is just matrix algebra! If you think of (x,y) as a vector v, we are solving

v = 1 - e^{Av}

where “exponentiation” of a vector is interpreted coordinatewise. You can think of this as finding a fixed point of a nonlinear operator on vectors.

When does the outbreak spread to cover a positive proportion of the population? There’s a beautiful theorem of Bollobas, Janssen, and Riordan that tells you: you get a big component exactly when the largest eigenvalue λ of A, the so-called Perron-Frobenius eigenvalue, is larger than 1. In the case of the matrix studied above, the two eigenvalues are about 1.31 and 0.19. You might also note that in the early stages of the epidemic, when almost everyone in the network is susceptible, the spread in each town will be governed by repeated multiplication of a small vector by A, and the exponential rate of growth is thus also going to be given by λ.

It would be cool if the big eigenvalue also told you what proportion of the vertices are in the giant component, but that’s too much to ask for. For instance, we could just replace A with a diagonal matrix with 1.31 and 0.19 on the diagonal; then the first town gets 43% infected and the second town completely disconnected from the first, gets 0.

What is the relationship between the Perron-Frobenius eigenvalue and the usual “mean R_0” definition? The eigenvalue can be thought of as

\max_{v} v^T A v / v^T v

while the average R_0 is exactly

\mathbf{1}^T A \mathbf{1} / n

where 1 is the all-ones vector. So we see immediately that λ is bounded below by the average R_0, but it really can be bigger; indeed, this is just what we see in the two-separated-towns example we started with, where R_0 is smaller than 1 but λ is larger.

I don’t see how to work out any concise description of the size of the giant component in terms of the symmetric matrix, even in the simplest cases. As we’ve seen, it’s not just a function of λ. The very simplest case might be that where A has rank 1; in other words, you have some division of the population into equal sized boxes, and each box has its own R_0, and then the graph is constructed in a way that is “Erdos-Renyi but with a constraint on degrees” — I think there are various ways to do this but the upshot is that the matrix A is rank 1 and its (i,j) entry is R_0(i) R_0(j) / C where C is the sum of the R_0 in each box. The eigenvalues of A are all zero except for the big one λ, which is equal to the trace, which is

\mathbf{E} R_0^2 / \mathbf{E} R_0

or, if you like, mean(R_0) + variance(R_0)/mean(R_0); so if the average R_0 is held fixed, this gets bigger the more R_0 varies among the population.

And if you look back at that Wikipedia page about the giant component, you’ll see that this is the exact threshold they give for random graphs with specified degree distribution, citing a 2000 paper of Molloy and Reid. Or if you look at Lauren Meyers’s 2005 paper on epidemic spread in networks, you will find the same threshold for epidemic outbreak in section 2. (The math here is descended from work of Molloy-Reed and this much-cited paper of Newman, Strogatz, and Watts.) Are typical models of “random graphs with specified degree distribution” are built to have rank 1 in this sense? I think so — see e.g. this sentence in Newman-Strogatz-Watts: “Another quantity that will be important to us is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex.”

At any rate, even in this rank 1 case, even for 2×2 matrices, it’s not clear to me how to express the size of the giant component except by saying it’s a nonzero solution of v = 1 - e^{Av}. Does the vector v have anything do do with the Perron-Frobenius eigenvector? Challenge for the readers: work this out!

I did try a bunch of randomly chosen 6×6 matrices and plot the overall size of the giant component against λ, and this is what I got:

The blue line shows the proportion of the vertices that get infected if the graph were homogeneous with parameter λ. Which makes me think that thinking of λ as a good proxy for R_0 is not a terrible idea; it seems like a summary statistic of A which is pretty informative about the giant component. (This graph suggests maybe that among graphs with a given λ, the homogeneous one actually has the biggest giant component? Worth checking.)

I should hasten to say that there’s a lot of interest in the superspreader phenomenon, where a very small (probability -> 0) set of vertices has very large (superlinear in n) number of contacts. Meyers works out a bunch of cases like this and I think they are not well modeled by what I’m talking about here.

A more technical note: the result of Bollobas et al is much more general; there’s no reason the vertices have to be drawn from finitely many towns of equal size; you can instead have the types of vertices drawn from whatever probability space M you like, and then have the probability of an edge between an vertex x and a vertex y be W(x,y) for some symmetric function on M^2; nowadays this is called the “graphon” point of view. Now the matrix is replaced by an operator on functions:

A_Wf(x) = \int_M f(y)W(x,y),

the probability g(x) that a vertex of type x is in the giant component is a solution of the integral equation

g = 1-e^{Ag}

and a giant component exists just when the operator norm ||A_W||_2 is greater than 1. This is the kind of analysis you’d want to use if you wanted to really start taking geography into account. For instance, take the vertices to be random points in a disc and let W(x,y) be a decreasing function of |x-y|, modeling a network where infection likelihood is a decreasing function of distance. What is the norm of the operator A_W in this case? I’ll bet my harmonic analyst friends know, if any of them are reading this. But I do not.

Update: Now I know, because my harmonic analyst friend Brian Street told me it’s just the integral over W(x,y) over y, which is the same for all y (well, at least it is if we’re on the whole of R^d.) Call that number V. He gave a very nice Fourier-theoretic argument but now that I know the answer I’m gonna explain it in terms of the only part of math I actually understand, 2×2 matrices. Here’s how it goes. In this model, each vertex has the same expected number of neighbors, namely that integral V above. But to say every vertex has the same expected number of neighbors is to say that 1 is an eigenvector for A. If 1 were any eigenvector other than Perron-Frobenius, it would be orthogonal to Perron-Frobenius, which it can’t be because both have positive entries, so it is Perron-Frobenius, so λ = V.

In fact I had noticed this fact in one of the papers I looked at while writing this (that if the matrix had all row-sums the same, the long-term behavior didn’t depend on the matrix) but didn’t understand why until just this minute. So this is kind of cool — if the kind of heterogeneity the network exhibits doesn’t cause different kinds of vertices to have different mean degree, you can just pretend the network is homogeneous with whatever mean R_0 it has. This is a generalization of the fact that two towns with no contact which have the same R_0 can be treated as one town with the same R_0 and you don’t get anything wrong.

Tagged , , , ,

Pandemic blog 11: Why do curves bend?

When you plot the number of reported deaths from COVID on a log scale you get pictures that look like this one, by John Burn-Murdoch at the Financial Times:

A straight line represents exponential growth, which is what one might expect to see in the early days of a pandemic according to baby models. You’ll note that the straight line doesn’t last very long, thank goodness; in just about every country the line starts to bend. Why are COVID deaths concave? There are quite a few possible reasons.

  1. Suppression is working. When pandemic breaks out, countries take measures to suppress transmission, and people take their own measures over and above what their governments do. (An analysis by Song Gao of our geography department of cellphone location data shows that Wisconsinites median distance traveled from home decreased by 50% even before the governor issued a stay-at-home order.) That should slow the rate of exponential growth — hopefully, flip it to exponential decay.
  2. Change in reporting. Maybe we’re getting better at detecting COVID deaths; if on day 1, only half of COVID deaths were reported as same, while now we’re accurately reporting them all, we’d see a spuriously high slope at the beginning of the outbreak. (The same reasoning applies to the curve for number of confirmed cases; at the beginning, the curve grows faster than the true number of infections as testing ramps up.)
  3. COVID is getting less lethal. This is the whole point of “flattening the curve” — with each week that passes, hospitals are more ready, we have more treatment options and fuller knowledge of which of the existing treatments best suits which cases.
  4. Infection has saturated the population. This is the most controversial one. The baby model (where by baby I mean SIR) tells you that the curve bends as the number of still-susceptible people starts to really drop. The consensus seems to be we’re nowhere near that yet, and almost everyone (in the United States, at least) is still susceptible. But I guess one should be open to the possibility that there are way more asymptomatic people than we think and half the population is already infected; or that for some reason a large proportion of the population carries natural immunity so 1% of population infected is half the susceptible population.
  5. Heterogeneous growth rate. I came across this idea in a post by a physicist (yeah, I know, but it was a good post!) which I can’t find now — sorry, anonymous physicist! There’s not one true exponential growth rate; different places may have different slopes. Just for the sake of argument, suppose a bunch of different locales all start with the same number of deaths, and suppose the rate of exponential growth is uniformly distributed between 0 and 1; then the total deaths at time t is \int^1_0 e^{\alpha t} d \alpha which is (1/t)(e^t - 1). The log of that function has positive second derivative; that is, it tends to make the curve bend up rather than down! That makes sense; with heterogeneous rates of exponential growth, you’ll start with some sort of average of the rates but before long the highest rate will dominate.

I’m sure I’ve skipped some curve-bending factors; propose more in comments!

Tagged , ,

Commutativity, with fractions

Talking to AB about multiplying rational numbers. She understands the commutativity of multiplication of integers perfectly well. But I had forgotten that commutativity in the rational setting is actually conceptually harder! That four sixes is six fours you can conceptualize by thinking of a rectangular array, or something equivalent to that. But the fact that seven halves is the same thing as seven divided by two doesn’t seem as “natural” to her. (Is that even an instance of commutativity? I think of the first as 7 x 1/2 and the second as 1/2 x 7.)

A divergent sequence

Someone asked on Twitter:

Indeed this series diverges, just as the tweeter says: there’s a positive-density subset of n such that the summand exceeds n^{-1/2}.

More subtle: what about

\sum_{n=1}^{\infty} \frac{1}{n^{2 + \cos n}}?

This should still diverge. Argument: the probability that a real number x chosen uniformly from a large interval has \cos x < -1+\epsilon is on order \epsilon^{1/2}, not \epsilon; so there will be a subset of integers with density on order \epsilon^{1/2} where the summand exceeds n^{-1-\epsilon}, and summing over those integers along should give a sum on order \epsilon^{-1/2}, which can be made as large as you like by bringing \epsilon close to 0.

What’s not so clear to me is: how does

\sum_{n=1}^{N} \frac{1}{n^{2 + \cos n}}

grow with N?

Tagged , ,

Khot,Minzer,Safra on approximate sections of sheaves

Subhash Khot is giving a talk at Current Developments in Math this year and his talk has the intriguing phrase “Grassmann graph” in it so I thought I’d look up what it is and what he and his collaborators prove about it, and indeed it’s interesting! I’m just going to jot down something I learned from “Pseudorandom Sets in Grassmann Graph have Near-Perfect Expansion,” by Khot, Dor Minzer, and Muli Safra, in a way that makes it sound like something algebraic geometers might be interested in, which, indeed, I think they might be!

Suppose you have a sheaf F on a space, and the space has a covering U_1, .. U_N. The sheaf axiom says that if we have a family of sections s_i of F(U_i) such that s_i and s_j agree on U_i \cap U_j for all i,j, then there is actually a global section s in F(X) which restricts to each s_i.

What if we only have an approximate section? That is: what if we have a family of s_i such that: if I select i and j uniformly at random, the probability that s_i and s_j agree on U_i \cap U_j is bounded below by some p > 0. Call such a family a “p-section.” (You should take the view that this is really a family of problems with X changing and N growing, so that when I say p > 0 the content is that p is bounded away from some 0 uniformly in X,N.)

The question is then: Is an approximate section approximately a section?

(This is meant to recall the principle from additive number theory that an approximate subgroup is approximately a subgroup, as in e.g. Freiman-Rusza.)

That is: if s_1, .. s_N from a p-section, is there some actual section s in F(X) such that, for i chosen uniformly at random,

\mathbf{Pr}(s | U_i) = s_i > p' > 0

for some p’ depending only on p?

The case which turns out to be relevant to complexity theory is the Grassmann graph, which we can describe as follows: X is a k-dimensional vector space over F_2 and the U_i are the l-dimensional vector spaces for some integer l. But we do something slightly weird (which is what makes it the Grassmann graph, not the Grassmann simplicial complex) and declare that the only nonempty intersections are those where U_i \cap U_j has dimension l-1. The sheaf is the one whose sections on U_i are the linear functions from U_i to F_2.

Speculation 1.7 in the linked paper is that an approximate section is approximately a section. This turns out not to be true! Because there are large sets of U_i whose intersection with the rest of X is smaller than you might expect. This makes sense: if X is a space which is connected but which is “almost a disjoint union of X_1 and X_2,” i.e. X_1 \cup X_2 = X and $\latex X_1 \cap X_2$ involves very few of the U_i, then by choosing a section of F(X_1) and a section of F(X_2) independently you can get an approximate section which is unlikely to be approximated by any actual global section.

But the good news is that, in the case at hand, that ends up being the only problem. Khot-Minzer-Safra classify the “approximately disconnected” chunks of X (they are collections of l-dimensional subspaces containing a fixed subspace of small dimension and contained in a fixed subspace of small codimension) and show that any approximate section of F is approximated by a section on some such chunk; this is all that is needed to prove the “2-to-2 games conjecture” in complexity theory, which is their ultimate goal.

So I found all this quite striking! Do questions about approximate global sections being approximated by global sections appear elsewhere? (The question as phrased here is already a bit weird from an algebraic geometry point of view, since it seems to require that you have or impose a probability measure on your set of open patches, but maybe that’s natural in some cases?)

Tagged , , ,

The quarter-circle game

Start at a lattice point inside the quarter-circle x^2 + y^2 < 10^6 in the positive quadrant. You and your opponent take turns: the allowable moves are to go up, right, or both at once (i.e. add (0,1), add (1,0), or add (1,1).) First person to leave the quarter-circle wins. What does it look like if you color a starting point black for “first-player loss” and yellow for “first-player win”? It looks like this:

I like the weird zones of apparent order here. Of course you can do this for any planar domain, any finite set of moves, etc. Are games like this analyzable at all?

I guess you could go a little further and compute the nimber or Grundy value associated to each starting position. You get:

What to make of this?

Here’s some hacky code, it’s simple.


M = 1000
def Crossed(a,b):
    return (a**2 + b**2 >= M*M)

def Mex(L):
    return min([i for i in range(5) if not (i in L)])


L = np.zeros((M+2,M+2))
for a in reversed(range(M+2)):
    for b in reversed(range(M+2)):
        if Crossed(a,b):
            L[a,b] = 0
        else:
            L[a,b] = Mex([L[a+1,b],L[a,b+1],L[a+1,b+1]])

plt.imshow(L,interpolation='none',origin='lower')
plt.show()

One natural question: what proportion of positions inside the quarter-circle are first-player wins? Heuristically: if you imagine the value of positions as Bernoulli variables with parameter p, the value at my current position is 0 if and only if all three of the moves available to me have value 1. So you might expect (1-p) = p^3. This has a root at about 0.68. It does look to me like the proportion of winning positions is converging, but it seems to be converging to something closer to 0.71. Why?

By the way, the game is still interesting (but I’ll bet more directly analyzable) even if the only moves are “go up one” and “go right one”! Here’s the plot of winning and losing values in that case:

Tagged ,

Soumya Sankar: Proportion of ordinarity in families of curves over finite fields

What’s the chance that a random curve has ordinary Jacobian? You might instinctively say “It must be probability 1” because the non-ordinary locus is a proper closed subvariety of M_g. (This is not obvious by pure thought, at least to me, and I don’t know who first proved it! I imagine you can check it by explicitly exhibiting a curve of each genus with ordinary Jacobian, but I’m not sure this is the best way.)

Anyway, the point is, this instinctive response is wrong! At least it’s wrong if you interpret the question the way I have in mind, which is to ask: given a random curve X of genus g over F_q, with g growing as q stays fixed, is there a limiting probability that X has ordinary Jacobian? And this might not be 1, in the same way that the probability that a random polynomial over F_q is squarefree is not 1, but 1-1/q.

Bryden Cais, David Zureick-Brown and I worked out some heuristic guesses for this problem several years ago, based on the idea that the Dieudonne module for a random curve might be a random Dieudonne module, and then working out in some detail what in the Sam Hill one might mean by “random Dieudonne module.” Then we did some numerical experiments which showed that our heuristic looked basically OK for plane curves of high degree, but pretty badly wrong for hyperelliptic curves of high genus. But there was no family of curves for which one could prove either that our heuristic was right or that it was wrong.

Now there is, thanks to my Ph.D. student Soumya Sankar. Unfortunately, there are still no families of curves for which our heuristics are provably right. But there are now several for which it is provably wrong!

15.7% of Artin-Schreier curves over F_2 (that is: Z/2Z-covers of P^1/F_2) are ordinary. (The heuristic proportion given in my paper with Cais and DZB is about 42%, which matches data drawn from plane curves reasonably well.) The reason Sankar can prove this is because, for Artin-Schreier curves, you can test ordinarity (or, more generally, compute the a-number) in terms of the numerical invariants of the ramification points; the a-number doesn’t care where the ramification points are, which would be a more difficult question.

On the other hand, 0% of Artin-Schreier curves over F are ordinary for any finite field of odd characteristic! What’s going on? It turns out that it’s only in characteristic 2 that the Artin-Schreier locus is irreducible; in larger characteristics, it turns out that the locus has irreducible components whose number grows with genus, and the ordinary curves live on only one of these components. This “explains” the rarity of ordinarity (though this fact alone doesn’t prove that the proportion of ordinarity goes to 0; Sankar does that another way.) Natural question: if you just look at the ordinary component, does the proportion of ordinary curves approach a limit? Sankar shows this proportion is bounded away from 0 in characteristic 3, but in larger characteristics the combinatorics get complicated! (All this stuff, you won’t be surprised to hear, relies on Rachel Pries’s work on the interaction of special loci in M_g with the Newton stratification.)

Sankar also treats the case of superelliptic curves y^n = f(x) in characteristic 2, which turns out to be like that of Artin-Schreier in odd characteristics; a lot of components, only one with ordinary points, probability of ordinarity going to zero.

Really nice paper which raises lots of questions! What about more refined invariants, like the shape of the Newton polygon? What about other families of curves? I’d be particularly interested to know what happens with trigonal curves which (at least in characteristic not 2 or 3, and maybe even then) feel more “generic” to me than curves with extra endomorphisms. Is there any hope for our poor suffering heuristics in a family like that?

Tagged , , , , ,

Large-scale Pareto-optimal topologies, or: how to describe a hexahedron

I got to meet Karen Caswelch, the CEO of Madison startup SciArtSoft last week. The company is based on tech developed by my colleague Krishnan Suresh. When I looked at one of his papers about this stuff I was happy to find there was a lovely piece of classical solid geometry hidden in it!

Here’s the deal. You want to build some component out of metal, which metal is to be contained in a solid block. So you can think of the problem as: you start with a region V in R^3, and your component is going to be some subregion W in R^3. For each choice of W there’s some measure of “compliance” which you want to minimize; maybe it’s fragility, maybe it’s flexibility, I dunno, depends on the problem. (Sidenote: I think lay English speakers would want “compliance” to refer to something you’d like to maximize, but I’m told this usage is standard in engineering.) (Subsidenote: I looked into this and now I get it — compliance literally refers to flexibility; it is the inverse of stiffness, just like in the lay sense. If you’re a doctor you want your patient to comply to their medication schedule, thus bending to outside pressure, but bending to outside pressure is precisely what you do not want your metal widget to do.)

So you want to minimize compliance, but you also want to minimize the weight of your component, which means you want vol(W) to be as small as possible. These goals are in conflict. Little lacy structures are highly compliant.

It turns out you can estimate compliance by breaking W up into a bunch of little hexahedral regions, computing compliance on each one, and summing. For reasons beyond my knowledge you definitely don’t want to restrict to chopping uniformly into cubes. So a priori you have millions and millions of differently shaped hexahedra. And part of the source of Suresh’s speedup is to gather these into approximate congruence classes so you can do a compliance computation for a whole bunch of nearly congruent hexahedra at once. And here’s where the solid geometry comes in; an old theorem of Cauchy tells you that if you know what a convex polyhedron’s 1-skeleton looks like as a graph, and you know the congruence classes of all the faces, you know the polyhedron up to rigid motion. In partiuclar, you can just triangulate each face of the hexahedron with a diagonal, and record the congruence class by 18 numbers, which you can then record in a hash table. You sort the hashes and then you can instantly see your equivalence classes of hexahedra.

(Related: the edge lengths of a tetrahedron determine its volume but the areas of the faces don’t.)

Tagged , , , ,

Small baseball triangles

This all started when CJ asked which three baseball stadiums formed the smallest triangle.  And we agreed it had to be the Brewers, the White Sox, and the Cubs, because Milwaukee and Chicago are really close together.

But it seems like cheating to use two teams in the same city.  The most elegant way to forbid that is to ask the question one league at a time.  Which three American League parks form the smallest triangle?  And what about the National League?

First of all, what does “smallest” mean?  There are lots of choices, but (perhaps inspired by the summer we played a lot of Ingress) we asked for the triangle with the smallest area.  Which means you don’t just want the parks to be close together, you want them to be almost collinear!

I asked on Twitter and got lots of proposed answers.  But it wasn’t obvious to me which, if any, were right, so I worked it out myself!  Seamheads has the longitude and latitude of every major league ballpark past and present in a nice .csv file.  How do you compute the area of a spherical triangle given longitudes and latitudes?  You probably already know that the area is given by the excess over pi of the sum of the angles.  But then you gotta look up a formula for the angles.  Or another way:  Distance on the sphere is standard, and then it turns out that there’s a spherical Heron formula for the area of a spherical triangle given its edgelengths!  I guess it’s clear there’s some formula like that, but it’s cool how Heron-like it looks.  Fifteen lines of Python and you’re ready to go!

So what are the answers?

We were right that Brewers-White Sox-Cubs form the smallest major league triangle.  And the smallest American League triangle is not so surprising:  Red Sox, Yankees, Orioles, forming a shortish line up the Eastern Seaboard.  But for the National League, the smallest triangle isn’t what you might expect!  A good guess, following what happened in the AL, is Mets-Phillies-Nationals.  And that’s actually the second-smallest.  But the smallest National League triangle is formed by the Phillies, the Nationals, and the Atlanta Braves!  Here’s a piece the geodesic path from SunTrust Park in Atlanta to Citizen’s Bank Park in Philly, courtesy of GPSVisualizer:

Not only does it go right through DC, it passes about a mile and a half from Nationals Park!

Another fun surprise is the second-smallest major league triangle:  you’d think it would be another triangle with two teams in the same city, but no!  It’s Baltimore-Cincinnati-St. Louis.  Here’s the geodesic path from Oriole Park at Camden Yards to Busch Stadium:

And here’s a closeup:

The geodesic path runs through the Ohio River, about 300m from the uppermost bleachers at Great American Ball Park.  Wow!

Now here’s a question:  should we find it surprising that the smallest triangles involve teams that are pretty far from each other?  If points are placed at random in a circle (which baseball teams are definitely not) do we expect the smallest-area triangles to have small diameter, or do we expect them to be long and skinny?  It’s the latter!  See this paper: “On Smallest Triangles,” by Grimmet and Janson.  Put down n points at random in the unit circle; the smallest-area triangle will typically have area on order 1/n^3, but will have diameter on order 1.  Should that have been what I expected?

PS:  the largest-area major league triangle is Boston-Miami-SF.  Until MLB expands to Mexico City, that is!

Tagged , , ,

Is the Dedekind sum really a function on SL_2?

Here’s an idle thought I wanted to record before I forgot it.

The Dedekind sum comes up in a bunch of disparate places; it’s how you keep track of the way half-integral weight forms like the eta function aka discriminant to the one-twelfth transforms under SL_2, it shows up in the topology of modular knots, the alternating sum of continued fraction coefficients, etc.  It has a weird definition which I find it hard to get a feel for.  The Dedekind sum also satsfies Rademacher reciprocity:

D(a,b;c) + D(b,c;a) + D(c,a;b) = \frac{1}{12abc}(a^2 + b^2 + c^2 - 3abc)

If that right-hand side looks familiar, it’s because it’s the very same cubic form whose vanishing defines the Markoff numbers!  Here’s a nice way to interpret it.  Suppose A,B,C are matrices with ABC = 1 and

(1/3)Tr A = a

(1/3)Tr B = b

(1/3)Tr C = c

(Why 1/3?  See this post from last month.)

Then

a^2 + b^2 + c^2 - 3abc = (1/9)(2 + \mathbf{Tr}([A,B]))

(see e.g. this paper of Bowditch.)

The well-known invariance of the Markoff form under moves like (a,b,c) -> (a,b,ab-c) now “lifts” to the fact that (the conjugacy class of) [A,B] is unchanged by the action of the mapping class group Gamma(0,4) on the set of triples (A,B,C) with ABC=1.

The Dedekind sum can be thought of as a function on such triples:

D(A,B,C) = D((1/3)Tr A, (1/3) Tr B; (1/3) Tr C).

Is there an alternate definition or characterization of D(A,B,C) which makes Rademacher reciprocity

D(A,B,C) + D(B,C,A) + D(C,A,B) = (1/9)(2 +  \mathbf{Tr}([A,B]))

more manifest?

Tagged , , ,
%d bloggers like this: