I. Introduction
The five prior papers in this suite, together with Paper 6 on moments of L-functions, have treated the Riemann hypothesis from increasingly comprehensive angles: historically, structurally, strategically, prospectively, framework-theoretically, and moment-theoretically. What remains for completeness is a treatment of the parallel statistical theory that complements pair correlation: the theory of zeros of L-functions in families.
The shift in perspective is conceptually substantial. Pair correlation, as treated in Paper 3 and refined in Paper 4, studies the local statistics of zeros of a single L-function — typically ζ — averaging across the imaginary axis to extract regularities. Family statistics, by contrast, studies the statistics of zeros across a family of L-functions, averaging across the family rather than within a single L-function. The two theories are complementary: pair correlation captures statistical features that emerge from a single L-function viewed at large heights, while family statistics captures features that emerge from a collection of L-functions viewed at relatively low heights.
The thesis of this paper is that family statistics has become, over the past quarter-century, one of the most productive research areas in analytic number theory. The thesis has several components. First, the Katz–Sarnak philosophy, formulated in their 1999 monograph Random Matrices, Frobenius Eigenvalues, and Monodromy, supplies a coherent framework for predicting family statistics in terms of the symmetry types of the families. Second, the predictions have been verified in restricted ranges for many natural families and proved completely in the function field setting via the work of Katz on monodromy of geometric families. Third, the framework connects densely to other parts of number theory — to arithmetic statistics in the Bhargava sense, to ranks of elliptic curves, to vanishing of L-functions at the central point, and to the broader Langlands picture. Fourth, the framework provides indirect evidence for the Riemann hypothesis itself: family statistics are sufficiently regular that gross failures of RH would be inconsistent with what is observed.
The structure of this paper follows the conceptual development. After setting up the basic framework of families and their symmetry types, the paper treats the Katz–Sarnak conjectures, the density conjectures and their verification in restricted ranges, the function field case where Katz’s monodromy theorems make the predictions theorems, the family of quadratic twists of elliptic curves and its connection to BSD, the question of vanishing at the central point, the connections to Bhargava’s arithmetic statistics program, the Selberg-class families and orthogonality conjectures, the role of computation, and finally the implications for RH and the place of family statistics in the broader picture. The paper closes with open problems and a concluding assessment.
The treatment is substantive. The literature on family statistics is vast and growing rapidly, and any single paper must select. The selection here emphasizes structural understanding over technical detail, with primary literature references for points where readers would benefit from following up.
II. The Katz–Sarnak Conjectures
The 1999 Monograph
In 1999, Nicholas Katz and Peter Sarnak published Random Matrices, Frobenius Eigenvalues, and Monodromy, a substantial monograph that did for family statistics what Montgomery’s 1973 paper had done for pair correlation: established the framework that would organize a research program for decades to come.
The monograph had two principal components. The first was a body of theorems in the function field setting: Katz had developed, over the preceding two decades, a sophisticated theory of monodromy of geometric families of L-functions over function fields, and the monograph applied this theory to derive equidistribution results for Frobenius eigenvalues that are exactly the function field analogs of family zero statistics. The second was a body of conjectures for the corresponding number field cases: the assertion that the function field theorems should have direct analogs for families of L-functions over number fields, with the same statistical predictions.
The structural insight was that the symmetry types of families — unitary, symplectic, orthogonal — should determine the statistics of low-lying zeros uniformly across number field and function field cases. The function field theorems were proof-of-concept; the number field conjectures were the working hypotheses for a research program.
The Central Philosophy
The Katz–Sarnak philosophy, in its general form, asserts: for L-functions in a natural family of symmetry type S, the local statistics of low-lying zeros, as the family parameter varies, follow the predictions of random matrix theory for the corresponding ensemble of type S.
The “low-lying zeros” are zeros at heights close to the central point s = 1/2 (or at heights of order 1 in appropriate normalization). The “natural family” is a family parameterized by some arithmetic invariant — conductor, discriminant, modulus — with the family parameter going to infinity. The “symmetry type” is determined by the structural features of the family: whether the L-functions are self-dual, what kind of automorphic origin they have, what root number sign distribution they exhibit.
The three classical symmetry types are:
Unitary: families with no special symmetry constraint. Examples include all primitive Dirichlet L-functions of conductor q as q varies, or automorphic L-functions on GL(n) for varying level. The corresponding random matrix ensemble is the unitary group U(N) with Haar measure.
Symplectic: families with a self-duality of symplectic type. The standard example is the family of quadratic Dirichlet L-functions L(s, χ_d) as d varies over fundamental discriminants, where the L-functions are self-dual (the character χ_d is real-valued) and the functional equation has the symplectic structure. The corresponding random matrix ensemble is the unitary symplectic group USp(2N).
Orthogonal: families with a self-duality of orthogonal type. The standard example is the family of L-functions L(s, E_d) of quadratic twists of a fixed elliptic curve E by fundamental discriminants d. The symmetry type is orthogonal, with the parity of the rank determining whether the family is even orthogonal SO(2N) or odd orthogonal SO(2N+1). The split between even and odd orthogonal cases is a refinement specific to the orthogonal type.
The Function Field Origin
The naming and structure of the symmetry types come from the function field setting. For a family of L-functions over a function field F_q(T), the L-functions are characteristic polynomials of Frobenius operators on cohomology groups of varieties. As the family parameter varies, the corresponding Frobenius operators trace out a subset of the relevant matrix group, and the geometric monodromy of the family (in the precise sense developed by Katz) determines which subgroup the Frobenius operators lie in.
For the families that arise naturally, the geometric monodromy is generic — it is one of the classical groups U(N), USp(2N), SO(2N), or SO(2N+1). The Deligne equidistribution theorem then says that the Frobenius operators equidistribute according to the Haar measure on the monodromy group as the family parameter goes to infinity. This is the function field theorem.
The transposition to the number field case is conjectural. There is no analog of the geometric monodromy group for arithmetic families (this is a special case of the “missing geometry” problem treated in Papers 2 and 5), but the Katz–Sarnak conjecture asserts that the same statistical predictions hold. The conjecture is supported by the function field theorems, by extensive numerical evidence, and by the structural fact that the symmetry type of an arithmetic family can be identified from the same kinds of data (functional equation type, root number sign distribution, self-duality structure) that determine the symmetry type in the function field case.
III. Symmetry Types and Their Predictions
The Unitary Type
For families of unitary symmetry type, the random matrix model is the unitary group U(N) with Haar measure. The eigenvalues e^{iθ_1}, …, e^{iθ_N} of a random unitary matrix lie on the unit circle, and their statistics in the bulk are governed by the sine kernel that produces GUE statistics in the appropriate limit.
For low-lying zeros, the relevant statistics concern the distribution of the eigenvalues nearest to the spectral edge. The “edge” in this context is θ = 0, corresponding to s = 1/2 on the L-function side. The distribution of the lowest eigenvalue, the second-lowest, the n-th lowest, and so on, can be computed explicitly in terms of determinants of certain kernel functions.
Specifically, the n-level density for the unitary type predicts that, for a test function f,
lim_{N→∞} E[∑_{j=1}^N f(N θ_j / 2π)] = ∫ f(x) W_U(x) dx,
where W_U is the unitary kernel — a specific function arising from the sine kernel — and the sum is over eigenvalues normalized to have spacing of order 1 near the edge.
For natural unitary families of L-functions, the conjecture is that the same n-level density formula holds, with N replaced by the appropriate logarithm of the family parameter (the conductor, modulus, or level).
The Symplectic Type
For families of symplectic symmetry type, the random matrix model is the unitary symplectic group USp(2N). The eigenvalues come in pairs (e^{iθ_j}, e^{-iθ_j}), with the symmetry of the symplectic structure. The eigenvalues nearest the spectral edge are repelled from θ = 0, in contrast to the unitary case.
The repulsion is structural. In the symplectic case, the lowest eigenvalue θ_1 satisfies P(θ_1 ≤ x) ~ x^2 as x → 0, indicating that eigenvalues near zero are rare. This is reflected on the L-function side as a reduced density of low-lying zeros near the central point compared to the unitary case.
The prediction for natural symplectic families — quadratic Dirichlet L-functions, families of self-dual representations of symplectic type — is the corresponding density formula with the symplectic kernel W_{Sp}.
The Orthogonal Type and Its Subtypes
For families of orthogonal symmetry type, the random matrix model is the special orthogonal group, with two subtypes: SO(2N) (even orthogonal) and SO(2N+1) (odd orthogonal). The split between the subtypes corresponds, on the L-function side, to the parity of the order of vanishing of the L-function at the central point.
For SO(2N), the eigenvalues come in pairs (e^{iθ_j}, e^{-iθ_j}), and the lowest eigenvalue θ_1 is attracted to θ = 0, with P(θ_1 ≤ x) ~ const · x as x → 0. This linear vanishing at zero is reflected on the L-function side as a higher density of low-lying zeros near the central point.
For SO(2N+1), the eigenvalues come in pairs plus one fixed eigenvalue at e^{i·0} = 1. The presence of this fixed eigenvalue corresponds, on the L-function side, to a forced zero of the L-function at the central point — that is, to L(1/2) = 0 with order at least 1.
The split into SO(2N) and SO(2N+1) is essential for orthogonal families. In families of L-functions of elliptic curve quadratic twists, for instance, the parity of the rank determines whether the L-function vanishes at the central point (with vanishing order matching the rank under BSD), and this corresponds to the split between SO(2N) (rank-zero twists) and SO(2N+1) (positive-rank twists).
The Low-Lying Zero Densities
The 1-level density for each symmetry type is given by an explicit formula. For test functions f with Fourier transform supported in (-σ, σ) for some σ > 0, the conjecture asserts:
For unitary families: D_1(f) = f̂(0) − ∫{-1/2}^{1/2} f̂(u) du · 1{|u| ≤ 1}.
For symplectic families: D_1(f) = f̂(0) − ∫{-1/2}^{1/2} f̂(u) du · 1{|u| ≤ 1} − (1/2) f(0).
For even orthogonal families: D_1(f) = f̂(0) + ∫{-1/2}^{1/2} f̂(u) du · 1{|u| ≤ 1} − (1/2) f(0).
For odd orthogonal families: D_1(f) = f̂(0) + ∫{-1/2}^{1/2} f̂(u) du · 1{|u| ≤ 1} + (1/2) f(0).
The differences among these formulas are not large in absolute terms — they differ by terms of order f(0) — but they are signature features that distinguish the symmetry types. A family whose 1-level density matches the symplectic formula, for instance, cannot match any of the other types.
IV. Verifying the Predictions in Restricted Ranges
The Iwaniec–Luo–Sarnak Theorem
The first major verification of Katz–Sarnak predictions for an arithmetic family was due to Henryk Iwaniec, Wenzhi Luo, and Peter Sarnak in 2000. They considered the family of L-functions of holomorphic modular forms of weight k and level N (with N varying), and they proved that the 1-level density matches the Katz–Sarnak prediction for the corresponding symmetry type.
The result holds in restricted ranges of test functions: specifically, for test functions with Fourier transform supported in an interval (-σ, σ) with σ < 2 (in the appropriate normalization). Outside this range, the prediction is conjectural.
The restriction on σ is fundamental, not merely technical. Establishing the prediction for σ ≤ 1 requires bounding error terms that come from primes in the analytic conductor; establishing it for σ ≤ 2 requires additional control from techniques developed by Luo and others. Beyond σ = 2, the arguments break down because higher-order interactions among primes become difficult to control.
The pattern — verification in a restricted range, with the full conjecture beyond reach by current methods — has become typical in the field.
The Özlük–Snyder Theorem
A closely related result, due to A. E. Özlük and C. Snyder in 1993 (predating the formal Katz–Sarnak framework), established the 1-level density for the family of quadratic Dirichlet L-functions L(s, χ_d) as d varies over fundamental discriminants. The result holds for test functions with Fourier transform in (-2, 2), and the prediction matches the symplectic Katz–Sarnak formula.
The Özlük–Snyder result is significant historically because it established the connection between the symmetry type of a family and the explicit form of the low-lying zero density, anticipating the broader Katz–Sarnak framework. It is significant technically because the family of quadratic Dirichlet L-functions is one of the most fundamental and well-studied families, and the result provides a benchmark against which other family results can be compared.
Subsequent Extensions
Subsequent work has extended the Katz–Sarnak verifications to many additional families.
L-functions of Hecke eigenforms: Iwaniec–Luo–Sarnak’s result was extended by various authors to refined families and to higher n-level densities (under appropriate restrictions on test function support).
L-functions of elliptic curves: The family of L-functions L(s, E_d) of quadratic twists of a fixed elliptic curve was studied by Heath-Brown, Goldfeld, and others. The 1-level density matches the orthogonal Katz–Sarnak prediction in restricted ranges.
Symmetric power L-functions: Families of symmetric power L-functions Sym^n L(s, f) for varying f were studied; the predictions match the appropriate symmetry types.
Higher-rank automorphic L-functions: Families of L-functions of automorphic representations of GL(n) for varying parameters were studied by various authors; the predictions match unitary, symplectic, or orthogonal types depending on self-duality.
In each case, the pattern is the same: verification in restricted ranges, with the full conjecture conditional or open.
What the Restricted Range Results Buy
The restricted range results have substantial structural content despite the restriction. They establish that:
- The Katz–Sarnak symmetry types correctly predict the low-lying zero statistics for a wide variety of families.
- The transition between symmetry types (unitary, symplectic, orthogonal, with the orthogonal subtypes) corresponds to identifiable structural features of the families (self-duality, root number sign distribution).
- The framework is internally consistent: predictions for related families agree in their overlapping ranges.
- Numerical verification of the predictions, made possible by extensive L-function computation (treated below), confirms the framework to high precision in essentially all cases.
The restricted range results thus establish the framework as substantially correct, with the open problem being the extension to full ranges of test functions and to additional families.
The Connection to Pair Correlation
The Katz–Sarnak framework for low-lying zeros connects naturally to the Montgomery framework for pair correlation. Pair correlation is, in the random matrix language, the bulk statistic of eigenvalues — the statistics far from the spectral edge. Low-lying zero statistics are the edge statistics — the statistics near the spectral edge.
Both bulk and edge statistics are determined by the same random matrix ensemble. For the unitary type, both are governed by the sine kernel. For the symplectic and orthogonal types, the bulk and edge statistics differ, with the edge showing the symmetry-type-specific features (repulsion for symplectic, attraction with possible forced zeros for orthogonal).
The two frameworks are thus complementary aspects of a single underlying picture. Pair correlation captures the bulk regime; family statistics capture the edge regime. Together they provide a comprehensive picture of zero statistics across the L-function landscape.
V. The Function Field Case
Katz’s Monodromy Theorems
The function field case of Katz–Sarnak is, in contrast to the number field case, fully theorematic. Katz had developed, in a series of monographs over the 1980s and 1990s, a theory of monodromy of geometric families of L-functions over function fields. The theory analyzes how Frobenius operators on cohomology vary as the underlying variety varies in a family, with the variation governed by a geometric structure — the monodromy representation of the family’s parameter space.
The geometric monodromy group of a family is, roughly, the closure of the image of the monodromy representation. For natural families of L-functions, this group turns out to be one of the classical groups: U(N), USp(2N), SO(2N), or SO(2N+1), with the choice determined by the family’s structural features.
Once the geometric monodromy group is identified, the Deligne equidistribution theorem (a deep result from the étale cohomology framework treated in Paper 2) implies that the Frobenius operators equidistribute according to the Haar measure on the monodromy group as the family parameter goes to infinity.
The equidistribution is exactly the random matrix prediction. The eigenvalues of Frobenius, suitably normalized, are distributed according to the eigenvalue density of a random matrix in the appropriate ensemble. The low-lying eigenvalues — those nearest to the spectral edge — have density given by the corresponding edge formula. The Katz–Sarnak prediction holds, in this setting, as a theorem.
Specific Computed Cases
The function field theorems have been worked out in detail for many natural families.
Hyperelliptic curves: The family of hyperelliptic curves y² = f(x) with f a polynomial of degree 2g+1 or 2g+2 over F_q, parameterized by f, has geometric monodromy USp(2g). The Katz–Sarnak prediction is the symplectic distribution, and this is a theorem.
Artin–Schreier curves: The family of Artin–Schreier curves y^p − y = f(x) over F_p has been studied; the geometric monodromy has been computed in many cases, and the corresponding Katz–Sarnak predictions are theorems.
Cyclic covers: Families of cyclic covers of P^1 have been studied extensively; the geometric monodromy depends on the cover degree and the ramification structure, with Katz–Sarnak predictions theorematic in each case.
Twists of fixed curves: The family of quadratic twists of a fixed hyperelliptic curve over F_q(T), parameterized by the twist, has geometric monodromy that has been computed. The corresponding Katz–Sarnak prediction (orthogonal type, with subtype determined by the genus parity) is a theorem.
In each case, the function field theorem provides a rigorous version of what is conjectural in the number field setting.
The Structural Lesson
The function field theorems serve the same structural role they have served throughout this suite: they provide the model for what the corresponding number field results should look like, and they identify the structural ingredients required for proof.
In the function field case, the proof requires:
- A geometric setting in which the L-functions arise as characteristic polynomials of Frobenius on cohomology.
- A monodromy representation governing how the Frobenius operators vary as the family parameter varies.
- The Deligne equidistribution theorem, providing the connection between monodromy and equidistribution.
- Computation of the geometric monodromy group for the specific family, identifying which classical group it is.
In the number field case, none of these is currently available in the form required. There is no obvious “monodromy” of an arithmetic family of L-functions in the sense Katz uses. The “missing geometry” problem manifests here as the absence of a structural framework that would support direct proofs of family statistics.
The conjectural transposition relies on extensive numerical evidence (computer verification of family statistics has been carried out for many natural families), on the structural consistency of the predictions across the function field/number field divide, and on the philosophical view that the symmetry type of a family — identifiable from analytic data — should determine its statistics regardless of the underlying setting.
VI. Quadratic Twist Families and Elliptic Curves
The Family L(s, E_d)
A particularly important orthogonal family is the family of L-functions of quadratic twists of a fixed elliptic curve. Let E be an elliptic curve over Q, and for each fundamental discriminant d, let E_d be the quadratic twist of E by d. The L-function L(s, E_d) is then defined, with functional equation and conjectured analytic properties.
The functional equation has root number ε(E_d) = ±1, and this root number is conjecturally distributed equally between +1 and -1 as d varies (with a small bias depending on E that is computable explicitly). The d for which ε(E_d) = +1 give an even orthogonal family, with conjectured rank parity 0 (mod 2). The d for which ε(E_d) = -1 give an odd orthogonal family, with conjectured rank parity 1 (mod 2).
The split between the two subfamilies is structural: it corresponds to the split between SO(2N) and SO(2N+1) in the random matrix prediction, and it has direct arithmetic content through the BSD conjecture.
The BSD Connection
The Birch and Swinnerton-Dyer conjecture asserts that the order of vanishing of L(s, E) at s = 1 equals the rank of the Mordell–Weil group E(Q). Under BSD, the rank parity equals the parity of the order of vanishing, which equals (1 – ε(E))/2 (so root number +1 corresponds to even rank, root number -1 corresponds to odd rank).
This connects the orthogonal symmetry split directly to arithmetic. The even orthogonal family of E_d with ε(E_d) = +1 consists of twists with even rank; the odd orthogonal family consists of twists with odd rank. The Katz–Sarnak prediction for each subfamily — the orthogonal distribution of low-lying zeros — corresponds, under BSD, to a prediction about the distribution of ranks within each subfamily.
Goldfeld’s Conjecture
Dorian Goldfeld in 1979 conjectured that, for a fixed elliptic curve E over Q, the average rank of the quadratic twists E_d is exactly 1/2. More precisely:
- Among twists with ε(E_d) = +1 (the even orthogonal family), the average rank is 0, with rank ≥ 2 occurring on a density-zero subset.
- Among twists with ε(E_d) = -1 (the odd orthogonal family), the average rank is 1, with rank ≥ 3 occurring on a density-zero subset.
The conjecture is consistent with the Katz–Sarnak prediction for orthogonal families: in the random matrix model, eigenvalues at the spectral edge contribute to the order of vanishing at s = 1/2, and the predicted distribution gives an average vanishing order matching Goldfeld’s conjecture.
Goldfeld’s conjecture has been substantially supported by recent work. Most notably, Alexander Smith in 2017–2022 proved the conjecture for the congruent number problem (which corresponds to quadratic twists of a specific elliptic curve), establishing that 100% of the relevant integers are congruent or non-congruent according to a precise statistical prediction. Smith’s methods are arithmetic-statistical rather than directly L-function-theoretic, but the conclusions match the Katz–Sarnak framework’s predictions.
Recent Progress on Ranks in Twist Families
Substantial progress has been made on the distribution of ranks in twist families. The Manjul Bhargava program (treated in the next section) has established bounds on the average rank of all elliptic curves over Q, with consequences for the rank distribution in any natural family. Work by Bhargava, Skinner, Zhang, and others has established, for instance, that a positive proportion of elliptic curves over Q satisfy BSD, with the rank distribution consistent with the Katz–Sarnak predictions.
The full Goldfeld conjecture remains open, but the body of partial results, combined with the Katz–Sarnak framework and the function field analogs, provides substantial structural support.
VII. Vanishing of L-Functions at the Central Point
The Phenomenon
A central concern in family statistics is the question: how often does an L-function in a family vanish at the central point s = 1/2? This is the question of L(1/2) vanishing, which has direct arithmetic content for many families.
For families of unitary symmetry type, L(1/2) ≠ 0 for almost all L-functions in the family — vanishing is a measure-zero event in the random matrix model and conjecturally in the L-function family.
For families of symplectic symmetry type, L(1/2) ≠ 0 for almost all L-functions in the family, and the value L(1/2) is positive (the symplectic root number forces L(1/2) ≥ 0 when nonzero).
For families of orthogonal symmetry type, the situation is more interesting. The orthogonal subfamily SO(2N+1) has a forced zero at s = 1/2: every L-function in this subfamily has L(1/2) = 0. The SO(2N) subfamily has L(1/2) ≠ 0 generically, but with a positive probability (depending on the family) of L(1/2) = 0.
Predictions for Orthogonal Families
For orthogonal families, the Katz–Sarnak framework predicts specific statistics for the order of vanishing.
Forced vanishing in SO(2N+1): Every L-function in the SO(2N+1) subfamily has L(1/2) = 0 with order at least 1. The order is generically 1, with order 3, 5, … occurring on density-zero subsets.
Generic non-vanishing in SO(2N): A random L-function in the SO(2N) subfamily has L(1/2) ≠ 0. The order of vanishing is 0 generically, with order 2, 4, … occurring on density-zero subsets.
Combined prediction: For a natural orthogonal family that is the union of SO(2N) and SO(2N+1) subfamilies, the average order of vanishing at s = 1/2 is exactly 1/2 (assuming equal distribution of root numbers), matching Goldfeld’s conjecture.
Smith’s Theorem on the Congruent Number Problem
Alexander Smith’s work on the congruent number problem, completed in 2022, provides one of the most striking verifications of the orthogonal family predictions. The congruent number problem asks which positive integers n are areas of right triangles with rational sides; equivalently, which n satisfy that the elliptic curve y² = x³ – n²x has positive rank.
Smith proved that, in a precise statistical sense, exactly half of squarefree positive integers are congruent and half are not, with the split given by an explicit congruence condition. The result is a special case of Goldfeld’s conjecture for the elliptic curve y² = x³ – x, and it confirms the orthogonal family prediction in this case completely.
Smith’s methods are not directly L-function-theoretic; they involve careful arithmetic statistics of class groups and 2-Selmer groups. But the conclusions match what the Katz–Sarnak framework predicts, and they provide the strongest verification to date of the framework’s quantitative predictions.
Soundararajan’s Non-Vanishing Theorem
A related result, due to Soundararajan, establishes that for the family of quadratic Dirichlet L-functions L(s, χ_d), the proportion of d for which L(1/2, χ_d) ≠ 0 is at least 7/8. The proof uses techniques from moment theory (treated in Paper 6) combined with mollifier arguments.
The result is significant because it establishes a lower bound on non-vanishing in a family of symplectic type, where the random matrix prediction is that 100% of L-functions are non-vanishing. The 7/8 bound is unconditional, while the 100% prediction remains conjectural. The gap between 7/8 and 100% reflects the difficulty of establishing exactly the random matrix prediction.
VIII. The Connection to Arithmetic Statistics
Bhargava’s Program
Manjul Bhargava, beginning in the early 2000s, has developed a program of arithmetic statistics — the systematic study of distributions of arithmetic objects when ordered by appropriate height functions. The objects studied include number fields of fixed degree, ideal class groups, elliptic curves over Q, Selmer groups, Tate–Shafarevich groups, and many others.
The program has produced theorems of remarkable depth. Bhargava’s work, often in collaboration with Arul Shankar, Manjul Bhargava and Christopher Skinner, and many others, has established:
Average rank of elliptic curves: The average rank of elliptic curves over Q, ordered by naive height, is at most 0.885. The random matrix prediction (combined with BSD) is that the average rank is exactly 1/2, so Bhargava’s bound is consistent with but not equal to the prediction.
Positive proportion of BSD: A positive proportion of elliptic curves over Q satisfy the rank-zero case of BSD (where L(1, E) ≠ 0 implies rank zero), and a positive proportion satisfy the rank-one case.
Distribution of ideal class groups: The Cohen–Lenstra heuristics, which predict the distribution of ideal class groups of imaginary quadratic fields, have been substantially confirmed in restricted ranges by Bhargava and collaborators.
Selmer group statistics: The 2-Selmer, 3-Selmer, and higher Selmer groups of elliptic curves have been studied with statistical methods, with results consistent with predictions from Bhargava, Kane, Lenstra, Poonen, and Rains (BKLPR heuristics).
Connections to L-Function Statistics
The arithmetic statistics results connect to L-function family statistics through the analytic class number formula, BSD, and similar conjectures. Many arithmetic statistics predictions factor through L-function statistics: predictions about ranks of elliptic curves, for instance, factor through predictions about orders of vanishing of L-functions, which factor through Katz–Sarnak predictions.
The structural unity is substantial. Arithmetic statistics, L-function family statistics, and random matrix theory are three perspectives on a single underlying picture. Predictions made in any one perspective should be consistent with predictions made in the others, and the verifications across perspectives provide cross-checks.
Bhargava–Shankar Bounds and Their Consequences
The Bhargava–Shankar bounds on average ranks of elliptic curves, when combined with the random matrix predictions, have several consequences.
First, they support the orthogonal family predictions. Bhargava–Shankar’s bound of average rank ≤ 0.885 is consistent with the Goldfeld conjecture’s prediction of 1/2; the bound does not contradict the prediction, and it rules out alternative conjectures predicting higher average ranks.
Second, they provide tools for proving partial cases of BSD. Bhargava and Skinner, with various collaborators, have used the bounds to establish BSD for specific subfamilies of elliptic curves (those with rank 0 or 1, satisfying certain technical conditions).
Third, they constrain the joint distribution of ranks and L-function vanishing. The bound implies that L-function vanishing of high order at s = 1 is rare, consistent with the random matrix prediction that high-order vanishing is a measure-zero event in orthogonal families.
The BKLPR Heuristics
The Bhargava–Kane–Lenstra–Poonen–Rains (BKLPR) heuristics provide a unified framework for predicting the distributions of Selmer groups, Tate–Shafarevich groups, and related arithmetic invariants of elliptic curves. The heuristics are explicit and computable; they predict, for instance, the distribution of |Sha(E)| (the order of the Tate–Shafarevich group, conjecturally finite) as E varies in a family.
The BKLPR heuristics are consistent with the random matrix predictions for the corresponding L-function families. The structural connection is via BSD: the BKLPR distribution of Sha is, modulo BSD, a prediction about subleading terms in the asymptotic of L(s, E) at s = 1, which is in turn predicted by the random matrix model.
The verification of BKLPR in restricted ranges, by Bhargava and his collaborators, provides indirect support for the random matrix framework. Just as the function field analogs make Katz–Sarnak theorematic, the BKLPR results make the corresponding arithmetic predictions theorematic in restricted ranges.
IX. Selberg-Class Families and the Orthogonality Conjectures
Families Within the Selberg Class
The Selberg class S, treated in Paper 2, is the class of L-functions satisfying the Selberg axioms. The class is conjecturally equal to the class of automorphic L-functions (treated in Paper 5) and includes Dirichlet L-functions, Hecke L-functions, modular form L-functions, and L-functions of automorphic representations of GL(n).
Within the Selberg class, families can be defined: collections of L-functions parameterized by some structural parameter, with the L-functions in the family sharing some structural feature (degree, conductor type, or symmetry type). The Katz–Sarnak philosophy applies to such families: the symmetry type of the family should determine the low-lying zero statistics.
For Dirichlet L-function families (varying the modulus), the symmetry type is unitary or symplectic depending on whether the characters are complex or real. For modular form L-function families, the symmetry type depends on whether the forms are self-dual or not. For higher-rank automorphic L-function families, the symmetry type depends on the structural features of the family.
Selberg Orthogonality
The Selberg orthogonality conjectures, treated briefly in Paper 4 of this suite, predict that distinct primitive L-functions in the Selberg class have orthogonal Dirichlet coefficients in a precise sense:
∑{p ≤ x} a_p(F) a_p(G) / p = δ{F, G} log log x + O(1),
where δ_{F,G} = 1 if F = G and 0 otherwise.
The orthogonality conjectures are connected to family statistics in the following way. If two L-functions F and G in a family have orthogonal Dirichlet coefficients, then their zeros are statistically independent in the appropriate sense. The family statistics emerge from the joint behavior of independent zero distributions.
For Dirichlet L-functions of distinct primitive characters, Selberg orthogonality is provable. For higher-rank L-functions, orthogonality is open in many cases. The proven cases support the Katz–Sarnak framework; the open cases are constraints on what the framework can be expected to deliver.
Cross-Correlations Across Families
A natural extension of family statistics is the study of cross-correlations across families. Given two families F_1 and F_2 of L-functions with possibly different symmetry types, one can ask: how do the zeros of L-functions in F_1 correlate with the zeros of L-functions in F_2?
The Katz–Sarnak framework predicts that, for natural families with different symmetry types, the zeros are essentially independent — their joint statistics are products of marginal statistics. For families with related symmetry types or shared structural features, the cross-correlations may be nontrivial.
The cross-correlations have not been studied as extensively as within-family statistics, but they are a natural research direction. The Stratified Zero–Prime Resonance Conjecture in Paper 4 of this suite is, in part, a conjecture about specific cross-correlations between ζ-zeros and Dirichlet L-function zeros, and it sits naturally within this broader framework.
X. Computational Verification
Large-Scale Computation
The verification of Katz–Sarnak predictions has been substantially supported by large-scale computation. The L-functions and Modular Forms Database (LMFDB) project, an ongoing collaborative effort by many researchers, has produced extensive databases of L-functions, modular forms, elliptic curves, and related objects, with computed zeros, special values, and statistical data.
For families of L-functions, the LMFDB and predecessor databases have allowed direct numerical verification of the Katz–Sarnak predictions in many cases. The verifications typically proceed by:
- Selecting a family and computing a substantial number of L-functions in it (often thousands to millions).
- Computing the low-lying zeros of each L-function.
- Aggregating the zero statistics across the family.
- Comparing to the Katz–Sarnak prediction for the appropriate symmetry type.
The agreement, in essentially all cases tested, has been strong. Discrepancies, when they appear, have typically been traced to incomplete identification of the symmetry type or to insufficient family size, with the Katz–Sarnak prediction for the corrected analysis matching closely.
Specific Verifications
Specific computational verifications of note include:
Quadratic Dirichlet L-functions: Massive computations by Rubinstein and others have verified the symplectic Katz–Sarnak prediction for L(s, χ_d) across a wide range of d.
L-functions of elliptic curves: Computation of L(s, E) and its low-lying zeros for many elliptic curves has confirmed the orthogonal Katz–Sarnak prediction in restricted ranges, with the split between SO(2N) and SO(2N+1) subfamilies clearly identifiable.
Modular form L-functions: Computation of L(s, f) for newforms of various weights and levels has confirmed the predicted symmetry types.
Higher-rank L-functions: For automorphic L-functions of GL(n) for n = 3 and n = 4, computational verification has been carried out in restricted ranges, with results consistent with the predicted symmetry types.
Discrepancies and Their Interpretation
In a few cases, computational results have revealed subtle features that go beyond the leading-order Katz–Sarnak prediction. These include:
Lower-order corrections: The leading-order prediction is an asymptotic statement; subleading corrections have been studied numerically and found to match refined predictions from the Conrey–Farmer–Keating–Rubinstein–Snaith framework.
Family-specific features: Some families exhibit features not captured by the symmetry type alone — for instance, “secondary” effects from auxiliary L-functions, or arithmetic-specific corrections from class number behavior. These features are typically small but identifiable.
Boundary effects: For families parameterized by conductors near the boundary of the family’s natural range, computational results may show systematic biases that wash out as the family size grows.
In each case, the discrepancies have been understandable — typically traceable to refined predictions or to known features of the family — and have not contradicted the framework. The pattern is that the Katz–Sarnak framework is correct at leading order, with subleading corrections that are themselves predicted by refinements of the framework.
The Role of Computation in Refining Conjectures
Computation has played a substantial role in refining the Katz–Sarnak conjectures. Features that emerge only from large-scale data have led to refined predictions and to identification of structural features that the original framework did not capture explicitly.
The CFKRS (Conrey–Farmer–Keating–Rubinstein–Snaith) refined moment conjecture, treated in Paper 6, is one example: it provides not only the leading constant but the full asymptotic expansion for moments, with subleading terms that have been verified numerically. The corresponding refinement for family zero statistics has been studied by Rubinstein and others, with similar agreement.
The computational program has thus served not only to verify the framework but to extend it. Refinements to the Katz–Sarnak predictions have emerged from numerical observation, and these refinements have, in turn, been justified theoretically through random matrix analysis and the hybrid Euler–Hadamard model.
XI. Implications for the Riemann Hypothesis
Indirect Evidence
The Katz–Sarnak framework provides indirect evidence for the Riemann hypothesis. The framework predicts that zeros of L-functions in families distribute themselves with statistical regularity governed by random matrix theory. If RH were grossly false — if many L-functions had zeros far from the critical line — the family statistics would be different from what the framework predicts.
The verification of the framework, in restricted ranges and to high precision numerically, is thus indirect support for RH. It indicates that L-function zeros behave with the kind of regularity that is consistent with RH, and that this regularity is not specific to any single L-function but holds across families.
Limits of the Indirect Argument
The indirect argument has limits. Family statistics can be consistent with RH while still allowing isolated failures: a single L-function with a zero far from the critical line would not, in general, disrupt the family statistics if the family is large. The framework establishes regularities in the average behavior, not in every individual L-function.
This is a real limitation. Even if all family statistics predictions are confirmed, they would not directly prove RH for any specific L-function. They would establish that RH holds on average, and that gross failures are inconsistent with the data, but they would not rule out subtle failures in specific cases.
The Structural Argument
A stronger argument is structural. The Katz–Sarnak framework predicts that L-function families have the symmetry types they do because of structural features of the families (self-duality, automorphic origin, monodromy in the function field case). If these structural features force the predicted statistics, and if the predicted statistics are inconsistent with off-line zeros, then the structural features themselves provide evidence against off-line zeros.
In the function field setting, this argument is rigorous: the geometric monodromy forces the random matrix distribution, which forces the eigenvalues to lie on the unit circle (the function field analog of the critical line). RH for varieties over finite fields follows from the monodromy structure.
In the number field setting, the analog is conjectural. The structural features of arithmetic families are real, but the rigorous derivation of zero statistics from them requires the missing geometry that has not been supplied. The structural argument is suggestive but not decisive.
Family Statistics as Constraint on Proofs
The Katz–Sarnak framework, even without proving RH, constrains what a proof of RH can look like. A proof of RH must be consistent with the family statistics — it must predict zeros lying on the critical line in a way that is statistically consistent with random matrix predictions. A proof that contradicted the family statistics would be inconsistent with extensive numerical verification and would have to explain the discrepancy.
This constraint is similar in spirit to the constraints on proofs treated in Paper 3: any successful proof must distinguish the critical line from neighboring lines, must use information specific to ζ that is absent from arbitrary L-functions, and must explain the function field success or differ from it deliberately. The Katz–Sarnak framework adds: any successful proof must produce family statistics consistent with random matrix predictions.
These accumulated constraints narrow the space of plausible proof strategies. They do not produce a proof, but they shape what a proof could be.
XII. Open Problems
Full Verification of Katz–Sarnak Predictions
The most prominent open problem is the full verification of the Katz–Sarnak predictions for natural families, without the restriction on Fourier support of test functions. The restricted-range results establish the predictions for σ in a bounded interval; the full conjectures concern all σ.
For the family of quadratic Dirichlet L-functions, the prediction is verified for σ < 2 by Özlük–Snyder. Extending to larger σ would require new methods for handling the prime contributions in the analytic conductor.
For the family of L-functions of holomorphic modular forms, similar restrictions apply. Iwaniec–Luo–Sarnak verified the prediction for σ in a specific bounded range; extending to larger ranges is open.
For families of L-functions of higher rank automorphic representations, the situation is more delicate. The verification has been extended to higher-rank cases by various authors, but the restrictions on σ are typically more severe than in the GL(2) and Dirichlet cases.
Higher-Rank Symmetry Types
The Katz–Sarnak philosophy has been most fully developed for families with classical symmetry types: unitary, symplectic, orthogonal. For families coming from higher-rank automorphic representations — for instance, families of L-functions of automorphic representations of GL(n) for n ≥ 3 — additional symmetry types may arise.
Specifically, for automorphic representations on classical groups (Sp(2n), SO(n), unitary groups), the symmetry types of the corresponding L-function families correspond to dual groups in the Langlands sense (treated in Paper 5). The full list of expected symmetry types includes the classical ones plus refined versions corresponding to specific dual group structures.
Working out the predictions for higher-rank automorphic families, and verifying them in restricted ranges, is an active area of research. Notable contributions include work by Sarnak, Templier, Shin, Kowalski, Michel, and many others.
The Role of the Conductor
Family statistics depend on how the family is parameterized — typically by some notion of conductor or analytic conductor. Different parameterizations of the “same” family can produce different statistics, and the appropriate parameterization is sometimes a subtle question.
For Dirichlet L-functions, the natural parameter is the modulus. For modular form L-functions, the parameter is the level. For elliptic curve L-functions, the parameter is the conductor. In each case, the parameter has both arithmetic significance and analytic significance, and the family statistics are predicted in terms of the parameter.
The role of the conductor in family statistics has been studied extensively, but a unified theoretical framework for selecting the appropriate parameter for an arbitrary family is not fully developed.
Joint Statistics Across Distinct Families
The cross-correlations between zeros of L-functions in different families are largely unstudied. The Katz–Sarnak framework predicts that, generically, families with distinct symmetry types have independent zero statistics, but the precise nature of this independence and the conditions under which it holds are open.
The Stratified Zero–Prime Resonance Conjecture in Paper 4 is, in part, a conjecture about specific cross-correlations: between ζ-zeros and Dirichlet L-function zeros. The conjecture predicts that these cross-correlations are not zero — that the structural features of L(s, χ) leave a quantifiable signature on ζ-zero statistics.
Investigating cross-correlations more broadly, both theoretically and computationally, is a natural extension of the family statistics program.
XIII. Conclusion
The Katz–Sarnak philosophy, formulated in 1999, has organized a substantial research program for a quarter-century. The framework predicts that zeros of L-functions in families distribute themselves according to symmetry-type-dependent random matrix statistics, with specific functional forms for the 1-level density and higher correlations.
The framework has been verified in restricted ranges for many natural families: quadratic Dirichlet L-functions, modular form L-functions, elliptic curve L-functions, and many others. The verifications have been substantial — they establish that the framework correctly predicts the leading-order behavior of family statistics across a wide variety of families. The full conjectures remain open in many cases, but the partial verifications are strong evidence for the framework’s correctness.
In the function field setting, Katz’s monodromy theorems make the framework theorematic. The geometric monodromy of natural families is a classical group, and the Deligne equidistribution theorem implies that Frobenius operators equidistribute according to the Haar measure on the monodromy group. This is exactly the random matrix prediction, established as a theorem in the function field setting.
The framework connects densely to other parts of number theory. Through BSD and Goldfeld’s conjecture, it connects to ranks of elliptic curves and to arithmetic statistics in the Bhargava sense. Through Selberg orthogonality, it connects to the Selberg class and to the broader Langlands picture. Through the moments-and-zeros connection, it connects to the moment theory of Paper 6 and to the conjecture of Paper 4. The connections are not coincidental: family statistics, moment statistics, and pair correlation statistics are three perspectives on a single underlying random matrix picture, with mutual constraints and cross-checks.
For the Riemann hypothesis specifically, family statistics provide indirect evidence and structural constraints. The framework’s predictions are consistent with RH and would be inconsistent with gross failures of RH. The structural argument — that the symmetry type of a family forces the predicted statistics, which force critical-line zeros — is rigorous in the function field setting and conjectural in the number field setting. The constraint on proofs is real: any successful proof of RH must produce family statistics consistent with what is observed.
Why does family statistics deserve its own treatment? Because the framework is substantial and well-developed; because the verifications, partial as they are, are among the deepest in analytic number theory; because the function field theorems provide a model for what the number field results should look like; because the connections to arithmetic statistics, BSD, and the Langlands program are dense; because the framework is likely to be the source of substantial progress in the coming decades, with new verifications, new families, and refinements of the predictions; and because the framework, together with pair correlation and moment theory, constitutes the broad random matrix picture of L-functions that has organized analytic number theory since 2000.
The seven papers of this suite, taken together, trace the Riemann hypothesis from its historical origins (Paper 1), through the field-theoretic framework that situates it within the broader landscape of L-functions and arithmetic geometry (Paper 2), through the survey of strategies that have been developed for its proof and the structural reasons for their success or stagnation (Paper 3), through a forward-looking conjecture that aims to add quantitative structure to the conditional theory (Paper 4), through the Langlands framework that places RH as a specimen of a much larger conjectural family (Paper 5), through the moment theory that supplies technical infrastructure on which much else depends (Paper 6), and now through the family statistics that complete the random matrix picture and connect L-functions to arithmetic statistics and the broader Langlands picture (Paper 7).
The Riemann hypothesis itself remains where Riemann left it: probable, supported, central, and unproved. The framework around it has grown substantially over the past century and a half, and especially over the past quarter-century. The combined picture — Langlands, moments, families, pair correlation, missing geometry — is dense and increasingly precise. A proof of RH, when it comes, will likely emerge from this combined picture rather than from any single approach. The work of identifying which parts of the combined picture are most likely to bear on a proof, and of extending each part further, is the work of the coming decades.
The forward-looking dimension of the suite — the conjecture in Paper 4, the prospects discussed in Paper 5, the open problems identified throughout — points toward the active frontier. Family statistics, moment theory, the Langlands program, and the missing geometry programs are all advancing. None has yet produced the proof. Each contributes, in its own way, to the structural understanding that, eventually, will support the proof when it comes. In the meantime, the discipline waits, and works.
═══════════════════════════════════════════════
