Pinboard: public bookmarks for cshalizi

cshalizi

20330

« earlier

[close]

select: none ‧ this page ‧ all pages no items selected

mark as read ‧ unread ‧ add stars ‧ remove stars ‧ delete

tag: add or remove tag

THIS IS BOILERPLATE confirm ‧ cancel

Does This Harvard Economist Practice What He Preaches?

--- This is supposed to be a hit piece??!?

1. Chetty &co. have always been admirably clear that a big part of their reproduction-of-inequality story is that (in modern times) the children of the well off are, in fact, better educated. SAT scores track income so well because the rich have kids who are (as a population) better prepared for college than are the children of the poor. If the _Chronicle_ journalists had read the papers beyond the abstract, or even just the (excellent) NY _Times_ write-ups, they'd have been left in no doubts about this. You could argue that Chetty et al. are wrong about this, but that's not going to support a charge of _hypocrisy_. Which brings me to...

2. OF COURSE students who go to schools with more well-prepared classmates are going to get tend to get better training than equally-talented kids going to worse schools, because their teachers can target a higher level and go further. This is why I can teach our stats./math/CS undergrads stuff that'd be MS or intro-PhD material elsewhere, but if AEO tried to use a Swarthmore undergrad history syllabus here, heads would explode (and not in a good way). An undergrad stats. degree from CMU is just worth a lot more than one from Pitt, the same way that an undergrad history degree from Swarthmore is worth a lot more than one from CMU. (Unfairly, I think Swarthmore's undergrad stats program is excellent.) Pretending otherwise is not going to do anyone any favors in the end.

3. I don't even begin to see any contradiction between "asks pre-docs to work long hours" and "wants to have higher social mobility". Even if we stipulated that these hours were inhumane, I don't see what the tension was supposed to be. For an analogy: suppose that Chetty has a bad temper, swears a lot at subordinates, etc. (N.B.: this is a hypothetical I am making up out of whole cloth.) That could make him a bad boss; if it's bad enough it could even make him an _abusive_ boss, perhaps to the point where Harvard should shut down the lab. That would have nothing at all to do with whether his actions contradict his research!

4. Kevin Bryan's quoted points _against_ pre-docs, basically that (A) it's a lot easier to delay grown-up salaries if you're rich, and that, (B) to the extent it helps career success, we should expect it to get filled up by the genuinely-well-prepared children of privilege, are of course sounds. You could counter-act (B), somewhat, by establishing a floor of minimal competence and instituting a lottery for everyone above that threshold. But, to the extent that the people running the pre-doc program _can_ separate those who will be more useful assistants from those who will not be, the lottery option will produce less useful research for the same amount of sponsors' money and senior scholars' time. That might be worth it if it has compensating positive social effects, but that's the trade-off. (One could of course deny the premise, and argue that senior scholars cannot, in fact, make such judgments reliably, whatever they might think.) This is, of course, not at all unique to Chetty's group.

5. I guess I really don't understand how the reporters, and the editors, at _The Chronicle_ think.

im_going_to_regret_making_this_a_public_note_arent_i academia chetty.raj circular_firing_squads why_oh_why_cant_we_have_a_better_press_corps
5 days ago

copy to mine

Projective, sparse and learnable latent position network models

"When modeling network data using a latent position model, it is typical to assume that the nodes’ positions are independently and identically distributed. However, this assumption implies the average node degree grows linearly with the number of nodes, which is inappropriate when the graph is thought to be sparse. We propose an alternative assumption—that the latent positions are generated according to a Poisson point process—and show that it is compatible with various levels of sparsity. Unlike other notions of sparse latent position models in the literature, our framework also defines a projective sequence of probability models, thus ensuring consistency of statistical inference across networks of different sizes. We establish conditions for consistent estimation of the latent positions, and compare our results to existing frameworks for modeling sparse networks."

--- Ungated: http://arxiv.org/abs/1709.09702
--- Comments: (2023-12) - (2017-09) = referee #2, man, referee #2

in_NB self-promotion network_data_analysis spencer.neil to:blog re:geographons_project graphons
5 days ago

copy to mine

Maybe We Already Have Runaway Machines | The New Yorker

--- Book in question: [https://wwnorton.com/books/9781631496943]
--- Publishing schedules being what they are, Runciman must've begun writing this long before our _Economist_ piece [http://bactra.org/weblog/shoggothim.html]. But also probably long after its roots [http://bactra.org/reviews/cognition-in-the-wild/], [http://bactra.org/weblog/699.html]. To be clear, I'd be surprised if Runciman actually took anything from my old blog posts...

book_reviews artificial_intelligence corporations state-building re:shoggothim track_down_references
7 days ago

copy to mine

Darwinian rational expectations: Journal of Economic Methodology: Vol 29, No 2

"The rational expectations hypothesis holds that agents should be modeled as not making systematic forecasting errors and has become a central model-building principle of modern economics. The hypothesis is often justified on the grounds that it coheres with the general methodological principle of economic rationality. In this article, I propose a novel Darwinian market justification for rational expectations which does not require either structural knowledge or statistical learning, as is commonly required in the economic literature. Rather, this Darwinian market account reconceives rationality as a market level phenomenon instead of as an individualistic property."

--- I presume they distinguish this from Alchain somehow.
--- ETA: On a quick scan, Alchain shows up at the very end.

to:NB economics rationality evolutionary_game_theory philosophy_of_science
11 days ago

copy to mine

The Map/Territory Relationship in Game-Theoretic Modeling of Cultural Evolution - PhilSci-Archive

"The cultural red king effect occurs when discriminatory bargaining practices emerge because of a disparity in learning speed between members of a minority and a majority. This effect has been shown to occur in some Nash Demand Game models and has been proposed as a tool for shedding light on the origins of sexist and racist discrimination in academic collaborations. This paper argues that none of the three main strategies used in the literature to support the epistemic value of these models—structural similarity, empirical confirmation, and how-possibly explanations—provides strong support for this modeling practice in its present form."

--- Re last tag, obviously I wouldn't actually teach this, but I should read this if I'm going to teach that kind of model again

to:NB evolutionary_game_theory institutions inequality philosophy_of_science social_science_methodology to_teach:statistics_of_inequality_and_discrimination
11 days ago

copy to mine

What would imaginary ancestors do? - PhilSci-Archive

"In this paper, I identify a novel challenge to reasoning about human cognitive evolution. Theorists engaged in producing a causal history of uniquely human psychology often implicitly or explicitly take the perspective of imaginary hominins to reason about a plausible evolutionary sequence. I argue that such speculations only appear plausible because we have employed our evolved cognitive capacities to decide what the imaginary hominin would think or do. Further, I argue that we are likely to continue making this kind of mistake, and so we must continuously contend with it, even in our best approaches to human cognitive evolution."

to:NB philosophy_of_science human_evolution narrative
11 days ago

copy to mine

Collective deception: toward a network model of epistemic responsibility | Synthese

"What kind of collective is responsible for the deception that follows disinformation campaigns? Jennifer Lackey argues in The Epistemology of Groups that a group agent is responsible for such deception. She analyzes this deception as a group lie, which involves a group misrepresenting its own beliefs through a jointly accepted assertion or a spokesperson. Against this view, I argue that the group responsible for disinformation campaigns is a diffuse network. This deception involves misrepresenting scientific knowledge, not a group belief. Taking tobacco industry disinformation campaigns as an example, I argue that these corporate groups needed a network of epistemically authoritative sources—including scientists, doctors, and reputable publishers—to create and spread disinformation in order to make a skeptical view of scientific knowledge appear credible. As such, I argue that a network is epistemically responsible for this deception. First, I challenge the assumption within group epistemology that assertion is the basis of epistemic responsibility and argue that credibility enhancement is the basis instead. This explains how non-testimonial forms of support and corroboration from multiple sources can bolster the apparent credibility of an implausible view. Next, I describe the roles of different corroborators to show why it is necessary to include them. Finally, I defend a network model of epistemic responsibility for deception. Understanding how enhancing the credibility of disinformation is a matter of responsibility can help us to build more trustworthy communities."

to:NB networked_life epistemology deceiving_us_has_become_an_industrial_process epidemiology_of_representations social_networks moral_philosophy moral_responsibility re:actually-dr-internet-is-the-name-of-the-monsters-creator
11 days ago

copy to mine

[2109.02224] On Empirical Risk Minimization with Dependent and Heavy-Tailed Data

"In this work, we establish risk bounds for the Empirical Risk Minimization (ERM) with both dependent and heavy-tailed data-generating processes. We do so by extending the seminal works of Mendelson [Men15, Men18] on the analysis of ERM with heavy-tailed but independent and identically distributed observations, to the strictly stationary exponentially β-mixing case. Our analysis is based on explicitly controlling the multiplier process arising from the interaction between the noise and the function evaluations on inputs. It allows for the interaction to be even polynomially heavy-tailed, which covers a significantly large class of heavy-tailed models beyond what is analyzed in the learning theory literature. We illustrate our results by deriving rates of convergence for the high-dimensional linear regression problem with dependent and heavy-tailed data."

--- NeurIPS version: https://proceedings.neurips.cc/paper_files/paper/2021/hash/4afa19649ae378da31a423bcd78a97c8-Abstract.html

to:NB to_read learning_theory mixing heavy_tails
13 days ago

copy to mine

[2308.06220] Nonlinear Permuted Granger Causality

"Granger causal inference is a contentious but widespread method used in fields ranging from economics to neuroscience. The original definition addresses the notion of causality in time series by establishing functional dependence conditional on a specified model. Adaptation of Granger causality to nonlinear data remains challenging, and many methods apply in-sample tests that do not incorporate out-of-sample predictability, leading to concerns of model overfitting. To allow for out-of-sample comparison, a measure of functional connectivity is explicitly defined using permutations of the covariate set. Artificial neural networks serve as featurizers of the data to approximate any arbitrary, nonlinear relationship, and consistent estimation of the variance for each permutation is shown under certain conditions on the featurization process and the model residuals. Performance of the permutation method is compared to penalized variable selection, naive replacement, and omission techniques via simulation, and it is applied to neuronal responses of acoustic stimuli in the auditory cortex of anesthetized rats. Targeted use of the Granger causal framework, when prior knowledge of the causal mechanisms in a dataset are limited, can help to reveal potential predictive relationships between sets of variables that warrant further study."

in_NB granger_causality functional_connectivity time_series
17 days ago

copy to mine

The Math You NeedA Comprehensive Survey of Undergraduate Mathematics | Books Gateway | MIT Press

"A comprehensive survey of undergraduate mathematics, compressing four years of study into one robust overview.
"In The Math You Need, Thomas Mack provides a singular, comprehensive survey of undergraduate mathematics, compressing four years of math curricula into one volume. Without sacrificing rigor, this book provides a go-to resource for the essentials that any academic or professional needs. Each chapter is followed by numerous exercises to provide the reader an opportunity to practice what they learned. The Math You Need is distinguished in its use of the Bourbaki style—the gold standard for concision and an approach that mathematicians will find of particular interest. As ambitious as it is compact, this text embraces mathematical abstraction throughout, avoiding ad hoc computations in favor of general results.
"Covering nine areas—group theory, commutative algebra, linear algebra, topology, real analysis, complex analysis, number theory, probability, and statistics—this thorough and highly effective overview of the undergraduate curriculum will prove to be invaluable to students and instructors alike."

--- Shoulld see how it compares to Garrity [https://doi.org/10.1017/9781108992879]

to:NB books:noted mathematics
17 days ago

copy to mine

Human-like systematic generalization through a meta-learning neural network | Nature

"The power of human language and thought arises from systematic compositionality—the algebraic ability to understand and produce novel combinations from known components. Fodor and Pylyshyn1 famously argued that artificial neural networks lack this capacity and are therefore not viable models of the mind. Neural networks have advanced considerably in the years since, yet the systematicity challenge persists. Here we successfully address Fodor and Pylyshyn’s challenge by providing evidence that neural networks can achieve human-like systematicity when optimized for their compositional skills. To do so, we introduce the meta-learning for compositionality (MLC) approach for guiding training through a dynamic stream of compositional tasks. To compare humans and machines, we conducted human behavioural experiments using an instruction learning paradigm. After considering seven different models, we found that, in contrast to perfectly systematic but rigid probabilistic symbolic models, and perfectly flexible but unsystematic neural networks, only MLC achieves both the systematicity and flexibility needed for human-like generalization. MLC also advances the compositional skills of machine learning systems in several systematic generalization benchmarks. Our results show how a standard neural network architecture, optimized for its compositional skills, can mimic human systematic generalization in a head-to-head comparison."

--- Last tag not because I think there's no way to get neural networks to be compositional (there'd better be!), but on general principles having to do with flashy claims in AI and the tabloids.

to:NB neural_networks cognitive_science color_me_skeptical
17 days ago

copy to mine

The Devil in the Data: Machine Learning & the Theory-Free Ideal - PhilSci-Archive

"Philosophers of science have argued that the widespread adoption of the methods of machine learning (ML) will entail radical changes to the variety of epistemic outputs science is capable of producing. Call this the disruption claim. This, in turn, rests on a distinctness claim, which holds ML to exist on novel epistemic footing relative to classical modelling approaches in virtue of its atheoreticity. We describe the operation of ML systems in scientific practice and reveal it to be a necessarily theory-laden exercise. This undercuts claims of epistemic distinctness and, therefore, at least one path to claims of disruption."

--- Apparently Andrews thought better of draft title of "The Immortal Science of ML". (Part of me hopes she reconsiders.)

to:NB philosophy_of_science data_mining
17 days ago

copy to mine

The Unexpected Compression: Competition at Work in the Low Wage Labor Market | NBER

"Labor market tightness following the height of the Covid-19 pandemic led to an unexpected compression in the US wage distribution that reflects, in part, an increase in labor market competition. Rapid relative wage growth at the bottom of the distribution reduced the college wage premium and counteracted nearly 40% of the four-decade increase in aggregate 90-10 log wage inequality. Wage compression was accompanied by rapid nominal wage growth and rising job-to-job separations—especially among young non-college (high school or less) workers. Comparing across states, post-pandemic labor market tightness became strongly predictive of real wage growth among low-wage workers (wage-Phillips curve), and aggregate wage compression. Simultaneously, the wage-separation elasticity—a key measure of labor market competition—rose among young non-college workers, with wage gains concentrated among workers who changed employers. Seen through the lens of a canonical job ladder model, the pandemic increased the elasticity of labor supply to firms in the low-wage labor market, reducing employer market power and spurring rapid relative wage growth among young noncollege workers who disproportionately moved from lower-paying to higher-paying and potentially more-productive jobs."

to:NB economics class_struggles_in_america inequality coronavirus_pandemic_of_2019-- to_teach:statistics_of_inequality_and_discrimination
17 days ago

copy to mine

[2310.17651] High-Dimensional Prediction for Sequential Decision Making

"We study the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers. We give efficient algorithms for solving this problem, as well as a number of applications that stem from choosing an appropriate set of conditioning events.
"For example, we can efficiently make predictions targeted at polynomially many decision makers, giving each of them optimal swap regret if they best-respond to our predictions. We generalize this to online combinatorial optimization, where the decision makers have a very large action space, to give the first algorithms offering polynomially many decision makers no regret on polynomially many subsequences that may depend on their actions and the context. We apply these results to get efficient no-subsequence-regret algorithms in extensive-form games (EFGs), yielding a new family of regret guarantees for EFGs that generalizes some existing EFG regret notions, e.g. regret to informed causal deviations, and is generally incomparable to other known such notions.
"Next, we develop a novel transparent alternative to conformal prediction for building valid online adversarial multiclass prediction sets. We produce class scores that downstream algorithms can use for producing valid-coverage prediction sets, as if these scores were the true conditional class probabilities. We show this implies strong conditional validity guarantees including set-size-conditional and multigroup-fair coverage for polynomially many downstream prediction sets. Moreover, our class scores can be guaranteed to have improved L2 loss, cross-entropy loss, and generally any Bregman loss, compared to any collection of benchmark models, yielding a high-dimensional real-valued version of omniprediction."

to:NB prediction decision_theory low-regret_learning roth.aaron conformal_prediction
17 days ago

copy to mine

[2309.13786] Distribution-Free Statistical Dispersion Control for Societal Applications

"Explicit finite-sample statistical guarantees on model performance are an important ingredient in responsible machine learning. Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range. However, for many high-stakes applications, it is crucial to understand and control the dispersion of a loss distribution, or the extent to which different members of a population experience unequal effects of algorithmic decisions. We initiate the study of distribution-free control of statistical dispersion measures with societal implications and propose a simple yet flexible framework that allows us to handle a much richer class of statistical functionals beyond previous work. Our methods are verified through experiments in toxic comment detection, medical imaging, and film recommendation."

to:NB learning_theory
18 days ago

copy to mine

[2305.18887] How Does Information Bottleneck Help Deep Learning?

"Numerous deep learning algorithms have been inspired by and understood via the notion of information bottleneck, where unnecessary information is (often implicitly) minimized while task-relevant information is maximized. However, a rigorous argument for justifying why it is desirable to control information bottlenecks has been elusive. In this paper, we provide the first rigorous learning theory for justifying the benefit of information bottleneck in deep learning by mathematically relating information bottleneck to generalization errors. Our theory proves that controlling information bottleneck is one way to control generalization errors in deep learning, although it is not the only or necessary way. We investigate the merit of our new mathematical findings with experiments across a range of architectures and learning settings. In many cases, generalization errors are shown to correlate with the degree of information bottleneck: i.e., the amount of the unnecessary information at hidden layers. This paper provides a theoretical foundation for current and future methods through the lens of information bottleneck. Our new generalization bounds scale with the degree of information bottleneck, unlike the previous bounds that scale with the number of parameters, VC dimension, Rademacher complexity, stability or robustness."

to:NB information_bottleneck neural_networks learning_theory
18 days ago

copy to mine

[2311.03910] Structure of universal formulas

"By universal formulas we understand parameterized analytic expressions that have a fixed complexity, but nevertheless can approximate any continuous function on a compact set. There exist various examples of such formulas, including some in the form of neural networks. In this paper we analyze the essential structural elements of these highly expressive models. We introduce a hierarchy of expressiveness classes connecting the global approximability property to the weaker property of infinite VC dimension, and prove a series of classification results for several increasingly complex functional families. In particular, we introduce a general family of polynomially-exponentially-algebraic functions that, as we prove, is subject to polynomial constraints. As a consequence, we show that fixed-size neural networks with not more than one layer of neurons having transcendental activations (e.g., sine or standard sigmoid) cannot in general approximate functions on arbitrary finite sets. On the other hand, we give examples of functional families, including two-hidden-layer neural networks, that approximate functions on arbitrary finite sets, but fail to do that on the whole domain of definition."

to:NB learning_theory neural_networks approximation
19 days ago

copy to mine

[2212.13628] Functional Expansions

"Path dependence is omnipresent in many disciplines such as engineering, system theory and finance. It reflects the influence of the past on the future, often expressed through functionals. However, non-Markovian problems are often infinite-dimensional, thus challenging from a conceptual and computational perspective. In this work, we shed light on expansions of functionals. First, we treat static expansions made around paths of fixed length and propose a generalization of the Wiener series−the intrinsic value expansion (IVE). In the dynamic case, we revisit the functional Taylor expansion (FTE). The latter connects the functional Itô calculus with the signature to quantify the effect in a functional when a "perturbation" path is concatenated with the source path. In particular, the FTE elegantly separates the functional from future trajectories. The notions of real analyticity and radius of convergence are also extended to the path space. We discuss other dynamic expansions arising from Hilbert projections and the Wiener chaos, and finally show financial applications of the FTE to the pricing and hedging of exotic contingent claims."

to:NB stochastic_processes finance
19 days ago

copy to mine

SAT and ACT predict college GPA after removing g - ScienceDirect

"This research examined whether the SAT and ACT would predict college grade point average (GPA) after removing g from the tests. SAT and ACT scores and freshman GPAs were obtained from a university sample (N = 161) and the 1997 National Longitudinal Study of Youth (N = 8984). Structural equation modeling was used to examine relationships among g, GPA, and the SAT and ACT. The g factor was estimated from commercial cognitive tests (e.g., Wonderlic and Wechsler Adult Intelligence Scale) and the computer-adaptive Armed Services Vocational Aptitude Battery. The unique variances of the SAT and ACT, obtained after removing g, were used to predict GPA. Results from both samples converged: While the SAT and ACT were highly g loaded, both tests generally predicted GPA after removing g. These results suggest that the SAT and ACT are strongly related to g, which is related to IQ and intelligence tests. They also suggest that the SAT and ACT predict GPA from non-g factors. Further research is needed to identify the non-g factors that contribute to the predictive validity of the SAT and ACT."

in_NB iq standardized_testing education re:g_paper
19 days ago

copy to mine

[2309.17016] Efficient Agnostic Learning with Average Smoothness

"We study distribution-free nonparametric regression following a notion of average smoothness initiated by Ashlagi et al. (2021), which measures the "effective" smoothness of a function with respect to an arbitrary unknown underlying distribution. While the recent work of Hanneke et al. (2023) established tight uniform convergence bounds for average-smooth functions in the realizable case and provided a computationally efficient realizable learning algorithm, both of these results currently lack analogs in the general agnostic (i.e. noisy) case.
"In this work, we fully close these gaps. First, we provide a distribution-free uniform convergence bound for average-smoothness classes in the agnostic setting. Second, we match the derived sample complexity with a computationally efficient agnostic learning algorithm. Our results, which are stated in terms of the intrinsic geometry of the data and hold over any totally bounded metric space, show that the guarantees recently obtained for realizable learning of average-smooth functions transfer to the agnostic setting. At the heart of our proof, we establish the uniform convergence rate of a function class in terms of its bracketing entropy, which may be of independent interest."

in_NB nonparametrics learning_theory empirical_processes kith_and_kin kontorovich.aryeh hanneke.steve
19 days ago

copy to mine

The Chile Project

"How Chile became home to the world’s most radical free-market experiment—and what its downfall suggests about the fate of neoliberalism around the globe
"In The Chile Project, Sebastian Edwards tells the remarkable story of how the neoliberal economic model—installed in Chile during the Pinochet dictatorship and deepened during three decades of left-of-center governments—came to an end in 2021, when Gabriel Boric, a young former student activist, was elected president, vowing that “If Chile was the cradle of neoliberalism, it will also be its grave.” More than a story about one Latin American country, The Chile Project is a behind-the-scenes history of the spread and consequences of the free-market thinking that dominated economic policymaking around the world in the second half of the twentieth century—but is now on the retreat.
"In 1955, the U.S. State Department launched the “Chile Project” to train Chilean economists at the University of Chicago, home of the libertarian Milton Friedman. After General Augusto Pinochet overthrew socialist president Salvador Allende in 1973, Chile’s “Chicago Boys” implemented the purest neoliberal model in the world for the next seventeen years, undertaking a sweeping package of privatization and deregulation, creating a modern capitalist economy, and sparking talk of a “Chilean miracle.” But under the veneer of success, a profound dissatisfaction with the vast inequalities caused by neoliberalism was growing. In 2019, protests erupted throughout the country, and in 2022 Boric began his presidency with a clear mandate: to end neoliberalismo.
"In telling the fascinating story of the Chicago Boys and Chile’s free-market revolution, The Chile Project provides an important new perspective on the history of neoliberalism and its global decline today."

--- Via [https://pinboard.in/u:cshalizi/b:036b460cc9fb]

in_NB books:noted downloaded chile neoliberalism economists_in_politics
19 days ago

copy to mine

What’s the Matter with Chile? - American Affairs Journal

--- Not sure if the mis-description of Cybersyn originates with the author or the reviewer
--- Book in question: [https://pinboard.in/u:cshalizi/b:ebe3db70bd92]

chile neoliberalism economics intellectuals_in_politics book_reviews have_read project_cybersyn
19 days ago

copy to mine

[2207.12382] On Confidence Sequences for Bounded Random Processes via Universal Gambling Strategies

"This paper considers the problem of constructing a confidence sequence for bounded random processes. Building upon the gambling approach pioneered by Hendriks (2018) and Jun and Orabona (2019) and following the recent work of Waudby-Smith and Ramdas (2020) and Orabona and Jun (2021), this paper revisits the idea of Cover (1991)'s universal portfolio in constructing confidence sequences and demonstrates new properties, based on a natural \emph{two-horse race} perspective on the gambling approach. The main result of this paper is a new algorithm based on a mixture of lower bounds, which closely approximates the performance of Cover's universal portfolio with only constant per-round time complexity. A higher-order generalization of a lower bound in (Fan et al, 2015), which is invoked in the proposed algorithm, may be of independent interest."

to:NB prediction confidence_sets universal_prediction information_theory
19 days ago

copy to mine

[2309.10140] A Geometric Framework for Neural Feature Learning

"We present a novel framework for learning system design based on neural feature extractors by exploiting geometric structures in feature spaces. First, we introduce the feature geometry, which unifies statistical dependence and features in the same functional space with geometric structures. By applying the feature geometry, we formulate each learning problem as solving the optimal feature approximation of the dependence component specified by the learning setting. We propose a nesting technique for designing learning algorithms to learn the optimal features from data samples, which can be applied to off-the-shelf network architectures and optimizers. To demonstrate the application of the nesting technique, we further discuss multivariate learning problems, including conditioned inference and multimodal learning, where we present the optimal features and reveal their connections to classical approaches."

to:NB information_geometry variable_selection neural_networks statistics
19 days ago

copy to mine

5th Symposium on Approximate Bayesian Inference (2023)
statistics computational_statistics bayesianism to_download
19 days ago

copy to mine

[2006.06466] How Interpretable and Trustworthy are GAMs?

"Generalized additive models (GAMs) have become a leading modelclass for interpretable machine learning. However, there are many algorithms for training GAMs, and these can learn different or even contradictory models, while being equally accurate. Which GAM should we trust? In this paper, we quantitatively and qualitatively investigate a variety of GAM algorithms on real and simulated datasets. We find that GAMs with high feature sparsity (only using afew variables to make predictions) can miss patterns in the data and be unfair to rare subpopulations. Our results suggest that inductive bias plays a crucial role in what interpretable models learn and that tree-based GAMs represent the best balance of sparsity, fidelity and accuracy and thus appear to be the most trustworthy GAM."

to:NB additive_models
19 days ago

copy to mine

Clark (2023) and the Persistence of Hereditarian Fallacies | bioRxiv

"Clark (2023) considers the similarity in socioeconomic status between relatives, drawing on records spanning four centuries in England. The paper adapts a classic quantitative genetics model in order to argue the fit of the model to the data suggests that: (1) variation in socioeconomic status is largely determined by additive genetic variation; (2) contemporary English people “remain correlated in outcomes with their lineage relatives in exactly the same way as in preindustrial England”; and (3) social mobility has remained static over this time period due to strong assortative mating on a “social genotype.” These conclusions are based on a misconstrual of model parameters, which conflates genetic and non-genetic transmission (e.g. of wealth) within families. As we show, there is strong confounding of genetic and non-genetic sources of similarity in these data. Inconsistent with claims (2) and (3), we show that familial correlations in status are variable—generally decreasing—through the time period analyzed. Lastly, we find that statistical artifacts substantially bias estimates of familial correlations in the paper. Overall, Clark (2023) provides no information about the relative contribution of genetic and non-genetic factors to social status."

to:NB to_read human_genetics historical_genetics transmission_of_inequality to_teach:statistics_of_inequality_and_discrimination
22 days ago

copy to mine

Validity of the GRE without Restriction of Range - Bradley E. Huitema, Cheri R. Stein, 1993

"Restriction of range is a frequently acknowledged issue in estimating the validity of predictors of academic performance in graduate school. Data obtained from a doctoral program in a psychology department where graduate students were admitted without regard to Graduate Record Examination (GRE) scores yielded essentially identical standard deviations on this test for the 204 applicants and 138 enrolled students. The GRE-Total validity coefficients obtained on subjects in the enrolled sample ranged from .55 through .70; these values are considerably higher than those typically reported. The data are congruent with the argument that uncorrected GRE validity coefficients yield biased estimates of the unknown validity in unrestricted applicant pools."

--- I guess my position is that I think standardized tests like the GRE have evolved to be pretty good predictors of whether someone is ready for various educational programs (without major investments on the part of the program in remedial work, change in the structure of the program, etc.), which in no way implies the existence of a unitary ability being measured. I also frankly think they're going to be _less_ gameable, and less likely to reproduce mere cultural capital, than the feasible alternatives...

to:NB to_read via:? standardized_testing mental_testing to_teach:statistics_of_inequality_and_discrimination psychometrics prediction
22 days ago

copy to mine

Polygenic scoring accuracy varies across the genetic ancestry continuum | Nature

"Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1,2,3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled ‘homogeneous’ genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of −0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs."

to:NB statistics human_genetics
22 days ago

copy to mine

Edgeworth's mathematization of social well-being - ScienceDirect

"Francis Ysidro Edgeworth's unduly neglected monograph New and Old Methods of Ethics (1877) advances a highly sophisticated and mathematized account of social well-being in the utilitarian tradition of his 19th-century contemporaries. This article illustrates how his usage of the ‘calculus of variations’ was combined with findings from empirical psychology and economic theory to construct a consequentialist axiological framework. A conclusion is drawn that Edgeworth is a methodological predecessor to several important methods, ideas, and issues that continue to be discussed in contemporary social well-being studies."

to:NB economics ethics political_philosophy calculus_of_variations history_of_ideas
22 days ago

copy to mine

[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces

"Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation."

--- This sounds extremely promising!
--- Prediction before reading beyond the abstract: this will be an example of what Cox called an "observation-driven" model, a.k.a. a "chain with complete connections" [http://bactra.org/notebooks/chains-with-complete-connections.html].
--- After an initial read: It is indeed a CCC. I should re-read it carefully (the hardware bits I frankly skimmed), but this is very cool, and will repay further study.

in_NB to_read large_language_models_(so_called) neural_networks have_skimmed
22 days ago

copy to mine

Modelling Scientific Communities

"This Element will overview research using models to understand scientific practice. Models are useful for reasoning about groups and processes that are complicated and distributed across time and space, i.e., those that are difficult to study using empirical methods alone. Science fits this picture. For this reason, it is no surprise that researchers have turned to models over the last few decades to study various features of science. The different sections of the element are mostly organized around different modeling approaches. The models described in this element sometimes yield take-aways that are straightforward, and at other times more nuanced. The Element ultimately argues that while these models are epistemically useful, the best way to employ most of them to understand and improve science is in combination with empirical methods and other sorts of theorizing."

in_NB books:noted downloaded sociology_of_science philosophy_of_science science_as_a_social_process oconnor.cailin
22 days ago

copy to mine

An Elementary Theorem Concerning Stationary Ergodic Processes on JSTOR

--- Cute.

stochastic_processes ergodic_theory have_read breiman.leo re:almost_none in_NB
25 days ago

copy to mine

Bias and Confidence in Not-quite Large Samples (Tukey, 1958)

--- Tukey (1958) on the jackknife turns out to be a one-paragraph abstract of a conference talk, which apparently was never published in any more fleshed-out form. I can't decide if this is very frustrating, or absolutely heroic.

to:NB have_read jackknife tukey.john_w.
27 days ago

copy to mine

Reviewed Work: The Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications. by Norbert Wiener Review by: John W. Tukey (1952)
to:NB time_series tukey.john_w. wiener.norbert have_read
27 days ago

copy to mine

Lie-to-children - Wikipedia

--- This alleges that the phrase originates in Ian Stewart and Jack Cohen's (good) book _The Collapse of Chaos_. It helpfully provides page numbers. The phrase does not appear on those pages. (Wikipedia even links to an archive.org scan of the pages where you can see that it doesn't, but I just checked against my old paperback copy.) The phrase does not appear in the index to the book. "Explanation" does, and none of those pages contain the phrase. Google Books search for the book doesn't show me enough to be certain but it doesn't look good.

Wikipedia also cites two articles by a literary scholar studying Stewart and Cohen's collaboration with Pratchett (_Science of Discworld_). But the one of those articles I can access _doesn't claim_ that the phrase originated with Stewart and Cohen, just that they use the expository tactic. (The other one might go into this, I can't say.)

I strongly suspect that the phrase originated _with Pratchett_, in his novels, and that Stewart and Cohen adopted it for the collaboration, as an instance of their-kind-of-thing.

lies_told_to_children pedagogy oh_wikipedia rhetoric
27 days ago

copy to mine

Notes on Bias in Estimation on JSTOR
to:NB have_read jackknife to_teach
27 days ago

copy to mine

Modeling Social Behavior | Princeton University Press

"This book provides a unified, theory-driven introduction to key mathematical and agent-based models of social dynamics and cultural evolution, teaching readers how to build their own models, analyze them, and integrate them with empirical research programs. It covers a variety of modeling topics, each exemplified by one or more archetypal models, and helps readers to develop strong theoretical foundations for understanding social behavior. Modeling Social Behavior equips social, behavioral, and cognitive scientists with an essential tool kit for thinking about and studying complex social systems using mathematical and computational models."

to:NB books:noted coveted smaldino.paul agent-based_models cultural_evolution sociology books:suggest_to_library
5 weeks ago

copy to mine

Soft Matter | Princeton University Press

"Soft matter science is an interdisciplinary field at the interface of physics, biology, chemistry, engineering, and materials science. It encompasses colloids, polymers, and liquid crystals as well as rapidly emerging topics such as metamaterials, memory formation and learning in matter, bioactive systems, and artificial life. This textbook introduces key phenomena and concepts in soft matter from a modern perspective, marrying established knowledge with the latest developments and applications. The presentation integrates statistical mechanics, dynamical systems, and hydrodynamic approaches, emphasizing conservation laws and broken symmetries as guiding principles while paying attention to computational and machine learning advances."

to:NB books:noted statistical_mechanics books:suggest_to_library
5 weeks ago

copy to mine

Statistical Mechanics of Phases and Phase Transitions | Princeton University Press

"Statistical mechanics deploys a powerful set of mathematical approaches for studying the thermodynamic properties of complex physical systems. This textbook introduces students to the statistical mechanics of systems undergoing changes of state, focusing on the basic principles for classifying distinct thermodynamic phases and the critical phenomena associated with transitions between them. Uniquely designed to promote active learning, Statistical Mechanics of Phases and Phase Transitions presents some of the most beautiful and profound concepts in physics, enabling students to obtain an essential understanding of a computationally challenging subject without getting lost in the details."

to:NB books:noted phase_transitions statistical_mechanics books:suggest_to_library
5 weeks ago

copy to mine

Data Science for Neuroimaging | Princeton University Press

"As neuroimaging turns toward data-intensive discovery, researchers in the field must learn to access, manage, and analyze datasets at unprecedented scales. Concerns about reproducibility and increased rigor in reporting of scientific results also demand higher standards of computational practice. This book offers neuroimaging researchers an introduction to data science, presenting methods, tools, and approaches that facilitate automated, reproducible, and scalable analysis and understanding of data. Through guided, hands-on explorations of openly available neuroimaging datasets, the book explains such elements of data science as programming, data management, visualization, and machine learning, and describes their application to neuroimaging. Readers will come away with broadly relevant data science skills that they can easily translate to their own questions."

to:NB books:noted fmri neural_data_analysis yarkoni.tal statistics self-recommending books:suggest_to_library
5 weeks ago

copy to mine

College students are still struggling with basic math. Professors blame the pandemic | AP News

Joy.

education coronavirus_pandemic_of_2019-- inequality transmission_of_inequality
5 weeks ago

copy to mine

Projective families of distributions revisited - ScienceDirect

"The behaviour of statistical relational representations across differently sized domains has become a focal area of research from both a modelling and a complexity viewpoint. Recently, projectivity of a family of distributions emerged as a key property, ensuring that marginal probabilities are independent of the domain size. However, the formalisation used currently assumes that the domain is characterised only by its size. This contribution extends the notion of projectivity from families of distributions indexed by domain size to functors taking extensional data from a database. This makes projectivity available for the large range of applications taking structured input. We transfer key known results on projective families of distributions to the new setting. This includes a characterisation of projective fragments in different statistical relational formalisms as well as a general representation theorem for projective families of distributions. Furthermore, we prove a correspondence between projectivity and distributions on countably infinite domains, which we use to unify and generalise earlier work on statistical relational representations in infinite domains. Finally, we use the extended notion of projectivity to define a further strengthening, which we call σ-projectivity, and which allows the use of the same representation in different modes while retaining projectivity."

--- I should just get over my aversion to abstract nonsense and drink the category-theoretic kool-aid, shouldn't I?

in_NB projectivity re:your_favorite_ergm_sucks category_theory
5 weeks ago

copy to mine

[2310.00865] Data Science at the Singularity

"A purported `AI Singularity' has been in the public eye recently. Mass media and US national political attention focused on `AI Doom' narratives hawked by social media influencers. The European Commission is announcing initiatives to forestall `AI Extinction'. In my opinion, `AI Singularity' is the wrong narrative for what's happening now; recent happenings signal something else entirely. Something fundamental to computation-based research really changed in the last ten years. In certain fields, progress is dramatically more rapid than previously, as the fields undergo a transition to frictionless reproducibility (FR). This transition markedly changes the rate of spread of ideas and practices, affects mindsets, and erases memories of much that came before.
"The emergence of frictionless reproducibility follows from the maturation of 3 data science principles in the last decade. Those principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services. Empirical Machine Learning (EML) is todays leading adherent field, and its consequent rapid changes are responsible for the AI progress we see. Still, other fields can and do benefit when they adhere to the same principles.
"Many rapid changes from this maturation are misidentified. The advent of FR in EML generates a steady flow of innovations; this flow stimulates outsider intuitions that there's an emergent superpower somewhere in AI. This opens the way for PR to push worrying narratives: not only `AI Extinction', but also the supposed monopoly of big tech on AI research. The helpful narrative observes that the superpower of EML is adherence to frictionless reproducibility practices; these practices are responsible for the striking progress in AI that we see everywhere."

to:NB donoho.david reproducibility computational_statistics to_teach:statcomp to_teach:data-mining
5 weeks ago

copy to mine

Bill Hamilton (2003)
have_read lives_of_the_scientists evolutionary_biology genetics hamilton.william psychoceramics
5 weeks ago

copy to mine

Kevin J. Lande, Pictorial Syntax - PhilPapers

"It is commonly assumed that images, whether in the world or in the head, do not have a privileged analysis into constituent parts. They are thought to lack the sort of syntactic structure necessary for representing complex contents and entering into sophisticated patterns of inference. I reject this assumption. “Image grammars” are models in computer vision that articulate systematic principles governing the form and content of images. These models are empirically credible and can be construed as literal grammars for images. Images can have rich syntactic structure, though of a markedly different form than sentences in language."

--- Image grammars! Now there's something I've not thought about since the 1990s...

cognitive_science syntax automata_theory to:NB
5 weeks ago

copy to mine

[2211.01126] Likelihood-free hypothesis testing

"Consider the problem of binary hypothesis testing. Given Z coming from either ℙ⊗m or ℚ⊗m, to decide between the two with small probability of error it is sufficient and in most cases necessary to have m≍1/ϵ2, where ϵ measures the separation between ℙ and ℚ in total variation (𝖳𝖵). Achieving this, however, requires complete knowledge of the distributions and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to ℙ and ℚ is given through n iid observations from each. In the case when ℙ,ℚ are assumed to belong to a non-parametric family , we demonstrate the existence of a fundamental trade-off between n and m given by nm≍n2𝖦𝗈𝖥(ϵ,), where n𝖦𝗈𝖥 is the minimax sample complexity of testing between the hypotheses H0:ℙ=ℚ vs H1:𝖳𝖵(ℙ,ℚ)≥ϵ. We show this for three families of distributions: β-smooth densities supported on [0,1]d, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions on alphabet [k]={1,2,…,k} with pmfs bounded by c/k for fixed c. For the larger family of all distributions on [k] we obtain a more complicated trade-off that exhibits a phase-transition. The test that we propose, based on the L2-distance statistic of Ingster, simultaneously achieves all points on the trade-off curve for the regular classes. This demonstrates the possibility of testing without fully estimating the distributions, provided m≫1/ϵ2."

to:NB to_read hypothesis_testing simulation-based_inference
5 weeks ago

copy to mine

Feasible Peer Effects: Experimental Evidence for Deskmate Effects on Educational Achievement and Inequality | Sociological Science

"Schools routinely employ seating charts to influence educational outcomes. Dependable evidence for the causal effects of seating charts on students’ achievement levels and inequality, however, is scarce. We executed a large pre-registered field experiment to estimate causal peer effects on students’ test scores and grades by randomizing the seating charts of 195 classrooms (N=3,365 students). We found that neither sitting next to a deskmate with higher prior achievement nor sitting next to a female deskmate affected learning outcomes on average. However, we also found that sitting next to the highest-achieving deskmates improved the educational outcomes of the lowest-achieving students; and sitting next to the lowest-achieving deskmates lowered the educational outcomes of the highest-achieving students. Therefore, compared to random seating charts, achievement-discordant seating charts would decrease inequality; whereas achievement concordant seating charts would increase inequality. We discuss policy implications."

--- Elwert is sound, but the fact that they find the effects only for the extremes seems a bit funny.

to:NB to_read experimental_sociology social_influence elwert.felix
5 weeks ago

copy to mine

[2203.14223] Identifying Peer Influence in Therapeutic Communities

"We investigate if there is a peer influence or role model effect on successful graduation from Therapeutic Communities (TCs). We analyze anonymized individual-level observational data from 3 TCs that kept records of written exchanges of affirmations and corrections among residents, and their precise entry and exit dates. The affirmations allow us to form peer networks, and the entry and exit dates allow us to define a causal effect of interest. We conceptualize the causal role model effect as measuring the difference in the expected outcome of a resident (ego) who can observe one of their social contacts (e.g., peers who gave affirmations), to be successful in graduating before the ego's exit vs not successfully graduating before the ego's exit. Since peer influence is usually confounded with unobserved homophily in observational data, we model the network with a latent variable model to estimate homophily and include it in the outcome equation. We provide a theoretical guarantee that the bias of our peer influence estimator decreases with sample size. Our results indicate there is an effect of peers' graduation on the graduation of residents. The magnitude of peer influence differs based on gender, race, and the definition of the role model effect. A counterfactual exercise quantifies the potential benefits of intervention of assigning a buddy to "at-risk" individuals directly on the treated resident and indirectly on their peers through network propagation."

--- OK, maybe we should have written out the more general "assume you can estimate latent locations in an arbitrary graphon at such-and-such a rate" theorem...

to:NB to_read have_skimmed re:community-control network_data_analysis homophily social_influence
5 weeks ago

copy to mine

[2310.16028] What Algorithms can Transformers Learn? A Study in Length Generalization

"Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers."

in_NB large_language_models_(so_called)
5 weeks ago

copy to mine

Emery Brown and the Truth About Anesthesia | Quanta Magazine
neuroscience consciousness anesthesia brown.emery strogatz.steven have_read
5 weeks ago

copy to mine

[2311.04378] Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

"Watermarking generative models consists of planting a statistical signal (watermark) in a model's output so that it can be later verified that the output was generated by the given model. A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation. In this paper, we study the (im)possibility of strong watermarking schemes. We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve. This holds even in the private detection algorithm setting, where the watermark insertion and detection algorithms share a secret key, unknown to the attacker. To prove this result, we introduce a generic efficient watermark attack; the attacker is not required to know the private key of the scheme or even which scheme is used. Our attack is based on two assumptions: (1) The attacker has access to a "quality oracle" that can evaluate whether a candidate output is a high-quality response to a prompt, and (2) The attacker has access to a "perturbation oracle" which can modify an output with a nontrivial probability of maintaining quality, and which induces an efficiently mixing random walk on high-quality outputs. We argue that both assumptions can be satisfied in practice by an attacker with weaker computational capabilities than the watermarked model itself, to which the attacker has only black-box access. Furthermore, our assumptions will likely only be easier to satisfy over time as models grow in capabilities and modalities. We demonstrate the feasibility of our attack by instantiating it to attack three existing watermarking schemes for large language models: Kirchenbauer et al. (2023), Kuditipudi et al. (2023), and Zhao et al. (2023). The same attack successfully removes the watermarks planted by all three schemes, with only minor quality degradation."

in_NB large_language_models_(so_called)
5 weeks ago

copy to mine

[2311.03658] The Linear Representation Hypothesis and the Geometry of Large Language Models

"Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of "linear representation", one in the output (word) representation space, and one in the input (sentence) space. We then prove these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product."

in_NB text_mining large_language_models_(so_called) veitch.victor
5 weeks ago

copy to mine

[2205.05055] Data Distributional Properties Drive Emergent In-Context Learning in Transformers

"Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having large numbers of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed. These properties are exemplified by natural language, but are also inherent to naturalistic data in a wide range of other domains. They also depart significantly from the uniform, i.i.d. training distributions typically used for standard supervised learning. In our initial experiments, we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously. However, our later experiments uncovered that the two modes of learning could co-exist in a single model when it was trained on data following a skewed Zipfian distribution -- another common property of naturalistic data, including language. In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models. In sum, our findings indicate how the transformer architecture works together with particular properties of the training data to drive the intriguing emergent in-context learning behaviour of large language models, and how future work might encourage both in-context and in-weights learning in domains beyond language."

in_NB large_language_models_(so_called)
5 weeks ago

copy to mine

[2309.01809] Are Emergent Abilities in Large Language Models just In-Context Learning?

"Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities. The emergence of such abilities carries profound implications for the future direction of research in NLP, especially as the deployment of such models becomes more prevalent. However, one key challenge is that the evaluation of these abilities is often confounded by competencies that arise in models through alternative prompting techniques, such as in-context learning and instruction following, which also emerge as the models are scaled up. In this study, we provide the first comprehensive examination of these emergent abilities while accounting for various potentially biasing factors that can influence the evaluation of models. We conduct rigorous tests on a set of 18 models, encompassing a parameter range from 60 million to 175 billion parameters, across a comprehensive set of 22 tasks. Through an extensive series of over 1,000 experiments, we provide compelling evidence that emergent abilities can primarily be ascribed to in-context learning. We find no evidence for the emergence of reasoning abilities, thus providing valuable insights into the underlying mechanisms driving the observed abilities and thus alleviating safety concerns regarding their use."

in_NB large_language_models_(so_called)
5 weeks ago

copy to mine

[2310.12397] GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems

"There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples, a wide spread belief in their iterative self-critique capabilities persists. In this paper, we set out to systematically investigate the effectiveness of iterative prompting of LLMs in the context of Graph Coloring, a canonical NP-complete reasoning problem that is related to propositional satisfiability as well as practical problems like scheduling and allocation. We present a principled empirical study of the performance of GPT4 in solving graph coloring instances or verifying the correctness of candidate colorings. In iterative modes, we experiment with the model critiquing its own answers and an external correct reasoner verifying proposed solutions. In both cases, we analyze whether the content of the criticisms actually affects bottom line performance. The study seems to indicate that (i) LLMs are bad at solving graph coloring instances (ii) they are no better at verifying a solution--and thus are not effective in iterative modes with LLMs critiquing LLM-generated solutions (iii) the correctness and content of the criticisms--whether by LLMs or external solvers--seems largely irrelevant to the performance of iterative prompting. We show that the observed increase in effectiveness is largely due to the correct solution being fortuitously present in the top-k completions of the prompt (and being recognized as such by an external verifier). Our results thus call into question claims about the self-critiquing capabilities of state of the art LLMs."

in_NB large_language_models_(so_called) artificial_intelligence graph_theory
5 weeks ago

copy to mine

[2310.08118] Can Large Language Models Really Improve by Self-critiquing Their Own Plans?

There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued by those claims, in this paper we set out to investigate the verification/self-critiquing abilities of large language models in the context of planning. We evaluate a planning system that employs LLMs for both plan generation and verification. We assess the verifier LLM's performance against ground-truth verification, the impact of self-critiquing on plan generation, and the influence of varying feedback levels on system performance. Using GPT-4, a state-of-the-art LLM, for both generation and verification, our findings reveal that self-critiquing appears to diminish plan generation performance, especially when compared to systems with external, sound verifiers and the LLM verifiers in that system produce a notable number of false positives, compromising the system's reliability. Additionally, the nature of feedback, whether binary or detailed, showed minimal impact on plan generation. Collectively, our results cast doubt on the effectiveness of LLMs in a self-critiquing, iterative framework for planning tasks.

in_NB large_language_models_(so_called) artificial_intelligence
5 weeks ago

copy to mine

Mapping Texts - Paperback - Dustin S. Stoltz; Marshall A. Taylor - Oxford University Press

"Learn how to conduct a robust text analysis project from start to finish--and then do it again.
"Mining is the dominant metaphor in computational text analysis. When mining texts, the implied assumption is that analysts can find kernels of truth--they just have to sift through the rubbish first. In this book, Dustin Stoltz and Marshall Taylor encourage text analysts to work with a different metaphor in mind: mapping. When mapping texts, the goal is not necessarily to find meaningful needles in the haystack, but instead to create reductions of the text to document patterns. Just like with cartographic maps, though, the type and nature of the textual map is dependent on a range of decisions on the part of the researcher. Creating reproducible workflows is therefore critical for the text analyst.
"Mapping Texts offers a practical introduction to computational text analysis with step-by-step guides on how to conduct actual text analysis workflows in the R statistical computing environment. The focus is on social science questions and applications, with data ranging from fake news and presidential campaigns to Star Trek and pop stars. The book walks the reader through all facets of a text analysis workflow--from understanding the theories of language embedded in text analysis, all the way to more advanced and cutting-edge techniques.
"The book will prove useful not only to social scientists, but anyone interested in conducting text analysis projects."

to:NB books:noted text_mining books:suggest_to_library
5 weeks ago

copy to mine

[2310.05921] Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions

"We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions. Examples of such decisions are ubiquitous, from robot planning algorithms that rely on pedestrian predictions, to calibrating autonomous manufacturing to exhibit high throughput and low error, to the choice of trusting a nominal policy versus switching to a safe backup policy at run-time. The decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk without any assumptions on the world model whatsoever; the observations need not be I.I.D. and can even be adversarial. The theory extends results from conformal prediction to calibrate decisions directly, without requiring the construction of prediction sets. Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing."

to:NB to_read conformal_prediction decision_theory jordan.michael_i. via:rvenkat
6 weeks ago

copy to mine

Less Discriminatory Algorithms by Emily Black, John Logan Koepke, Pauline Kim, Solon Barocas, Mingwei Hsu :: SSRN

"Entities that use algorithmic systems in traditional civil rights domains like housing, employment, and credit should have a duty to search for and implement less discriminatory algorithms (LDAs). Why? Work in computer science has established that, contrary to conventional wisdom, for a given prediction problem there are almost always multiple possible models with equivalent performance—a phenomenon termed model multiplicity. Critically for our purposes, different models of equivalent performance can produce different predictions for the same individual, and, in aggregate, exhibit different levels of impacts across demographic groups. As a result, when an algorithmic system displays a disparate impact, model multiplicity suggests that developers may be able to discover an alternative model that performs equally well, but has less discriminatory impact. Indeed, the promise of model multiplicity is that an equally accurate, but less discriminatory alternative algorithm almost always exists. But without dedicated exploration, it is unlikely developers will discover potential LDAs.
"Model multiplicity has profound ramifications for the legal response to discriminatory algorithms. Under disparate impact doctrine, it makes little sense to say that a given algorithmic system used by an employer, creditor, or housing provider is either “justified” or “necessary” if an equally accurate model that exhibits less disparate effect is available and possible to discover with reasonable effort. Indeed, the overarching purpose of our civil rights laws is to remove precisely these arbitrary barriers to full participation in the nation’s economic life, particularly for marginalized racial groups. As a result, the law should place a duty of a reasonable search for LDAs on entities that develop and deploy predictive models in covered civil rights domains. The law should recognize this duty in at least two specific ways. First, under disparate impact doctrine, a defendant’s burden of justifying a model with discriminatory effects should be recognized to include showing that it made a reasonable search for LDAs before implementing the model. Second, new regulatory frameworks for the governance of algorithms should include a requirement that entities search for and implement LDAs as part of the model building process."

in_NB to_read algorithmic_fairness law to_teach:statistics_of_inequality_and_discrimination via:rvenkat
6 weeks ago

copy to mine

[2310.17611] Uncovering Meanings of Embeddings via Partial Orthogonality

"Machine learning tools often rely on embedding text as vectors of real numbers. In this paper, we study how the semantic structure of language is encoded in the algebraic structure of such embeddings. Specifically, we look at a notion of ``semantic independence'' capturing the idea that, e.g., ``eggplant'' and ``tomato'' are independent given ``vegetable''. Although such examples are intuitive, it is difficult to formalize such a notion of semantic independence. The key observation here is that any sensible formalization should obey a set of so-called independence axioms, and thus any algebraic encoding of this structure should also obey these axioms. This leads us naturally to use partial orthogonality as the relevant algebraic structure. We develop theory and methods that allow us to demonstrate that partial orthogonality does indeed capture semantic independence. Complementary to this, we also introduce the concept of independence preserving embeddings where embeddings preserve the conditional independence structures of a distribution, and we prove the existence of such embeddings and approximations to them."

to:NB text_mining natural_language_processing veitch.victor aragam.bryon
7 weeks ago

copy to mine

[2309.03969] Estimating the prevalance of peer effects and other spillovers

"In settings where interference between units is possible, we define the prevalance of indirect effects to be the number of units who are affected by the treatment of others. This quantity does not fully identify an indirect effect, but may be used to show whether such effects are widely prevalent. Given a randomized experiment with binary-valued outcomes, methods are presented for conservative point estimation and one-sided interval estimation. No assumptions beyond randomization of treatment are required, allowing for usage in settings where models or assumptions on interference might be questionable. To show asymptotic coverage of our intervals in settings not covered by existing results, we provide a central limit theorem that combines local dependence and sampling without replacement. Consistency and minimax properties of the point estimator are shown as well. The approach is demonstrated on an experiment in which students were treated for a highly transmissible parasitic infection, for which we find that a significant fraction of students were affected by the treatment of schools other than their own."

to:NB causal_inference experiments_on_networks network_data_analysis kith_and_kin choi.david
7 weeks ago

copy to mine

Ethics of an Artificial Person: Lost Responsibility in Professions and Organizations
to:NB books:noted downloaded ethics political_philosophy re:shoggothim
8 weeks ago

copy to mine

[2310.16626] Scalable Causal Structure Learning via Amortized Conditional Independence Testing

"Controlling false positives (Type I errors) through statistical hypothesis testing is a foundation of modern scientific data analysis. Existing causal structure discovery algorithms either do not provide Type I error control or cannot scale to the size of modern scientific datasets. We consider a variant of the causal discovery problem with two sets of nodes, where the only edges of interest form a bipartite causal subgraph between the sets. We develop Scalable Causal Structure Learning (SCSL), a method for causal structure discovery on bipartite subgraphs that provides Type I error control. SCSL recasts the discovery problem as a simultaneous hypothesis testing problem and uses discrete optimization over the set of possible confounders to obtain an upper bound on the test statistic for each edge. Semi-synthetic simulations demonstrate that SCSL scales to handle graphs with hundreds of nodes while maintaining error control and good power. We demonstrate the practical applicability of the method by applying it to a cancer dataset to reveal connections between somatic gene mutations and metastases to different tissues."

to:NB hypothesis_testing causal_inference kith_and_kin ramdas.aaditya
8 weeks ago

copy to mine

Creativity in Large-Scale Contexts: Guiding Creative Engagem...

"Innovators and creators work in cultural, economic, and social contexts that shape their work. These contexts are large-scale, filled with overwhelming multitudes of elements and possibilities—but these contexts can be fruitfully "mined" by creative teams. Creativity in Large-Scale Contexts, by the Yale professor Jonathan S. Feinstein, introduces a groundbreaking new "network model" to describe how successful innovation can be focused, generated, and accelerated. The book will help teams and organizations innovate smarter and faster.
"Feinstein argues that in large-scale contexts creativity happens most efficiently when it is actively "guided" by a creative leader or team. Guiding creativity involves understanding, navigating, and actively using the cultural context, identifying puzzles and opportunities, and spanning these tensions to create novel connections. With thoughtful guidance, creators and creative teams can find their way through the thicket of possibilities faster, smarter, and with less waste."

--- Last tag applies because this sounds a bit vacuous, but the topic is interesting.

to:NB downloaded books:noted innovation re:democratic_cognition color_me_skeptical
8 weeks ago

copy to mine

A History of Fake Things on the Internet - Walter J. Scheire...

"As all aspects of our social and informational lives increasingly migrate online, the line between what is "real" and what is digitally fabricated grows ever thinner—and that fake content has undeniable real-world consequences. A History of Fake Things on the Internet takes the long view of how advances in technology brought us to the point where faked texts, images, and video content are nearly indistinguishable from what is authentic or true.
"Computer scientist Walter J. Scheirer takes a deep dive into the origins of fake news, conspiracy theories, reports of the paranormal, and other deviations from reality that have become part of mainstream culture, from image manipulation in the nineteenth-century darkroom to the literary stylings of large language models like ChatGPT. Scheirer investigates the origins of Internet fakes, from early hoaxes that traversed the globe via Bulletin Board Systems (BBSs), USENET, and a new messaging technology called email, to today's hyperrealistic, AI-generated Deepfakes. An expert in machine learning and recognition, Scheirer breaks down the technical advances that made new developments in digital deception possible, and shares behind-the-screens details of early Internet-era pranks that have become touchstones of hacker lore. His story introduces us to the visionaries and mischief-makers who first deployed digital fakery and continue to influence how digital manipulation works—and doesn't—today: computer hackers, digital artists, media forensics specialists, and AI researchers. Ultimately, Scheirer argues that problems associated with fake content are not intrinsic properties of the content itself, but rather stem from human behavior, demonstrating our capacity for both creativity and destruction."

to:NB re:actually-dr-internet-is-the-name-of-the-monsters-creator epidemiology_of_representations networked_life the_present_before_it_was_widely_distributed
8 weeks ago

copy to mine

Functional Itô Calculus by Bruno Dupire :: SSRN

"Itô calculus deals with functions of the current state whilst we deal with functions of the current path to acknowledge the fact that often the impact of randomness is cumulative. We express the differential of the functional in terms of adequately defined partial derivatives to obtain an Itô formula. We develop an extension of the Feynman-Kac formula to the functional case and an explicit expression of the integrand in the Martingale Representation Theorem, providing an alternative to the Clark-Ocone formula from Malliavin Calculus. We establish that under certain conditions, even path dependent options prices satisfy a partial differential equation in a local sense."

in_NB stochastic_differential_equations re:almost_none
8 weeks ago

copy to mine

What is Intellectual History? · Intellectual History Archive
in_NB intellectuals history_of_ideas pocock.j.a.g. have_read
8 weeks ago

copy to mine

What's Wrong with Sociology? on JSTOR
in_NB sociology social_science_methodology to_read
8 weeks ago

copy to mine

More Gygax’s Finches
role-playing_games funny:geeky have_read evolutionary_biology
9 weeks ago

copy to mine

Gygax’s Finches
role-playing_games evolutionary_biology funny:geeky have_read
9 weeks ago

copy to mine

[1712.03586] Fairness in Machine Learning: Lessons from Political Philosophy

"What does it mean for a machine learning model to be `fair', in terms which can be operationalised? Should fairness consist of ensuring everyone has an equal probability of obtaining some benefit, or should we aim instead to minimise the harms to the least advantaged? Can the relevant ideal be determined by reference to some alternative state of affairs in which a particular social pattern of discrimination does not exist? Various definitions proposed in recent literature make different assumptions about what terms like discrimination and fairness mean and how they can be defined in mathematical terms. Questions of discrimination, egalitarianism and justice are of significant interest to moral and political philosophers, who have expended significant efforts in formalising and defending these central concepts. It is therefore unsurprising that attempts to formalise `fairness' in machine learning contain echoes of these old philosophical debates. This paper draws on existing work in moral and political philosophy in order to elucidate emerging debates about fair machine learning."

in_NB political_philosophy algorithmic_fairness to_read via:wiggins to_teach:data-mining to_teach:statistics_of_inequality_and_discrimination
9 weeks ago

copy to mine

The screening effect in Kriging

"When predicting the value of a stationary random field at a location x in some region in which one has a large number of observations, it may be difficult to compute the optimal predictor. One simple way to reduce the computational burden is to base the predictor only on those observations nearest to x. As long as the number of observations used in the predictor is sufficiently large, one might generally expect the best predictor based on these observations to be nearly optimal relative to the best predictor using all observations. Indeed, this phenomenon has been empirically observed in numerous circumstances and is known as the screening effect in the geostatistical literature. For linear predictors, when observations are on a regular grid, this work proves that there generally is a screening effect as the grid becomes increasingly dense. This result requires that, at high frequencies, the spectral density of the random field not decay faster than algebraically and not vary too quickly. Examples demonstrate that there may be no screening effect if these conditions on the spectral density are violated."

in_NB have_skimmed spatial_statistics random_fields statistics to_teach:data_over_space_and_time
11 weeks ago

copy to mine

Uniform Asymptotic Optimality of Linear Predictions of a Random Field Using an Incorrect Second-Order Structure
in_NB spatial_statistics random_fields have_skimmed to_teach:data_over_space_and_time
12 weeks ago

copy to mine

Asymptotically Efficient Prediction of a Random Field with a Misspecified Covariance Function

--- Reveals to me that I have no intuition for when two Gaussian random fields are mutually absolutely continuous in the in-fill limit. (Two stationary and ergodic time-series are MAC in the time-going-to-infinity limit iff they are identical. [Otherwise, by ergodicity, each of them puts probability 1 on an event to which the other assigns probability 0.] But that doesn't apply here!)

in_NB spatial_statistics have_skimmed to_teach:data_over_space_and_time random_fields
12 weeks ago

copy to mine

[2206.06421] Repro Samples Method for Finite- and Large-Sample Inferences

"This article presents a novel, general, and effective simulation-inspired approach, called {\it repro samples method}, to conduct statistical inference. The approach studies the performance of artificial samples, referred to as {\it repro samples}, obtained by mimicking the true observed sample to achieve uncertainty quantification and construct confidence sets for parameters of interest with guaranteed coverage rates. Both exact and asymptotic inferences are developed. An attractive feature of the general framework developed is that it does not rely on the large sample central limit theorem and is likelihood-free. As such, it is thus effective for complicated inference problems which we can not solve using the large sample central limit theorem. The proposed method is applicable to a wide range of problems, including many open questions where solutions were previously unavailable, for example, those involving discrete or non-numerical parameters. To reduce the large computational cost of such inference problems, we develop a unique matching scheme to obtain a data-driven candidate set. Moreover, we show the advantages of the proposed framework over the classical Neyman-Pearson framework. We demonstrate the effectiveness of the proposed approach on various models throughout the paper and provide a case study that addresses an open inference question on how to quantify the uncertainty for the unknown number of components in a normal mixture model. To evaluate the empirical performance of our repro samples method, we conduct simulations and study real data examples with comparisons to existing approaches. Although the development pertains to the settings where the large sample central limit theorem does not apply, it also has direct extensions to the cases where the central limit theorem does hold."

--- Based on the talk on Monday, I don't see how this _isn't_ just the Neyman inversion method, with a very clever idea about how to do the testing that I need to wrap my head around. But it seems very cool, and to be, potentially, very useful to me. So this needs careful attention.

--- ETA after reading carefully: It's Neyman inversion. Also, they're not actually getting valid confidence intervals for the number of mixture components, because there's no way to give an upper confidence limit for the number of mixture components. (For any distribution which really does have k components, there are others with arbitrarily many more clusters, arbitrarily close in distribution.) They _think_ they can do this because they arbitrarily limit how many clusters they consider.
Now, in the talk Xie gave a rather more convincing example of a confidence set for a discrete parameter, viz., which node on a network some process started spreading from. The difference, I think, is that in this 2nd case, we can't switch the value of the discrete parameter while making an _arbitrarily small_, and hence undetectably small, change to the distribution.

have_read heard_the_talk confidence_sets simulation-based_inference statistics re:codename:catherine_wheel in_NB
12 weeks ago

copy to mine

A Vanishing Nomadic Clan, With a Songlike Language All Their Own - The New York Times
anthropology indonesia historical_genetics lansing.stephen
12 weeks ago

copy to mine

Stochastic Models of Compartmental Systems on JSTOR

"This paper reviews a stochastic approach to compartmental modeling. The need for stochasticity in the model is motivated by two examples, one concerned with vanadium depuration in marine organisms and the other with the thermal resistance of the green sunfish, Lepomis cyanellus. A stochastic formulation is suggested which includes several sources of stochasticity due to some variability among particles in a given experiment and also to some variability between replicates of an experiment. Many different stochastic models of compartmental systems may be obtained from various combinations of these sources of stochasticity, and the mean and autocovariance functions of these models are derived for some one-compartment systems. The mean value functions are shown to be generally tractable forms which may be readily fitted to data; however, even in the case of time-invariant coefficients, the mean functions are often different from the sums of exponential functions given by the deterministic solution. The causal model cannot be identified on the basis of the mean value function alone; however the covariance structure is unique for each of the proposed causal models. The paper also illustrates some basic statistical analysis with these stochastic models. The estimation of parameters by weighted nonlinear least squares is illustrated by fitting some mean functions of these models to data from the previous examples. The procedure may generate RBAN estimators for both of the examples and an asymptotic goodness-of-fit statistic is formulated. The paper concludes with a short summary of promising areas for future research."

in_NB compartment_models time_series statistical_inference_for_stochastic_processes
12 weeks ago

copy to mine

Decentralized dynamic processes for finding equilibrium - ScienceDirect

"This paper describes a class of decentralized dynamic processes designed to converge to equilibrium when the equilibrium equations are linear. These processes can also be viewed as distributed algorithms for solving systems of linear equations, or as learning algorithms. The class includes processes that use a message space larger by one binary digit than the space in which the equilibrium exists. However, memory and time requirements increase exponentially with the number of agents (equations)."

in_NB distributed_systems economics computational_economics re:in_soviet_union_optimization_problem_solves_you simon.carl_p. mechanism_design
12 weeks ago

copy to mine

A lower bound on computational complexity given by revelation mechanisms | SpringerLink

"This paper establishes a lower bound on the computational complexity of smooth functions between smooth manifolds. It generalizes one for finite (Boolean) functions obtained (by Arbib and Spira [2]) by counting variables. Instead of a counting procedure, which cannot be used in the infinite case, the dimension of the message space of a certain type of revelation mechanism provides the bound. It also provides an intrinsic measure of the number of variables on which the function depends. This measure also gives a lower bound on computational costs associated with realizing or implementing the function by a decentralized mechanism, or by a game form."

in_NB computational_complexity mechanism_design distributed_systems computational_economics re:in_soviet_union_optimization_problem_solves_you
12 weeks ago

copy to mine

The WALRAS Algorithm: A Convergent Distributed Implementation of General Equilibrium Outcomes | SpringerLink

"The WALRAS algorithm calculates competitive equilibria via a distributed tatonnement-like process, in which agents submit single-good demand functions to market-clearing auctions. The algorithm is asynchronous and decentralized with respect to both agents and markets, making it suitable for distributed implementation. We present a formal description of this algorithm, and prove that it converges under the standard assumption of gross substitutability. We relate our results to the literature on general equilibrium stability and some more recent work on decentralized algorithms. We present some experimental results as well, particularly for cases where the assumptions required to guarantee convergence do not hold. Finally, we consider some extensions and generalizations to the WALRAS algorithm."

in_NB economics distributed_systems computational_complexity computational_economics re:in_soviet_union_optimization_problem_solves_you wellman.michael
12 weeks ago

copy to mine

« earlier