So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'.

And before we even ask what those are, we might first ask, Why?

lc1d240
12
The "people are altruistic" bias is so pernicious and widespread I've never actually seen it articulated in detail or argued for. Most seem to both greatly underestimate the size of this bias, and assume opinions either way are a form of mind-projection fallacy on the part of nice/evil people. In fact, it looks to me like this skew is the deeper origin of a lot of other biases, including the just-world fallacy, and the cause of a lot of default contentment with a lot of our institutions of science, government, etc. You could call it a meta-bias that causes the Hansonian stuff to go largely unnoticed. I would be willing to pay someone to help draft a LessWrong post for me about this; I think it's important but my writing skills are lacking.
"to clean house" as implication of violence... Due to a tragic shortage of outbuildings (to be remedied in the mid term but not immediately), my living room is the garage/makerspace of my home. I cleaned as one cleans for guests last week, because a friend from way back was dropping by. I then got to enjoy a clean-enough-for-guests home for several days, which is a big part of why it is nice to be visited by friends un-intimate enough to feel like cleaning for. Then my partner-in-crafts came over, and we re-occupied every table with a combination of resin casting and miniature clay sculpting shenanigans. It's an excellent time. We also went shopping for fabric together because I plan to make a baby quilt kid-in-progress of the aforementioned friend from way back. Partner-in-crafts idly asked me when I was planning to do the quilt stuff, because historically I would be expected to launch into it immediately as soon as the fabric came out of the dryer. However, I found something new in myself: A reluctance to start a new project without a clean place to start it in. I'm not sure where this reluctance came from, as I think it seems new, but I also think I like it. So I got to tidying up the stuff that was un-tidyable last night because the resin was still sticky, but is eminently tidyable now because it cured over time, and carefully examining my reluctance-to-tidy as it tried to yell at me. In that reluctance-to-tidy, I find time travel again: We store information in the position of objects in our environment. Object location encodes memory, so moving someone else's objects has certain commonalities with the rewriting-of-memory that we call gaslighting when pathological. For better or worse, my architecture of cognition defaults to relying on empathy twice over when reasoning about moving stuff that someone else was using, or someone else's stuff. By recognizing an object's location as a person's memory of where-they-left-it, I view moving it as rewriting that memory. The double-empathy thing comes in where I reason about what moves of stuff it's ok to make. If I put the thing where the person will have an easy time finding it, if I model them well enough to guess correctly where they'll first look when they want it, then I can help them by moving it. I can move it from somewhere they'd look later to somewhere they'd look sooner, and thereby improve their life at the moment of seeking it, and that's a clearly good act. That's the first empathy layer. The second empathy layer comes of a natural tendency to anthropomorphize objects, which I've considered trying to eradicate from myself but settled on keeping because I find it quite convenient to have around in other circumstances. This is the animism of where something "wants" to go, creating a "home" for your keys by the door, and so forth. So there's 2 layers of modeling minds -- one of complex real minds who are likely to contain surprises in their expectations, and one of simple virtual "minds" that follow from the real-minds as a convenient shortcut. I guess one way to put it is that I figure stuff has/channels feelings kinda like how houseplants do -- they probably don't experience firsthand emotion in any way that would be recognizable to people, but there's a lot of secondhand emotion that's shown in how they're related to and cared for. Not sure where I'm going with all that, other than noticing how the urge to tidy up can be resisted by the same aesthetic sensibility that says it's generally bad to erase anybody's memories.
Btw less.online is happening. LW post and frontpage banner probably going up Sunday or early next week. 
I just finished listening to The Hacker and the State by Ben Buchanan, a book about cyberattacks, and the surrounding geopolitics. It's a great book to start learning about the big state-related cyberattacks of the last two decades. Some big attacks /leaks he describes in details: * Wire-tapping/passive listening efforts from the NSA, the "Five Eyes", and other countries * The multi-layer backdoors the NSA implanted and used to get around encryption, and that other attackers eventually also used (the insecure "secure random number" trick + some stuff on top of that) * The shadow brokers (that's a *huge* leak that went completely under my radar at the time) * Russia's attacks on Ukraine's infrastructure * Attacks on the private sector for political reasons * Stuxnet * The North Korea attack on Sony when they released a documentary criticizing their leader, and misc North Korean cybercrime (e.g. Wannacry, some bank robberies, ...) * The leak of Hillary's emails and Russian interference in US politics * (and more) Main takeaways (I'm not sure how much I buy these, I just read one book): * Don't mess with states too much, and don't think anything is secret - even if you're the NSA * The US has a "nobody but us" strategy, which states that it's fine for the US to use vulnerabilities as long as they are the only one powerful enough to find and use them. This looks somewhat nuts and naive in hindsight. There doesn't seem to be strong incentives to protect the private sector. * There are a ton of different attack vectors and vulnerabilities, more big attacks than I thought, and a lot more is publicly known than I would have expected. The author just goes into great details about ~10 big secret operations, often speaking as if he was an omniscient narrator. * Even the biggest attacks didn't inflict that much (direct) damage (never >10B in damage?) Unclear if it's because states are holding back, if it's because they suck, or if it's because it's hard. It seems that even when attacks aim to do what some people fear the most (e.g. attack infrastructure, ...) the effect is super underwhelming. * The bottleneck in cyberattacks is remarkably often the will/the execution, much more than actually finding vulnerabilities/entry points to the victim's network. * The author describes a space where most of the attacks are led by clowns that don't seem to have clear plans, and he often seems genuinely confused why they didn't act with more agency to get what they wanted (does not apply to the NSA, but does apply to a bunch of Russia/Iran/Korea-related attacks) * Cyberattacks are not amazing tools to inflict damage or to threaten enemies if you are a state. The damage is limited, and it really sucks that (usually) once you show your capability, it reduces your capability (unlike conventional weapons). And states don't like to respond to such small threats. The main effect you can have is scaring off private actors from investing in a country / building ties with a country and its companies, and leaking secrets of political importance. * Don't leak secrets when the US presidential election is happening if they are unrelated to the election, or nobody will care. (The author seems to be a big skeptic of "big cyberattacks" / cyberwar, and describes cyber as something that always happens in the background and slowly shapes the big decisions. He doesn't go into the estimated trillion dollar in damages of everyday cybercrime, nor the potential tail risks of cyber.)
Here is the best toy model I currently have for rational agents. Alas, it is super messy and hacky, but better than nothing. I'll call it the BAVM model; the one-sentence summary is "internal traders concurrently bet on beliefs, auction actions, vote on values, and merge minds". There's little novel here, I'm just throwing together a bunch of ideas from other people (especially Scott Garrabrant and Abram Demski). In more detail, the three main components are: 1. A prediction market 2. An action auction 3. A value election You also have some set of traders, who can simultaneously trade on any combination of these three. Traders earn money in two ways: 1. Making accurate predictions about future sensory experiences on the market. 2. Taking actions which lead to reward or increase the agent's expected future value. They spend money in three ways: 1. Bidding to control the agent's actions for the next N timesteps. 2. Voting on what actions get reward and what states are assigned value. 3. Running the computations required to figure out all these trades. Values are therefore dominated by whichever traders earn money from predictions or actions, who will disproportionately vote for values that are formulated in the same ontologies they use for prediction/action, since that's simpler than using different ontologies. (Note that this does requires the assumption that simpler traders start off with more money.) The last component is that it costs traders money to do computation. The way they can reduce this is by finding other traders who do similar computations as them, and then merging into a single trader. I am very interested in better understanding what a merging process like this might look like, though it seems pretty intractable in general because it will depend a lot on the internal details of the traders. (So perhaps a more principled approach here is to instead work top-down, figuring out what sub-markets or sub-auctions look like?)

Popular Comments

Recent Discussion

Hi guys, I’d like to share a proposal regarding AI alignment. The proposal is that training AI in the curriculum of Classical Virtue ethics could be a promising approach to alignment. A) Because general virtues with many exemplifications can help us teach the AI what we would really want it to do, even when we can't micromanage it, and B) because this pedagogy seems to be a good fit for AI's style of learning more generally.

 

Background i) The Pedagogy of Classical Art

In the Classical Humanist Tradition the pedagogy of learning depends on whether one studies a subject, or practices an art. Art is understood as craft or skill, for example the art of speaking well and persuading (rhetoric), and the art of living (virtue ethics). Training in...

Thanks for your suggestions. I think having more people deeply engaged with alignment is good for our chances of getting it right.

I think this proposal falls into the category of goal crafting (a term proposed by Roko) - deciding what we want an AGI to do. Most alignment work addresses technical alignment - how we might get an AGI to reliably do anything. I think you imply the approach "just train it"; this might work for some types of AGI, and some types of training.

I think many humans trained in classical ethics are not actually ethical by their standard... (read more)

Suppose our old friends Alice and Bob decide to undertake an art project. Alice will draw a bunch of random purple and green lines on a piece of paper. That will be Alice’s picture (A). She’ll then make a copy, erase all the purple lines, and send the result as a message (M) to Bob. Bob then generates his own random purple lines, and adds them to the green lines from Alice, to create Bob’s picture (B). The two then frame their two pictures and hang them side-by-side to symbolize something something similarities and differences between humans something. Y’know, artsy bullshit.

Now, suppose Carol knows the plan and is watching all this unfold. She wants to make predictions about Bob’s picture, and doesn’t want to remember irrelevant details...

3Aprillion (Peter Hozák)3h
(Writing before I read the rest of the article): I believe Carol would "naturally" expect that Alice and Bob share more mutual information than she does with Bob herself (even if they weren't "old friends", they both "decided to undertake an art project" while she "wanted to make predictions"), thus she would weight the costs of remembering more than just the green lines against the expected prediction improvement given her time constrains, lost opportunities, ... - I imagine she could complete purple lines on her own, and then remember some "diff" about the most surprising differences... Also, not all of the green lines would be equally important, so a "natural latent" would be some short messages in "tokens of remembering", not necessarily correspond to the mathematical abstraction encoded by the 2 tokens of English "green lines" => Carol doesn't need to be able to draw the green lines from her memory if that memory was optimized to predict purple lines. If the purpose was to draw the green lines, I would be happy to call that memory "green lines" (and in that, I would assume to share a prior between me and the reader that I would describe as: "to remember green lines" usually means "to remember steps how to draw similar lines on another paper" ... also, similarity could be judged by other humans ... also, not to be confused with a very different concept "to remember an array of pixel coordinates" that can also be compressed into the words "green lines", but I don't expect people will be confused about the context, so I don't have to say it now, just keep in mind if someone squirts their eyes just-so which would provoke me to clarify).
3Mateusz Bagiński7h
In the Alice and Bob example, suppose there is a part (call it X) of the image that was initially painted green but where both Alice and Bob painted purple. Would that mean that the part of the natural latent over images A and B corresponding to X should be purple?

Only if they both predictably painted that part purple, e.g. as part of the overall plan. If they both randomly happened to paint the same part purple, then no.

I've been in many conversations where I've mentioned the idea of using neuroscience for outer alignment, and the people who I'm talking to usually seem pretty confused about why I would want to do that. Well, I'm confused about why one wouldn't want to do that, and in this post I explain why.

As far as I see it, there are three main strategies people have for trying to deal with AI alignment in worlds where AI alignment is hard.

  1. Value alignment
  2. Corrigibility
  3. Control/scalable alignment

In my opinion, these are all great efforts, but I personally like the idea of working on value alignment directly. Why? First some negatives of the others:

  1. Corrigibility requires moderate to extreme levels of philosophical deconfusion, an effort worth doing for some, but a very small set not
...
1Joseph Bloom8h
  I'm not sure anyone I know in mech interp is claiming this is a non-trivial problem. 
1Joseph Bloom8h
  I'm confused by this statement. Do we know this? Do we have enough of an understanding of either to say this? Don't get me wrong, there's some level on which I totally buy this. However, I'm just highly uncertain about what is really being claimed here. 

Does this comment I wrote clear up my claim?

A clarification about in what sense I claim "biological and artificial neural-networks are based upon the same fundamental principles":

I would not be surprised if the reasons why neural networks "work" are also exploited by the brain.

In particular why I think neuroscience for value alignment is good is because we can expect that the values part of the brain will be compatible with these reasons, and won’t require too much extra fundamental advances to actually implement, unlike say corrigibility, which will first

... (read more)
1Malentropic Gizmo2h
I will read the fiction book that is recommended to me first (and I haven't already read it)! Time is of the essence! I will read anything, but if you want to recommend me something I am more likely to enjoy, here are a few thing about me: I like Sci-fi, Fantasy, metaethics, computers, games, Computer Science theory, Artificial Intelligence, fitness, D&D, edgy/shock humor.
3benjamincosman1h
Too Like the Lightning by Ada Palmer :)

I second this recommendation. This book was amazing. It's quite unlike other scifi, and that's a good thing.

2jmh2h
Not really memoirs but a German documentary about WWII might be of interest for you. Der unbekannte Soldat I watched on Amazon Prime and you can still find the title there in a search, not sure if it is only available for rent/sale now or if you can stream with Prime membership.
2trevor4h
I'm not sure to what extent this is helpful, or if it's an example of the dynamic you're refuting, but Duncan Sabien recently wrote a post that intersects with this topic: Where it connects is that if someone sees [making the world a better place] like simply selecting a better Nash Equilibria, they absolutely will spend time exploring solutionspace/thinking through strategies similar to Goal Factoring or Babble and Prune. Lots of people throughout history have yearned for a better world in a lot of different ways, with varying awareness of the math behind Nash Equilibira, or the transhumanist and rationalist perspectives on civilization (e.g. map & territory & biases & scope insensitivity for rationalism, cryonics/anti-aging for transhumanism). But their goal here is largely steering culture away from nihilism (since culture is a Nash Equilibria) which means steering many people away from themselves, or at least the selves that they would have been. Maybe that's pretty minor in this case e.g. because feeling moderate amounts of empathy and living in a better society are both fun, but either way, changing a society requires changing people, and thinking really creatively about ways to change people tears down lots of chesterton-Schelling fences and it's very easy to make really big damaging mistakes in the process (because you need to successfully predict and avoid all mistakes as part of the competent pruning process, and actually measurably consistently succeeding at this is thinkoomph not just creative intelligence). Add in conflict theory to the mistake theory I've described here, factor in unevenly distributed intelligence and wealth in addition to unevenly distributed traits like empathy and ambition and suspicion-towards-outgroup (e.g. different combinations of all 5 variables), and you can imagine how conflict and resentment would accumulate on both sides over the course of generations. There's tons of examples in addition to Ayn Rand and Wokeness.
3Adam Zerner9h
Yeah, I echo this. I've gone back and forth with myself about this sort of stuff. Are humans altruistic? Good? Evil? On the one hand, yes, I think lc is right about how in some situations people exhibit just an extraordinary amount of altruism and sympathy. But on the other hand, there are other situations where people do the opposite: they'll, I dunno, jump into a lake at a risk to their own life to save a drowning stranger. Or risk their lives running into a burning building to save strangers (lots of volunteers did this during 9/11). I think the explanation is what Dagon is saying about how mutable and context-dependent people are. In some situations people will act extremely altruistically. In others they'll act extremely selfishly. The way that I like to think about this is in terms of "moral weight". How many utilons to John Doe would it take for you to give up one utilon of your own? Like, would you trade 1 utilon of your own so that John Doe can get 100,000 utilons? 1,000? 100? 10? Answering these questions, you can come up with "moral weights" to assign to different types of people. But I think that people don't really assign a moral weight and then act consistently. In some situations they'll act as if their answer to my previous question is 100,000, and in other situations they'll act like it's 0.00001.

My model of utility (and the standard one, as far as I can tell) doesn't work that way.  No rational agent ever gives up a utilon - that is the thing they are maximizing.  I think of it as "how many utilons do you get from thinking about John Doe's increased satisfaction (not utilons, as you have no access to his, though you could say "inferred utilons") compared to the direct utilons you would otherwise get".

Those moral weights are "just" terms in your utility function.

And, since humans aren't actually rational, and don't have consistent utility functions, actions that imply moral weights are highly variable and contextual.

2Garrett Baker11h
You are right, but I guess the thing I do actually care about here is the magnitude of the advancement (which is relevant for determining the sign of the action). How large an effect do you think the model merging stuff has (I'm thinking the effect where if you train a bunch of models, then average their weights, they do better). It seems very likely to me its essentially zero, but I do admit there's a small negative tail that's greater than the positive, so the average is likely negative. As for agent interactions, all the (useful) advances there seem things that definitely would have been made even if nobody released any LLMs, and everything was APIs.

it's true, but I don't think there's anything fundamental preventing the same sort of proliferation and advances in open source LLMs that we've seen in stable diffusion (aside from the fact that LLMs aren't as useful for porn). that it has been relatively tame so far doesn't change the basic pattern of how open source effects the growth of technology

1Shankar Sivarajan11h
I'll believe it when I see it. The man who said it would be an open release has just been fired stepped down as CEO.
2Matt Goldenberg6h
yeah, it's much less likely now
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I attended Secular Solstice in Berkeley last December. 

My perspective is quite unusual: I live in a rationalist group house and work at an AI safety office, but I also am a Christian and attend church every week.[1]

I was originally not planning on going to Solstice, but a decent number of people (~5) told me in person that they would be particularly interested in my opinions of it. I realized that I was interested in learning what I would think of it too, so I went.

I took notes on my thoughts throughout the service.[2] This blog post is my broader thoughts on the experience. I also have blog posts for a fun little correction to one of the songs and my detailed notes & commentary.

Overarching Narrative

I do not agree...

6Cole Wyeth2h
I appreciate the perspective. Personally I don't really see the point of a secular solstice. But frankly, the hostility to religion is a feature of the rationalist community, not a bug.  Rejection of faith is a defining feature of the community and an unofficial litmus test for full membership. The community has a carefully cultivated culture that makes it a kind of sanctuary from the rest of the world where rationalists can exchange ideas without reestablishing foundational concepts and repeating familiar arguments (along with many other advantages). The examples you point to do not demonstrate hostility towards religious people, they demonstrate hostility towards religion. This is as appropriate here as hostility towards factory farming is at a vegan group. Organizations (corporate, social, biological) are all defined by their boundaries. Christianity seems to be unusually open to everyone, but I think this is partially a side effect of evangelism. It makes sense to open your boundaries to the other when you are trying to eat it. Judaism in contrast carefully enforces the boundaries of its spaces.     Lesswrong hates religion in the way that lipids hate water. We want it on the outside. I don't know about other rationalists, but I don't have a particular desire to seek it out and destroy it everywhere it exists (and I certainly wish no harm to religious people). I agree with you that too much hostility is harmful; but I don't agree that good organizations must always welcome the other.  
2the gears to ascension8h
Weird take I frequently get funny looks for, no matter where I say it, rationalist community or other places: I currently think it is accurate to say that the sun is a ball of nearly pure suffering, devoid of the conscious experience that normally might make suffering worth it. Because I hold this belief, I also hold the belief that we therefore have an obligation to starlift it. I don't claim we need to then turn it into computronium, and I'd still like warmth and lights for our planets. But starlifting the sun would likely break up the solar system, so we'd need to recoordinate the planets to do it. It would be an immense undertaking of scales not often spoken of even in science fiction. But I think we have a moral obligation to give negentropic matter the chance to become happy people as its path towards entropy. For further understanding of how I think about this - perhaps in over-dense jargon, sorry to be over-brief here - I am very close to being a pure positive utilitarian, and my current understanding of nociception and avoidance behaviors implies that suffering in the brain may just be when brain-managed matter moves away from its path to entropy being made of patterns of intended-selfhood, eg because it is damaged, and the agency of returning to an intended self-form costs negentropy. Therefore, all energy spend that is not a being having its intended form is waste, and that waste is suffering because of there being life forms who wish it to be otherwise. My priority right now is preserving life on earth, but once we've got that more stable I think ensuring there's not astronomical waste is a moral imperative because wasted negentropy is unconscious suffering.
2the gears to ascension8h
Quantum physics only adds up to normality until you learn enough about reality to find out that it really, really, really, really doesn't, and then you get to build quantum computers. I reject the claim that faith implies the world cannot change; I would describe the agnostic-compatible interreligious part of faith as a lobian bet - one could also known as wishcasting - that others will behave in ways that enact good. This does not mean the world cannot change. I agree that there is something real that could be mathematized underlying what "faith" is, and that noticing that "trust" is a near-exact synonym is part of why I agree with this. I think that it would mostly add up to normality to describe it formally, and it would in fact reveal that most religious people are wrong to have faith in many of the things they do. I recognize in myself the urge to make disses about this, and claim that if it reveals religious people are not wrong, I would in fact react to that. I went from atheist to strong agnostic. There are multiple ways I can slice the universe conceptually where I can honestly identify phenomena as alive or as people; similarly, there are multiple ways I can slice the universe where I can honestly identify phenomena as gods. Whether those gods are good is an empirical question, just as it is an empirical question for me whether another will be kind to me.

I reject the claim that faith implies the world cannot change

Me too, which is why I didn't write this.

Alice takes a chemistry class at a university. She gets a professor who basically just reads from the textbook during lectures. She reads the textbook on her own, talks to her classmates, and finds some relevant Wikipedia articles and youtube videos.

Bob studies chemistry on his own. He buys the textbook Alice uses because it's popular, reads it on his own, talks to other people interested in chemistry, and finds some relevant Wikipedia articles and youtube videos.

Bob is an autodidact. Alice is not.

OK, I understand that, but what's the key difference? What is the essence of autodidact-ness? Is it...

  • The mere involvement of a "legitimate" institution, even if it makes no real difference to the individual's learning experience?
  • Some essential difference in the experience that Alice and Bob have while learning?
  • Something different about the personal character of Alice and Bob?

I don't think there's a clear consensus, and I don't think it describes a clear distinction, and that's why I don't normally use the word "autodidact".

4interstice13h
This seems significantly overstated. Most subjects are not taught in school to most people, but they don't thereby degrade into nonsense.
4Garrett Baker14h
This seems false. Often those who are rich get rich off of profitable subjects, and end up spreading awareness of those subjects. Many were never taught programming in school, yet learned to program anyway. Schools could completely neglect that subject, and still it would spread.
12bhauth17h
The thing is, I've noticed that a lot of the curriculum in schools is blind imitation of high-status science-y people, which can end up as cargo culting. Textbooks and classes have students memorize terms for certain things because "smart people know those words, and we should make kids be more like the smart people" - but they use those words because they understand underlying concepts, and without that the words can be useless. This reminds me of Feynman talking about textbooks. Now that we have the internet, you can simply see what kinds of technical language experts use and learn about those terms yourself. The distinctions "autodidact" is supposed to imply might have been weakened by the internet, and for that matter by libraries. Of course, you'd also have to pick the experts you're going to imitate on your own, but then, you also have to pick professors or schools or textbooks. Following a societal consensus about competence should work about equally well either way. I remember talking with a director of a Max Planck institute. Ion transport was relevant to the conversation, and I said something about how "of course, while lithium ions are light, that doesn't mean they diffuse quickly because they're strongly bound to their solute complex, more than sodium ions". He said, "Aha, I see you're well-educated". I thought that was funny because that wasn't something I ever took any classes on or got mentoring about. The other funny thing about that conversation was that his pet project was this energy storage idea that had no chance of working because he didn't understand ion solvation well enough.

I think we probably agree on how far the existing system is from the ideal. I wanted to point at the opposite end of the scale as a reminder that we are even further away from that.

When I was at the first grade of elementary school, they tried to teach us about "sets", which mostly meant that instead of "two plus two equals four" the textbook said "the union of a set containing two elements and another set containing two elements, has four elements". In hindsight I see this was a cargo-cultish version of set theory, which probably was very high-status at t... (read more)

"to clean house" as implication of violence...

Due to a tragic shortage of outbuildings (to be remedied in the mid term but not immediately), my living room is the garage/makerspace of my home. I cleaned as one cleans for guests last week, because a friend from way back was dropping by. I then got to enjoy a clean-enough-for-guests home for several days, which is a big part of why it is nice to be visited by friends un-intimate enough to feel like cleaning for.

Then my partner-in-crafts came over, and we re-occupied every table with a combination of resin ca... (read more)