So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'.
And before we even ask what those are, we might first ask, Why?
Hi guys, I’d like to share a proposal regarding AI alignment. The proposal is that training AI in the curriculum of Classical Virtue ethics could be a promising approach to alignment. A) Because general virtues with many exemplifications can help us teach the AI what we would really want it to do, even when we can't micromanage it, and B) because this pedagogy seems to be a good fit for AI's style of learning more generally.
Background i) The Pedagogy of Classical Art
In the Classical Humanist Tradition the pedagogy of learning depends on whether one studies a subject, or practices an art. Art is understood as craft or skill, for example the art of speaking well and persuading (rhetoric), and the art of living (virtue ethics). Training in...
Thanks for your suggestions. I think having more people deeply engaged with alignment is good for our chances of getting it right.
I think this proposal falls into the category of goal crafting (a term proposed by Roko) - deciding what we want an AGI to do. Most alignment work addresses technical alignment - how we might get an AGI to reliably do anything. I think you imply the approach "just train it"; this might work for some types of AGI, and some types of training.
I think many humans trained in classical ethics are not actually ethical by their standard...
Suppose our old friends Alice and Bob decide to undertake an art project. Alice will draw a bunch of random purple and green lines on a piece of paper. That will be Alice’s picture (A). She’ll then make a copy, erase all the purple lines, and send the result as a message (M) to Bob. Bob then generates his own random purple lines, and adds them to the green lines from Alice, to create Bob’s picture (B). The two then frame their two pictures and hang them side-by-side to symbolize something something similarities and differences between humans something. Y’know, artsy bullshit.
Now, suppose Carol knows the plan and is watching all this unfold. She wants to make predictions about Bob’s picture, and doesn’t want to remember irrelevant details...
Only if they both predictably painted that part purple, e.g. as part of the overall plan. If they both randomly happened to paint the same part purple, then no.
I've been in many conversations where I've mentioned the idea of using neuroscience for outer alignment, and the people who I'm talking to usually seem pretty confused about why I would want to do that. Well, I'm confused about why one wouldn't want to do that, and in this post I explain why.
As far as I see it, there are three main strategies people have for trying to deal with AI alignment in worlds where AI alignment is hard.
In my opinion, these are all great efforts, but I personally like the idea of working on value alignment directly. Why? First some negatives of the others:
Does this comment I wrote clear up my claim?
...A clarification about in what sense I claim "biological and artificial neural-networks are based upon the same fundamental principles":
I would not be surprised if the reasons why neural networks "work" are also exploited by the brain.
In particular why I think neuroscience for value alignment is good is because we can expect that the values part of the brain will be compatible with these reasons, and won’t require too much extra fundamental advances to actually implement, unlike say corrigibility, which will first
I second this recommendation. This book was amazing. It's quite unlike other scifi, and that's a good thing.
My model of utility (and the standard one, as far as I can tell) doesn't work that way. No rational agent ever gives up a utilon - that is the thing they are maximizing. I think of it as "how many utilons do you get from thinking about John Doe's increased satisfaction (not utilons, as you have no access to his, though you could say "inferred utilons") compared to the direct utilons you would otherwise get".
Those moral weights are "just" terms in your utility function.
And, since humans aren't actually rational, and don't have consistent utility functions, actions that imply moral weights are highly variable and contextual.
it's true, but I don't think there's anything fundamental preventing the same sort of proliferation and advances in open source LLMs that we've seen in stable diffusion (aside from the fact that LLMs aren't as useful for porn). that it has been relatively tame so far doesn't change the basic pattern of how open source effects the growth of technology
I attended Secular Solstice in Berkeley last December.
My perspective is quite unusual: I live in a rationalist group house and work at an AI safety office, but I also am a Christian and attend church every week.[1]
I was originally not planning on going to Solstice, but a decent number of people (~5) told me in person that they would be particularly interested in my opinions of it. I realized that I was interested in learning what I would think of it too, so I went.
I took notes on my thoughts throughout the service.[2] This blog post is my broader thoughts on the experience. I also have blog posts for a fun little correction to one of the songs and my detailed notes & commentary.
I do not agree...
I reject the claim that faith implies the world cannot change
Me too, which is why I didn't write this.
Alice takes a chemistry class at a university. She gets a professor who basically just reads from the textbook during lectures. She reads the textbook on her own, talks to her classmates, and finds some relevant Wikipedia articles and youtube videos.
Bob studies chemistry on his own. He buys the textbook Alice uses because it's popular, reads it on his own, talks to other people interested in chemistry, and finds some relevant Wikipedia articles and youtube videos.
Bob is an autodidact. Alice is not.
OK, I understand that, but what's the key difference? What is the essence of autodidact-ness? Is it...
I don't think there's a clear consensus, and I don't think it describes a clear distinction, and that's why I don't normally use the word "autodidact".
I think we probably agree on how far the existing system is from the ideal. I wanted to point at the opposite end of the scale as a reminder that we are even further away from that.
When I was at the first grade of elementary school, they tried to teach us about "sets", which mostly meant that instead of "two plus two equals four" the textbook said "the union of a set containing two elements and another set containing two elements, has four elements". In hindsight I see this was a cargo-cultish version of set theory, which probably was very high-status at t...
"to clean house" as implication of violence...
Due to a tragic shortage of outbuildings (to be remedied in the mid term but not immediately), my living room is the garage/makerspace of my home. I cleaned as one cleans for guests last week, because a friend from way back was dropping by. I then got to enjoy a clean-enough-for-guests home for several days, which is a big part of why it is nice to be visited by friends un-intimate enough to feel like cleaning for.
Then my partner-in-crafts came over, and we re-occupied every table with a combination of resin ca...