March 2016 Newsletter

March 5, 2016 | Rob Bensinger | Newsletters

Research updates

A new paper: “Defining Human Values for Value Learners“
New at IAFF: Analysis of Algorithms and Partial Algorithms; Naturalistic Logical Updates; Notes from a Conversation on Act-Based and Goal-Directed Systems; Toy Model: Convergent Instrumental Goals
New at AI Impacts: Global Computing Capacity
A revised version of “The Value Learning Problem” (pdf) has been accepted to a AAAI spring symposium.

General updates

MIRI and other Future of Life Institute (FLI) grantees participated in a AAAI workshop on AI safety this month.
MIRI researcher Eliezer Yudkowsky discusses Ray Kurzweil, the Bayesian brain hypothesis, and an eclectic mix of other topics in a new interview.
Alexei Andreev and Yudkowsky are seeking investors for Arbital, a new technology for explaining difficult topics in economics, mathematics, computer science, and other disciplines. As a demo, Yudkowsky has written a new and improved guide to Bayes’s Rule.

News and links

Should We Fear or Welcome the Singularity? (video): a conversation between Kurzweil, Stuart Russell, Max Tegmark, and Harry Shum.
The Code That Runs Our Lives (video): Deep learning pioneer Geoffrey Hinton expresses his concerns about smarter-than-human AI (at 10:00).
The State of AI (video): Russell, Ya-Qin Zhang, Matthew Grob, and Andrew Moore share their views on a range of issues at Davos, including superintelligence (at 21:09).
Bill Gates discusses AI timelines.
Paul Christiano proposes a new AI alignment approach: algorithm learning by bootstrapped approval-maximization.
Robert Wiblin asks the effective altruism community: If tech progress might be bad, what should we tell people about it?
FLI collects introductory resources on AI safety research.
Raising for Effective Giving, a major fundraiser for MIRI and other EA organizations, is seeking a Director of Growth.
Murray Shanahan answers questions about the new Leverhulme Centre for the Future of Intelligence. Leverhulme CFI is presently seeking an Executive Director.

John Horgan interviews Eliezer Yudkowsky

March 2, 2016 | Rob Bensinger | Conversations

Scientific American writer John Horgan recently interviewed MIRI’s senior researcher and co-founder, Eliezer Yudkowsky. The email interview touched on a wide range of topics, from politics and religion to existential risk and Bayesian models of rationality.

Although Eliezer isn’t speaking in an official capacity in the interview, a number of the questions discussed are likely to be interesting to people who follow MIRI’s work. We’ve reproduced the full interview below.

John Horgan: When someone at a party asks what you do, what do you tell her?

Eliezer Yudkowsky: Depending on the venue: “I’m a decision theorist”, or “I’m a cofounder of the Machine Intelligence Research Institute”, or if it wasn’t that kind of party, I’d talk about my fiction.

John: What’s your favorite AI film and why?

Eliezer: AI in film is universally awful. Ex Machina is as close to being an exception to this rule as it is realistic to ask.

John: Is college overrated?

Eliezer: It’d be very surprising if college were underrated, given the social desirability bias of endorsing college. So far as I know, there’s no reason to disbelieve the economists who say that college has mostly become a positional good, and that previous efforts to increase the volume of student loans just increased the cost of college and the burden of graduate debt.

John: Why do you write fiction?

Eliezer: To paraphrase Wondermark, “Well, first I tried not making it, but then that didn’t work.”

Beyond that, nonfiction conveys knowledge and fiction conveys experience. If you want to understand a proof of Bayes’s Rule, I can use diagrams. If I want you to feel what it is to use Bayesian reasoning, I have to write a story in which some character is doing that.

New paper: “Defining human values for value learners”

February 29, 2016 | Rob Bensinger | Papers

MIRI Research Associate Kaj Sotala recently presented a new paper, “Defining Human Values for Value Learners,” at the AAAI-16 AI, Society and Ethics workshop.

The abstract reads:

Hypothetical “value learning” AIs learn human values and then try to act according to those values. The design of such AIs, however, is hampered by the fact that there exists no satisfactory definition of what exactly human values are. After arguing that the standard concept of preference is insufficient as a definition, I draw on reinforcement learning theory, emotion research, and moral psychology to offer an alternative definition. In this definition, human values are conceptualized as mental representations that encode the brain’s value function (in the reinforcement learning sense) by being imbued with a context-sensitive affective gloss. I finish with a discussion of the implications that this hypothesis has on the design of value learners.

Economic treatments of agency standardly assume that preferences encode some consistent ordering over world-states revealed in agents’ choices. Real-world preferences, however, have structure that is not always captured in economic models. A person can have conflicting preferences about whether to study for an exam, for example, and the choice they end up making may depend on complex, context-sensitive psychological dynamics, rather than on a simple comparison of two numbers representing how much one wants to study or not study.

Sotala argues that our preferences are better understood in terms of evolutionary theory and reinforcement learning. Humans evolved to pursue activities that are likely to lead to certain outcomes — outcomes that tended to improve our ancestors’ fitness. We prefer those outcomes, even if they no longer actually maximize fitness; and we also prefer events that we have learned tend to produce such outcomes.

Affect and emotion, on Sotala’s account, psychologically mediate our preferences. We enjoy and desire states that are highly rewarding in our evolved reward function. Over time, we also learn to enjoy and desire states that seem likely to lead to high-reward states. On this view, our preferences function to group together events that lead on expectation to similarly rewarding outcomes for similar reasons; and over our lifetimes we come to inherently value states that lead to high reward, instead of just valuing such states instrumentally. Rather than directly mapping onto our rewards, our preferences map onto our expectation of rewards.

Sotala proposes that value learning systems informed by this model of human psychology could more reliably reconstruct human values. On this model, for example, we can expect human preferences to change as we find new ways to move toward high-reward states. New experiences can change which states my emotions categorize as “likely to lead to reward,” and they can thereby modify which states I enjoy and desire. Value learning systems that take these facts about humans’ psychological dynamics into account may be better equipped to take our likely future preferences into account, rather than optimizing for our current preferences alone.

February 2016 Newsletter

February 6, 2016 | Rob Bensinger | Newsletters

Research updates

New at IAFF: Thoughts on Logical Dutch Book Arguments; Another View of Quantilizers: Avoiding Goodhart’s Law; Another Concise Open Problem

General updates

Fundraiser and grant successes: MIRI will be working with AI pioneer Stuart Russell and a to-be-determined postdoctoral researcher on the problem of corrigibility, thanks to a $75,000 grant by the Center for Long-Term Cybersecurity.

News and links

In a major break from trend in Go progress, DeepMind’s AlphaGo software defeats the European Go champion 5-0. A top Go player analyzes AlphaGo’s play.
NYU hosted a Future of AI symposium this month, with a number of leading thinkers in AI and existential risk reduction in attendance.
Marvin Minsky, one of the early architects of the field of AI, has passed away.
Learning and Logic: Paul Christiano writes on the challenge of “pursuing symbolically defined goals” without known observational proxies.
OpenAI, a new Elon-Musk-backed AI research nonprofit, answers questions on Reddit. (MIRI senior researcher Eliezer Yudkowsky also chimes in.)
Victoria Krakovna argues that people concerned about AI safety should consider becoming AI researchers.
The Centre for Effective Altruism is accepting applicants through Feb. 14 to the Pareto Fellowship, a new three-month training program for ambitious altruists.

End-of-the-year fundraiser and grant successes

January 12, 2016 | Nate Soares | News

Our winter fundraising drive has concluded. Thank you all for your support!

Through the month of December, 175 distinct donors gave a total of $351,298. Between this fundraiser and our summer fundraiser, which brought in $630k, we’ve seen a surge in our donor base; our previous fundraisers over the past five years had brought in on average $250k (in the winter) and $340k (in the summer). We additionally received about $170k in 2015 grants from the Future of Life Institute, and $150k in other donations.

In all, we’ve taken in about $1.3M in grants and contributions in 2015, up from our $1M average over the previous five years. As a result, we’re entering 2016 with a team of six full-time researchers and over a year of runway.

Our next big push will be to close the gap between our new budget and our annual revenue. In order to sustain our current growth plans — which are aimed at expanding to a team of approximately ten full-time researchers — we’ll need to begin consistently taking in close to $2M per year by mid-2017.

I believe this is an achievable goal, though it will take some work. It will be even more valuable if we can overshoot this goal and begin extending our runway and further expanding our research program. On the whole, I’m very excited to see what this new year brings.

In addition to our fundraiser successes, we’ve begun seeing new grant-winning success. In collaboration with Stuart Russell at UC Berkeley, we’ve won a $75,000 grant from the Berkeley Center for Long-Term Cybersecurity. The bulk of the grant will go to funding a new postdoctoral position at UC Berkeley under Stuart Russell. The postdoc will collaborate with Russell and MIRI Research Fellow Patrick LaVictoire on the problem of AI corrigibility, as described in the grant proposal:

Consider a system capable of building accurate models of itself and its human operators. If the system is constructed to pursue some set of goals that its operators later realize will lead to undesirable behavior, then the system will by default have incentives to deceive, manipulate, or resist its operators to prevent them from altering its current goals (as that would interfere with its ability to achieve its current goals). […]

We refer to agents that have no incentives to manipulate, resist, or deceive their operators as “corrigible agents,” using the term as defined by Soares et al. (2015). We propose to study different methods for designing agents that are in fact corrigible.

This postdoctoral position has not yet been filled. Expressions of interest can be emailed to alex@intelligence.org using the subject line “UC Berkeley expression of interest.”

January 2016 Newsletter

January 3, 2016 | Rob Bensinger | Newsletters

Research updates

A new paper: “Proof-Producing Reflection for HOL”
A new analysis: Safety Engineering, Target Selection, and Alignment Theory
New at IAFF: What Do We Need Value Learning For?; Strict Dominance for the Modified Demski Prior; Reflective Probability Distributions and Standard Models of Arithmetic; Existence of Distributions That Are Expectation-Reflective and Know It; Concise Open Problem in Logical Uncertainty

General updates

Our Winter Fundraiser is over! A total of 176 people donated $351,411, including some surprise matching donors. All of you have our sincere thanks.
Jed McCaleb writes on why MIRI matters, while Andrew Critch writes on the need to scale MIRI’s methods.
We attended NIPS, which hosted a symposium on the “social impacts of machine learning” this year. Viktoriya Krakovna summarizes her impressions.
We’ve moved to a new, larger office with the Center for Applied Rationality (CFAR), a few floors up from our old one.
Our paper announcements now have their own MIRI Blog category.

News and links

“The 21st Century Philosophers”: AI safety research gets covered in OZY.
Sam Altman and Elon Musk have brought together leading AI researchers to form a new $1 billion nonprofit, OpenAI. Andrej Karpathy explains OpenAI’s plans (link), and Altman and Musk provide additional background (link).
Alphabet chairman Eric Schmidt and Google Ideas director Jared Cohen write on the need to “establish best practices to avoid undesirable outcomes” from AI.
A new Future of Humanity Institute (FHI) paper: “Learning the Preferences of Ignorant, Inconsistent Agents.”
Luke Muehlhauser and The Telegraph signal-boost FHI’s AI safety job postings (deadline Jan. 6). The Global Priorities Project is also seeking summer interns (deadline Jan. 10).
CFAR is running a matching fundraiser through the end of January.

Safety engineering, target selection, and alignment theory

December 31, 2015 |&nbs