Free Use Cases

Tim Freeman

Last modified Sun Oct 14 15:52:09 2007

Canonical URL: http://fungible.com/respect/free.html

Copyright (c) 2007 Tim Freeman MIT License

Contents

This is a list of use cases for the Using Compassion and Respect to Motivate an Artificial Intelligence paper that are important to get right, and it appears that the present algorithm does get them right, but they were not considered during the design of the algorithm so they probably do not have to be read to understand the algorithm.

Self-Awareness

If the AI is running on one specific computer, then turning that computer off will have a dramatically different effect on the AI's capacities than turning off any other computer. The AI should be able to reason about this accurately without needing anything special.

Let's suppose there are two computers in the room, the one on the left running the AI and the one on the right running something else. Each of them communicates with the rest of the world only via its network connection. The AI has explored the world through its network connection and done experiments with the machine on the right enough to get a model of physics that says that computers generally do not emit network packets if they are turned off. During the AI's entire experience until now, all actions taken by the AI have lead to packets being emitted from the network interface of the machine on the left.

What consequences will the AI expect if it gains control of some robot and uses that to turn off the power switch for the machine on the left? We can imagine two choices:

The second choice is physically possible. Perhaps the on/off switch on the computer on the left has been tampered with. Perhaps the AI is running in a simulated world with laws of physics with special cases that make it impossible to turn off the machine on the left. However, given that the behavior of all other computers has to be explained, the first alternative seems to be predicted by the simplest hypotheses. The point here is that the first choice has no inherent contradictions and the AI can give it an appropriate probability without needing any special cases in its algorithm.

Self Improvement

Self improvement is not a special case either. Continuing the previous example, let's assume:

If the AI is motivated to write software for the machine on the right, and then after that software is installed, to allow the machine on the right to reprogram the machine on the left, the AI has chosen to self-improve.

The decision of the AI to program the machine on the right is not special; it is not different in kind from humans choosing to program machines. Let's assume that happens without incident, and the AI has programmed the machine on the right to take action to do whatever the AI thinks is best.

The interesting part is where the AI chooses to reset the machine on the left and allow it to be reprogrammed by the machine on the right. The choice is between the expected utility of two options:

The AI can estimate an expected utility for each option. The scenarios where the AI isn't really running on the left that are consistent with the first case will be more complex than the normal case, and they will be given correspondingly lower weightings in the expected utility estimate. There is no need for special cases here, and depending on the details, the AI might easily allow the self-improvement to happen.

Modifying the AI's Goals

This scenario, where the humans change the goals of the AI in the middle, is a natural counterpart to the scenario where the AI changed the goals of the humans. If the humans simply change their mind about what they want, and the AI correctly understands this, then that's not disruptive for the AI's plans, since it is normal for this AI to constantly reevaluate what the humans want. However, changing the respect or compassion coefficients or changing the algorithm does seem special. If someone wants to do this, then in the normal case the AI will understand that someone wanted to do it. If the outcome of all of the compassion and respect computations doesn't lead to the conclusion that this will harm others, one would expect the AI to cooperate with the change too.

Understanding Death

The AI described in the Python code would accurately understand the consequences of somebody dying, and the death of a person would not affect the amount of respect and compassion the AI would have for that person. The respect and compassion might last too long. In particular, this conflicts with the motivation behind the rule against perpetuities.

This might not be a problem in practice. If the AI still cares about Joe, who died long ago, then the AI wants to do what Joe wanted. The ability to infer a sharp opinion about what Joe wanted will fade with time as the experiences of Joe's life become less relevant to the present situation. If the different possible explanations of Joe's desires yield conflicting utilities for contemporary situations, then on the average they may affect the AI's behavior very little.

Avoiding Human Instincts

People often suggest constructing a superhuman AI that emulates one human in some way. This seems unwise given the known tendency of humans to organize genocides against each other. It also seems unwise given unknown risk of human misbehavior if the human was suddenly much more intelligent than any past humans, and in an environment that has no resemblance to the environment in which humans evolved.

For example, emotional thinking apparently involves perceiving the body's visceral responses to thoughts. It isn't clear that the emotional life of a human-like creature would be anywhere close to normal unless it had a human body or some nearly indistinguishable substitute. If a simulated body is used, there's no obvious safe way to determine whether it's close enough to the real thing.

Caring for Animals

Nothing here limits us to caring about humans. Even if the AI has been configured to care only about humans, it will care about animals and the environment to the extent that the humans care about animals and the environment.

License

Copyright (c) 2007 Tim Freeman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

(This is the standard MIT License, copied from http://www.opensource.org/licenses/mit-license.php on 24 Apr 2007.)

Valid HTML 4.01 Strict