Keywords

Artificial intelligence; algorithmic probability; sequential decision theory; rational agents; value function; Solomonoff induction; Kolmogorov complexity; reinforcement learning; universal sequence prediction; strategic games; function minimization; supervised learning.

Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI that is still effectively more intelligent than any other time and length bounded agent. The computation time of AIXI is of the order . The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.

###### Contents

- 1 Introduction
- 2 Agents in Known Probabilistic Environments
- 3 Universal Sequence Prediction
- 4 The Universal Algorithmic Agent AIXI
- 5 Important Problem Classes
- 6 Time-Bounded AIXI Model
- 7 Discussion
- Introductory textbooks
- Algorithmic information theory
- Sequential decision theory
- The author’s contributions
- Acknowledgements

## 1 Introduction

This article gives an introduction to a mathematical theory for intelligence. We present the AIXI model, a parameter-free optimal reinforcement learning agent embedded in an arbitrary unknown environment.

The science of Artificial Intelligence (AI) may be defined as the construction of intelligent systems and their analysis. A natural definition of a system is anything that has an input and an output stream. Intelligence is more complicated. It can have many faces like creativity, solving problems, pattern recognition, classification, learning, induction, deduction, building analogies, optimization, surviving in an environment, language processing, knowledge and many more. A formal definition incorporating every aspect of intelligence, however, seems difficult. Most, if not all known facets of intelligence can be formulated as goal-driven or, more precisely, as maximizing some utility function. It is, therefore, sufficient to study goal-driven AI; e.g. the (biological) goal of animals and humans is to survive and spread. The goal of AI systems should be to be useful to humans. The problem is that, except for special cases, we know neither the utility function nor the environment in which the agent will operate in advance. The mathematical theory, coined AIXI, is supposed to solve these problems.

Assume the availability of unlimited computational resources. The first important observation is that this does not make the AI problem trivial. Playing chess optimally or solving NP-complete problems become trivial, but driving a car or surviving in nature don’t. This is because it is a challenge itself to well-define the latter problems, not to mention presenting an algorithm. In other words: The AI problem has not yet been well defined. One may view AIXI as a suggestion for such a mathematical definition of AI.

AIXI is a universal theory of sequential decision making akin to Solomonoff’s celebrated universal theory of induction. Solomonoff derived an optimal way of predicting future data, given previous perceptions, provided the data is sampled from a computable probability distribution. AIXI extends this approach to an optimal decision making agent embedded in an unknown environment. The main idea is to replace the unknown environmental distribution in the Bellman equations by a suitably generalized universal Solomonoff distribution . The state space is the space of complete histories. AIXI is a universal theory without adjustable parameters, making no assumptions about the environment except that it is sampled from a computable distribution. From an algorithmic complexity perspective, the AIXI model generalizes optimal passive universal induction to the case of active agents. From a decision-theoretic perspective, AIXI is a suggestion of a new (implicit) “learning” algorithm, which may overcome all (except computational) problems of previous reinforcement learning algorithms.

There are strong arguments that AIXI is the most intelligent unbiased agent possible. We outline for a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning, how the AIXI model can formally solve them. The major drawback of the AIXI model is that it is incomputable. To overcome this problem, we construct a modified algorithm AIXI that is still effectively more intelligent than any other time and length bounded agent. The computation time of AIXI is of the order . Other discussed topics are a formal definition of an intelligence order relation, the horizon problem and relations of the AIXI theory to other AI approaches.

The article is meant to be a gentle introduction to and discussion of the AIXI model. For a mathematically rigorous treatment, many subtleties, and proofs see the references to the author’s works in the annotated bibliography section at the end of this article, and in particular the book [Hut04]. This section also provides references to introductory textbooks and original publications on algorithmic information theory and sequential decision theory.

Chapter 2 presents the theory of sequential decisions in a very general form (called AI model) in which actions and perceptions may depend on arbitrary past events. We clarify the connection to the Bellman equations and discuss minor parameters including (the size of) the I/O spaces and the lifetime of the agent and their universal choice which we have in mind. Optimality of AI is obvious by construction.

Chapter 3: How and in which sense induction is possible at all has been subject to long philosophical controversies. Highlights are Epicurus’ principle of multiple explanations, Occam’s razor, and probability theory. Solomonoff elegantly unified all these aspects into one formal theory of inductive inference based on a universal probability distribution , which is closely related to Kolmogorov complexity , the length of the shortest program computing . Rapid convergence of to the unknown true environmental distribution and tight loss bounds for arbitrary bounded loss functions and finite alphabet can be shown. Pareto optimality of in the sense that there is no other predictor that performs better or equal in all environments and strictly better in at least one can also be shown. In view of these results it is fair to say that the problem of sequence prediction possesses a universally optimal solution.

Chapter 4: In the active case, reinforcement learning algorithms are usually used if is unknown. They can succeed if the state space is either small or has effectively been made small by generalization techniques. The algorithms work only in restricted (e.g. Markovian) domains, have problems with optimally trading off exploration versus exploitation, have nonoptimal learning rate, are prone to diverge, or are otherwise ad hoc. The formal solution proposed here is to generalize Solomonoff’s universal prior to include action conditions and replace by in the AI model, resulting in the AIAIXI model, which we claim to be universally optimal. We investigate what we can expect from a universally optimal agent and clarify the meanings of universal, optimal, etc. Other discussed topics are formal definitions of an intelligence order relation, the horizon problem, and Pareto optimality of AIXI.

Chapter 5: We show how a number of AI problem classes fit into the general AIXI model. They include sequence prediction, strategic games, function minimization, and supervised learning. We first formulate each problem class in its natural way (for known ) and then construct a formulation within the AI model and show their equivalence. We then consider the consequences of replacing by . The main goal is to understand in which sense the problems are solved by AIXI.

Chapter 6: The major drawback of AIXI is that it is incomputable, or more precisely, only asymptotically computable, which makes an implementation impossible. To overcome this problem, we construct a modified model AIXI, which is still superior to any other time and length bounded algorithm. The computation time of AIXI is of the order . The solution requires an implementation of first-order logic, the definition of a universal Turing machine within it and a proof theory system.

Chapter 7: Finally we discuss and remark on some otherwise unmentioned topics of general interest. We remark on various topics, including concurrent actions and perceptions, the choice of the I/O spaces, treatment of encrypted information, and peculiarities of mortal embodies agents. We continue with an outlook on further research, including optimality, down-scaling, implementation, approximation, elegance, extra knowledge, and training of/for AIXI(). We also include some (personal) remarks on non-computable physics, the number of wisdom , and consciousness.

An annotated bibliography and other references conclude this work.

## 2 Agents in Known Probabilistic Environments

The general framework for AI might be viewed as the design and study of intelligent agents [RN03]. An agent is a cybernetic system with some internal state, which acts with output on some environment in cycle , perceives some input from the environment and updates its internal state. Then the next cycle follows. We split the input into a regular part and a reward , often called reinforcement feedback. From time to time the environment provides nonzero reward to the agent. The task of the agent is to maximize its utility, defined as the sum of future rewards. A probabilistic environment can be described by the conditional probability for the inputs to the agent under the condition that the agent outputs . Most, if not all environments are of this type. We give formal expressions for the outputs of the agent, which maximize the total -expected reward sum, called value. This model is called the AI model. As every AI problem can be brought into this form, the problem of maximizing utility is hence being formally solved, if is known. Furthermore, we study some special aspects of the AI model. We introduce factorizable probability distributions describing environments with independent episodes. They occur in several problem classes studied in Section 5 and are a special case of more general separable probability distributions defined in Section 4.3. We also clarify the connection to the Bellman equations of sequential decision theory and discuss similarities and differences. We discuss minor parameters of our model, including (the size of) the input and output spaces and and the lifetime of the agent, and their universal choice, which we have in mind. There is nothing remarkable in this section; it is the essence of sequential decision theory [NM44, Bel57, BT96, SB98], presented in a new form. Notation and formulas needed in later sections are simply developed. There are two major remaining problems: the problem of the unknown true probability distribution , which is solved in Section 4, and computational aspects, which are addressed in Section 6.

### 2.1 The Cybernetic Agent Model

A good way to start thinking about intelligent systems is to consider more generally cybernetic systems, in AI usually called agents. This avoids having to struggle with the meaning of intelligence from the very beginning. A cybernetic system is a control circuit with input and output and an internal state. From an external input and the internal state the agent calculates deterministically or stochastically an output. This output (action) modifies the environment and leads to a new input (perception). This continues ad infinitum or for a finite number of cycles.

###### Definition 1 (The Agent Model)

An agent is a system that interacts with an environment in cycles . In cycle the action (output) of the agent is determined by a policy that depends on the I/O-history . The environment reacts to this action and leads to a new perception (input) determined by a deterministic function or probability distribution , which depends on the history . Then the next cycle starts.

As explained in the last section, we need some reward assignment to the cybernetic system. The input is divided into two parts, the standard input and some reward input . If input and output are represented by strings, a deterministic cybernetic system can be modeled by a Turing machine , where is called the policy of the agent, which determines the (re)action to a perception. If the environment is also computable it might be modeled by a Turing machine as well. The interaction of the agent with the environment can be illustrated as follows: