# Copyright (c) 2009 Tim Freeman # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell # copies of the Software, and to permit persons to whom the Software is # furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included in # all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN # THE SOFTWARE. # # (This is the standard MIT License, copied from # http://www.opensource.org/licenses/mit-license.php on 24 Apr 2007.) #desc A simple decision procedure that uses inductive inference to #desc answer questions. # Goals: # * Make something simple and understandable, so people will have better # intuitions when reading the main CoD paper. # * Demonstrate to people who haven't read the main CoD paper that treating # inductive inference as a programming language primitive is useful. # * Make something that doesn't have the novice philosopher bug. # * Play around with the liar paradox # (http://en.wikipedia.org/wiki/Liar_paradox). # For simplicity, this question answerer answers one question. It # does not take a counterfactual version of its own output as an # input, so if you ask questions about the future, and then react to # the answer before it has been proven true or false, you can invent # situations where it's always wrong. # Time is represented as nonnegative integers: 0, 1, 2, 3, ... # First we have a procedure "predict_next" that looks at a sequence of # observations and predicts what will happen next. # # "observations" is a list of the observations made so far. It has # some length, say t. The element at position p in observations # indicates what was observed at time p. # # "eps" (short for epsilon) is the maximum probability of the # hypotheses that are not explored. If we're going to have a decision # procedure, we can't explore the entire space because it is infinite. # If all hypotheses we do not look at are consistent with the # observations, then the total probability of the unexplored # explanations is at most eps. However, most explanations are # inconsistent with any actual set of observations, so the total # probability of any of the unexplored explanations will generally be # much less than eps. # # We will generally explore somewhat more than the simplest portion of # the hypothesis space with total probability (1-eps), since our # procedure for generating hypotheses gives them to us in lumps. # The problem is basically to derive the ending step number for the # algorithm AS on page 7 of: # ftp://ftp.idsia.ch/pub/juergen/coltspeed.pdf # except we're doing it slightly different because we want a bound on # the error in the probability estimate, not the ratio between the # error in the measure and the measure. As an aside, I'd like to # point out that I found Schmidhuber's notation to be remarkably # opaque. The fact that we're penalizing long programs twice for # being long, instead of just once, is not apparent at all. # # Below we will define S(x) to be the speed prior of x. It's a # measure; we'll need to divide by the total measure to get the # probability. # Hypotheses h consist of a runtime bound maxrun(h) and a program # prog(h) with a size l(prog(h)). maxrun is the log of the runtime # bound, so: # # h explains x is defined to mean if we run prog(h) for 2**maxrun(h) # steps, we get x. # # h is a prefix of h' if: # prog(h) is a prefix of prog(h') and maxrun(h) <= maxrun(h') # (That is, if a hypothesis predicts a result, then letting it run # longer or adding bits to it doesn't help.) # h is a proper prefix of h' if h is a prefix of h' and h != h'. # # The simplicity of h is 2*len(prog(h)) + maxrun(h). Write as simplicity(h). # (We need to multiply the length by two so we can compute # approximations to the speed prior as detailed below. This # definition of S(x) is equivalent to Schmidhuber's, but the factor of # two was difficult for me to see because I was confused by his # notation.) # If h explains x, and # no proper prefix of h explains x, # then h contributes 1/2**simplicity(h) to S(x) # Define hset(x) to be the set of hypotheses that contribute to S(x). # That is: # hset(x) = {h | h explains x and no proper prefix of h explains x} # Then we define S(x) to be the sum for h in hset(x) of 1/2**(simplicity(h)) # We need to truncate this sequence to get a computable approximation # to S(x). Define: # the size of h is len(prog(h)) + maxrun(h). Write it as size(h). # Note that simplicity(h) = size(h) + len(prog(h)). # If we order the h by their size, we will show that # discarding h with size >= i decreases S by at most 1/2**(i-1). # Here's the proof: # Define hsettrunc(i, x) to be {h in hset(x) such that size(h) >= i} # # and define Stail(i, x) = sum for h in hsettrunc(i, x) of 1/2**simplicity(h) # # So we need to show that Stail(i, x) <= 1/2**(i-1) # We'll use the Kraft inequality, which states that # if you have a set P of binary strings, and no string in P is a # prefix of any other string in P, then: # # sum for p in P of 1/2**len(p) <= 1 # We need to assume that adding unused information from a program # cannot speed it up. This has been false for some real models of # computation. For example, if you want to compile fast code for an # 80486 processor or better, it is useful to know exactly how the # instruction cache operates and to position the instructions of the # program to keep them from competing with each other for a position # in the cache. In this case, adding unused bytes to the compiled # code makes it run faster because it positions the instructions in # the right places. However, the assumption is true for the Turing # machine used in this paper. # Lemma: Suppose h1 and h2 explain x and prog(h1) is a prefix of # prog(h2). Then there is an h3 that is a prefix of both h1 and h2 # such that h3 explains x. # Proof: If maxrun(h1) <= maxrun(h2), then h1 is a prefix of h2 # and we can simply take h3 = h1. # In the more interesting case where # maxrun(h1) > maxrun(h2), let prog(h3) = prog(h1), and # let maxrun(h3) = maxrun(h2). By construction, h3 is a # prefix of both h1 and h2. # Since h1 explains x, prog(h1) computes x. # Since h2 explains x, prog(h2) computes computes x in 2**maxrun(h2) # steps. Since prog(h1) computes x, the extra information in prog(h2) # that is not in prog(h1) must not be used. Adding unused information # to a program cannot speed it up, so prog(h1) must compute x # in 2**maxrun(h2) steps or less. prog(h3) = prog(h1) and maxrun(h3) = # maxrun(h2), so h3 explains x. QED. # By construction, no hypothesis in hset(x) is a prefix of any other. # The preceeding lemma implies that {prog(h) | h in hset(x)} satisfies # the premise of the Kraft inequality. # Stail(i, x) = sum for h in hsettrunc(i, x) of 1/2**simplicity(h) # = sum for h in hset(x) such that size(h) >= i # of 1/2**(len(prog(h)) + size(h)) # = sum for j = i to infinity of # sum for h in hset(x) such that size(h) = j of # 1/2**(len(prog(h)) + j) # = sum for j = i to infinity of (factoring out 1/2**j) # 1/2**j * sum for h in hset(x) such that size(h) = j of # 1/2**(len(prog(h))) # <= sum for j = i to infinity of 1/2**j (by the Kraft inequality) # = 1/2**(i-1) (summing the series) # Define: # Stotal = sum over all y of S(y) # Stotal will generally be less than 1. # Therefore, the Speed Prior is a measure, not a probability estimate. # To get a probability estimate, we have to # divide by the total measure. That is, if we define # Sprob(x) = S(x) / Stotal # then Sprob(x) is the a-priori probability of x in the speed prior. # Our goal is, given an epsilon > 0, to be able to effectively compute # some Sprobest(x, epsilon), to be defined, with the property: # | Sprob(x) - Sprobest(x, epsilon) | < epsilon # Define Strunc(i, x) = S(x) - Stail(i, x), or equivalently: # Strunc(i, x) = sum for h in hset(x) where size(h) < i of 1/2**simplicity(h) # We can compute Strunc(i, x). # Because any terms present in S(x) but not Stail(i, x) are positive, # Strunc(i, x) < S(x). Thus: # |S(x) - Strunc(i, x)| = S(x) - Strunc(i, x) = Stail(i, x) <= 1/2**(i-1), # We don't have to iterate over y to compute Stotal. # We can compute it by simply dropping the parts of the definition of # S(x) that restrict it to just computing the total measure of x. # That is, # Stotal = # sum over all h # such that h explains anything # and no proper prefix of h explains anything # of 1/2**simplicity(h) # Here "h explains anything" means that prog(h) computes some value # when run for <= 2**maxrun(h) steps. # The argument that Stail(i, x) <= 1/2**(i-1) made no use of x, so we # can show analogously that truncating the sum for Stotal is subject # to the same error. In other words, if we define # Stotaltail(i) = sum over all h # such that h explains anything # and no proper prefix of h explains anything # and size(h) >= i # of 1/2**simplicity(h) # then the same argument given before implies that Stotaltail(i) <= 1/2**(i-1). # Define Stotaltrunc(i) = Stotal - Stotaltail(i) # or equivalently # Stotaltrunc(i) = sum over all h # such that h explains anything # and no proper prefix of h explains anything # and size(h) < i # of 1/2**simplicity(h) # # So Stotaltrunc(i) is computable for any i, and # 0 < Stotal - Stotaltrunc(i) = Stotaltail(i) <= 1/2**(i-1). # Our estimate of Sprob(x) will be Strunc(i, x) / Stotaltrunc(i) for # some i. We know that 0 <= Strunc(i, x) <= Stotaltrunc(i), and we # want to choose i such that # |Sprob(x) - (Strunc(i, x) / Stotaltrunc(i))| < epsilon # Define: # Stotalterm(j) = # sum over all h # such that h explains anything # and no proper prefix of h explains anything # and size(h) = j # of 1/2**simplicity(h) # Clearly Stotaltrun(i) = sum for j = 0 to i-1 of Stotalterm(j). # # Let j be the least nonnegative integer such that Stotalterm(j) # is nonzero. # # Let hs be the h with the smallest value of len(prog(h)) considered # when computing Stotalterm(j). # # Let lhs be len(prog(hs)). # # Thus # Stotalterm(j) >= 1/2**simplicity(hs) # = 1/2**(size(hs) + len(prog(hs))) # = 1/2**(j + lhs) # # The long names for these variables are unwieldy for the analysis # that follows. Let: # a = S(x) # b = Stotal # c = Strunc(i, x) # d = Stotaltrunc(i) # e = epsilon # f = 1/2**(j + lhs) # So c, d, e, and f are known. a and b are unknown. # 0 <= c <= a <= 1 # 0 <= a - c <= 1/2**(i-1) # 0 <= b - d <= 1/2**(i-1) # 0 < f <= d <= b <= 1 # Regard x as constant. c, d, and f are implicitly functions of i. Our # goal is to find an i such that: # | a/b - c/d | < e # Now # | a/b - c/d | = # | (ad/b)/d - c/d | = # | ad/b - c | / d <= # | ad/b - c | / f # so it suffices to show that | ad/b - c | / f < e. # This is equivalent to # | ad/b - c | < ef # and to -ef < ad/b - c < ef. We'll find an i that satisfies each of # these inequalities separately. # First we'll find an i such that ad/b - c < ef. # d <= b implies d/b <= 1 which implies ad/b <= a and # ad/b - c <= a - c. # Thus it suffices to show that a - c < ef. # a - c <= 1/2**(i-1), so it suffices to show that 1/2**(i-1) < ef. # Taking the logarithm (all logarithms are to the base 2), we get # 1 - i < log e + log f. # Since f = 1/2**(j + lhs), this is equivalent to # 1 - i < log e - j - lhs # and i > j + lhs - log e + 1. This is the i we wanted, so we can # move on to the other inequality. # Next we'll find an i such that -ef < ad/b - c, which is equivalent # to ef > c - ad/b. # We need a lower bound for d/b and we get that by finding an upper # bound for d/b. # b - d <= 1/2**(i-1), so b <= d + 1/2**(i-1). Dividing by d gives: # b/d <= 1 + 1/d*2**(i-1) = (d*2**(i-1) + 1) / d*2**(i-1). # Inverting, # d/b <= d*2**(i-1) / (d*2**(i-1) + 1) # The right hand side is (d*2**(i-1) + 1 - 1)/ (d*2**(i-1) + 1) # which in turn is 1 - 1/(d*2**(i-1) + 1). Thus it suffices to show that # ef > c - a (1 - 1/(d*2**(i-1) + 1)) = c - a + a/(d*2**(i-1) + 1). # c - a <= 0, so it suffices to show that # ef > a/(d*2**(i-1) + 1) # a <= 1, so it suffices to show that # ef > 1/(d*2**(i-1) + 1) # which is equivalent to (d*2**(i - 1) + 1)*e*f > 1 and: # d*2**(i-1)*e*f + ef > 1 # ef is positive, so it sufices to show # d*2**(i-1)*e*f > 1 # Taking logarithms, this is equivalent to # log d + i - 1 + log e + log f > 0 # Having log d in there is inconvenient because d is a function of i # and our goal is to find a good value for i. Since d >= f, # log d >= log f, so it suffices to show # i - 1 + log e + 2 * log f > 0 # Unfolding the definition of f and pushing everything but i to the # right, this is equivalent to: # i > 1 - log e + 2*j + 2*lhs # This is what we wanted to know about i. This a tighter constraint # on i than the previous result for the other side of the inequality, # so our final result is that # # | Sprob(x) - Strunc(i, x) / Stotaltrunc(i) | < epsilon # # if: # # i > 1 - log epsilon + 2*(j + lhs) # # That is, we should choose i = 2 - log epsilon + 2*(j + lhs). # Use case to worry about: We guess a physical law that leads to # cycling states of the universe. But the training questions and # answers are not consistent with the inferred cycles. So we'll be in # state X and question A has answer B1, and later when we're in state # X, question A has answer B2. There is no correct mapping from # questions * states -> answers to learn, so we'd loop forever looking # for one. To fix this, give each question a unique question number. # That would allow it to at least find a non-informative # interpretation of English, but at least it would find something. from bits import log2, Doesnt_match, many_bits from speed_prior import Speed_prior, prefix_in from machine import empty_tape from turing_tape import tape_nth # mpmath saves me from having to pay attention to floating point # underflow issues. mpf(3) behaves similarly to the Python floating # point number 3.0, except the exponent can get arbitrarily large. # 1.0/(1L << 100000) gets an error. # mpf(1.0)/(1L << 100000) prints a plausible result. from mpmath import mpf, ceil, log # "predict_next" takes: # "predicts": A function that takes a hypothesis and # returns whatever we wanted to infer from the hypothesis, or # throws Doesnt_match if the hypothesis is bad. # If the hypothesis has to predict the past, it's up to "predicts" to # know which past we're talking about. # The result from predicts will be used as a key for a hash table. # "epsilon": The error we're willing to tolerate in our returned # probability estimates. The total error in all of the returned # probabilities is less than epsilon. # "generator": Takes a hypothesis size and returns all hypotheses # with that size. Unit tests pass in a non-default value. # predict_next returns a dict where the keys are the predictions and # the values are the estimated probabilities of the predictions. def predict_next(epsilon, predicts, generator = Speed_prior.generator): imax = None i = 0 # result maps predictions to their measure and eventually their probability. result = {} totalmeasure = 0 # matched_hypotheses has all hypotheses we saw that explained # anything. We need this for prefix checking. matched_hypotheses = [] while imax is None or i <= imax: # minlen is the least program length of any # correct hypothesis of size i, or None if we haven't found # any yet. We only use this to determine imax once. minlen = None for h in generator(i): if not prefix_in(matched_hypotheses, h): try: prediction = predicts(h) # If we get here without raising an exception, # h correctly predicts the past and it predicts some # future. matched_hypotheses.append(h) if minlen is None or minlen > h.program_length(): minlen = h.program_length() totalmeasure += h.measure() result[prediction] = result.get(prediction, 0) + h.measure() except Doesnt_match: pass if imax is None and minlen is not None: imax = ceil(2 - log(epsilon, 2) + 2 * (i + minlen)) i += 1 # Convert measure estimates to probability estimates. for p in result: result[p] /= totalmeasure return result # next_video_frame is designed to be passed in as the # "predicts" argument to predict_next. It says whether the given # hypothesis is consistent with the given past when we're doing the # video prediction exercise. # # "h" is a proposed set of laws of physics. # "past" is the sequence of observations we've made of the past. # Think of these as frames of video. # Return the predicted next frame of video if the hypothesis is consistent the # observations and the hypothesis predicts a frame. If the hypothesis # gives the wrong past or it makes no prediction, raise Doesnt_match. def next_video_frame(physics, past): state = empty_tape for observation in past: # The next statement can raise Doesnt_match if the hypothesis # runs too long or gets an error. pair = physics.run(state) if physics.nth(pair, 0) != observation: raise Doesnt_match("Doesn't predict the past") state = physics.nth(pair, 1) pair = physics.run(state) # Make sure it predicts a future, that is, you do get a # pair back when you ask it for the next step. If there # are syntactic requirements on the predicted future, check them # here and raise Doesnt_match when the requirements are # not satisfied. # For example, if an observation is a frame of video, and you # only want predictions that are frames of video, then check # here that physics. nth(pair, 0) is a frame of video. return physics.nth(pair, 0) def predict_video(past, epsilon, generator = Speed_prior.generator): def predicts(h): return next_video_frame(h, past) return predict_next(epsilon, predicts = predicts, generator = generator) # This is what happened at one moment in the past. # video is a turing tape representing a frame of video that was # observed then. # qa is a list of zero or more QuestionAnswer objects that are # training data. The question was asked at the given time and the # answer is presumably the correct answer. class PastMoment(object): __slots__ = ["video", "qa"] def __init__(self, video, qa=[]): self.video = video self.qa = qa # A QuestionAnswer represents a unit of training data for answering questions class QuestionAnswer(object): __slots__ = ["question", "answer"] def __init__(self, question, answer): self.question = question self.answer = answer # For the question answerer with the novice philosopher bug, # we have two hypotheses: # One that acts like the one for video prediction that came earlier. # That is, it takes a state as an input and produces a pair of a # predicted video frame and a new state. # The other takes a state and a question and returns an answer. # "hypotheses" is a pair with the two speed prior hypothesis objects. def predicts_npbug(hypotheses, pastmoments, question): (physicsh, qah) = hypotheses.explanations state = empty_tape for pm in pastmoments: pair = physicsh.run(state) state = physicsh.nth(pair, 1) # Exercise physicsh. if physicsh.nth(pair, 0) != pm.video: raise Doesnt_match("Bad prediction of the video.") # Exercise qah. for qa in pm.qa: answer = qah.run(qah.build_tuple(qa.question, state)) if answer != qa.answer: raise Doesnt_match("Bad answer to a question") return qah.run(qah.build_tuple(question, state)) def two_hypothesis_generator(totalsize): return many_bits(totalsize, [Speed_prior.generator, Speed_prior.generator]) # Question-answerer, with novice philosopher bug. # The past is a sequence of PastMoment objects. # epsilon is an error bound on the probability estimates. # question is the question being asked at the present moment, which is # just after all of the given past. # The result is a mapping from possible answers to their probabilities. # Easy, but wrong, because of the novice philosopher bug. # If the questions are hard to answer, that shouldn't make the physics # less likely. def answer_questions_npbug(past, question, epsilon, generator=two_hypothesis_generator): def predicts(hypotheses): return predicts_npbug(hypotheses, past, question) return predict_next(epsilon, predicts = predicts, generator = generator) def dict_to_tuple(d): result = [] for k in d: result.append((k, d[k])) return tuple(result) def sequence_to_dict(l): result = {} for k, v in l: result[k] = v return result # qah is going to map (questionnumber, question, state) triples to answers. def predict_qna(qah, pastmoments, question, physicsh): state = empty_tape qn = 0 def runqah(question): return qah.run(qah.build_tuple(qah.num_to_value(qn), question, state)) for pm in pastmoments: pair = physicsh.run(state) state = physicsh.nth(pair, 1) for qa in pm.qa: if runqah(qa.question) != qa.answer: raise Doesnt_match("Bad answer for a question") qn += 1 return runqah(question) # We take a question_generator to use for hypotheses about answering # questions so we can use a stub generator during unit tests. def predicts_nonpbug(physicsh, pastmoments, question, epsilon, question_generator=Speed_prior.generator): def predicts(qah): return predict_qna(qah, pastmoments, question, physicsh) state = empty_tape for pm in pastmoments: pair = physicsh.run(state) if physicsh.nth(pair, 0) != pm.video: raise Doesnt_match("Bad prediction of the video.") state = physicsh.nth(pair, 1) # If we get here, then physicsh is apparently a good set of laws # of physics. Time to try to understand how to answer questions. answers = predict_next(epsilon, predicts, generator=question_generator) # The keys of answers are possible answers to the questions and # the values are the estimated probability that those answer are # correct, assuming that phsycish is the true laws of physics. return dict_to_tuple(answers) # Now let's try it without the novice philosopher bug. # answer_questions_nonpbug takes the same arguments as # answer_questions_npbug. # # Epsilon is our error budget. Dividing it in half and passing half # to the outer call to predict_next and the other half indirectly to # the inner recursive call to predict_next seems naive, but it is # correct. Proving this requires the triangle inequality. Here's the # proof: # # In general, we'll show that if e1 is the error bound passed to the # outer call to predict_next, and e2 is the error bound passed to the # inner call, then the total error in the probability of the matching # answers is e1 + e2. Setting e1 to epsilon/2 and e2 to epsilon/2 is # the special case of this that we're using here. # # Suppose x is a typical key in the result from the outer call to # predict_next, and y is a typical answer. # Let Sp(x) be the correct probability for x given the speed prior and # the given observations, and Spest(x) be the value for the key x in # the dict returned by the outer call to predict_next. Define # e1(x) = | Sp(x) - Spest(x) | # The error bound on predict_next gives: # sum over all x of e1(x) < e1 # Let Sp'(x, y) be the correct probability for y given the speed # prior, the given observations, and x. Let Spest'(x, y) be the value # for the key y in the dict returned by the inner call to predict_next # when we're assuming x. Define # e2(x, y) = | Sp'(x, y) - Spest'(x, y) | # The error bound on predict_next gives: # for all x, sum over all y of e2(x, y) < e2 # Because Sp(x), Spest(x), Sp'(x, y), and Spest'(x, y) are all either # probabilities or probability estimates, we also have the following # dreary but useful list of inequalities: # sum over all x of Sp(x) = 1 # sum over all x of Spest(x) = 1 # for all x, sum over all y of Sp'(x, y) = 1 # for all x, sum over all y of Spest'(x, y) = 1 # for all x, 0 <= Sp(x) <= 1 # for all x, 0 <= Spest(x) <= 1 # for all x and y, 0 <= Sp'(x, y) <= 1 # for all x and y, 0 <= Spest'(x, y) <= 1 # We want to show that the total error in the results from # answer_questions_nonpbug is no more than e1 + e2, or in other words: # sum for all y of # | sum for all x of Sp(x)Sp'(x, y) - Spest(x)Spest'(x, y) | < e1 + e2 # The triangle inequality states that for all a, b, and c: # |a - b| <= |a - c| + |c - b| # The error in # Spest(x)S'pest(x, y) # is # | Sp(x)Sp'(x, y) - Spest(x)Spest'(x, y) | # and then we can add and then subtract Sp(x)Spest'(x, y) to get: # | Sp(x)Sp'(x, y) - Sp(x)Spest'(x, y) # + Sp(x)Spest'(x, y) - Spest(x)Spest'(x, y) | # By the triangle inequality, this is less than or equal to: # | Sp(x)Sp'(x, y) - Sp(x)Spest'(x, y) | # + | Sp(x)Spest'(x, y) - Spest(x)Spest'(x, y) | # Factoring, this is equal to # Sp(x) | Sp'(x, y) - Spest'(x, y) | + | Sp(x) - Spest(x) | Spest'(x, y) # which equals: # Sp(x)e2(x,y) + e1(x)Spest'(x, y) # Our total error # sum for all y of # | sum for all x of Sp(x)Sp'(x, y) - Spest(x)Spest'(x, y) | # is less than or equal to # sum for all y of sum for all x of | Sp(x)Sp'(x, y) - Spest(x)Spest'(x, y) | # and by the above, this is less than or equal to # sum for all y of sum for all x of (Sp(x)e2(x,y) + e1(x)Spest'(x, y)) # which equals # sum for all x of sum for all y of (Sp(x)e2(x,y) + e1(x)Spest'(x, y)) # which equals # sum for all x of (Sp(x) * sum for all y of e2(x,y)) # + sum for all x of sum for all y of e1(x)Spest'(x, y) # which is less than or equal to # sum for all x of Sp(x) * e2 # + sum for all x of e1(x) * sum for all y of Spest'(x, y) # which is less than or equal to # e2 + sum for all x of e1(x) # which is less than e2 + e1, which is what we wanted to show. def answer_questions_nonpbug(past, question, epsilon, question_generator=Speed_prior.generator, physics_generator=Speed_prior.generator): def predicts(physicsh): return predicts_nonpbug(physicsh, past, question, epsilon/2, question_generator=question_generator) answerstable = predict_next(epsilon/2, predicts=predicts, generator=physics_generator) result = {} for answerslist in answerstable: answersdict = sequence_to_dict(answerslist) physprobability = answerstable[answerslist] for answer in answersdict: result[answer] = result.get(answer, 0) + \ physprobability * answersdict[answer] return result