21

When should a robot stop thinking and starting acting?

 4 years ago
source link: https://towardsdatascience.com/how-can-a-robot-decide-when-to-stop-thinking-and-start-acting-d65d3ed8dce2?gi=9c6c31aef5ff
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

People solve many problems every day. Maybe we’re figuring out what we should wear to some party or maybe we’re coming up with the best way to drive to some place. Sometimes we’re even trying to figure out who we should marry or where we should go to college.

When a person solves a difficult problem involving many different factors, they make a tradeoff without even realizing it. If we spend too little time solving the problem, we end up with a solution that just isn’t good enough. However, if we spend too much time solving the problem, we’ll get a great solution, but it’ll take way too long. In other words, when we solve complex problems, we unknowingly but effectively answer the following question:

When should we stop thinking about a problem and just start acting on the best solution that we’ve come up with so far?

Let’s consider an example. Suppose Sally is coming up with the best way to drive from her house to a ski resort. Take a look at a few options. Which one should she prefer?

  1. Should she think for a long time (30 minutes) to get a great route that takes 35 minutes?
  2. Should she think for almost no time at all (1 second) to get a terrible route that takes 2 hours?
  3. Should she think for a decent amount of time (2 minutes) to get a decent route that takes 40 minutes?

Although the first option results in the quickest route, she’d spend way too much time thinking. Similarly, while the second option requires her to think very little, she’d spend way too much driving. The third option, however, offers a nice trade-off between thinking time and driving time.

My first paper in grad school tried to answer this question for a robot: when should a robot stop thinking about its problem and start acting on the best solution that it’s come up with so far? That’s what we’ll talk about today.

vaQb2e6.jpg!web

Photo by Bruno /Germany on Pixabay

What do we mean by thinking?

Let’s first formalize what we mean by a robot thinking . We’re not talking about the hard problem of consciousness or the Chinese room argument for the philosophy geeks like me in the room. Instead, we’re talking about a class of algorithms called an anytime algorithm . Simply put, an anytime algorithm is just an algorithm that gradually improves a solution over time and can be interrupted at any time for that solution. For example, if we’re trying to come up with a route from the grocery story to the hospital, the anytime algorithm would continually produce routes that get better and better with more and more time. Basically, when we say a robot is thinking , what we really mean is that the robot is executing an anytime algorithm that produces solutions that improve over time.

An anytime algorithm usually has a couple nice properties. First, an anytime algorithm exhibits monotonicity : it guarantees that solution quality increases or stays the same but never gets worse over time. Next, an anytime algorithm exhibits diminishing returns : the improvement in solution quality is high at the early stages of the computation and low at later stages.

To illustrate the behavior of an anytime algorithm, take a look at this photo. In this photo, as computation time increases, solution quality increases as well. It turns out that this behavior is pretty typical of an anytime algorithm.

qUfumyJ.png!web

Trading solution quality for computation time

To determine when a robot should stop thinking and start acting, we need to quantify two things: how much the quality of a solution is worth to us and how much time is worth to us. Cue time-dependent utility . Time-dependent utility represents the utility of a solution computed by an anytime algorithm. Intuitively, in any real-time decision-making tasks, a solution of some quality computed in a second has higher utility than a solution of the same quality computed in an hour. This means that the utility of a solution is a function of both quality and computation time. Here’s a formal way of saying this:

Luckily, it’s usually possible to simplify a time-dependent utility function by expressing it as the difference between two functions called the intrinsic value function and the cost of time . First, the intrinsic value function represents the utility of a solution if we consider only the quality of that solution, ignoring the cost of computation time. Second, the cost of time represents the utility of a solution if we take into account only the time needed to compute that solution, disregarding the value of solution quality. Let’s define this simplification a little more formally below:

jMF7v22.png!web

What’s the point of this stuff? That’s a great question. Given a time-dependent utility function separated into an intrinsic value function and a cost of time, the problem of deciding when to interrupt an anytime algorithm and act on the current solution becomes well-defined. For example, take a look the example below. We have three functions: the intrinsic value function, the cost of time, and the time-dependent utility function. The intrinsic value function gradually increases over time, the cost of time exponentially decreases over time, and the time-dependent utility function creates a convex function for us. We should stop thinking and start acting at the top of the time-dependent utility function.

QRbqymj.png!web

Sounds easy, right? Well, there’s a problem. While it’s easy to point to the orange circle and say stop here, we don’t actually know the performance of the anytime algorithm in advance. Deciding when to interrupt the anytime algorithm and act on the current solution requires predicting the performance of the anytime algorithm in the future.

Predicting the performance of anytime algorithm

Before we get into how we can predict the performance of an anytime algorithm, let’s define a few more things. We’re going to define a pair of vectors that jointly represent the performance of an anytime algorithm. The first vector describes the past performance of the anytime algorithm as it solves an instance of a problem. We can define past performance as a vector of solution qualities observed from the initial solution to the current solution. In other words, the performance history is a sequence of solution qualities observed from the start time step to the current time step of the anytime algorithm like this:

RJb6R3U.png!web

The second vector represents the future performance of an anytime algorithm as it solves an instance of a problem. We can define future performance as a vector of solution qualities projected over the remaining time of the algorithm after the current solution to the final solution. That is, a performance projection is a sequence of solution qualities projected after the current time step to the final time step of the algorithm like this:

To predict the future performance of an anytime algorithm, we can use its past performance on the instance of the problem being solved. Generally, this is just a function that computes a performance projection from a performance history . Without committing to a specific implementation, let’s define this function broadly like this:

It’s important to highlight that this function can be implemented in many ways. In most cases, a simple method, such as nonlinear regression , can compute a decent performance projection from a performance history. In fact, that’s what I did in the paper. However, for all of those deep learning experts out there, it could also be done using a sophisticated model, like a neural network, that includes features of the algorithm.

Here’s an intuitive depiction of a performance predictor. Ideally, it computes performance projections that approach the true performance of an anytime algorithm as the size of the performance history increases. For instance, at the i th time step, the i th performance projection doesn’t closely approximate the true performance p* . In fact, it’s way too optimistic. However, at the (i + 1) th time step, the next performance projection (i + 1) th performance projection gets closer to the true performance p* . Intuitively, as the performance predictor exploits more solution qualities in the performance history, the performance projections approach the true performance of an algorithm.

eaQRzm3.png!web

When should a robot stop thinking and start acting?

Now that we’ve defined a bunch of stuff, we can finally see how a robot decides when to stop thinking and start acting. Basically, the robot will monitor its anytime algorithm at fixed intervals. At each time step, the robot will generate a performance projection from a performance history using a performance predictor. Depending on how that performance projection looks, the robot will either interrupt the anytime algorithm or let it continue to run. At a deeper level, the basic algorithm works like this:

  1. Initialize the time step and the performance history.
  2. Start the anytime algorithm.
  3. While the anytime algorithm is running, do the following operations:
  4. Get the current solution of the anytime algorithm.
  5. Calculate the quality of the current solution.
  6. Add the quality of the current solution to the performance history.
  7. Compute a performance projection from the performance history using the performance predictor.
  8. If the performance projection has met the stopping condition:
  9. — — Interrupt the anytime algorithm.
  10. — — Return the current solution.
  11. Otherwise , increment the time step and sleep for a while.

But what’s a stopping condition? Simply put, we use a stopping condition to determine whether or not an anytime algorithm should be interrupted at a given time step. If the stopping condition is true, we interrupt the algorithm. Otherwise, we let the algorithm continue to run.

And what stopping condition should we use? We use a stopping condition that lets the anytime algorithm run as long as the difference between the utility of the projected best solution and the utility of the current solution is greater than zero . More formally, we let the anytime algorithm run when the following value is greater than zero:

7feieyR.png!web

Without further ado, for those who want to see more technical details, here’s what the method for deciding when an intelligent system should stop thinking and start acting looks like:

yAVjaaI.png!web


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK