Answered What happens when I use KNITRO?

  • 12 Mei 2011 16:26
     
     

    Hi,

    I am now using solver foundation with the KNITRO plugin and am getting very good performance for a non linear problem. Just out of curiousity, I wonder what is actually happening under the hood.

    I imagine that solver foundation is taking my expression tree for the objective function, analysis it and then provides KNITRO with callbacks for an evaluation of the objective, the gradient and the hessian? Is it using automatic differentiation to compute the gradient and hessian in this case? Is solver foundation using something like reflection emit to create the callback functions (which I assume would result in very high perf), or is it based on some data structures in memory that represent the expressions for the objective function and then analysing those data structures in these callback functions?

    Thanks!

    Best,

    David

Semua Balasan

  • 13 Mei 2011 0:41
    Pemilik
     
     Jawab

    You have it basically right. We start with the expression tree and convert it to a bytecode-like form that is easier to analyze. We then apply symbolic differentiation to generate the gradient and hessian, and optimize to evaluate constants, eliminate common subexpressions, and so on. The callbacks passed to the solver then interpret that bytecode. We experimented with runtime code generation but it turns out that interpreting is faster for reasonably sized models.

    Ross

  • 13 Mei 2011 16:27
     
      Memiliki Kode
    Thanks, and again I have to follow up questions:
    1. Is it really symbolic differentiation? I was under the impression that something like automatic differentiation (as described here http://en.wikipedia.org/wiki/Automatic_differentiation) is used?
    2. I am curios about the performance of bytecode interpretation vs runtime code generation. Let me first desribe how I use solver foundation right now. Essentially I am using solver foundation inside a larger numerical project where I do something like function iteration. In the end I have a function f(x,a), where x is a vector of variables and a is a vector of parameters, and then the optimization is in a loop that is called VERY often, so my code looks something like this:
    // SETUP PART
    // Create problem
    
    f(x,a) = bla bla // Construct the expression trees for objective etc
    
    // SOLVE PART
    // A is a set of vectors for a that is very large
    foreach(a in A)
    {
      // THIS IS THE CALL TO SOLVER FOUNDATION SOLVE
      solution = max arg f(x,a)
    }
    
    This loop will be executed a very large number of times, so the call to the solve method in solver foundation will happen many thousand times within that loop. Each time I set the parameter vector a to different values, but everything else stays the same (in particular the objective expression).
    So, at the end of the day, I don't care about the time spent in the setup part, anything that cuts down on the time spent in the actual solve routine will easily pay off.
    Do you believe that bytecode interpretation still wins in this case? I just have a hard time believing so (but obviously, might be wrong). The overhad in generating code seems to be in the "compile" step from the expression tree to reflection emit, then in the compile step from intermediate bytecode to machine, and I can easily see that this might take quite a while. But once that is done, wouldn't the callbacks you have at that point always be faster than a callback that interprets a data structure?
    The whole situation to me looks a bit like regular expressions: if you use a regular expression once or twice (or a few times), I am sure compiling it introduces an overhad that is not worth the payoff in you gain later. But if you know a regular expression will be reused millions of times, the proper strategy might be to pre-compile. The regular expression class solved this by giving us a choice, maybe something similiar would make sense for solver foundation as well? If, of course, my data free speculation is correct and there are cases like my function iteration code where code emit could really have a large payoff.
    Best,
    David
  • 13 Mei 2011 20:07
    Pemilik
     
     

    1. It's really symbolic differentiation. We even have an internal tool that can differentiate arbitrary expressions and print the result.

    2. Unfortunately, almost all the work happens in solve. The solver foundation services does the whole process I described every time you call solve, even if the only thing that's changed is the parameters. Doing reflection emit at each call to solve is too expensive. It would be nice if we could cache the results and reuse them, but doing that turns out to be much harder than it looks. (ForEachWhere is a big culprit here - you can use it to make large changes to the structure of your model based on parameter values.)

    Ross

  • 13 Mei 2011 21:58
     
     

    On 1: Do you have any plans to change over to automatic differentiation? My reading of the literature is that symbolic differntiation is accurate but slow, whereas automatic differentiation is accurate and fast. Automatic differentiation really seems to be the "state of the art" method of computing derivatives, right?

    On 2: Ah, that is interesting. I guess I would put in a strong vote to try to make the repeated-solve-with-different-parameters as fast as possible :) I think there are a great number of interesting numerical algorithms that would have the kind of structure I discussed above that would really benefit from this. Maybe one could cache nevertheless, and detect whether a particular change to parameter values changes the problem structure too much and in that case invalidate the cached stuff? That might still speed up cases in which the ForEachWhere issue doesn't come up.

    Best,

    David

  • 14 Mei 2011 16:25
    Pemilik
     
     

    Hi David,

    During the implementation of Solver Foundation 3.0 we evaluated a number of different approaches for differentiation, and chose the approach that gave us the best total system performance and accuracy. There are a number of subtle implementation issues (some of which Ross mentioned in an earlier reply) and the relative costs change depending on the modeling environment and implementation language.

    That said, both of your suggestions are very sensible, and are things that we will take into consideration for future releases. Repeated solve with different parameters is an important scenario across the board, not just for nonlinear optimization. We do want to minimize the amount of "wasted" work when there are multiple solves, while at the same time not wasting system resources when there is only one solve.

    Nathan

  • 16 Mei 2011 16:49
     
     

    Hi Nathan,

    the reason I was under the impression you used autodiff (in the usual sense, i.e. specifically not symbolic difff) was that it was mentioned here

    http://blogs.msdn.com/b/solverfoundation/archive/2010/11/10/microsoft-solver-foundation-v3-0-released.aspx

    including details like "forward and backward". I understand what forward and backward mean in the autodiff context, but in the symbolic case I am a bit confused. Was the info in that blog just not accurate?

    I found a automatic diff library for .Net on codeplex (http://autodiff.codeplex.com/) so I am almost tempted to play with that a bit... My sense is that they will at some point support compilation of value and gradient evaluation routines via expression compile, at which point that might be the architecture I initially had in mind. If I were to use that library to create my value, gradient and hessian evaluators, would it be fairly striaghtforward to use the KNITRO plugin class directly without using the service layer (i.e. like I can use the CompactQuasiNewtonSolver class directly)? Are there any other optimizations that the service layer performes (other than the differentiation) that I would miss out when going to the solver directly?

    Also, the more I work with these math things on .Net the more apparent some basic framework support for all of this becomes to me. Right now the situation is really quite fragmented, each and every library comes with its own matrix support, all symbolic packages bring on their own classes for expression terms etc, making any interopability really quite painful. Would be great if MS provided basic data structures for matrices, vectors, symbolic expressions etc that could then be shared by libraries like autodiff. I know, quite a project, but it would be really cool.

    Best,

    David

  • 16 Mei 2011 17:00
    Pemilik
     
     

    Hi David, the informtion in the blog is accurate. The discussion has moved to internal implementation details of Solver Foundation, and while that is interesting, we don't want to go through all of the specific implementation choices and techniques in this forum because some of them are novel.

    Over time, it would be great if Solver Foundation components such as autodiff could be used outside the context of solving NLPs, and conversely if "plug-in" support moved beyond simply *solver* plug-in support. But like any software project we limited resources and we've got to focus our attention on the things that customers tell us are most important, as well as the things that we will believe push the technology forward in the best way possible.

    Nathan

  • 16 Mei 2011 19:04
     
     

    Hi Nathan,

    I understand to some extend that you don't want to share all implementation details. But I do think that if you use well defined technical terms to describe what solver foundation is doing it should be correct and clear :) "Symbolic differentation" and "automatic differentation" are clearly defined terms in the community. I am still not clear whether solver foundation is using "automatic differentation" as commonly understood in the math community (like this http://en.wikipedia.org/wiki/Automatic_differentiation) or whether you are using a novel MS technique (that might be close to automatic differentation). I do take it from your response now that Ross' original answer that you are using symbolic differentation is incorrect?

    Let me also point out why I care about this: personally I don't really care, as long as I am reasonably confident that the answer is correct and perf is good enough. But I work in research and want to publish papers. If I want to use solver foundation in that context there is simply a higher standard for transparency. I think in most cases it is fine to not have source code, but a description of what kind of algorithm one uses. If I use solver foundation and can say in my papers that it uses automatic differentation (in the sense that many papers have been published on) and KNITRO (where there are ample papers published on iths methods) that is perfectly ok. If it uses some novel MS algorithm, I need at least to know that that is the case. Ideally I would hope you guys publish a paper on your method that I could reference, because there will be instances where journals will not accept "it is some cool new stuff from MS that no more info is known about" :)

    Also, let me point out again that I think solver foundation is incredibly cool and that I understand perfectly well that you have resource contraints. Take all my comments as feedback from a customer of what he would like to see in the next version of an already very great product.

    Cheers,

    David

  • 16 Mei 2011 19:10
     
     

    You have it basically right. We start with the expression tree and convert it to a bytecode-like form that is easier to analyze. We then apply symbolic differentiation to generate the gradient and hessian, and optimize to evaluate constants, eliminate common subexpressions, and so on. The callbacks passed to the solver then interpret that bytecode. We experimented with runtime code generation but it turns out that interpreting is faster for reasonably sized models.

    Ross


    I looked at this again and am confused about the hessian. Are you really computing the hessian? From digging in a bit more I get the sense that you are actually not handing a callback to KNTIRO that would give back the hessian. The KnitroNativeInterface.HessianOption only has the BFGS, LBFGS and SR1 option implemented, so the options where solver foundation would compute the hessian for KNITRO can't even be called?

    Best,

    David

  • 16 Mei 2011 19:12
    Pemilik
     
     
    You are correct - Solver Foundation does not compute the Hessian, the previous post is in error.
  • 16 Mei 2011 19:22
    Pemilik
     
     

    Forward and reverse accumulation techniques can be applied in the context of symbolic differentiation. The following paper is not the basis of the Solver Foundation implementation, but it is similar:

    http://research.microsoft.com/en-us/um/people/bguenter/docs/symbolicDifferentiation.pdf

    I hope this clears up some of the confusion and helps to explain why both Ross's initial post and my blog post are not self-contradictory. We appreciate the feedback and we will try to provide more documentation about how the differentation stuff works.

    Thanks again - Nathan

  • 20 Mei 2011 18:02
     
     

    Great, that looks very cool, thanks!

    I've got a further question :) When I create a report using solution.GetReport (still using KNITRO), there is a number "Solve Time (ms):" and one "Total Time (ms):". What do these two numbers stand for? What is the difference between the two? Is solve time roughly the time from once solver foundation hands the problem off to KNITRO to once KNTIRO comes back with a solution? Does it includes the time spend in evaluating the objective and gradient? Is the difference between solve time and total time roughly the time spent on analyzing my expresions, creating the data structures for the gradient eval etc?

    Total time and solve time really diverge in my case by now (solve time is about 1/35 of total time), and I am trying to figure out whether it might be worth the hassle in my tight loop scenario to use D* to pre-compile evaluation functions for the objective and gradient and then use the KnitroSolver directly as in the CompactQuasiNewton example, without going to the service layer at all...

    Thanks!

    Best,

    David


  • 05 Juni 2011 22:28
    Pemilik
     
     

    The Solve Time is the elapsed time between the core solver (in this case KNITRO) is invoked and when it finishes. Therefore the Solve Time includes the time calling back into the objective and gradient functions (which are generated by Solver Foundation). The Total Time is the elapsed time for the Solver Foundation Solve call, which includes the time to translate the model into the form required by the solver AND the time to do the differentiation.

    My recommendation would be to see how the solution time differs if you call the KnitroSolver directly. The API is very similar to CompactQuasiNewtonSolver so hopefully it is not too hard to try out.

    If you are able to share the model with us, we can do some additional profiling and help further.

    Best regards, Nathan

  • 24 Juni 2011 0:09
     
     

    Finally found time to continue with this.

    I am now trying to use D* (and later I also want to try AutoDiff) to build my own function and gradient eval stuff and then call KnitroSolver plugin directly.

    I managed to call KnitroSolver directly by taking the CompactQuasiNewtonSolver example from your blog, replacing class names with the corresponding Knitro plugin names and then I also had to change the vids (one solver seems 0 based, the other one 1 based). That all works. What I don't understand is how I can pass my constraints on to Knitro via the direct interface. Any help with that would be much appreciated.

    Also, I'd be happy to share code via email. Just drop me a mail to my username on this board at hotmail.com.

    Best,

    David

  • 24 Juni 2011 18:48
    Pemilik
     
     

    Hi David,

     

    The basic idea is to create rows for the constraints, set the bounds on them, and then change the FunctionEvaluator and GradientEvaluator to check to see which row the value/gradient is being requested for. Here is a short example:

     

        static void ConstrainedNonlinear() {

          INonlinearSolver solver = // blah blah blah
     

          // Add variables x and y.

          int x, y;

          solver.AddVariable("x", out x);

          solver.AddVariable("y", out y);

     

          // Add a goal row: 3 x + 4 y

          int g;

          solver.AddRow("g", out g);

          solver.AddGoal(g, 0, true);

     

          // Add a constraint row: x^2 + y^2 == 1

          int c;

          solver.AddRow("c", out c);

          solver.SetBounds(c, 1, 1);

     

          solver.FunctionEvaluator = (model, vid, values, isNew) =>

          {

            if (vid == g) {

              return 3 * values[x] + 4 * values[y];

            }

            else {

              return values[x] * values[x] + values[y] * values[y];

            }

          };

          // set the gradient evaluator...

        }

     

     

    Hope this helps, Nathan
  • 18 Nopember 2011 20:45
     
     

    Hi,

     

    Maybe this should be in a separate thread, but I cant find any information of how to setup Knitro connection for Solver Foundation. How did you set it up?

     

    I'm having a test problem I want to see the response time of Knitro before jumping into separable programming with head and feet ...

     

    Rgds, Haavard

  • 21 Nopember 2011 14:56
     
     
    1. The Knitro plug-in dll is included with the Solver Foundation install.
    2. You will need to obtain the Knitro solver separately.
    3. Configure your application (.Net or Excel) to tell Solver Foundation to use Knitro. There are some articles on how to do this on the MSF blog: http://blogs.msdn.com/b/solverfoundation/

    Nate

  • 06 Mei 2012 17:58
     
     
    Can someone send me the Knitro.dll file please...
  • 08 Mei 2012 19:01
     
     
    You need to get that from Ziena. That is the Knitro solver - not included with MSF.