{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solving the Rosenbrock Problem\n", "\n", "## Introduction\n", "In this notebook we want to minimize the \"Rosenbrock fuction\", regarding [Wikipedia](https://en.wikipedia.org/wiki/Rosenbrock_function), the general form is given by\n", "\\begin{align*}\n", "f : \\mathbb{R}^2 &\\to \\mathbb{R}\\\\\n", "(x, y) &\\mapsto (a-x)^2 + b(y-x^2)^2\n", "\\end{align*}\n", "for $a,b \\in \\mathbb{R}$. This function has a global minimum at $x^* = (a, a^2)$ with $f(x^*) = 0$. In our case we\n", "set $a=1$ and $b=100$, so the global minimum is in $(1,1)$. The function looks like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "# import rosenbrock function and its derivatives\n", "from oppy.tests.costfunctions import rosenbrock, gradient_rosenbrock, hessian_rosenbrock\n", "# import additional plotting modules\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import cm\n", "# ==================\n", "# surface plot\n", "# ==================\n", "xDef = np.linspace(-2, 2, 100)\n", "yDef = np.linspace(-0.5, 3, 100)\n", "X,Y = np.meshgrid(xDef, yDef)\n", "\n", "rosenbrock_vec = lambda x,y: rosenbrock(np.array([x,y]))\n", "Z = rosenbrock_vec(X,Y)\n", "\n", "fig = plt.figure(1)\n", "ax = fig.add_subplot(111, projection='3d')\n", "surf = ax.plot_surface(X,Y,Z,cmap=cm.gist_rainbow)\n", "plt.title(\"Rosenbrock function: $a=1, b=100$\")\n", "plt.show()\n", "\n", "# ==================\n", "# contour plot\n", "# ==================\n", "xDef = np.linspace(-0.5, 2, 100)\n", "yDef = np.linspace(-0.5, 2, 100)\n", "X,Y = np.meshgrid(xDef, yDef)\n", "\n", "rosenbrock_vec = lambda x,y: rosenbrock(np.array([x,y]))\n", "Z = rosenbrock_vec(X,Y)\n", "\n", "plt.figure(2)\n", "cp = plt.contour(X,Y,Z,np.logspace(-0.5,3.5,7,base=10),cmap=cm.coolwarm)\n", "plt.clabel(cp, inline=True, fontsize=12)\n", "plt.plot(1.0,1.0, \"rx\", label=\"x* = (1,1)\")\n", "plt.title(\"Rosenbrock function: $a=1, b=100$\")\n", "plt.legend(loc=\"upper right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function is pretty interesting to test our optimization algorithms, since the global minimum is inside a long, \n", "narrow, parabolic shaped flat valley. Most of the minimization methods can easily get inside this valley, but converging to the global minimum is usually pretty hard." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Minimization of the Rosenbrock function via oppy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Descent method using armijo and wolfe line search\n", "First we need to import the missing modules:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import line searches\n", "from oppy.unconOpt import armijo, wolfe\n", "\n", "# import descent method\n", "from oppy.unconOpt import gradmethod\n", "# import options class\n", "from oppy.options import Options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the first step we just want to optimize the Rosenbrock function with the descent method without a lot of specified options to get a first guess for the problem and see how it works." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#starting point:\n", "x0 = np.array([1,-0.5])\n", "\n", "# show intermediate steps while the algorithm is executed\n", "disp = False\n", "# option use=scale is default, so in the gradients method the descent direction is always normalized\n", "\n", "# set options for line search\n", "options_grad = Options(disp=disp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to test our descent method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result_dict = {}\n", "result_dict[\"res_grad\"] = gradmethod(rosenbrock, gradient_rosenbrock, x0, options_grad)\n", "# print result\n", "print(\"current point x = {}\".format(result_dict[\"res_grad\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_grad\"].res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We notice that the result is not good and in this case it is what we expected, due to the complex function. In the next step we want to work with some options for the program to get a better result. We also want to use the armijo line search. Therefore we have to set some initial parameters for the minimization methods:\n", "$x_0, c_1, \\beta, \\max_{\\mathrm{iter}}$. We parameterize the functions with these options using the `oppy.options.Options` class. All of the oppy function take this class as a parameter to specify the options, but the structures and keywords can differ.\n", "\n", "We will try to minimize the Rosenbrock function using gradient method with armijo line search." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "initstep = 1\n", "c1 = 1e-2\n", "# reduction parameter to reduce the stepwidth\n", "beta = 0.5\n", "# maximum number of iterations\n", "nmax_line = 30\n", "# define the options for the linesearch method\n", "options_armijo = Options(initstep=initstep, c1=c1, nmax=nmax_line, beta=beta)\n", "\n", "# define linesearch\n", "def linesearch_armijo(x, d):\n", " t, flag = armijo(rosenbrock, gradient_rosenbrock, x, d, options_armijo)\n", " return t, flag" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a next step we define the options dictonary which is given to respective optimization task." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#starting point:\n", "x0 = np.array([1,-0.5])\n", "\n", "#maximum number of iterations for the descent method:\n", "nmax = 100\n", "\n", "# show intermediate steps while the algorithm is executed\n", "disp = False\n", "# option use=scale is default, so in the gradients method the descent direction is always normalized\n", "\n", "# set options for line search\n", "options_grad_armijo = Options(nmax=nmax, linesearch=linesearch_armijo, disp=disp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to test our descent method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result_dict[\"res_grad_armijo\"] = gradmethod(rosenbrock, gradient_rosenbrock, x0, options_grad_armijo)\n", "# print result\n", "print(\"current point x = {}\".format(result_dict[\"res_grad_armijo\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_grad_armijo\"].res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the result is better than in the first try but at least not good, caused by our theoretical observations from above. Maybe we have to\n", "change our line search method to wolfe line search:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c1 = 1e-4\n", "c2 = 9/10\n", "options_wolfe = Options(initstep=initstep, c1=c1, c2=c2, beta=beta, nmax=nmax_line)\n", "\n", "# set options for wolfe line search\n", "def linesearch_wolfe(x, d):\n", " t, flag = wolfe(rosenbrock, gradient_rosenbrock, x, d, options_wolfe)\n", " return t, flag\n", "\n", "# define options for gradient method using wolfe line search\n", "options_grad_wolfe = Options(nmax=nmax, disp=disp, linesearch=linesearch_wolfe)\n", "\n", "result_dict[\"res_grad_wolfe\"] = gradmethod(rosenbrock, gradient_rosenbrock, x0, options_grad_wolfe)\n", "print(\"current point x = {}\".format(result_dict[\"res_grad_wolfe\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_grad_wolfe\"].res[-1]))\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the error got slightly smaller, the result is still not really promising. Maybe we have to change some \n", "options regarding our line search. We will try it again with wolfe and we will increase the maximum amount of\n", "iterations for the **line search to 200** and for the **algorithm itself to 1000**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "initstep = 1\n", "# reduction parameter to reduce the stepwidth\n", "beta = 0.5\n", "# new maximum number of iterations for line search\n", "nmax_line_new = 200\n", "options_wolfe_new = Options(initstep=initstep, c1=c1, c2=c2, beta=beta, nmax=nmax_line_new)\n", "# set options for wolfe line search\n", "def linesearch_wolfe_new(x, d):\n", " t, flag = wolfe(rosenbrock, gradient_rosenbrock, x, d, options_wolfe_new)\n", " return t, flag\n", "# new maximum amount of iterations for descent method\n", "nmax_new = 1000\n", "\n", "# define options for gradient method using wolfe line search\n", "options_grad_wolfe_new = Options(nmax=nmax, disp=disp, linesearch=linesearch_wolfe_new)\n", "\n", "results_grad_wolfe_new = gradmethod(rosenbrock, gradient_rosenbrock, x0, options_grad_wolfe_new)\n", "print(\"current point x = {}\".format(results_grad_wolfe_new.x))\n", "print(\"norm gradient = {}\".format(results_grad_wolfe_new.res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the result got better, in general we do not want to have that high border for the maximum amount of \n", "iterations, because the algorithm will consume a lot of time. So we will try a more convenient optimization method: \n", "**Newtons method**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Newtons method using armijo and wolfe line search\n", "We have to load again some modules. To compare the error history and thus the convergance rate, we will keep the\n", "main options as tolerance, maximum number of iterations, etc. the same." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from oppy.unconOpt import newton\n", "# define the options using armijo and wolfe linesearch\n", "options_newton_armijo = Options(nmax=nmax, disp=disp, linesearch=linesearch_armijo)\n", "options_newton_wolfe = Options(nmax=nmax, disp=disp, linesearch=linesearch_wolfe)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us see, how Newtons method is performing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result_dict[\"res_newton_armijo\"] = newton(rosenbrock, gradient_rosenbrock, hessian_rosenbrock,\n", " x0, options_newton_armijo)\n", "print(\"Newton with Armijo\")\n", "print(\"current point x = {}\".format(result_dict[\"res_newton_armijo\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_newton_armijo\"].res[-1]))\n", "\n", "result_dict[\"res_newton_wolfe\"] = newton(rosenbrock, gradient_rosenbrock, hessian_rosenbrock,\n", " x0, options_newton_wolfe)\n", "print(\"\\nNewton with Wolfe\")\n", "print(\"current point x = {}\".format(result_dict[\"res_newton_wolfe\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_newton_wolfe\"].res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can clearly see, that newtons method convergeces using both, armijo and wolfe, linesearch methods. The error is\n", "zero and we get our theoretical result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare different optimization methods using the wolfe line search\n", "Now we want to test some other optimization using the Rosenbrock function and compare them to each other. To keep it clear we will only use the wolfe line search and do it without the armijo line search. We will keep the same options as above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from oppy.unconOpt import nonlinear_cg, quasinewton\n", "# options for the fr_pr method\n", "options_fr_wolfe = Options(nmax=nmax, disp=disp, linesearch=linesearch_wolfe)\n", "\n", "# options for the quasinewton method\n", "options_quasi_wolfe = Options(nmax=nmax, linesearch=linesearch_wolfe, disp=disp)\n", "\n", "# Fletcher-Reeves-Polak-Ribiere method\n", "result_dict[\"res_nonlinear_cg_wolfe\"] = nonlinear_cg(rosenbrock, gradient_rosenbrock, x0, options_fr_wolfe)\n", "\n", "# BFGS method\n", "result_dict[\"res_quasi_wolfe\"] = quasinewton(rosenbrock, gradient_rosenbrock, x0,\n", " options_quasi_wolfe)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion: Comparison of convergence rates\n", "\n", "In the cell below all results with the wolfe linesearch are printed into one graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "#==============================================================================\n", "#Plot the norm of the gradient for all schemes:\n", "#------------------\n", "\n", "plt.figure(figsize=(15,10))\n", "\n", "plt.semilogy(result_dict[\"res_grad_wolfe\"].res, label = \"res_grad_wolfe\") \n", "\n", "plt.semilogy(result_dict[\"res_newton_wolfe\"].res, label = \"res_newton_wolfe\") \n", "\n", "plt.semilogy(result_dict[\"res_quasi_wolfe\"].res, label = \"res_quasi_wolfe\") \n", "\n", "plt.semilogy(result_dict[\"res_nonlinear_cg_wolfe\"].res, label = \"res_nonlinear_cg_wolfe\") \n", "\n", "plt.legend(loc=\"lower right\")\n", "plt.title('Convergence results for different methods')\n", "plt.ylabel(\"Norm of the gradient\")\n", "plt.xlabel('Number of iterations')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "419.105px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }