\documentclass[11pt]{article} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{indentfirst} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\real}{\mathbb{R}} \newcommand{\set}[1]{\{\, #1 \,\}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\opor}{\mathbin{\rm or}} \newcommand{\opand}{\mathbin{\rm and}} \numberwithin{equation}{section} \numberwithin{figure}{section} \begin{document} \title{Stat 8054 Homework 1 Solutions} \author{Charles J. Geyer} \maketitle \section{Problem 7.4 in Fletcher} \subsection{Solve Graphically} \begin{figure}[hbp] \begin{center} <>= par(mar = c(3, 3, 1, 1) + 0.1) curve(sqrt((x - 1)^3), from = 1, to = 3, ylim = c(-sqrt(2^3), sqrt(2^3)), xlab = "", ylab = "") curve(- sqrt((x - 1)^3), from = 1, to = 3, add = TRUE) @ \end{center} \caption{Locus of points satisfying $(x - 1)^3 = y^2$.} \label{fig:1a} \end{figure} Figure~\ref{fig:1a} shows the feasible region where $(x - 1)^3 = y^2$. Clearly, the closest point to the origin is where $x = 1$ and $y = 0$. \subsection{Eliminate $\boldsymbol{y}$.} After eliminating $y$ the objective function is $f(x) = x^2 + (x - 1)^3$. The graph of this function is shown in Figure~\ref{fig:1b}. \begin{figure}[htbp] \begin{center} <>= par(mar = c(3, 3, 1, 1) + 0.1) curve(x^2 + (x - 1)^3, from = -1, to = 3, xlab = "", ylab = "") @ \end{center} \caption{Graph of $f(x)$.} \label{fig:1b} \end{figure} This function has no stationary points, but more can be said. The function makes sense for all real $x$, not just for $x \ge 1$ in the original problem. Moreover, it is unbounded below, going to $- \infty$ as $x \to - \infty$. The moral of the story: eliminating variables doesn't eliminate the constraints; one still needs to consider them. Since one must go back to the original constraint $(x - 1)^3 = y^2$ to recover the constraint, we see that elimination doesn't work. \subsection{Eliminate $\boldsymbol{x}$.} After eliminating $x$ the objective function is $f(y) = (\sqrt[3]{y^2} + 1)^2 + y^2$. The graph of this function is shown in Figure~\ref{fig:1c}. \begin{figure}[htbp] \begin{center} <>= par(mar = c(3, 3, 1, 1) + 0.1) curve((x^2 + 1)^(1/3) + x^2, from = -2, to = 2, xlab = "", ylab = "") @ \end{center} \caption{Graph of $f(x)$.} \label{fig:1c} \end{figure} This does have a stationary point at $y = 0$. So this time the method of elimination does find the solution. \section{Problem 7.6 in Fletcher} \subsection{Original Variables} First we solve the problem the right way, using the original variables. I suppose this isn't really part of the problem, since this is a Chapter~7 problem and we need material from Chapter~9 in this section of our solution, but it is worthwhile to have a standard of comparison. \subsubsection{First-Order Conditions} \label{sec:1st-2a} The Lagrangian is $$ f(x_1, x_2) - \lambda_1 x_1 - \lambda_2 x_2, $$ and the Kuhn-Tucker conditions are \begin{subequations} \begin{align} f_i(x) = \lambda_i, & \qquad i = 1, 2 \label{eq:kt1a} \\ x_i \ge 0, & \qquad i = 1, 2 \\ \lambda_i \ge 0, & \qquad i = 1, 2 \\ x_i \lambda_i = 0, & \qquad i = 1, 2 \end{align} \end{subequations} where $f_i(x) = \partial f(x) / \partial x_i$. Eliminating the Lagrange multipliers, this simplifies to \begin{subequations} \begin{align} f_i(x) \ge 0, & \qquad i = 1, 2 \\ x_i \ge 0, & \qquad i = 1, 2 \\ \text{$x_i = 0$ or $f_i(x) = 0$}, & \qquad i = 1, 2 \end{align} \end{subequations} At the point $x = 0$, the only point of interest in this problem, this further simplifies to \begin{equation} f_i(x) \ge 0, \qquad i = 1, 2 \end{equation} \subsubsection{Second-Order Conditions} \label{sec:2nd-2a} Now we consider the second-order conditions, only at the point $x = 0$. From \eqref{eq:kt1a} the Lagrange multipliers are the components of the gradient. So the $i$-th constraint is strongly active (at $x = 0$) if and only if $f_i(x) > 0$. The set of their indices is $$ A_+ = \set{ i : f_i(0) > 0 } $$ Then define $$ G = \set{ s : \text{$s_i = 0$, $i \in A_+$ and $s_i \ge 0$, otherwise} } $$ Then the second-order necessary conditions (Theorem~9.3.1 in Fletcher) are $$ s^T W s \ge 0, \qquad s \in G, $$ where $$ W = \nabla^2 f(0) $$ (since the constraints are linear, there is no difference between second derivatives of the objective function and second derivatives of the Lagrangian). And the second-order sufficient conditions (Theorem~9.3.2 in Fletcher) are $$ s^T W s > 0, \qquad s \in G, \ s \neq 0. $$ \subsection{New Variables} Now we change the problem to finding an unconstrained minimum of the function $$ g(y_1, y_2) = f(y_1^2, y_2^2). $$ \subsubsection{First-Order Conditions} $$ g_i(y_1, y_2) = 2 y_i \cdot f_i(y_1^2, y_2^2). $$ Thus $y = 0$ is always a solution of the first-order necessary conditions for this problem, whether or not $x = 0$ is a solution of the first order conditions in Section~\ref{sec:1st-2a}. Thus $y = 0$ may be a ``spurious solution''. \subsubsection{Second-Order Conditions} $$ g_{i j}(y_1, y_2) = \begin{cases} 2 f_i(y_1^2, y_2^2) + 2 y_i \cdot f_{i j}(y_1^2, y_2^2) & i = j \\ 2 y_i \cdot f_{i j}(y_1^2, y_2^2) & i \neq j \end{cases} $$ hence $$ g_{i j}(0, 0) = \begin{cases} 2 f_i(0, 0) & i = j \\ 0 & i \neq j \end{cases} $$ Since this does not involve second derivatives of the original objective function $f$, these have nothing to do with the second order conditions derived in Section~\ref{sec:2nd-2a}. In fact, the second-order necessary conditions, that $\nabla^2 g(0)$ be positive semidefinite are equivalent to the first-order necessary conditions for the original problem derived in Section~\ref{sec:1st-2a}. The second-order sufficient conditions are $f_i(0) > 0$, $i = 1$, 2. These are rather odd, because we usually don't talk about first-order sufficient conditions, but these are first-order when put in terms of the original problem, and they are sufficient for $x = 0$ to be a local optimum. \section{Problem 9.2 in Fletcher} \begin{figure}[htbp] \begin{center} <>= a <- exp(1) par(mar = c(3, 3, 1, 1) + 0.1) curve(x^2, from = 0, to = 1, xlab = "", ylab = "") lines(c(0, 0, 1), c(0, 1, 1)) foo <- seq(0, 1, 0.1) segments(0 * foo, foo, 1 - foo, 1 - 0 * foo, lty = "dashed") segments(foo, 0 * foo, 1 - 0 * foo, 1 - foo, lty = "dashed") points(1 / 2, 1 / 4, pch = 19) @ \end{center} \caption{Constraint set (solid lines) and contours of the objective function (dashed lines). Bullet shows optimal solution.} \label{fig:3a} \end{figure} Figure~\ref{fig:3a} shows the situation when $a \ge 1$. We see that the optimal solution is somewhere along the curve $x_1^2 = x_2$ where the tangent vector to the curve is parallel to the contours of the objective function. \begin{figure}[htbp] \begin{center} <>= a <- 0.75 par(mar = c(3, 3, 1, 1) + 0.1) curve(x^2, from = 0, to = a, xlab = "", ylab = "", ylim = c(0, 1)) lines(c(0, 0, a, a), c(0, 1, 1, a^2)) foo <- seq(0, 1, 0.1) segments(0 * foo, foo, 1 - foo, 1 - 0 * foo, lty = "dashed") segments(foo, 0 * foo, 1 - 0 * foo, 1 - foo, lty = "dashed") points(1 / 2, 1 / 4, pch = 19) @ \end{center} \caption{Constraint set (solid lines) and contours of the objective function (dashed lines). Bullet shows optimal solution.} \label{fig:3b} \end{figure} Figure~\ref{fig:3b} (page~\pageref{fig:3b}) shows the situation when $a < 1$ but still moderately large. We see that the optimal solution is still somewhere along the curve $x_1^2 = x_2$ where the tangent vector to the curve is parallel to the contours of the objective function. \begin{figure}[htbp] \begin{center} <>= a <- exp(- 1) par(mar = c(3, 3, 1, 1) + 0.1) curve(x^2, from = 0, to = a, xlab = "", ylab = "", ylim = c(0, 1)) lines(c(0, 0, a, a), c(0, 1, 1, a^2)) foo <- seq(0, 1, 0.1) segments(0 * foo, foo, 1 - foo, 1 - 0 * foo, lty = "dashed") segments(foo, 0 * foo, 1 - 0 * foo, 1 - foo, lty = "dashed") points(a, a^2, pch = 19) @ \end{center} \caption{Constraint set (solid lines) and contours of the objective function (dashed lines). Bullet shows optimal solution.} \label{fig:3c} \end{figure} Figure~\ref{fig:3c} (page~\pageref{fig:3c}) shows the situation when $a < 1$ and quite small. We see that the optimal solution is now $(a, a^2)$. The Lagrangian is $$ L(x, \lambda) = - x_1 + x_2 - \lambda_1 x_1 - \lambda_2 (a - x_1) - \lambda_3 x_2 - \lambda_4 (1 - x_2) - \lambda_5 (x_2 - x_1^2) $$ and the Kuhn-Tucker conditions are \begin{enumerate} \item \textbf{gradient.} \begin{align*} -1 - \lambda_1 + \lambda_2 + 2 \lambda_5 x_1 & = 0 \\ 1 - \lambda_3 + \lambda_4 - \lambda_5 & = 0 \end{align*} \item \textbf{primal feasibility.} \begin{gather*} 0 \le x_1 \le a \\ 0 \le x_2 \le 1 \\ x_2 - x_1^2 \ge 0 \end{gather*} \item \textbf{dual feasibility.} $\lambda_i \ge 0$ for all $i$. \item \textbf{complementary slackness.} \begin{align*} \lambda_1 = 0 \opor x_1 = 0 \\ \lambda_2 = 0 \opor x_1 = a \\ \lambda_3 = 0 \opor x_2 = 0 \\ \lambda_4 = 0 \opor x_2 = 1 \\ \lambda_5 = 0 \opor x_2 = x_1^2 \end{align*} \end{enumerate} From the figures, the only active constraint is $x_2 = x_1^2$ in the cases shown in Figures~\ref{fig:3a} and~\ref{fig:3b}, and the only active constraints are $x_2 = x_1^2$ and $x_1 = a$ in the case shown in Figure~\ref{fig:3c}. Thus only $\lambda_5$ is nonzero in the first case and only $\lambda_4$ and $\lambda_4$ in the second case. (Case I: $x_2 = x_1^2$ and $\lambda_i = 0$, $i \neq 5$). From the gradient condition we must have $\lambda_5 = 1$ and $- 1 + 2 x_1 = 0$. Hence $x_1 = 1 / 2$ and $x_2 = 1 / 4$. We go back and put this point on the figures. (Case II: $x_2 = x_1^2$ and $x_1 = a$ and $\lambda_i = 0$, $i < 4$). From the gradient condition becomes \begin{align*} -1 + 2 \lambda_5 x_1 & = 0 \\ 1 + \lambda_4 - \lambda_5 & = 0 \end{align*} We are checking the point $(a, a^2)$. So the first gradient equation becomes $-1 + 2 \lambda_5 a = 0$ or $\lambda_5 = 1 / (2 a)$. Then the second gradient equation becomes $1 + \lambda_4 - 1 / (2 a) = 0$ or $\lambda_4 = 1 / (2 a) - 1$. In order for dual feasibility to hold, we must have $\lambda_4 \ge 0$ or $a \le 1 / 2$. This clearly makes sense (from the pictures) because we clearly switch from Case I to Case II when the point $(a, a^2)$ passes $(1/2, 1/4)$. The check that the rest of the Kuhn-Tucker conditions hold is trivial (they were established by earlier considerations). We go back and put this point on the figure. \section{Problem 9.9 in Fletcher} \subsection{(i)} The Lagrangian is $$ L(x, \lambda) = - x_2 - \lambda_1 \bigl[ (3 - x_1)^3 - (x_2 - 2) \bigr] - \lambda_2 \bigl[ 3 x_1 + x_2 - 9 \bigr] $$ The Kuhn-Tucker conditions are \begin{enumerate} \item \textbf{gradient.} \begin{subequations} \begin{align} \lambda_1 (3 - x_1)^2 - 3 \lambda_2 & = 0 \label{eq:4:grad-a} \\ - 1 + \lambda_1 - \lambda_2 & = 0 \label{eq:4:grad-b} \end{align} \end{subequations} \item \textbf{primal feasibility.} \begin{subequations} \begin{align} (3 - x_1)^3 - (x_2 - 2) & \ge 0 \label{eq:4:primal-a} \\ 3 x_1 + x_2 - 9 & \ge 0 \label{eq:4:primal-b} \end{align} \end{subequations} \item \textbf{dual feasibility.} $\lambda_i \ge 0$ for all $i$. \item \textbf{complementary slackness.} \begin{align*} \lambda_1 = 0 \opor (3 - x_1)^3 - (x_2 - 2) = 0 \\ \lambda_2 = 0 \opor 3 x_1 + x_2 = 9 \end{align*} \end{enumerate} Since there are two constraints, there are four conceivable active sets: $\varnothing$, $\{ 1 \}$, $\{ 2 \}$, and $\{ 1, 2 \}$. (Case I: neither constraint is active). Then complimentary slackness implies $\lambda_1 = \lambda_2 = 0$. But then \eqref{eq:4:grad-b} cannot hold, so this case is impossible. (Case II: only the first constraint is active). Then complimentary slackness implies $\lambda_2 = 0$. Then \eqref{eq:4:grad-b} implies $\lambda_1 = 1$. Then \eqref{eq:4:grad-a} implies $x_1 = 3$. Now since \eqref{eq:4:primal-a} holds with equality, we have $x_2 = 2$. This gives one putative solution (satisfies first order necessary condition) $(3, 2)$. (Case III: only the second constraint is active). Then complimentary slackness implies $\lambda_1 = 0$. Then \eqref{eq:4:grad-b} implies $\lambda_2 = -1$. But that violates dual feasibility, so this case is also impossible. (Case III: both constraints are active). Then $x_2 = 9 - 3 x_1$, and $$ (3 - x_1)^3 - (x_2 - 2) = (3 - x_1)^3 - (9 - 3 x_1 - 2) = 0 $$ We have a cubic equation to solve, which doesn't look easy. \begin{figure}[htbp] \begin{center} <>= par(mar = c(3, 3, 1, 1) + 0.1) curve((3 - x)^3 - (9 - 3 * x - 2), from = -1, to = 6, xlab = "", ylab = "") abline(h = 0) @ \end{center} \caption{Graph of the function $x \mapsto (3 - x)^3 - (9 - 3 x - 2)$.} \label{fig:4i} \end{figure} Figure~\ref{fig:4i} (page~\pageref{fig:4i}) shows the graph of the this cubic. It seems to have roots at 2 and 5. Hence $x_1 = 2$, or $x_1 = 5$ and $x_2$ is given by $x_2 = 9 - 3 x_1$. This gives two more putative solutions $(2, 3)$ and $(5, -6)$. \subsection{(ii)} \begin{figure}[htbp] \begin{center} <>= # (3 - x_1)^3 - (x_2 - 2) = 0 # 3 x_1 + x_2 = 9 par(mar = c(3, 3, 1, 1) + 0.1) curve((3 - x)^3 +2, from = 0, to = 5, xlab = "", ylab = "") curve((3 - x)^3 +2, from = -1, to = 5, add = TRUE) curve(9 - 3 * x, from = -1, to = 5, add = TRUE) x <- c(2, 5) points(x, 9 - 3 * x, pch = 19) points(3, 2, pch = 19) abline(h = seq(-9, 29, 2), lty = "dashed") @ \end{center} \caption{Constraint set (solid lines) and contours of the objective function (dashed lines). Constraint set is unbounded to the left, the boundaries continuing to infinity. Bullets show putative optimal solutions.} \label{fig:4ii} \end{figure} Figure~\ref{fig:4ii} shows the feasible region and our putative solutions (which satisfy the first order necessary conditions). Clearly, none of the putative solutions is even locally optimal. In fact the argmin is empty and the supremum of the objective function over the feasible region is $+ \infty$. \subsection{(iii)} Now the Kuhn-Tucker conditions are even more complicated. The Lagrangian is $$ L(x, \lambda) = - x_2 - \lambda_1 \bigl[ (3 - x_1)^3 - (x_2 - 2) \bigr] - \lambda_2 \bigl[ 3 x_1 + x_2 - 9 \bigr] - \lambda_3 \bigl[ 2 x_1 - 3 x_2 \bigr] $$ The Kuhn-Tucker conditions are \begin{enumerate} \item \textbf{gradient.} \begin{subequations} \begin{align} \lambda_1 (3 - x_1)^2 - 3 \lambda_2 - 2 \lambda_3 & = 0 \label{eq:4iii:grad-a} \\ - 1 + \lambda_1 - \lambda_2 + 3 \lambda_3 & = 0 \label{eq:4iii:grad-b} \end{align} \end{subequations} \item \textbf{primal feasibility.} \begin{subequations} \begin{align} (3 - x_1)^3 - (x_2 - 2) & \ge 0 \label{eq:4iii:primal-a} \\ 3 x_1 + x_2 - 9 & \ge 0 \label{eq:4iii:primal-b} \\ 2 x_1 - 3 x_2 & \ge 0 \label{eq:4iii:primal-c} \end{align} \end{subequations} \item \textbf{dual feasibility.} $\lambda_i \ge 0$ for all $i$. \item \textbf{complementary slackness.} \begin{align*} \lambda_1 = 0 \opor (3 - x_1)^3 - (x_2 - 2) = 0 \\ \lambda_2 = 0 \opor 3 x_1 + x_2 = 9 \\ \lambda_3 = 0 \opor 2 x_1 - 3 x_2 = 0 \end{align*} \end{enumerate} Now there are 8 cases to consider for active sets. (Case I: empty active set). Again \eqref{eq:4iii:grad-b} cannot be satisfied when $\lambda_1 = \lambda_2 = \lambda_3 = 0$. So this case is impossible. (Case II: only \eqref{eq:4iii:primal-a} is active). By the same analysis as before, $(3, 2)$ is a solution of the Kuhn-Tucker equations if it satisfies \eqref{eq:4iii:primal-c}. It does. (Case III: only \eqref{eq:4iii:primal-b} is active). By the same analysis as before, this case is impossible. (Case IV: only \eqref{eq:4iii:primal-a} and \eqref{eq:4iii:primal-b} are active). By the same analysis as before, $(2, 3)$ and $(5, -6)$ are solutions of the Kuhn-Tucker equations if they satisfy \eqref{eq:4iii:primal-c}. Only $(5, -6)$ does. (Case V: only \eqref{eq:4iii:primal-c} is active). Then $\lambda_1 = \lambda_2 = 0$. By \eqref{eq:4iii:grad-b} $\lambda_3 = 1 / 3$. But now \eqref{eq:4iii:grad-a} cannot be satisfied. So this case is impossible. (Case VI: only \eqref{eq:4iii:primal-a} and \eqref{eq:4iii:primal-c} are active). Then $x_2 = 2 x_1 / 3$, and $$ (3 - x_1)^3 - (x_2 - 2) = (3 - x_1)^3 - (2 x_1 / 3 - 2) = 0 $$ We have a cubic equation to solve, which doesn't look easy. But it probably has been arranged by Fletcher to have a nice solution. Throw it a Mathematica. The only real root is $x_1 = 3$. Again we get $(3, 2)$ as a solution of the Kuhn-Tucker conditions if it is primal feasible, which it is. (Case VII: only \eqref{eq:4iii:primal-b} and \eqref{eq:4iii:primal-c} are active). Then $x_2 = 2 x_1 / 3$, and $$ 3 x_1 + x_2 - 9 = 3 x_1 + 2 x_1 / 3 - 9 = 0 $$ Hence $x_1 = 27 / 11$ and $x_2 = 18 / 11$ is a solution of the Kuhn-Tucker conditions if it satisfies \eqref{eq:4iii:primal-a}, which it does and the gradient equations, which we now check. We have $\lambda_1 = 0$, so we have to solve the linear equations \begin{align*} - 3 \lambda_2 - 2 \lambda_3 & = 0 \\ - 1 - \lambda_2 + 3 \lambda_3 & = 0 \end{align*} From the latter $\lambda_2 = 3 \lambda_3 - 1$ so the former becomes $$ - 3 (3 \lambda_3 - 1) - 2 \lambda_3 = - 11 \lambda_3 + 3 = 0 $$ or $\lambda_3 = 3 / 11$ and $\lambda_2 = 9 / 11 - 1 < 0$. So this does not satisfy dual feasibility and is not a putative solution. (Case VIII: all three constraints are active). With three equations in two unknowns are probably inconsistent. They are. (Summary). $(3, 2)$ and $(5, -6)$ are solutions of the Kuhn-Tucker conditions. \begin{figure}[htbp] \begin{center} <>= # (3 - x_1)^3 - (x_2 - 2) >= 0 # 3 x_1 + x_2 >= 9 # 2 x_1 - 3 x_2 >= 0 par(mar = c(3, 3, 1, 1) + 0.1) plot(c(3, 27 / 11, 5), c(2, 18 / 11, -6), type = "l", xlab = "", ylab = "") curve((3 - x)^3 +2, from = 3, to = 5, add = TRUE) points(3, 2, pch = 19) points(5, -6, pch = 19) abline(h = seq(-6, 2, 0.5), lty = "dashed") @ \end{center} \caption{Constraint set (solid lines) and contours of the objective function (dashed lines). Constraint set is unbounded to the left, the boundaries continuing to infinity. Bullets show solutions of the Kuhn-Tucker equations.} \label{fig:4iii} \end{figure} Figure~\ref{fig:4iii} (page~\pageref{fig:4iii}) shows the feasible region and solutions of the Kuhn-Tucker equations. Clearly, the one at the top is optimal $(3, 2)$ and the one at the bottom is not. \section{OLS with Nonnegative Parameters} \subsection{Problem Statement} For the problem \begin{align*} \text{minimize} & \quad \beta \mapsto \norm{y - M \beta}^2 \\ \text{subject to} & \quad \beta \ge 0 \end{align*} where $y$ is a known vector and $M$ is a known matrix and where the constraint is interpreted coordinatewise ($\beta_i \ge 0$ for all $i$), show that there is a unique global minimizer and that it can be found using the method of Lagrange multipliers (Kuhn-Tucker conditions). Cite any theorems used. \subsection{Solution} From the analysis in Section~\ref{sec:1st-2a} we know that the Kuhn-Tucker conditions are \begin{subequations} \begin{align} f_i(\beta) \ge 0, & \qquad \forall i \\ \beta_i \ge 0, & \qquad \forall i \\ \text{$\beta_i = 0$ or $f_i(\beta) = 0$}, & \qquad \forall i \end{align} \end{subequations} where $$ f(\beta) = \norm{y - M \beta}^2 $$ hence $$ \nabla f(\beta) = - 2 M^T (y - M \beta) $$ Hence if we define the gradient vector $$ g = - 2 M^T (y - M \beta) $$ the Kuhn-Tucker conditions are $$ (\forall i) (g_i \ge 0 \opand \beta_i \ge 0 \opand g_i \beta_i = 0) $$ For global optimality of the solution of the Kuhn-Tucker condition we use the first theorem I proved in class (not in Fletcher). Suppose $M^T M$ is nonsingular. Then the Lagrangian is a strictly convex quadratic function for any values of the Lagrange multipliers. Hence it has a unique global minimizer. Hence any solution of the Kuhn-Tucker conditions is unique and the global minimizer of the constrained least squares problem. By Lemma~9.2.2 and the Corollary to Lemma~9.2.4 in Fletcher, Lagrange multipliers always exist when all constraints are linear, so the solution can be found by solving the Kuhn-Tucker equations. If $M^T M$ is singular, then the solution need not be unique just like in the unconstrained OLS problem. \section{OLS with Nonnegative Parameters: Dumb} \subsection{Problem Statement} For the same problem as in 5, find an example (a $y$ and $M$) for which the ``dumb method'' of constrained optimization --- solve the unconstrained problem and set any negative betas in the unconstrained solution to zero --- does not work. Oops! That isn't what I meant to call the dumb method. That method is \emph{really dumb} and only works with orthogonal predictors. What I meant is guess the active set is the one with the negative Lagrange multipliers and set those coefficients to zero. Refit the model with those coefficients set to zero, and don't check Lagrange multipliers. Of course this method will work if the solution just happens to satisfy primal and dual feasibility, but won't work in general. \subsection{Solution} If the problem is one-dimensional (one parameter) or if the predictors are orthogonal so $M^T M$ is diagonal, then the dumb method works. Thus we try a problem with no intercept and two correlated predictors. The problem is which way we want the correlation to go. Have to try both, I suppose. Let's try highly positively correlated first. <>= n <- 30 rho <- 0.9 set.seed(42) x1 <- rnorm(n) x2 <- rho * x1 + sqrt(1 - rho^2) * rnorm(n) y <- rnorm(n) out12 <- lm(y ~ x1 + x2 + 0, x = TRUE) coefficients(out12) out1 <- lm(y ~ x1 + 0) coefficients(out1) g <- (- 2 * t(m) %*% y) @ So the dumb method doesn't work. But that still isn't what I really wanted. \subsection{Problem Restatement} Consider the following algorithm. \begin{enumerate} \item Set $A = \varnothing$. \item Solve the OLS problem subject to $\beta_i = 0$, $i \in A$. Denote the solution $\hat{\beta}$. \item If $\hat{\beta}_i \ge 0$, $\forall i$, stop. This is the what this ``dumb method'' calls the solution. \item Otherwise set $A = A \cup \set{ i : \hat{\beta}_i < 0 }$ and go to 2. \end{enumerate} Note that $A$ is monotonely increasing. We never take an index out of the putative active set in this ``dumb method''. Note also that the putative solution of this dumb method does solve the gradient and primal feasibility parts of the Kuhn-Tucker conditions. But if we calculate Lagrange multipliers satisfying complementary slackness, they need not satisfy dual feasibility. So the restatement of this problem is to find an example where this ``dumb method'' does not find the solution of the Kuhn-Tucker conditions. \end{document}