\documentclass[11pt]{article}

\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{indentfirst}

\newcommand{\norm}[1]{\lVert #1 \rVert}

\newcommand{\real}{\mathbb{R}}

\newcommand{\set}[1]{\{\, #1 \,\}}
\newcommand{\inner}[1]{\langle #1 \rangle}

\newcommand{\opor}{\mathbin{\rm or}}
\newcommand{\opand}{\mathbin{\rm and}}

\numberwithin{equation}{section}
\numberwithin{figure}{section}

\begin{document}

\title{Stat 8054 Homework 1 Solutions}
\author{Charles J. Geyer}
\maketitle

\section{Problem 7.4 in Fletcher}

\subsection{Solve Graphically}

\begin{figure}[hbp]
\begin{center}
<<label=fig1a,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
par(mar = c(3, 3, 1, 1) + 0.1)
curve(sqrt((x - 1)^3), from = 1, to = 3, ylim = c(-sqrt(2^3), sqrt(2^3)),
    xlab = "", ylab = "")
curve(- sqrt((x - 1)^3), from = 1, to = 3, add = TRUE)
@
\end{center}
\caption{Locus of points satisfying $(x - 1)^3 = y^2$.}
\label{fig:1a}
\end{figure}
Figure~\ref{fig:1a} shows the feasible region where $(x - 1)^3 = y^2$.
Clearly, the closest point to the origin is where $x = 1$ and $y = 0$.

\subsection{Eliminate $\boldsymbol{y}$.}

After eliminating $y$ the objective function
is $f(x) = x^2 + (x - 1)^3$.  The graph of this function is
shown in Figure~\ref{fig:1b}.
\begin{figure}[htbp]
\begin{center}
<<label=fig1b,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
par(mar = c(3, 3, 1, 1) + 0.1)
curve(x^2 + (x - 1)^3, from = -1, to = 3, xlab = "", ylab = "")
@
\end{center}
\caption{Graph of $f(x)$.}
\label{fig:1b}
\end{figure}

This function has no stationary points,
but more can be said.  The function makes sense for all real $x$, not just
for $x \ge 1$ in the original problem.  Moreover, it is unbounded below,
going to $- \infty$ as $x \to - \infty$.  The moral of the story: eliminating
variables doesn't eliminate the constraints; one still needs to consider them.

Since one must go back to the original constraint $(x - 1)^3 = y^2$
to recover the constraint, we see that elimination doesn't work.

\subsection{Eliminate $\boldsymbol{x}$.}

After eliminating $x$ the objective function
is $f(y) = (\sqrt[3]{y^2} + 1)^2 + y^2$.
The graph of this function is
shown in Figure~\ref{fig:1c}.
\begin{figure}[htbp]
\begin{center}
<<label=fig1c,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
par(mar = c(3, 3, 1, 1) + 0.1)
curve((x^2 + 1)^(1/3) + x^2, from = -2, to = 2, xlab = "", ylab = "")
@
\end{center}
\caption{Graph of $f(x)$.}
\label{fig:1c}
\end{figure}

This does have a stationary point at $y = 0$.  So this time the
method of elimination does find the solution.

\section{Problem 7.6 in Fletcher}

\subsection{Original Variables}

First we solve the problem the right way, using the original variables.
I suppose this isn't really part of the problem, since this is a Chapter~7
problem and we need material from Chapter~9 in this section of
our solution, but it is worthwhile to have a standard of comparison.

\subsubsection{First-Order Conditions} \label{sec:1st-2a}

The Lagrangian is
$$
   f(x_1, x_2) - \lambda_1 x_1 - \lambda_2 x_2,
$$
and the Kuhn-Tucker conditions are
\begin{subequations}
\begin{align}
f_i(x) = \lambda_i, & \qquad i = 1, 2
\label{eq:kt1a}
\\
x_i \ge 0, & \qquad i = 1, 2
\\
\lambda_i \ge 0, & \qquad i = 1, 2
\\
x_i \lambda_i = 0, & \qquad i = 1, 2
\end{align}
\end{subequations}
where $f_i(x) = \partial f(x) / \partial x_i$.

Eliminating the Lagrange multipliers, this simplifies to
\begin{subequations}
\begin{align}
f_i(x) \ge 0, & \qquad i = 1, 2
\\
x_i \ge 0, & \qquad i = 1, 2
\\
\text{$x_i = 0$ or $f_i(x) = 0$}, & \qquad i = 1, 2
\end{align}
\end{subequations}

At the point $x = 0$, the only point of interest in this problem,
this further simplifies to
\begin{equation}
f_i(x) \ge 0, \qquad i = 1, 2
\end{equation}

\subsubsection{Second-Order Conditions} \label{sec:2nd-2a}

Now we consider the second-order conditions, only at the point $x = 0$.
From \eqref{eq:kt1a} the Lagrange multipliers
are the components of the gradient.  So the $i$-th constraint is strongly
active (at $x = 0$) if and only if $f_i(x) > 0$.  The set of their indices
is
$$
   A_+ = \set{ i : f_i(0) > 0 }
$$
Then define
$$
   G = \set{ s : \text{$s_i = 0$, $i \in A_+$ and $s_i \ge 0$, otherwise} }
$$
Then the second-order necessary conditions (Theorem~9.3.1 in Fletcher)
are
$$
   s^T W s \ge 0, \qquad s \in G,
$$
where
$$
   W = \nabla^2 f(0)
$$
(since the constraints are linear, there is no difference between
second derivatives of the objective function and second derivatives of
the Lagrangian).

And the second-order sufficient conditions (Theorem~9.3.2 in Fletcher)
are
$$
   s^T W s > 0, \qquad s \in G, \ s \neq 0.
$$

\subsection{New Variables}

Now we change the problem to finding an unconstrained minimum of the
function
$$
   g(y_1, y_2) = f(y_1^2, y_2^2).
$$

\subsubsection{First-Order Conditions}

$$
   g_i(y_1, y_2) = 2 y_i \cdot f_i(y_1^2, y_2^2).
$$

Thus $y = 0$ is always a solution of the first-order necessary conditions
for this problem, whether or not $x = 0$ is a solution of the first
order conditions in Section~\ref{sec:1st-2a}.

Thus $y = 0$ may be a ``spurious solution''.

\subsubsection{Second-Order Conditions}

$$
   g_{i j}(y_1, y_2)
   =
   \begin{cases}
   2 f_i(y_1^2, y_2^2) + 2 y_i \cdot f_{i j}(y_1^2, y_2^2) & i = j
   \\
   2 y_i \cdot f_{i j}(y_1^2, y_2^2) & i \neq j
   \end{cases}
$$
hence
$$
   g_{i j}(0, 0)
   =
   \begin{cases}
   2 f_i(0, 0) & i = j
   \\
   0 & i \neq j
   \end{cases}
$$
Since this does not involve second derivatives of the original objective
function $f$, these have nothing to do with the second order conditions
derived in Section~\ref{sec:2nd-2a}.

In fact, the second-order necessary conditions, that $\nabla^2 g(0)$ be
positive semidefinite are equivalent to the first-order necessary conditions
for the original problem derived in Section~\ref{sec:1st-2a}.

The second-order sufficient conditions are $f_i(0) > 0$, $i = 1$, 2.
These are rather odd, because we usually don't talk about first-order
sufficient conditions, but these are first-order when put in terms of the
original problem, and they are sufficient for $x = 0$ to be a local optimum.

\section{Problem 9.2 in Fletcher}

\begin{figure}[htbp]
\begin{center}
<<label=fig3a,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
a <- exp(1)
par(mar = c(3, 3, 1, 1) + 0.1)
curve(x^2, from = 0, to = 1, xlab = "", ylab = "")
lines(c(0, 0, 1), c(0, 1, 1))
foo <- seq(0, 1, 0.1)
segments(0 * foo, foo, 1 - foo, 1 - 0 * foo, lty = "dashed")
segments(foo, 0 * foo, 1 - 0 * foo, 1 - foo, lty = "dashed")
points(1 / 2, 1 / 4, pch = 19)
@
\end{center}
\caption{Constraint set (solid lines) and contours of the objective function
(dashed lines).  Bullet shows optimal solution.}
\label{fig:3a}
\end{figure}
Figure~\ref{fig:3a} shows the situation when $a \ge 1$.
We see that the optimal solution is somewhere
along the curve $x_1^2 = x_2$ where the tangent vector to the curve is
parallel to the contours of the objective function.

\begin{figure}[htbp]
\begin{center}
<<label=fig3b,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
a <- 0.75
par(mar = c(3, 3, 1, 1) + 0.1)
curve(x^2, from = 0, to = a, xlab = "", ylab = "", ylim = c(0, 1))
lines(c(0, 0, a, a), c(0, 1, 1, a^2))
foo <- seq(0, 1, 0.1)
segments(0 * foo, foo, 1 - foo, 1 - 0 * foo, lty = "dashed")
segments(foo, 0 * foo, 1 - 0 * foo, 1 - foo, lty = "dashed")
points(1 / 2, 1 / 4, pch = 19)
@
\end{center}
\caption{Constraint set (solid lines) and contours of the objective function
(dashed lines).  Bullet shows optimal solution.}
\label{fig:3b}
\end{figure}
Figure~\ref{fig:3b} (page~\pageref{fig:3b})
shows the situation when $a < 1$ but still moderately
large.
We see that the optimal solution is still somewhere
along the curve $x_1^2 = x_2$ where the tangent vector to the curve is
parallel to the contours of the objective function.

\begin{figure}[htbp]
\begin{center}
<<label=fig3c,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
a <- exp(- 1)
par(mar = c(3, 3, 1, 1) + 0.1)
curve(x^2, from = 0, to = a, xlab = "", ylab = "", ylim = c(0, 1))
lines(c(0, 0, a, a), c(0, 1, 1, a^2))
foo <- seq(0, 1, 0.1)
segments(0 * foo, foo, 1 - foo, 1 - 0 * foo, lty = "dashed")
segments(foo, 0 * foo, 1 - 0 * foo, 1 - foo, lty = "dashed")
points(a, a^2, pch = 19)
@
\end{center}
\caption{Constraint set (solid lines) and contours of the objective function
(dashed lines).  Bullet shows optimal solution.}
\label{fig:3c}
\end{figure}
Figure~\ref{fig:3c} (page~\pageref{fig:3c})
shows the situation when $a < 1$ and quite small.
We see that the optimal solution is now $(a, a^2)$.

The Lagrangian is
$$
   L(x, \lambda) = - x_1 + x_2 - \lambda_1 x_1 - \lambda_2 (a - x_1)
   - \lambda_3 x_2 - \lambda_4 (1 - x_2) - \lambda_5 (x_2 - x_1^2)
$$
and the Kuhn-Tucker conditions are
\begin{enumerate}
\item \textbf{gradient.}
\begin{align*}
 -1 - \lambda_1 + \lambda_2 + 2 \lambda_5 x_1 & = 0
  \\
  1 - \lambda_3 + \lambda_4 - \lambda_5 & = 0
\end{align*}
\item \textbf{primal feasibility.}
\begin{gather*}
  0 \le x_1 \le a
  \\
  0 \le x_2 \le 1
  \\
  x_2 - x_1^2 \ge 0
\end{gather*}
\item \textbf{dual feasibility.} $\lambda_i \ge 0$ for all $i$.
\item \textbf{complementary slackness.}
\begin{align*}
  \lambda_1 = 0 \opor x_1 = 0
  \\
  \lambda_2 = 0 \opor x_1 = a
  \\
  \lambda_3 = 0 \opor x_2 = 0
  \\
  \lambda_4 = 0 \opor x_2 = 1
  \\
  \lambda_5 = 0 \opor x_2 = x_1^2
\end{align*}
\end{enumerate}

From the figures, the only active constraint is $x_2 = x_1^2$ in
the cases shown in Figures~\ref{fig:3a} and~\ref{fig:3b},
and the only active constraints are $x_2 = x_1^2$ and $x_1 = a$
in the case shown in Figure~\ref{fig:3c}.

Thus only $\lambda_5$ is nonzero in the first case and only $\lambda_4$
and $\lambda_4$ in the second case.

(Case I: $x_2 = x_1^2$ and $\lambda_i = 0$, $i \neq 5$).
From the gradient condition we must have $\lambda_5 = 1$ and $- 1 + 2 x_1 = 0$.
Hence $x_1 = 1 / 2$ and $x_2 = 1 / 4$.

We go back and put this point on the figures.

(Case II: $x_2 = x_1^2$ and $x_1 = a$ and $\lambda_i = 0$, $i < 4$).
From the gradient condition becomes
\begin{align*}
   -1 + 2 \lambda_5 x_1 & = 0
   \\
   1 + \lambda_4 - \lambda_5 & = 0
\end{align*}
We are checking the point $(a, a^2)$.  So the first gradient equation becomes
$-1 + 2 \lambda_5 a = 0$ or $\lambda_5 = 1 / (2 a)$.  Then the second gradient
equation becomes $1 + \lambda_4 - 1 / (2 a) = 0$ or $\lambda_4 = 1 / (2 a) - 1$.

In order for dual feasibility to hold, we must have $\lambda_4 \ge 0$
or $a \le 1 / 2$.
This clearly makes sense (from the pictures) because we clearly switch
from Case I to Case II when the point $(a, a^2)$ passes $(1/2, 1/4)$.

The check that the rest of the Kuhn-Tucker conditions hold is trivial
(they were established by earlier considerations).

We go back and put this point on the figure.

\section{Problem 9.9 in Fletcher}

\subsection{(i)}

The Lagrangian is
$$
   L(x, \lambda) = - x_2 - \lambda_1 \bigl[ (3 - x_1)^3 - (x_2 - 2) \bigr]
   - \lambda_2 \bigl[ 3 x_1 + x_2 - 9 \bigr]
$$
The Kuhn-Tucker conditions are
\begin{enumerate}
\item \textbf{gradient.}
\begin{subequations}
\begin{align}
  \lambda_1 (3 - x_1)^2 - 3 \lambda_2 & = 0
  \label{eq:4:grad-a}
  \\
  - 1 + \lambda_1 - \lambda_2 & = 0
  \label{eq:4:grad-b}
\end{align}
\end{subequations}
\item \textbf{primal feasibility.}
\begin{subequations}
\begin{align}
  (3 - x_1)^3 - (x_2 - 2) & \ge 0
  \label{eq:4:primal-a}
  \\
  3 x_1 + x_2 - 9 & \ge 0
  \label{eq:4:primal-b}
\end{align}
\end{subequations}
\item \textbf{dual feasibility.} $\lambda_i \ge 0$ for all $i$.
\item \textbf{complementary slackness.}
\begin{align*}
  \lambda_1 = 0 \opor (3 - x_1)^3 - (x_2 - 2) = 0
  \\
  \lambda_2 = 0 \opor 3 x_1 + x_2 = 9
\end{align*}
\end{enumerate}

Since there are two constraints, there are four conceivable active sets:
$\varnothing$, $\{ 1 \}$, $\{ 2 \}$, and $\{ 1, 2 \}$.

(Case I: neither constraint is active).  Then complimentary slackness
implies $\lambda_1 = \lambda_2 = 0$.  But then \eqref{eq:4:grad-b} cannot
hold, so this case is impossible.

(Case II: only the first constraint is active).  Then complimentary slackness
implies $\lambda_2 = 0$.  Then \eqref{eq:4:grad-b} implies $\lambda_1 = 1$.
Then \eqref{eq:4:grad-a} implies $x_1 = 3$.  Now since \eqref{eq:4:primal-a}
holds with equality, we have $x_2 = 2$.
This gives one putative solution (satisfies first order necessary condition)
$(3, 2)$.

(Case III: only the second constraint is active).  Then complimentary slackness
implies $\lambda_1 = 0$.  Then \eqref{eq:4:grad-b} implies $\lambda_2 = -1$.
But that violates dual feasibility, so this case is also impossible.

(Case III: both constraints are active).  Then $x_2 = 9 - 3 x_1$, and
$$
  (3 - x_1)^3 - (x_2 - 2)
  =
  (3 - x_1)^3 - (9 - 3 x_1 - 2)
  =
  0
$$
We have a cubic equation to solve, which doesn't look easy.

\begin{figure}[htbp]
\begin{center}
<<label=fig4i,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
par(mar = c(3, 3, 1, 1) + 0.1)
curve((3 - x)^3 - (9 - 3 * x - 2), from = -1, to = 6, xlab = "", ylab = "")
abline(h = 0)
@
\end{center}
\caption{Graph of the function $x \mapsto (3 - x)^3 - (9 - 3 x - 2)$.}
\label{fig:4i}
\end{figure}
Figure~\ref{fig:4i} (page~\pageref{fig:4i})
shows the graph of the this cubic.  It seems to have roots at 2 and 5.
Hence $x_1 = 2$, or $x_1 = 5$ and $x_2$ is given by $x_2 = 9 - 3 x_1$.

This gives two more putative solutions $(2, 3)$ and $(5, -6)$.

\subsection{(ii)}

\begin{figure}[htbp]
\begin{center}
<<label=fig4ii,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
#  (3 - x_1)^3 - (x_2 - 2) = 0
#  3 x_1 + x_2 = 9
par(mar = c(3, 3, 1, 1) + 0.1)
curve((3 - x)^3 +2, from = 0, to = 5, xlab = "", ylab = "")
curve((3 - x)^3 +2, from = -1, to = 5, add = TRUE)
curve(9 - 3 * x, from = -1, to = 5, add = TRUE)
x <- c(2, 5)
points(x, 9 - 3 * x, pch = 19)
points(3, 2, pch = 19)
abline(h = seq(-9, 29, 2), lty = "dashed")
@
\end{center}
\caption{Constraint set (solid lines) and contours of the objective function
(dashed lines).  Constraint set is unbounded to the left, the boundaries
continuing to infinity.  Bullets show putative optimal solutions.}
\label{fig:4ii}
\end{figure}
Figure~\ref{fig:4ii} shows the feasible region and our putative solutions
(which satisfy the first order necessary conditions).  Clearly, none of
the putative solutions is even locally optimal.  In fact the argmin is
empty and the supremum of the objective function over the feasible
region is $+ \infty$.

\subsection{(iii)}

Now the Kuhn-Tucker conditions are even more complicated.
The Lagrangian is
$$
   L(x, \lambda) = - x_2 - \lambda_1 \bigl[ (3 - x_1)^3 - (x_2 - 2) \bigr]
   - \lambda_2 \bigl[ 3 x_1 + x_2 - 9 \bigr]
   - \lambda_3 \bigl[ 2 x_1 - 3 x_2 \bigr]
$$
The Kuhn-Tucker conditions are
\begin{enumerate}
\item \textbf{gradient.}
\begin{subequations}
\begin{align}
  \lambda_1 (3 - x_1)^2 - 3 \lambda_2 - 2 \lambda_3 & = 0
  \label{eq:4iii:grad-a}
  \\
  - 1 + \lambda_1 - \lambda_2 + 3 \lambda_3 & = 0
  \label{eq:4iii:grad-b}
\end{align}
\end{subequations}
\item \textbf{primal feasibility.}
\begin{subequations}
\begin{align}
  (3 - x_1)^3 - (x_2 - 2) & \ge 0
  \label{eq:4iii:primal-a}
  \\
  3 x_1 + x_2 - 9 & \ge 0
  \label{eq:4iii:primal-b}
  \\
  2 x_1 - 3 x_2 & \ge 0
  \label{eq:4iii:primal-c}
\end{align}
\end{subequations}
\item \textbf{dual feasibility.} $\lambda_i \ge 0$ for all $i$.
\item \textbf{complementary slackness.}
\begin{align*}
  \lambda_1 = 0 \opor (3 - x_1)^3 - (x_2 - 2) = 0
  \\
  \lambda_2 = 0 \opor 3 x_1 + x_2 = 9
  \\
  \lambda_3 = 0 \opor 2 x_1 - 3 x_2 = 0
\end{align*}
\end{enumerate}

Now there are 8 cases to consider for active sets.

(Case I: empty active set).  Again \eqref{eq:4iii:grad-b} cannot be
satisfied when $\lambda_1 = \lambda_2 = \lambda_3 = 0$.  So this case
is impossible.

(Case II: only \eqref{eq:4iii:primal-a} is active).
By the same analysis as before, $(3, 2)$ is a solution of the Kuhn-Tucker
equations if it satisfies \eqref{eq:4iii:primal-c}.
It does.

(Case III: only \eqref{eq:4iii:primal-b} is active).
By the same analysis as before, this case is impossible.

(Case IV: only \eqref{eq:4iii:primal-a} and \eqref{eq:4iii:primal-b} are
active).
By the same analysis as before,
$(2, 3)$ and $(5, -6)$ are solutions of the Kuhn-Tucker equations
if they satisfy \eqref{eq:4iii:primal-c}.
Only $(5, -6)$ does.

(Case V: only \eqref{eq:4iii:primal-c} is active).
Then $\lambda_1 = \lambda_2 = 0$.  By \eqref{eq:4iii:grad-b}
$\lambda_3 = 1 / 3$.  But now \eqref{eq:4iii:grad-a} cannot be satisfied.
So this case is impossible.

(Case VI: only \eqref{eq:4iii:primal-a} and \eqref{eq:4iii:primal-c} are
active).
Then $x_2 = 2 x_1 / 3$, and
$$
  (3 - x_1)^3 - (x_2 - 2)
  =
  (3 - x_1)^3 - (2 x_1 / 3 - 2)
  =
  0
$$
We have a cubic equation to solve, which doesn't look easy.
But it probably has been arranged by Fletcher to have a nice solution.
Throw it a Mathematica.  The only real root is $x_1 = 3$.
Again we get $(3, 2)$ as a solution of the Kuhn-Tucker conditions if it
is primal feasible, which it is.

(Case VII: only \eqref{eq:4iii:primal-b} and \eqref{eq:4iii:primal-c} are
active).
Then $x_2 = 2 x_1 / 3$, and
$$
  3 x_1 + x_2 - 9
  =
  3 x_1 + 2 x_1 / 3 - 9
  =
  0
$$
Hence $x_1 = 27 / 11$ and $x_2 = 18 / 11$ is a solution of the
Kuhn-Tucker conditions if it satisfies \eqref{eq:4iii:primal-a},
which it does and the gradient equations, which we now check.

We have $\lambda_1 = 0$, so we have to solve the linear equations
\begin{align*}
  - 3 \lambda_2 - 2 \lambda_3 & = 0
  \\
  - 1 - \lambda_2 + 3 \lambda_3 & = 0
\end{align*}
From the latter $\lambda_2 = 3 \lambda_3 - 1$ so the former becomes
$$
   - 3 (3 \lambda_3 - 1) - 2 \lambda_3
   =
   - 11 \lambda_3 + 3
   =
   0
$$
or $\lambda_3 = 3 / 11$ and $\lambda_2 = 9 / 11 - 1 < 0$.
So this does not satisfy dual feasibility and is not a putative solution.

(Case VIII: all three constraints are active).
With three equations in two unknowns are probably inconsistent.
They are.

(Summary).  $(3, 2)$ and $(5, -6)$ are solutions of the Kuhn-Tucker conditions.

\begin{figure}[htbp]
\begin{center}
<<label=fig4iii,fig=TRUE,echo=FALSE,height=3.5,width=3.5>>=
#  (3 - x_1)^3 - (x_2 - 2) >= 0
#  3 x_1 + x_2 >= 9
#  2 x_1 - 3 x_2 >= 0
par(mar = c(3, 3, 1, 1) + 0.1)
plot(c(3, 27 / 11, 5), c(2, 18 / 11, -6), type = "l", xlab = "", ylab = "")
curve((3 - x)^3 +2, from = 3, to = 5, add = TRUE)
points(3, 2, pch = 19)
points(5, -6, pch = 19)
abline(h = seq(-6, 2, 0.5), lty = "dashed")
@
\end{center}
\caption{Constraint set (solid lines) and contours of the objective function
(dashed lines).  Constraint set is unbounded to the left, the boundaries
continuing to infinity.  Bullets show solutions of the Kuhn-Tucker equations.}
\label{fig:4iii}
\end{figure}
Figure~\ref{fig:4iii} (page~\pageref{fig:4iii})
shows the feasible region and solutions of the
Kuhn-Tucker equations.
Clearly, the one at the top is optimal $(3, 2)$ and the one at the bottom
is not.

\section{OLS with Nonnegative Parameters}

\subsection{Problem Statement}

For the problem
\begin{align*}
    \text{minimize} & \quad \beta \mapsto \norm{y - M \beta}^2
    \\
    \text{subject to} & \quad \beta \ge 0
\end{align*}
where $y$ is a known vector and $M$ is a known matrix and where the constraint
is interpreted coordinatewise ($\beta_i \ge 0$ for all $i$), show that there
is a unique global minimizer and that it can be found using the method of
Lagrange multipliers (Kuhn-Tucker conditions).  Cite any theorems used.

\subsection{Solution}

From the analysis in Section~\ref{sec:1st-2a} we know that the
Kuhn-Tucker conditions are
\begin{subequations}
\begin{align}
f_i(\beta) \ge 0, & \qquad \forall i
\\
\beta_i \ge 0, & \qquad \forall i
\\
\text{$\beta_i = 0$ or $f_i(\beta) = 0$}, & \qquad \forall i
\end{align}
\end{subequations}
where
$$
   f(\beta) = \norm{y - M \beta}^2
$$
hence
$$
   \nabla f(\beta) = - 2 M^T (y - M \beta)
$$
Hence if we define the gradient vector
$$
   g = - 2 M^T (y - M \beta)
$$
the Kuhn-Tucker conditions are
$$
   (\forall i) (g_i \ge 0 \opand \beta_i \ge 0 \opand g_i \beta_i = 0)
$$

For global optimality of the solution of the Kuhn-Tucker condition
we use the first theorem I proved in class (not in Fletcher).
Suppose $M^T M$ is nonsingular.
Then the Lagrangian is a strictly convex quadratic function for any values
of the Lagrange multipliers.  Hence it has a unique global minimizer.
Hence any solution of the Kuhn-Tucker conditions is unique and the global
minimizer of the constrained least squares problem.

By Lemma~9.2.2 and the Corollary to Lemma~9.2.4 in Fletcher, Lagrange
multipliers always exist when all constraints are linear, so the solution
can be found by solving the Kuhn-Tucker equations.

If $M^T M$ is singular, then the solution need not be unique just like
in the unconstrained OLS problem.

\section{OLS with Nonnegative Parameters: Dumb}

\subsection{Problem Statement}

For the same problem as in 5, find an example (a $y$ and $M$)
for which the ``dumb method'' of constrained optimization --- solve the
unconstrained problem and set any negative betas in the unconstrained solution
to zero --- does not work.

Oops!  That isn't what I meant to call the dumb method.  That method is
\emph{really dumb} and only works with orthogonal predictors.

What I meant is guess the active set is the one with the negative Lagrange
multipliers and set those coefficients to zero.  Refit the model with
those coefficients set to zero, and don't check Lagrange multipliers.
Of course this method will work if the solution just happens to satisfy
primal and dual feasibility, but won't work in general.

\subsection{Solution}

If the problem is one-dimensional (one parameter) or if the predictors
are orthogonal so $M^T M$ is diagonal, then the dumb method works.
Thus we try a problem with no intercept and two correlated predictors.

The problem is which way we want the correlation to go.  Have to try
both, I suppose.  Let's try highly positively correlated first.
<<doit-positive>>=
n <- 30
rho <- 0.9
set.seed(42)
x1 <- rnorm(n)
x2 <- rho * x1 + sqrt(1 - rho^2) * rnorm(n)
y <- rnorm(n)
out12 <- lm(y ~ x1 + x2 + 0, x = TRUE)
coefficients(out12)
out1 <- lm(y ~ x1 + 0)
coefficients(out1)
g <- (- 2 * t(m) %*% y)
@

So the dumb method doesn't work.  But that still isn't what I really wanted.

\subsection{Problem Restatement}

Consider the following algorithm.
\begin{enumerate}
\item Set $A = \varnothing$.
\item Solve the OLS problem subject to $\beta_i = 0$, $i \in A$.
    Denote the solution $\hat{\beta}$.
\item If $\hat{\beta}_i \ge 0$, $\forall i$, stop.  This is the what
    this ``dumb method'' calls the solution.
\item Otherwise set $A = A \cup \set{ i : \hat{\beta}_i < 0 }$ and go to 2.
\end{enumerate}
Note that $A$ is monotonely increasing.  We never take an index out of
the putative active set in this ``dumb method''.

Note also that the putative solution of this dumb method does solve the
gradient and primal feasibility parts of the Kuhn-Tucker conditions.
But if we calculate Lagrange multipliers satisfying complementary slackness,
they need not satisfy dual feasibility.

So the restatement of this problem is to find an example where this ``dumb
method'' does not find the solution of the Kuhn-Tucker conditions.

\end{document}