Theorem

If $(X,d)$ is an infinite metric space and $f:X\to X$ is continuous and chaotic, then $f$ sensitively depends on initial conditions.

This proof is from Adams and Franzosa (2008), who in turn got it from Banks, Brooks, Cairns, Davis, and Stacey (1992).

**Idea: **We find a pair of disjoint periodic orbits $O\left(q\right)$ and $O(q\prime )$, so any $x\in X$ will be at least a constant distance away from at least one of them. We use the first property of chaos (density of periodic points) to get a periodic point $p$ nearby $x$ (call its period $m$), and the second property of chaos (topological transitivity) to get a point $w$ nearby $x$ such that $f$ eventually brings $w$ near $q$ or $q\prime $. The continuity of $f$ lets us pick $w$ such that $m$ iterations of $f$ on $w$ stay close to $O\left(q\right)$ or $O(q\prime )$. Then for the $n$ we need to demonstrate sensitive dependence on initial conditions, we pick an integer that's a multiple of $m$ but also keeps ${f}^{n}\left(w\right)$ near $O\left(q\right)$ or $O(q\prime )$. Then $d({f}^{n}\left(p\right),{f}^{n}\left(w\right))=d(p,{f}^{n}\left(w\right))$, which is large because $p$ is near $x$ and $x$ is far from $O\left(q\right)$ or $O(q\prime )$. By the triangle inequality, at least one of $d({f}^{n}\left(x\right),{f}^{n}\left(p\right))$ or $d({f}^{n}\left(x\right),{f}^{n}\left(w\right))$ must be large, proving the theorem.

**Proof: **First of all, observe that $X$ contains infinitely many periodic points of $f$, since it's infinite and periodic points are dense in it. At least two of these points—call them $q$ and $q\prime $—must have disjoint orbits. Otherwise, all of the infinitely many periodic points would be part of the same cycle, prohibiting any of them from having a finite period. Now let ${\delta}_{0}=min\{d(x,y):x\in O\left(q\right),y\in O(q\prime )\}$, so that ${\delta}_{0}$ is the minimum distance between the orbits of $q$ and $q\prime $.

Define $\delta =\frac{{\delta}_{0}}{8}$. We'll prove that this is the $\delta $ needed for sensitive dependence on initial conditions. So pick an arbitrary $x\in X$ and $\epsilon >0$. Without loss of generality, we may force $\epsilon <\delta $. (We can shrink $\epsilon $ as much as we like because if we find a suitable $y\in {B}_{\epsilon}\left(x\right)$ for the new $\epsilon $, then of course $y\in {B}_{\epsilon}\left(x\right)$ for the original, larger $\epsilon $ as well.) Since periodic points are dense, there's a periodic point $p\in {B}_{\epsilon}\left(x\right)$. Let $m$ be the period of $p$.

It follows from the triangle inequality (and it's obvious from the picture) that wherever $p$ is, it must be at least $\frac{{\delta}_{0}}{2}$ units away from either every point in $O\left(q\right)$ or every point in $O(q\prime )$; suppose, without loss of generality, that the former is the case.

For each $j=0,1,\dots ,m$, let ${B}_{j}={B}_{\delta}\left({f}^{j}\left(q\right)\right)$. (This means that ${B}_{0}$ is just ${B}_{\delta}\left(q\right)$. But ${B}_{m}$ need not equal ${B}_{0}$, since $m$ is the period of $p$, not $q$.) Then for each $j$, the continuity of $f$ implies that ${f}^{j}$ is continuous, so ${\left({f}^{j}\right)}^{-1}\left({B}_{j}\right)$ is open; in fact, it's a neighborhood of $q$. Let $V$ be the intersection of all $m+1$ of these neighborhoods.

Since $f$ is topologically transitive, there exist $w\in {B}_{\epsilon}\left(x\right)$ and $k\in \mathbb{N}$ such that ${f}^{k}\left(w\right)\in V$.

Now pick $h\in \mathbb{N}$ with $k\le hm\le k+m$. Suppose we knew that

$d({f}^{hm}\left(p\right),{f}^{hm}\left(w\right))>2\delta .$

(1)

The triangle inequality would then imply that $d({f}^{hm}\left(x\right),{f}^{hm}\left(p\right))>\delta $ or $d({f}^{hm}\left(x\right),{f}^{hm}\left(w\right))>\delta $, proving the theorem, since $p,w\in {B}_{\epsilon}\left(x\right)$. So all we have left to do is prove (1).

By the triangle inequality,

$d(x,{f}^{hm-k}\left(q\right))\le d(x,p)+d(p,{f}^{hm}\left(w\right))+d({f}^{hm}\left(w\right),{f}^{hm-k}\left(q\right)).$

(2)

Since $p\in {B}_{\epsilon}\left(x\right)$,

$d(x,p)<\epsilon <\delta .$

(3)

Since ${f}^{k}\left(w\right)\in V$,

${f}^{k}\left(w\right)\in {B}_{0},{f}^{k+1}\left(w\right)\in {B}_{1},\dots ,{f}^{k+m}\left(w\right)\in {B}_{m};$

in particular, since $k\le hm\le k+m$, ${f}^{hm}\left(w\right)\in {B}_{hm-k}={B}_{\delta}\left({f}^{hm-k}\left(q\right)\right)$, so

$d({f}^{hm}\left(w\right),{f}^{hm-k}\left(q\right))<\delta .$

(4)

Combining (2), (3), and (4) yields

$d(x,{f}^{hm-k}\left(q\right))\le 2\delta +d(p,{f}^{hm}\left(w\right)).$

But we assumed that $x$ is more than $\frac{{\delta}_{0}}{2}=4\delta $ units away from every point in $O\left(q\right)$, so

$4\delta <d(x,{f}^{hm-k}\left(q\right))\le 2\delta +d(p,{f}^{hm}\left(w\right))$

and thus

$2\delta <d(p,{f}^{hm}\left(w\right)).$

This implies (1), since

${f}^{hm}\left(p\right)={f}^{m}\left({f}^{m}(\cdots {f}^{m}\left(p\right)\cdots )\right)=p,$

$m$ being the period of $p$. □