Contribution by Lucilla
This page is based on a document submitted by my very good (non-StA) friend and mathematician Lucilla.
Have something to contribute? Check out the Contribution Guide!
Beyond the module
This page covers content which goes beyond what is required for the module. Read on, but don’t worry if you don’t recognise or understand all of it!
If you’re looking for what’s strictly covered under the module content, the following pages may be of interest to you:
Okay, so you know the definitions of the hyperbolic functions and that they’re supposed to relate to hyperbolae in the same way that trigonometric functions do to circles.
But how exactly? And where did they come from? Who thought they were a good idea? And why are they defined with these silly equations involving rather than some cool geometric diagram?
“Well,” you might answer, “just as parametrically plotting gives a circle, so does plotting give a hyperbola. Well, only the right half of it, but you get the idea.”
But so what? The unit hyperbola is the set of points satisfying . So any pair of functions such that spans the whole real line and will also give a parameterisation of the unit hyperbola. For example, . Or even !
Okay, okay. Parameterisations are too open-ended. But there’s another analogy we can draw. are the coordinates of a point that has travelled from counterclockwise along the unit circle by an angle of radians. Perhaps if we just switch out the word “circle” for “hyperbola” there, we’ll get the hyperbolic functions?
Well, not quite. The unit hyperbola never really goes beyond 45 degrees: in fact, it’s asymptotic to both the lines and . So if that was a defining property of the hyperbolic functions, then the graphs of and would need to have a huge vertical asymptote around the mark. Not only that, but there’s no point on the hyperbola at all that is 60 or 90 or 120 degrees counterclockwise from : there’d be blank space on the graph until , when the graph enters back into view from another vertical asymptote. And, anyway, such graphs’d need to repeat every units. That’s not at all what the graphs of and look like, so that can’t be it.
Fortunately, that analogy is almost true. We just have to use a slightly different starting point: instead of characterising as the point you reach when the angle you’ve travelled is , characterise it as the point you reach when the area of the subtended sector is . Now the shift to the hyperbolic world works just as you’d expect:
But why does that even work? Why would defining something with a weird combination of exponentials give you a function that tracks areas subtended by a hyperbola?
The path to explaining this is a long one, and it’ll lead us to a bunch of pretty discoveries along the way as we get to the bottom of What Hyperbolic Functions Truly Are. And to start, let’s go back to the very beginning: the exponential function.
Derivatives all the way down
Think back in time to when you first learned how to differentiate. You probably recall the power rule, which, along with the fact that derivatives play nicely with sums and scalar multiples, lets you differentiate any polynomial. But now you want to spice it up a little. You’re asking yourself a neat interesting question: is there a function which is its own derivative?
Well, the simplest function we can probably imagine is the constant zero function, which just takes the value 0 everywhere. That’s a constant function, so its derivative is 0 everywhere, which is equal to the function itself, voilà! We found one.
But that’s hardly interesting; we want a function with some really funky behavior that still manages to be equal to its own derivative everywhere. Let’s say that in order to explicitly exclude the constant zero function, we’ll additionally require our target function to take the value 1 at 0. So we’re looking for a function that satisfies and .
Okay, what’s the simplest function we can imagine that takes the value 1 at 0? The constant 1 function! Alright, that one doesn’t work; being constant, its derivative is still 0, which this time isn’t equal to the function itself. We need its derivative to be 1.
Let’s not give up. We can fix the discrepancy by adding a linear term, for some constant : this term would vanish at 0, so it wouldn’t break ; and it’s non-constant, so it could help us set the derivative at 0 to be 1. Since differentiates to , this means needs to be 1; so now and its derivative is 1.
But wait! While we worked on fixing the discrepancy, the goalposts themselves moved by one step! Now we need the derivative to be , not just 1. We can keep going and add a quadratic term whose derivative is ; that’d be , if you remember your calculus, so now and its derivative is . Agh, but now itself is again one step ahead, now we need the derivative to have that term! And it’ll keep doing that forever, because every time we want to add a term to fix , that same term will move the goalposts away by one step.
But all hope is not lost. Maybe if we do this for infinitely many steps, then we will indeed get a function whose derivative is itself: when differentiating, the constant 1 term would vanish, and the -th power term would turn into the -th power term; ordinarily that would leave a gap at the end, but since with infinity there is no end, everything works out perfectly, Hilbert’s Hotel style.
Question
What is the sequence of denominators here? In other words, if the -th power term of is , what is ?
Nice, so if we put aside the question of whether this sum converges,1 we actually did find a function which takes the value 1 at 0 and equals its own derivative! And clearly it’s doing something funkier than just constant zero.
Is Its Own Derivative
So, we found a function equal to its own derivative which takes the value 1 at 0. Big deal, right? There’s probably thousands of such functions.
Well, the interesting thing is, no, actually! The function we constructed, by constantly “moving the goalposts”, is the only one.
Let’s think about it again. The constant term couldn’t have been anything other than 1, since all the higher terms will vanish at zero, and we want . Now because , we also need ; but the constant term obviously has zero derivative, and the quadratic and all higher terms will differentiate to at least linear terms, which, again, will vanish at zero. So the only term capable of controlling the value of the derivative at zero is the linear term: so that one also has to be .
The same argument can keep going: the value of the -th derivative at 0 can be controlled only by the -th power term, since all lower ones will have zero -th derivative everywhere, and all higher ones will differentiate to at least something linear, and so will have zero -th derivative at 0. And so, because we’ve specified the value of every order derivative at 0 (namely, all of them must be 1), we’ve uniquely determined the value of every power term.
This means our silly infinite polynomial is actually quite an important function indeed! Let’s name it , for “Is Its Own Derivative [and, at 0, takes the value 1]“. Again, it’s the unique function with those two properties.
Now the interesting thing is that we can generalise this a little. Since derivatives play along nicely with scalar multiplication, the derivative of is also for any real number ; and clearly such a function would take the value at 0. Also, by the chain rule, the derivative of is ; so such a function has the property that its derivative is times itself. And in the same fashion as before, it turns out that is the unique function whose derivative is times itself and which takes the value at 0.2
Keep this in mind. I’m gonna repeat it again for importance’s sake:
The only function whose derivative is times itself and which takes the value at 0 is .
With that in mind, we can try to figure out more about this mysterious function.
Exponents all along?
Let’s see just how much more we can find out about just from its characteristic property of being equal to its own derivative.
We know the product rule says that the derivative of is . Let’s see what happens when we apply the product rule to two differently scaled versions of :
Huh! So has a derivative equal to times itself. But wait, we deduced earlier that the only such function is . So they must be equal! Specifically, setting , we get:
How peculiar! This function has a property reminiscent of exponents.
What about the power rule? What if we try differentiating ?
So is a function whose derivative is times itself! And again, we know the only such function is , so:
Again, just like exponents!
Perhaps you even know that the power rule works for all real exponents, not just natural-number ones. But if you don’t, we can still extend that second property of to at least all rational exponents.3 The property plays along very nicely with all this; it implies the identity works for negative integers, since , so that must be the reciprocal of . And must be the -th root of , since .
Always has been
Indeed, in this identity we can set as well, and get the following:
Aha! So it was exponents all along! Amazingly, starting from just the requirement of being its own derivative, we found that is secretly just the operation of raising to the power of .
So whatever this is, it must be a really important number! Let’s try to find out what it is.
Since we know the infinite polynomial that defines , let’s just plug in 1 and see what we get:
So it’s some number between 2 and 3, somewhere around 2.7.
By this point you’re probably screaming at me. “Of course it’s ! We knew it all along! We knew was its own derivative! We knew was just all along!”
Yes, I admit, I made it excessively cryptic on purpose. I did that to drive the point home: This is what the place of in calculus is all about. is not about compound interest, or the limit of , or, heck, even a raw dry definition of as that sum doesn’t fully do it justice. The true meaning of is that it is the value of . We were able to find the function equal to its own derivative without knowing about at all; derives from , not the other way around.
Euler’s formula
Remember that is just a funky polynomial. An infinite one, true, but still just a function that involves only the four elementary operations.
And, hey, we know how to do the four elementary operations on complex numbers, too! There should be no harm in, say, plugging in :
So the end result would be something like:
The numbers are unusual, but the principle itself isn’t strange: we’re just plugging in complex numbers into a polynomial.
Here’s something entirely different, for a change: what’s the derivative of ?
Aha! It’s times itself! But what do we know about such functions? They’re just secretly scaled and stretched copies of , aren’t they? And because and , this function already has the value 1 at 0, so no scaling is necessary. So in conclusion:
You’ve more than likely come across this formula at some point in your maths-curiosity-filled life:
It’s the same formula.
The mirrors of Euler’s formula
Perhaps you remember from physics that circular motion has the defining property that the acceleration vector is proportional to the displacement vector with a negative proportionality factor. That makes sense: in a circle, acceleration always points towards the centre, whereas displacement is, well, away from the centre. The cool thing is that this property also characterises circular motion: as long as both the horizontal and vertical velocities are nonzero, such motion will always follow a circle.
And, hey, circles! Sine and cosine! Maybe there’s a connection there!
The displacement in circular motion follows , where is time. But what if we tried to find out what functions describe circular motion not through a geometric approach, but through a differential one? What if we tried to analyse circular motion through this idea of acceleration being negatively proportional to displacement?
So we have a new wishlist: we want functions such that
- , , and ;
- , , and .
In other words, we’re specifying how we want our circular motion to start (namely, one unit to the right from the origin, and with a derivative pointing upwards), and having the characteristic property: their second derivative (acceleration) is equal to the displacement with a negative sign.
How can the second derivative of a function be equal to times itself? Well, one easy way is for its first derivative to be times itself; then differentiating it twice will give us back. Naturally, another way is , since every number has two square roots. Also, any sum of such functions will still have a second derivative equal to times itself, and any scalar multiple still will.
And in fact, the theory of differential equations tells us that that’s all of them. The only functions whose second derivative is times itself are functions of the form , where and are constants, and and are functions whose derivative is and times itself, respectively.4
And we already know what function differentiates to times itself: it’s ! And in our case, was , so is . So both and are some combination of scalar multiples of and .
Rings a bell?
If , all that’s left is to solve for and . We want:
So and , giving and , so that:
Repeating this with , we get and , so and , and we get:
And, of course, I cleverly picked the initial conditions so that the circular motion functions and are just and : at time 0, a point tracing out would be one unit to the right of the origin and its velocity would be upwards. Switching out for that scary, scary notation gives us the “mirrors” of Euler’s Formula:
This usual notation of as , even when its argument is a complex number, can be crazy confusing if not outright intimidating: what’s it even mean to raise a number to an imaginary power? The answer: this isn’t about raising anything to a power at all! It’s an abuse of notation; what we mean by is .5
So sine and cosine can be characterised by their geometric meaning just as well as their property that their second derivatives are equal to their own negation, and the connection between the two approaches lies in how circular motion works. This means sine and cosine are not-so-distant cousins of exponentials! In fact, for a brief time before the discovery of logarithms, people used sines and cosines to turn multiplication into addition: sines and cosines were makeshift logarithms.
And, hey, bring up those formulae above one more time:
Doesn’t this look a whole lot like the definition of and ?
IIOD’s slightly less outlandish cousins
What if we go back to the defining characteristic of circular motion, but get rid of that pesky negative sign this time? Instead of looking for functions whose second derivative equals minus themselves; let’s look for some whose second derivative equals… themselves, period.
Obviously, is such a function. So is , since differentiating it once negates it, and differentiating it a second time negates that, landing us back at the beginning. And since 1 and -1 are the only two square roots of 1, by the same reasoning as above, all such functions are the sums and scalar multiples of and .
Just for fun, let’s give them the same starting conditions as we did above: ; and give the resulting functions and the names and . Just for fun. Then we get slightly less outlandish counterparts of the mirrors of Euler’s Formula:
And, by adding them, a slightly less outlandish counterpart of Euler’s Formula itself:
That’s right, and are the even part and odd part of , respectively.
And by the way, with those definitions, it’s easy to verify that ; and because the unit hyperbola includes the point and is tangent to a vertical line at that point, that means traces out (half of) the unit hyperbola, just like traces out the unit circle.6 But this isn’t what and are about: that’s just a consequence of the fact that they satisfy . Their real significance is in their differential properties. Ironically, their names are kind of a misnomer: they really aren’t all that much about hyperbolae. Like, sure, the connection is there, but it sorta distracts from the more important stuff about them, you know?
Not really about hyperbolae
Let’s go back all the way to the beginning, to how we characterised the hyperbolic functions through their connection to spanning areas in hyperbolae. Here’s that diagram again, for reference:
Now we have everything we need to actually show that this is true.
We want to find a parameterisation of the unit hyperbola with a function , which will serve as the vertical coordinate (so the horizontal coordinate will be ), such that when the vertical coordinate is , the area of the subtended sector will be . Our task is to determine what is.
We can find the area of this sector using calculus, if we rotate this diagram by 90 degrees:
It basically amounts to finding the integral .
This integral isn’t easy! Right off the bat, if we substituted or something, there’d be no good way to get to turn into . The answer will probably not just be a simple polynomial.
Let’s try to put our hyperbolic functions (the ones with the exponential definitions) to the rescue. Since , let’s try to substitute . Doing so will give us and , so the integral becomes .
To continue, we can rewrite as (which is pretty much just the first binomial formula, quite neatly). This gives us
(Plus , but shush, we’ll want a definite integral in the end.)
Okay, but we’re not quite there yet; we can’t convert back anymore. We started with , and now we have . That’s okay though; we can use a similar identity to go back: . (Pretty much just the third binomial formula.) Then becomes , and becomes , and we get our final answer:
What a funky integral indeed! Who would’ve thought that the integral of would have inverse hyperbolic sine in it.
So that’s the area between the unit hyperbola and the -axis. To get our desired sector, we need to subtract the bit subtended by a straight line directly from to a point on the hyperbola, namely . That’s just a right triangle, and the area of a triangle is half base times height, where the base is the -coordinate, , and the height is the -coordinate, . Putting it all together:
And since we want it to be equal to , that means our must be . That exponential one.
So, yes, the “hyperbolic” functions with those weird combinations of and do really characterise a hyperbola. But the fact that they do is little more than an exercise in calculus as a consequence of their real significance: that they’re their own second derivatives.
Thank you for reading.
Footnotes
-
This sum does indeed converge for every , so this function is indeed well-defined for all real numbers. There’s probably several ways to verify this: here’s one I came up with. First, I gotta spoil that those denominators are factorials: the summands are for from 0 to infinity. (The factorials come from repeatedly applying the power rule; you multiply by but lower the exponent to , so next time you’ll multiply by , and so on.) Note that , being , is the same as the geometric mean of all the numbers from 1 to , raised to the power . As increases, that geometric mean increases without bound; in particular, it always eventually exceeds any fixed . In fact, it even eventually exceeds ; from that point onwards, is always less than , so the rest of the terms from that point onwards are bounded from above by a geometric series with common ratio . And we know that that series converges, so our original series also does. For large it can converge to something very big; after all, all we know is that eventually it’ll turn to something smaller than a geometric series, and that “eventually” could take a very long time. But it will always be finite: the part before the Eventually will always have finitely many terms, and the part after the Eventually is always bounded by a geometric series. ↩
-
You could prove this by either repeating the argument from above with these generalisations, or with a neat succinct indirect proof: suppose there existed another function whose derivative was times itself and which took the value at 0; then would be a function other than equal to its own derivative and taking the value 1 at 0, a contradiction. ↩
-
Exponentiation with a rational number can be defined relatively simply as . Exponentiation with arbitrary real numbers, on the other hand, is a lot trickier to define. It basically requires proving that exponentiation is a continuous function (at every point except (0, 0)), and then “filling in the gaps”. Illustratively, if for, say, , we pick some sequence of rational numbers that converges to it (say, 1, 1.4, 1.41, 1.414, …), then the sequence will always converge to the same value no matter which sequence we picked: we define to be that value. ↩
-
I don’t have a handy, easy-to-understand proof for this, sadly, you’ll have to take my word for it or look it up yourself if you feel up to it. It’s basically a generalisation of what we saw above with , though, so you’re not missing out on much. ↩
-
Heck, this only gets worse: the same abuse of notation is used to take matrix exponents and even exponential derivatives. ↩
-
So maybe motion in which acceleration is proportional to displacement with a positive proportionality factor should be called “hyperbolic motion”. ↩