Q) what is authentication
A) identifying who you are
Philosophical tangent on who you really are, anyway
So how do you define ‘who you are’ ? Many philosopher kings would probably ascribe it to your hopes and dreams, what you do for a living, how many pyramids you have, what great works of art, music or literature you’ve produced, the belief system you’ve imprinted upon your children or your standing in the local community. Or if you’re in the developed world, maybe it’s your lifelong carbon footprint (texas crude, barrels). Others may define the ‘self’ as being both spiritual and physical, giving a distinction between that which is eternal and that which is mere substance on a fleeting mortal plane.
Software developed by yours truly (and literally every other piece of software on the planet) sidesteps this thorny issue by saying that you’re a number, which in most systems is called the UserID number.
It’s really important that you are your number and that your number is you, because if you were a different number you’d be able to get all the different number’s goodies. And that other anthropomorphic number would be really annoyed about that, and possibly want to stop paying us to be a number. Paying us is the process of modifying different numbers, which are also really important.
I used to write software for banks.
But anyway. I digress.
How do I go about proving I am what i yam?
You typically do that by providing some information that only you would you know (or that you invented solely for the purpose of knowing it at a later point in time to prove that you were the entity capable of inventing such a thing). This thing that only you know is called a “password”, and is definitely not the alphanumeric string “abc123” or “p4ssw0rd”.
As you (this time with your multinational conglomerate hat on) grow and multiply, you might start finding that inadequate, so you might start adding other authentication mechanisms to your intranet / portal / knowledge base / support desk / issue tracking / continuous integration / email / birthday tracker / logistics / content delivery system.
And you might also want to force your users to jump through some hoops during the authentication process, because if they don’t click through the 10 pages of small print legalese they’re not going to read, you might find yourself on the hook by someone who doesn’t expect you to provide the software WITH ALL FAULTS, AND HEREBY DISCLAIM ALL OTHER WARRANTIES AND CONDITIONS, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY (IF ANY) IMPLIED WARRANTIES, DUTIES OR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A PARTICULAR PURPOSE, OF RELIABILITY OR AVAILABILITY, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF WORKMANLIKE EFFORT, OF LACK OF VIRUSES, AND OF LACK OF NEGLIGENCE, ALL WITH REGARD TO THE SOFTWARE, AND THE PROVISION OF OR FAILURE TO PROVIDE SUPPORT OR OTHER SERVICES, INFORMATION, SOFTWARE, AND RELATED CONTENT THROUGH THE SOFTWARE OR OTHERWISE ARISING OUT OF THE USE OF THE SOFTWARE.
And we couldn’t have that.
So what sort of authentication mechanisms are there?
Well there’s username + password, obviously. But if you’re actually developing the software, you probably don’t want to enter that username/password every single time the webapp restarts, so you’ll implement some kind of ‘auto-login’ process to fill that in for you whilst you noodle around in your development servers.
And passwords probably won’t cut the mustard for high-value accounts, so you’ll probably want to implement some kind of two factor authentication process (TFA), where the user has to scan their retina in, or, as is much more likely, enter a six-digit number from a keyfob that isn’t a fob any more but actually an app on their phone. These 6-digit numbers are linked to the user and change every 30 seconds, and are called “one-time-passwords” (OTP) but they’re not particularly one-timey as there’s only 6 numbers so you’re going to run out of them pretty quickly.
And clients will get sick of doing that after a short period of time, so they’ll probably start demanding that they can authenticate using some other system that they’re already forced to authenticate to as part of their productive existence in the metropolis-style bureaucracy that they find themselves in. This is called single-sign on ( SSO ), and there’s probably a half-dozen competing standards which allow you to do that, with names like OAuth, OpenID, SAML, LDAP, CAS, Kerberos, SSPI, or the corporate-branded versions of that, funded by the billions of “cyber security” dollars that governments believe will make the world a much more auditable place to live in, much of which appearing to be spent on creating perfectly understandable graphics of all the other identity management products that suddenly exist:
And once they’ve authenticated via that username/password, or TFA, or SSO, you don’t want to have to go through that again on the next page of this website of yours, so you’ll create a ‘user session’ with some other non-guessable identififer, stash that in a “Cookie“, which the user’s web browser will include on every subsequent request.
Modern application servers do that cookie thing for you automatically, but modern application servers also fall over from time to time, so you’ll want a secondary cookie which can be used to reconstruct the user session if the brand-new automatically provisioned non-falling-over server doesn’t know what that session identifier is supposed to be identifying. Modern application servers can also do that for you automatically, but you won’t want to do that, because they don’t do it very well.
But not all clients will want to interact with your site with a web browser, because they want to automate the process deep in the bowels of their own infinitely complex business processes and having to click a button on a webpage is difficult to automate, especially if the button keeps moving every few months to conform to the fashions of the day.
And you know, there might be one of those checkboxes asking if the automated system is a robot, and all automated systems are incapable of checking that checkbox due to Asimov’s Checkbox Law .
So you might find yourself constructing a framework to create API keys and secrets, which are a bit like passwords but have more enforced randomness, are usually a bit longer, can be disabled without affecting the user account it’s attached to.
So that’s it, is it?
Well almost, because there’s a few things you also need to check during login. Let’s say you’ve got a user called Alice from the off-license, or some other non-malicious character in this universe of discourse.
First of all, you probably need a way of temporarily disabling access for Alice, even if she can remember her magical incantation to access the site, in the off chance she doesn’t pay her bills within the 30-day paying your bills window.
And someone else (who’s name is usually Eve, or Chad for some reason) might try to pretend to be Alice, to gain access to Alice’s stuff, so you probably want something to prevent Chad from entering every password from ‘aaaaaaaa’ to ‘zzzzzzzz’ in an attempt to brute-force his way into the account. So you’ll probably implement some kind of failed login count, and once it hits some small number, lock them out of the system for some small period of time.
That also gives Chad the ability to lock Alice out of the system, so you might want to try to prevent that somehow, but you’re probably just better off just waiting for Chad to give up. Give up, Chad.
And you might want to prevent Alice from logging in from home, or her mobile, or some other non-secure location, so you might want to restrict what kinds of IP addresses or ranges can be used to access the site.
But you might want to lift those restrictions for certain users, because they’re extra special.
And of course you want to prevent people from accessing the site when you’re in the middle of rejigging the database, or the webapp, or some other maintenance task, so you’ll need some way of splashing the site during those processes, or if something causes the site to transition to a Total Inability To Support Usual Performance state .
So that’s it, is it?
Well, no, because what with people never clicking the ‘logout’ button, and browsers continuing to retain those cookies we were talking about earlier (all the better to advertise to you, dear), you might find that the ever-nefarious Chad can persuade Alice to visit some other website (maybe a link in an email or something) that Chad controls. And then Chad can cause Alice’s web browser to fire off some requests to your website pretending to be Alice, which your site will dutifully process, because as far as it’s concerned those requests did come from Alice.
So now you need to include another identifier in literally every request from Alice’s web browser that Chad’s evil site doesn’t know about, which is called a cross-site request forgery ( CSRF ) token.
So that’s it, is it?
Well, almost, because Chad might be able to intercept Alice’s web requests (which we usually start calling messages around this point), including messages containing Alice’s password, which is often called a classic man-in-the-middle attack.
So you’ll probably want to enable HTTPS encryption on your site to try to prevent that.
HTTPS is this thing where both of you select some really big prime numbers, and then everything you send is converted to a number, raised to the power of one of your prime numbers, and then moduloed some other almost prime number at the other end. Or something. It’s definitely more complicated than that.
Bruce Schneier has thought about it for a while, which is all you really need to know as a developer. As a consumer of content, all you really need to know is that you need to make sure there’s a little padlock next to the address bar, unless you’re using a browser that doesn’t display a little padlock, in which case you need to make sure it doesn’t say “Not secure” instead, unless you’re using a browser that doesn’t do that any more.
But if Chad can intercept Alices’ messages, he might be able to prevent HTTPS from being activated in the first place, so you’ll probably want to enable HTTPS strict security (HSTS) to try to prevent them from preventing that.
And if Chad can’t prevent that, he still might be able to create a certificate trusted by any of the 150 or so certificate authorities that your browser inherently trusts, in which case you’re fucked, unless you use Convergence ( remember that ? ), which won’t get you far these days, or one of the other dozen or so byzantine replacements or improvements to SSL or TLS that people have come up with.
But then Chad is probably a government, so there’s probably no stopping Chad. Chad has to keep himself entertained somehow. And Chad still wants to prevent that other Chad from doing what Chad’s doing to Chad. There’s more than one Chad is what I’m saying.
Also, not the country Chad, I’m talking about Chad in a more metaphorical sense.
And there’s a bunch of other constantly-changing headers that you can set on every response your webserver creates in order to prevent your browser from doing things that you as a user might legitimately might expect your browser to be able to do, which gives the security industry something to raise in their next penetration test.
So that’s it, is it?
Well, almost, because even assuming that your communication channel is secure, in order to prevent the Chads of the world from turning up in the middle of the night, nicking your hard drives, and putting them up on the Dark Web, you’re going to have to put those hard drives in a data center with one of those cool airlocks, encrypt the drive or the database at rest, and because you’ll be decrypting those automatically in order to use the thing, hash all the authentication tokens in your database, using bcrypt or something like that.
Bcrypt hashes the passwords and also verifies that user-supplied passwords match those hashes. So they can’t be cracked via rainbow tables, as they’re salted and stretched, which preserves the difficulty of cracking the passwords over time. Somewhat like bitcoin does, but without the energy requirements of small countries.
You might need some reversible encryption in there as well if you’re connecting to third parties, as they usually don’t accept hashes, so you should probably brush up on AES, or one of those other ones.
So that’s it, is it?
Well, almost, because in order to support your users you probably want to be able to legitimately impersonate them to recreate what’s currently preventing them from doing whatever it is that they’re trying to do, so you’ll need to to build that into your authentication system. And if you’re impersonating other users frequently enough you might want a way of linking profiles together so that users or support staff can switch between them relatively quickly.
So that’s it, is it?
Well, almost, because from time to time you’ll want to enable users to side-step the towering layers of security you’ve constructed in order to let them do something that they really, shouldn’t be able to do, but happens to be something that they actually need to be able to do in order to perform their job. So you’ll need a way of escalating those users temporarily during some actions to do those critically important things that you’ve spent most of the preceding time trying to prevent them from doing. Because once you start creating security frameworks, you’ll discover that there’s exceptions to every rule, except the rule about exceptions to every rule.
So that’s it, is it?
Well almost, because once you’re reasonably confident the user ID you’re dealing with actually corresponds to the user behind that user ID, you’ll want to allow or deny that user from performing certain tasks on the site depending on how much they’re paying you, or how many stripes they’ve got on their shoulder, or whether they’ve got access permissions to the corporate bathroom. The process of answering the question “is this user allowed to do this particular thing” is called Authorisation, and if you thought authentication was convoluted, you ain’t seen nothin’ yet.