Lecture 3: Authenticated Key Exchange

Jan 21, 2026

Disclaimer: This note is a transcript of the classroom discussion and is not yet of book chapter quality.

Okay, today we are going to start our study of TLS. At a high level, you have two parties, who first engage in authenticated key exchange and get a session key. Then they essentially do authenticated encryption back and forth.

Note: You will also see this structure in other protocols, such as the Signal protocol. The difference lies in the security properties required for the authenticated key exchange. With TLS, you need mainly authenticity: you want to know who you are talking with. But with Signal, one additional property you’d require is deniability: after the handshake, there should be no cryptographic evidence that Alice talked to Bob. We’ll talk more about that later, but basically it means that Alice should not sign on anything during this whole process, which makes authentication harder because the natural ingredient to authenticate who you’re talking with is to ask Alice to sign something.

The plan for this week is: today we talk about AKE constructions. We’re going to talk about the four variants, and the last one will look close to TLS 1.3. On Friday, we’re going to talk about concrete instantiation of AKEs with a public key encryption called El Gamal. Also, we’re going to talk about handshake in TLS, and the actual features that TLS introduced on top of basic goals of AKE.

Warm up with Key Exchange (KE)

Let’s start with adversarial model. We have Alice and Bob. They want to establish a shared key, and we have an eavesdropper in between listening to the channel, but there may also be impostors who actively pretends to be Alice or Bob, trying to confuse these parties. Dealing with impostors is where things get tricky.

We care about correctness of the exchange, meaning at the end of the protocol they should get the same key. We also care about the secrecy of the key. How do we define secrecy? For example, if we just say that the probability of the adversary learning the key is negligible, that would be too lenient of a requirement. Consider a protocol that simply leaks half of the key bits. It’s secure under this definition, because guessing the rest $n/2$ bits is hard, but it’s clearly not secure.

The second reason is that key exchange is typically paired with encryption, for which we need uniform keys. For these two reasons, we define security as follows: we ask the attacker to distinguish the result of the key exchange from a uniform string of the same length.

We consider a game that first runs an instance of the key exchange protocol between Alice and Bob; they follow the protocol, and output a key $K$. Let the transcript be the messages sent over the communication channel. Then the challenger gives the attacker the transcript, and either the actual output of the key exchange or a random bit string of the same length. It asks the attacker to guess which is the case. We demand that the attacker should win this game with a negligible probability over random guessing. That is to say, the attacker should not learn anything about the key that can help the adversary to distinguish it from a random string.

There’s a very elegant protocol that achieves this. Let’s focus on eavesdropper (no impostors) for now. The protocol is the famous Diffie-Hellman key exchange. TLS handshake diagram

It goes as follows. Alice samples a random scalar $x$ in the exponent space of the group, sends over $g^x$. Bob samples $y$ and sends over $g^y$. Alice will raise the message from Bob to the power of $x$, and then Bob will do the symmetric thing.

This protocol guarantees correctness, by the definition of group exponentiation. How about security?

If you give the transcript of the protocol, which is these two messages, to the adversary, can the adversary distinguish a random string from $g^{xy}$? So far we have only introduced the Discrete Log (DL) assumption, which says given a random group element, it’s hard to find its discrete log. The transcript consists of random group elements because the exponents are random. So indeed, because of DL, we know that it’s hard to recover $x$ or $y$. But it doesn’t seem sufficient because the adversary goal is not to recover the secret key. We need to rule out the possibility that the attacker can distinguish it from a random key, without computing it.

To formalize what we need, we introduce a new assumption to capture the actual property we need. This assumption is called the Diffie-Hellman assumption. TLS handshake diagram

It basically says exactly what we need: given the transcript, it’s hard to distinguish a random group element from $g^{xy}$. There’s a computational version, saying that it’s hard to compute $g^{xy}$. Though the decisional version is a harder problem and therefore a stronger assumption.

So the security proof of Diffie-Hellman key exchange is basically by the DDH assumption. But at this point you may ask, isn’t this cheating? When cryptographers cannot prove something, they sometimes propose a new assumption. We still gain something meaningful, because now we can separate assumptions from this construction, and we can scrutinize the assumption separately from the construction based on the assumption.

Note that the DH KE protocol outputs not a key, but a group element. You need to convert this into a uniform string. The tool for this purpose is a cryptographic hash function, or a special purpose tool called the key derivation function (KDE). It accepts high entropy secrets and outputs a uniform $n$-bit string. Note that high entropy secrets may not be uniformly distributed, but KDF essentially can smooth the distribution to a uniform distribution. The KDF can be constructed from HMAC. TLS handshake diagram

All right. So far so good. This is classical results from the 70s. As I said, a simple protocol that assumes no impostors. That’s the thing that we need to deal with today, because what makes it authenticated is to deal with impostors.

It’s easy to see why unauthenticated DH KE is not secure with an impostor in between. Here’s the man-in-the-middle attack. The basic idea is that when Alice does this protocol with Bob, it’s never checking that it is communicating with Bob. So the attacker can pretend to be Bob to Alice, and pretend to be Alice to Bob. I.e., our attacker will intercept the message from Alice, sample a new secret, and send it to Bob; do exactly the same in the other direction. At the end of the day, Alice will think she’s talking to Bob, and Bob will think she’s talking to Alice, but they are actually all talking to the adversary.

This is a very powerful attack because it’s not detectable. When Alice sends something to the attacker, the attacker can re-encrypt it and forward it to Bob. So Bob will think the message came directly from Alice. But we will see that even if you add the signatures, there can be subtle attacks.

Authenticated KE

For today’s AKE constructions, you can find the graduate applied cryptography textbook by Dan Boneh and Victor Shoup. We refer to this book as BV, and some proofs are deferred to the textbook. This book is freely available online.

Identity and PK

To talk about authenticated key exchange, we need to first have some notion of identity. Here’s our model.

Every player in the game has an identity and a public key. We denote identity with ID and public key with PK. The two are different. In TLS, identity can be your domain name. If we’re talking about digital e-signatures as in, say, DocuSign, your identity is your legal name. So the identity and the public keys are two different things. But they still need to be bonded to each other to make things work. That’s where we introduce a certificate authority in our model.

We assume there’s a trustworthy CA, to not get into how CA can be malicious. If the player wants to bind a public key to identity, she goes to the CA, presents evidence for her identity. Then CA verifies this identity document and basically signs over the identity and the public key pair to make it a binding relationship.

Have you had the experience of setting up your own TLS certificate and getting a certificate from Let’s Encrypt, for example? The evidence you prove to the CA is the evidence showing you own your domain. That can be done through DNS or HTTP challenges. Different ways to do this process, but at the end of the day you get back a signature over this pair. We assume everyone knows the public key of CA, so this is verifiable. All the certificates are public.

“CA seems like big assumption?”

It is a huge assumption, and there have been many attempts to improve the status quo. We will talk about transparency log, for example, in next few lectures.

Security goals

All right. Now with identity defined, we can talk about the goals of authenticity. Now we rename the parties P and Q instead of Alice and Bob to be consistent with the textbook. We state from the point of view of P, but same goes for Q.

We say that if P terminates, she should output two things: a key K, as before, and the identity of the party that she thinks she’s talking to. Of course, by extension, if you know who you expect to talk to, you could abort the protocol at the moment you learn that you’re talking to the wrong person. But if the protocol proceeds to the end, you output who you think you are talking to. The same holds for Q.

With this new model, we can define authenticity. If a party outputs a (key, ID) pair, we require that this key is shared with only the party with that ID. We need an additional clause to make the definition work: it’s possible that this key is shared with nobody, although P believes that the key is shared with somebody, which is also fine in our definition.

Security is exactly the same as before. But if you read the textbook, you will see that the definition for AKE becomes fairly convoluted. The security definition goes for four to five pages, and the proof is another four to five pages. The complication comes from the fact that this protocol can go on again and again. It’s not a one-off thing. It’s a recurring protocol, and the adversary may participate in multiple instances of it over time.

To show you one wrinkle, we consider the key reuse attack. The attacker’s goal is not to learn the key, not to confuse parties about the identities, but just to make it so that they use an old key that has been used before. Why is that problematic? There could be multiple reasons, but one example is a one-time pad encryption, or encryption requires a unique IV. Re-using an old key can break such encryption schemes.

Let’s try to fix Diffie-Hellman exchange with signatures. This is called Diffie-Hellman with ephemeral keys. P has a public key, secret key pair, and the same for Q, with which they issue and verify signatures. But they don’t use long-term secret keys to generate session keys. To generate a session key, Party P samples a random one-time secret $a$, sends over $g^a$, an ephemeral “public key”, along with a signature over it. In return, Q signs over the message it received from P as well as his own message, and the certificate. Why does Q sign a pair of messages? This is an attempt to link the two flows of messages together.

The question is: does this accomplish the goal of AKE?

Identity Misbinding Attack

Here’s the attack.

You will see that P will notice that nothing changed: she is still talking with Q, as if there’s no E. Q is different. Q will see no message from P, but a valid message from E. The problem is that they will associate this key with different names. P will think that this channel is with Q, and Q will think that this channel is with E. This breaks authenticity.

Note that in this entire process, E does not learn the session key. The problem is only confusion, but this can be bad. For example, let’s say P is the CFO of the company, and E is a regular employee. CFO says, “Move all the funds to my account,” as a message to Q, the accountant. Receiving this message, what will Q do? Q will move all the funds to E’s account. The context of who is “me” is ambiguous. This breaks the whole point of authenticity: you want to know who you’re talking with.

Constructions

That’s a warm-up attack against a warm-up protocol. Let’s see some more serious designs, four of them.

AKE1

The first design is pretty simple. We assume every party has a public key PK, which is actually two keys: a key for signature and a key for encryption. We’re not going to spell them out explicitly, but we will write $\mathsf{Enc}_P$, to denote encrypting under P’s public key. We will write $\mathsf{Sig}_P$, to denote signing using P’s private key.

The high-level idea is for Q to send over a key $k$ encrypted under P’s public key. P can decrypt and therefore get the key, and they now have a session key to use. Why all the other stuff besides $k$?

Let me offer some intuition.

The ciphertext $c$: its primary goal is to deliver the session key $k$. But it has a secondary goal, which is to bind these two pairs together. What I mean is that the attacker should not be able to swap this $\mathsf{id}_q$ with something else. This requires ciphertext integrity.
Signature: Signature proves the integrity of $c$. It proves that this originates from Q. Nobody has tampered with it.
$r$ is one-time randomness so that messages from different sessions cannot be stitched together.
Why echoing P’s ID? This intuitively prevents an identity misbinding attack. It lets P know that Q also thinks that he’s talking with P. If you recall the previous attack, Q thinks he’s talking to E, but P thinks he’s talking to Q. This is trying to prevent something like that.

After offering the intuition, I want to go through the exercise of removing some part of it and ask you to come up with an attack.

Let’s start.

Insecure variant number one: what if C is not signed? The attacker could replace C with the encryption of her own key.

What if R is not signed?

The attacker records the message from Q, and now P wants to talk again. The attacker will replay the same message from the past. This is a key reuse attack, because now P will think that she’s talking to Q and reuse a previous key.

Again, reusing keys is bad for several reasons. First, it may break security of encryption. Second, it enables a message replay. A contrived example: after key exchange, Q sent to P, a bank, a message “Transfer $100 to Eve.” If Eve, the adversary, with this attack, replays this message, P will transfer another $100 to Eve.

What if the identity of P is not signed?

The intuitive problem is that there’s no confirmation from Q that Q thinks that he is talking to P. This was the issue with identity misbinding attack.

When party P sends over the first message, the attacker intercepts that and reuses the message, pretty much like the misbinding attack we have seen for Diffie-Hellman with signature. Then Q will reply to A with this message. At the end of the day, P and Q think they are talking to a different person with the same key.

Another attack: what if this $id_Q$ is not encrypted here? This attack is less direct, because you would think that the signature provides protection over C. But if you look at the message in the signature, everything is public: R is public, ID is public, C is also public. Nothing prevents the attacker from producing the signature under the attacker’s certificate. That’s the problem.

Let’s say this first interaction happens. The attacker can replay the same ciphertext message, but with the attacker’s signature. It’s a combination of replay and identity misbinding. There’s a reuse of the session key, and P thinks she’s talking to A instead of Q.

Q: “Wouldn’t P reject after receiving the last message, knowing that the message came from A not Q?”

A: If P has in mind that she’s going to talk to Q, then at this point she will learn she’s talking to A and abort. But if P doesn’t have this knowledge a priori, maybe P is a bank who is just accepting any client, then P will think she’s talking to A using a session key that she used to talk to Q.

That’s our first AKE construction. Every bit of this is essential. As you can imagine, proving this whole thing to be secure is a daunting task, so I refer you to the textbook.

A note on why CCA security is required. If Enc is CPA secure, it is possible for the attacker to maul the ciphertext into an encryption of something else. This is another way to enable replay attacks. The attacker can replay the ciphertext and generate the rest. This is identical to the attack where you don’t have this identity. We will not say too much about CCA. I just want to plant the seed that some strong ciphertext integrity is needed. We’ll come back to that.

But this protocol has some limitations. So far we assume keys never get compromised, which is not the case in reality. What keys are involved here? There’s a secret encryption key and a secret signing key.

What if the secret encryption key is compromised? Can the attacker get the session key K? Yes: the attacker can simply decrypt C to get the key. If the encryption key is lost, then you can decrypt the future and the past traffic, which is bad. You want the system to be prepared for compromise. After the compromise, something is going to be visible to the attacker inevitably, but it would be nice that the session keys generated before the compromise are secure.

AKE2

In forward secrecy design, usually the trick is to use some ephemeral keys.

Our fix to this problem is called AKE2. Instead of encrypting the session key under the same key over and over again, we ask P to generate a new public key for each session. This PK replaces R in the first message. The protocol for Q is similar. The intuition is very much similar to AKE1. The forward secrecy is provided by this fresh public key. If the ephemeral key SK is lost, you cannot go back and decrypt all the session keys.

But the scheme is still quite brittle. If ephemeral SK is leaked, the attacker will be able to impersonate P at any time going forward, as often as they’d like, to any user, not only Q.

AKE3

The solution is to force the use of something that the attacker doesn’t have. For example, instead of signing over just the session key PK, we add a third message from P to Q, signing over a random fresh message from Q. This would prevent the attacker who only has ephemeral SK from finishing the protocol because the attacker doesn’t have the signing key. Here, “1” and “2” are domain separators.

In the textbook, this property is called HSM security. It basically requires that the attacker can only run the protocol if the attacker has access to the signing key, which can be protected by specialized hardware called Hardware Secure Module (HSM). That is, the attacker can only run the protocol if it has access to the HSM. The moment that the attacker is removed access to the HSM, the attacker stops being able to run the protocol.

It’s crucial that the signature not only covers a fresh message, as I promised, but also the identity of Q. What if this is removed? The problem is Q has no confirmation that P thinks that P is talking to Q. The attacker is trying to make P talk to A. What the attacker can do is to let this message go through, block the second message, replicate C with the signature from the attacker. Then P will send over this message, and the attacker will forward it to Q. So P will be talking to A, and Q will think that it’s talking to P. It’s a good exercise to think it through offline.

This is a good place to pause.