Cybersecurity & Tech

Scaling Laws: Caleb Withers on the Cybersecurity Frontier in the Age of AI

Kevin Frazier, Caleb Withers
Tuesday, December 2, 2025, 10:00 AM
How may frontier models shift the balance in favor of attackers in cyberspace?

Caleb Withers, a researcher at the Center for a New American Security, joins Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law and a Senior Editor at Lawfare, to discuss how frontier models shift the balance in favor of attackers in cyberspace. The two discuss how labs and governments can take steps to address these asymmetries favoring attackers, and the future of cyber warfare driven by AI agents.Jack Mitchell, a student fellow in the AI Innovation and Law Program at the University of Texas School of Law, provided excellent research assistance on this episode.


Check out Caleb’s recent research here.

This Scaling Laws episode ran as the December 5 Lawfare Daily episode.

To receive ad-free podcasts, become a Lawfare Material Supporter at www.patreon.com/lawfare. You can also support Lawfare by making a one-time donation at https://givebutter.com/lawfare-institute.

Click the button below to view a transcript of this podcast. Please note that the transcript was auto-generated and may contain errors.

 

Transcript

[Intro]

Alan Rozenshtein: When the AI overlords takeover, what are you most excited about?

Kevin Frazier: It's not crazy, it's just smart.

Alan Rozenshtein: And just this year, in the first six months, there have been something like a thousand laws.

Kevin Frazier: Who's actually building the scaffolding around how it's gonna work, how everyday folks are going to use it?

Alan Rozenshtein: AI only works if society lets it work.

Kevin Frazier: There are so many questions that have to be figured out and nobody came to my bonus class. Let's enforce the rules of the road.

[Main episode]

Kevin Frazier: Welcome back to Scaling Laws, the podcast brought to you by Lawfare and the University of Texas School of Law that explores the intersection of AI, policy and of course, the law.

I'm Kevin Frazier, the AI innovation and law fellow at Texas Law and a senior editor at Lawfare. Today we're joined by Caleb Withers. Caleb is a research associate at the Center for a New American Security, where he focuses on frontier AI and national security.

More specifically, Caleb studies the impact of emerging AI capabilities in the biological and cyber domains. Today's conversation examines how frontier models could disrupt the balance of power in cyberspace, potentially giving malicious actors a decisive edge. We'll look at the trends fueling this shift, explore how policymakers and labs can counter this threat, and finally, consider the next era of cybersecurity.

To get in touch with us, email scalinglaws@lawfaremedia.org or follow us on X or Bluesky. And with that, we hope you enjoy the show.

Caleb, welcome to Scaling Laws.

Caleb Withers: Thanks. It's great to be here.

Kevin Frazier: So, you penned a report called Tipping the Scales: Emerging AI Capabilities and the Cyber Offense-Defense Balance. And I can't say thank you enough because for one thing, you provided me with a great reading to assign to my students on our unit on AI and cyber. But perhaps of more importance than just my class, you added some really great insights into a space that's as gray as a fall day in the Pacific Northwest.

Depending on who you ask, cybercrime costs between $1.5 trillion and $10 trillion in 2025 alone. Now, that's a crazy gulf, and we won't get into how we can have estimates that are 10 times different.

But even on the small end of that spectrum, even if it is just $1.5 trillion in costs as a result of cybercrime, that's a huge issue. And that's a huge public policy issue. And obviously it's going to become even bigger with AI, and that's really where I want to center our analysis.

But let's start with some basics. Pre-gen AI, pre this wave of AI, what was the relationship between some of the AI tools that existed at that point and cybersecurity, both offense and defense?

Caleb Withers: Yeah, sure thing. So I think machine learning automation software, and as you said, both cyber offense and defense, is nothing new.

If I was to think back to some particular examples, you know, take spam. Initially that was somewhat of a manual, you know, process, or there wasn't a need for automation and sort of spam defense in the sense that no one was really doing it yet in the very early days of communication.

But as soon as we saw, sort of, the influx and people realized, ‘Hey, there's an opportunity here to send, you know, messages that the person receiving them might actually not be that excited to receive or not to their benefit to receive,’ then all of a sudden there's this question of, well, how well exactly can we do at having machine learning and software and algorithms to filter out, you know, the good from the bad there?

And so whether it's spam or recognizing malware or just some of the productivity-enhancing things, you know, looking at what a cyber defender might be doing and saying, ‘Hey, well this is a process that we can follow, and maybe we can speed that up a little bit by, you know, hard coding how to do that.’

Yeah, so machine steering, machine learning has played a longstanding role in cybersecurity for sure.

Kevin Frazier: Yeah. And I just think it's important to situate that this, quote unquote, AI moment is new in many ways, but also we've seen AI be a core part of cybersecurity for a long time.

But your report notes that while AI has traditionally helped defenders––for example, companies looking to detect anonymous behavior or helping you and me detect spam now with some of these frontier AI systems––the balance may be changing.

So let's get to that by first talking about what, what do we mean by frontier AI in a cyber context? And how is that kind of changing the calculus on offense versus defense?

Caleb Withers: Yeah, so I, I think the term can vary depending on who you're talking to, but at least the way I'm using it and see most people using it is effectively to refer to large foundation models––that is, models that are trained by ingesting large amount, large amounts of data, in particular from the internet and text and whatnot.

The sort of models that are powering ChatGPT: pretty powerful, pretty general purpose, moving pretty fast. Large language models would be the term that people would often refer to them as, although I sort of put a little asterisk on referring to them as large language models here because given some of the training and the capabilities and the things that we can see them. You know, do—if these models can, you know, effectively use computers, or, you know, they’re multimodal and they can ingest images and whatnot, that also has relevance in the cyber demand.

And so I think that's worth noting that when people think about large language models in cyber, sometimes they say, okay, well, you know, it's going to be helpful for English text and whatnot.

But you know, really these models are also increasingly using computers, making decisions, doing all sorts of things.

Kevin Frazier: Okay. So we've seen some new capabilities come about, and I wonder how, from the vantage point of thinking about the net balance of, ‘is AI helping defenders or helping offensive cybersecurity efforts,’ what is the key aspects of this frontier AI that may lend itself more actually to the offensive folks, the bad actors trying to infiltrate systems?

Caleb Withers: Yeah, so I'll say first and foremost that I think in general the case that AI helps defenders on net, the arguments for that, I think by and large, a lot of them still do apply and will apply to these frontier models.

Effectively, you know, people talk about the defenders dilemma, which is that if I'm a cyber attacker and I'm trying to get into your system, you know, if I only succeed once, you know, in some sense that I'm in your system and you sort of have to be successful all the time and for everything as a defender.

And you know, defenders especially these days will often have massive sprawling networks. And so there's just a big scale challenge there. And so if you have software tools or AI or machine learning that help you deal with that scale or deal with that volume of attacks, then sure, you know, attackers are going to be able to use them.

But you know, things that can scale up as software is good at doing is great for defenders. And so the approach my report takes is to sort of say, is there any reason to expect that this might not hold, or that this time might be different for some of the, you know, emerging capabilities we see.

And there's a few things that come to mind for me. One is that, even as the cost of any given capability in terms of running these models is, you know, rapidly going down, the benefit to spending more on running these models longer, running more of these models, running bigger models you know, we're still seeing returns to that.

And so that breaks an assumption that was previously pretty safe, that, you know, running most machine learning, you know, models or cybersecurity software is going to have, you know, pretty low marginal cost. So we're not there yet in terms of the being a big deal. But, you know, if we look into the future, I could see that the cost that defenders are spending on running AI models, you know, starting to actually be a material consideration of ‘can we actually afford to do XYZ thing.’

Kevin Frazier: Right. So is it right to, to really put an emphasis on the fact that these frontier AI systems essentially enable both an increase in quantity and quality of attacks, so any vector of vulnerability that you had previously, now, if a bad actor is willing to spend those inference costs of finding more and more ways of running a model to attack whatever defenses you have, well, as you're pointing out on the defensive side, you're also having to spend more and more. And you just may not be willing to put up as many of those costs as maybe a bad actor would be.

And so how are we going to see this sort of balance play out, given that it's just becoming more and more expensive to defend against more and more attacks?

Caleb Withers: Yeah. And so I think both factors are things to keep in mind here. I would say that maybe the quantity or scale side of things is maybe more important at the moment, just given that there's sort of not much truly new under the sun when it comes to cybersecurity. A lot of the attacks or exploits you might see are sort of, echoes of things or similar to things that have already been done.

And, you know, there was a study that Mita corporation, did, you know, that they're active in sort of the cybersecurity sort of research space, and they're looking at these things they called stubborn weaknesses. Software vulnerabilities that have consistently been the most common and most severe.

And if they, you know, looked back in 2007 versus more recently, they're still just making up a whole bunch of the vulnerabilities that we're seeing in software.

Kevin Frazier: So, so just to pause there—because I have a lot of my own stubborn weaknesses, so I'm curious to learn about these ones—hat are some of these weaknesses that really stand out both in 2007 and 2025?

Which is really unfortunate that we haven't made some improvements.

Caleb Withers: So, yeah, one, one that is ranked pretty highly, for example, is they call it not neutralizing special elements used in SQL command. So that's effectively, you know, SQL being the language that underpins some of the databases that are used on the internet and otherwise.

And, you know, you want people to be able to put in their usernames and text and whatnot, but you don't want them to be able to actually put in commands that make your database do things and whatnot.

And so, you know, this is a classic thing of, well, we should make sure that the text that, that people give you when you're using SQL systems and code and whatnot, you know, is just that not actually commands that are going to run on your system.

And that's, we still see today after, you know, decades, that's a mistake developers sometimes make. And they call them, you know, stubborn in the sense that, you know, sometimes there'll be an attack and you're like, oh, you really got unlucky. They're like, that was hard for you to know that was going to happen or, you know, anyone could make that mistake.

But a lot of the mistakes we see in the, you know, coding and software side of things that you probably shouldn't have done that in this day and age.

Kevin Frazier: Yeah. And it's really fascinating to get a sense of how the promise of AI in many ways of us all being able to code.

For example, everyone can create their own website. You can create your own app. Everyone's vibing, having a good time, but as you're pointing out, these stubborn weaknesses still exist and might introduce folks who aren't testing their new code, for example, against some of these attacks, to really impressive and really perhaps pervasive and troubling vulnerabilities.

And they're just not even paying attention to this, right? When you're vibe coding, it doesn't come with a warning necessarily of, ‘Hey, watch out. A bad actor might exploit this.’

Caleb Withers: Yeah, and I mean, the flip side of this is some of these things are things that AI could be quite useful for. These sort of known, relatively straightforward category categories of security vulnerabilities.

You know, this is the sort of thing that an AI model actually probably could be decent at least augmenting or helping coders pick up some of these. And I think one of the themes of my report is, you know, you can see it going both ways, right?

You can see if vibe coding models and apps aren't worrying about this too much, or you know, empowering people who aren't, don't even know what to be thinking about when it comes to making secure code. You know, that could be introducing all sorts of vulnerabilities.

At the same time, these tools could be really helpful in avoiding them. And so I think a point that's particularly salient for these, and in general, just as we think about AI and cybersecurity, is, you know, there's choices we face as, you know, whether that, be it industry or policy makers or procurers in terms of, you know, how much do we actually prioritize this sort of stuff?

If we are looking at the benchmarks for models and apps, you know, are we just looking at code performance per se, or are we actually, you know, making it that security is one of the more important things we want to be comparing these on?

Kevin Frazier: Yeah. So before we get to some of the policy interventions and some of your own recommendations, just to bring out some of these more stubborn weaknesses and more generally referring to just the sorts of cyber vulnerabilities people should be most attentive to.

I know I share this with a lot of folks. If you write me an email that inflates my ego and says, oh, Kevin, you are the best podcast host, Alan is, you know, pretty good, but you're definitely the better one of the co-hosts. And why don't you just send me some money to help out with my new Kickstarter campaign, I may be pretty tempted, right?

I've, oh, clearly this is someone who listens to the pod, they care about me. These sorts of hyper-personalized, just great spear-phishing or phishing attacks, right, of being able to write a personalized email that lands in someone's inbox at the right moment and makes a really tangible request.

I've heard this is one of those up-and-coming issues that, like you said, has echoes of the attacks of the past. But to what extent do you really see AI as amplifying some of these traditional attacks, whether it's phishing scams or things like DDoS or some of these other well-known cyber vulnerabilities?

Caleb Withers: Yeah, I think phishing is a great example here, 'cause I think in terms of the sort of, have we seen true transformation of cyberattack yet?

 You know, my, my answer is ‘not yet,’ except for this asterisk of like, as you said, phishing, you know, for exactly the dynamics you pointed at. You know, and I think another particularly salient example here is phishing in the non-English language.

Like, it just seems totally correct to say that this is probably increased by, you know, at least an order of magnitude in terms of—if I am a Japanese business and how many, as you say, like relatively high quality phishing emails I'm receiving, that's just, you know, gone through the roof. And this is a good example too of, you know, I think AI on the defense will be an important aspect of contracting AI on the offense.

But it also like, doesn't always, it's not going to be one for one, right? Or it's not always going to necessarily be an AI solution that's most promising for an AI threat.

And to make this concrete, you know, in the early days of the internet, as I suggested, that, you know, if you received an email, it was probably legitimate. Like there were just more people initially in the early days, you know, sending legitimate messages than there were sending illegitimate message.

And now that's just no longer the case. You know, Gmail and these service providers are sort of, any email that comes in, they're sort of working from an assumption of this is actually probably not a genuine email or a good faith email.

But at the same time, you know, the part of the solution here, yes, will be more sophisticated algorithms to identify these.

But at a certain point, you know, as you say, if I'm sending you this flattering, you know, email in your inbox you know, there's not going to be much that a sufficiently sophisticated you know, AI phishing campaign that has looked up what I say, has looked up what you say, you know, et cetera.

It's probably going to be not that differential between a legitimate and a, e.g., legitimate email. The actual thing that's going to be differentiable is, did I, Caleb send this?

that's where things like, you know, two-factor authentication or you know, is this email actually coming from cnas.org or not, and is, you know, what is the CNAS website?

That sort of thing. So, you know, some of these more mundane solutions, some of these more behavioral solutions in the sense of, just because there's an email in my inbox that says it is from so and so person and says the things I would expect it to say from so and so person.

You know, you could probably actually get away with assuming that most of the time, apart from the most critical cases you know, in years prior. But increasingly that's not the case.

And so, you know, standards and systems around authentication and understanding when an email says it comes from somewhere, you know, does it actually come from there? You know, that's going to play as much of a role, if not more of a role than AI defense for this sort of stuff.

Kevin Frazier: Yeah, it's fascinating the fact that we still can't get beyond the—we're very human. And you're going to have to rely on the end user, like you and me, to have some degree of cyber hygiene in place. To know those signs of when should I read that email again, when should I, you know, send a text message to Caleb to say, ‘Hey, are you really starting this new Kickstarter campaign for everyone to have a mustache that looks as beautiful as your mustache, Caleb?’

That'd be an interesting Kickstarter. I would donate to that. But this requires a lot of attention and a lot of resources for the actual end user, which I think does put you at a disadvantage of just the everyday person having to be even more hypervigilant. about who you're interacting with, what information you're receiving.

So that does seem to put us, us being the defenders, at a bit of a disadvantage.

Are there any other things, new threats you see on the horizon that AI is going to introduce from a cyber offensive perspective that we should be aware of? Or do you think that a lot of the, for lack of better phrase, fear-mongering of ‘cyber is broken, the internet is over, bad actors are just going to use AI to get into every system’—is that hyperbole and perhaps way too exaggerated based off of where the text stands today?

Caleb Withers: Yeah. I think one additional trend that comes to mind here is—and again, there is some continuity here—he time to exploit that is, you know, once an exploit is discovered how quickly is it actually weaponized and practiced against targets.

Kevin Frazier: Right. And some exploit here is just referring to some gap in the code, some ability to probe into a system that you otherwise shouldn't have.

Caleb Withers: Yeah. Yeah. And that time, on average has been trending down over the years, you know, going from, you know, months to weeks and then sometimes days now, depending on sort of what sort of exploiter it is or some other characteristics of it.

And I think that's notable because on the one hand it's going down. This just, you know, highlights that the challenge of defense is getting harder and harder at the same time, because it is usually measured in days or weeks at the moment. That actually gives you a little bit of grace as an offender.

You know, if you are updating your systems to the latest version when your software provider, you know, pushes out those updates. You'll probably be good most of the time, you know, you know, if you're a particularly attractive target to a state actor or you're unlucky, sure. Maybe that's not fast enough, but most of the time it is.

I think a trend that we're in the early stages of seeing, or maybe on the horizon is just that AI really does have the promise for attackers to, to compress this somewhat.

You know, I'm borrowing this from Timothée Chauvin, a researcher. He points to the scenario of—you can imagine something where you have an open-source piece of software, so, you know, software that people use in the sources up there online.

Then some update gets made to it to patch some security vulnerability in the code. And you could imagine a large language model that just sort of monitors everything that goes up on GitHub. This, you know, one of the places that open-source code goes.

And any, every time there's an update, sort of the prompt is, you know—it'd be more sophisticated, but in effect, is this trying to fix any security problem? If so, what is it? If so, how can we exploit this? If so, who is running the software? And then let's have a go at doing that. And so, you know, the softwares, the models, aren't there yet in terms of the sophistication, I think, to do this for all but the most rudimentary or transparent things.

But, you know, these models are getting quicker, faster. And so, you know, I—if I was to think about the coming years, a trend I anticipate seeing is. For, you know, starting not very sophisticated sort of, exploits, of vulnerabilities, but, you know, getting more and more sophisticated over time.

This thing of just—as soon as something is out there in the water, as soon as something is there to be discovered in terms of what can be exploited, just, you know, this happening at mass—and pretty fast, and, you know, that starts to put pressure on defenders in terms of, you know, if you can usually get away with, days to you know, update your systems, then okay, maybe you can do it overnight when not as many people are using it.

Or maybe you can at least, you know, run some checks and not roll it out to everyone at the same time, just to make sure, you know, we're not ending up with a situation like the CrowdStrike issues that, you know, the airlines and others had back in 2024. Where if, you know, roll out some update and mess up the update, and you roll it out to everyone all at once, and that can cause a lot of problems.

So, yeah, I can see this. AI, I think, will exacerbate this trend of the time to exploit things going down and the pressure on defenders to be quick going up.

Kevin Frazier: And so looking also a little bit ahead, if we think about us all becoming more reliant on agentic AI, right now we're living purportedly in the year of the AI agent.

I think all of us are waiting for an AI agent to book our travel and to plan my syllabus and so on and so forth. But soon, having autonomous AI systems that can—excuse me, but soon having autonomous AI systems that can complete tasks on your behalf will become ubiquitous. And all of us, you and me, may have dozens if not hundreds of AI agents acting on our behalf.

How is cyber shaping that space? Because as you noted, the possibility of having some of these bad actor agents that when they engage with your agent, may say, ‘Hey, what are you actually trying to do? Let me take that in a really nefarious direction.’

Is this an emerging field of cybersecurity that you think policymakers should be more attentive to, and in particular, AI developers should be more attentive to?

Caleb Withers: Yeah, I think so. I think—I mean the, and this comes back to the email and phishing discussion we were having earlier, you know, we were mentioning it might be hard enough with all the phishing emails that people are getting. Of course, people are actually quite excited to offload some of the effort of managing the inbox to these AI systems.

And, you know, I think it's not controversial for me to say that these agent systems are showing some, you know, impressive but nascent capabilities, but that these capabilities are probably outpacing their ability to be relied on with any degree of confidence that adver—if these systems are interacting with adversaries, that they're going to be reliable.

And so, as is often the case with the early days of various technological paradigms, there's a lot of opportunities and risks around thinking, how can we actually have some degree of security given this?

And so, you know, to offer some concrete examples here, at the moment, like, I have my laptop. There's a bunch of stuff that, you know, some categories of emails that I'm actually not that worried if, like an adversarial person was trying to deal with. You know, if a adversary wants to modify my grocery shopping list on Instacart, that'd be annoying, but actually probably just a dozen avocados—

Kevin Frazier: Worse things.

Caleb Withers: Yeah. Yeah. But there's probably a tolerable risk. On the other hand, you know, I also have confidential information on my laptop that would be a big deal.

And so of—you know, it's already the case that if I'm, you know, a software developer, I might have sort of virtualized systems for, you know, running less trusted things or mitigating the blowback if there was to be some security threat. But in terms of just like everything on my laptop in general, you know, consumers generally don't do that.

And so, you know, thinking is it going to be the case that in the future we'll have, you know, two somewhat segmented inboxes: the one that deals with stuff that actually isn't that big a deal versus the one that, no, we shouldn't let the AI touch this.

Another example that comes to mind here is that a lot of the agentic computer using AI systems own this sort of model of, you know, get a screenshot of what's on the screen and then be like, click this, coordinate and do this thing.

And you know, that's a great way to have flexibility, right? Because you don't have to say, okay, well here's how you use this piece of software and that piece of software, and here's the commands you can and can't run, right?

You can try and prompt that they can only do certain things, but ultimately, if you have an agent that can use your computer and type anywhere and click anything, right—it can do anything, right—and it might be a little bit more annoying or harder if you, you know, building in these guardrails and what affordances you do and don't have around these computer use systems.

But that's an example of how you can make trade-offs between doing the easy way and investing in the slightly more secure way that might take some more effort and, you know, whether it's users or developers or, you know, government deciding what should we procure and, you know, what might there be a public good case for investing in R&D in this direction?

You know, I think security and reliability, you know, stands out to me more so than just capabilities in general—because you know, people will push the capabilities, you know, forward. We don't have to worry about that as much.

Kevin Frazier: Bad actors gonna be bad. We know, we know that we can make that bumper sticker.

But I love your point, too, of security always involves trade-offs. If you want to have a more secure email, well, then you may not want to make it as available to agentic systems, for example, that can send those sorts of high-end phishing scams. So you may subject whoever sends an email to that inbox to a higher degree of security of you want to know certain information, you want more metadata, so on and so forth in the same way.

You may have that kind of dumb low barrier to entry inbox for all the sort of, oh, new deal at this thing or managing your Amazon purchases. And it's really interesting to think through how we may have more opportunities to segment where we're okay with some degree of exposure and where we're not.

But I wonder how you are seen. Some of the labs start to respond to those sorts of trade-offs, because what we're describing right now is kind of a market opportunity. You can be the provider that tries to offer the most secure, most reliable agentic email checking system and so on and so forth. Is there a market that we're already seeing on the private side of folks trying to be the most cyber-forward AI companies?

Caleb Withers: Yeah, I think so. Some things that come to mind here is—I mean, I don't want to undersell that, sort of, developers are making some effort here in terms of some of these problems I pointed at, you know, for these computer-using agents I'm referencing, or just agents in general.

You know, you'll see when OpenAI, for example, puts out the model card, they'll say, we are running classifiers over every interaction to try and identify, you know, have sort of multiple layers of redundancy. And having not just the model itself, identifying as something going on here that shouldn't, but then another model sort of looking at the model saying, is something going on here that shouldn't?

And to your point about tradeoffs, an interesting one that I saw here was, you know, there's this debate about website owners. Do they want AI models to be able to consume that information?

And one tradeoff I've seen here is that you can say—if you’re a website owner, you can say, I don't want an AI model to read the, you know, text.

Now a tradeoff here is that if the AI model is one of the things it's doing is sort of saying, is this a malicious or dangerous bit of the website that the users, you know, asked for—you know, if it respects that, and, you know, arguably it should I've seen this thing where you'll get a warning, you'll sort of say ‘give me the text from this website.’

And then the model will say, okay, well here's the text from the website, but by the way, consume it at your own risk because we weren't, you know, we respected the not running the model on their website.

So, you know, tradeoffs, I think, abound. And, you know, and I'm mentioning the sort of the agents and email side of things here, but you know, I—we've also seen I think most of the frontier developers and also the general key players in cybersecurity software, you know, they're offering all sorts of interesting things in terms of models that are, you know, fine-tuned to be particularly good for cybersecurity defense use cases and that sort of thing.

Kevin Frazier: So, you know, people are doing stuff here, right? Because clearly if you're going to make an AI tool available, let's say to any provider of mental health services or physical health services or lawyers or other key professions, they're going to be asking these questions of, how resilient is this tool to cyber attacks? And you're going to need to provide some assurances there if you want to get traction in that market.

So beyond the labs themselves, having this clear incentive, are we seeing from a policy standpoint—and let's start domestically here in the U.S., are there states, are there bills pending before Congress that are trying to address this issue?

And then looking more broadly, do we see countries around the world taking action in this regard, or is this one of the back burner AI issues for right now?

Caleb Withers: I mean, in the scheme of things, it feels relatively up there in that I've talked to some, you know, policy makers in the US and the incoming administration, and there's this whole debate about, you know, how much are people too worried about the risks and downsides of AI versus, you know, we just need to let you know, people push innovation forward.

And obviously that's not a—firstly of course, that's not just a binary tradeoff. But you know, with that being said, something I'll often hear is people say, oh, I'm not worried about, you know, most of the things that people are hyping up, but there's probably a—there, in cyber, just recognizing that, as you said right at the start of our conversation, like the stakes of cyber security and cyberattacks are really high.

We are seeing some interesting things that the models can do here. So I think there's a degree of attention here in terms of the, what to do. You know, I think there are some things that make sense to do, but also probably the thing I'm most excited about is just I improving evaluation of what these, you know, models can do.

And so, you know, I'm a big fan of KC here, for example, and, you know, I'm glad that they exist in a, looking at stuff like this and working with our ACS around the world around what can cyber models do. 'cause, you know, I think a year or two ago, they'd—you'd see some sort of research paper and it would say, we asked the model some sort of like multi-choice cyber skill questions.

You know, that's interesting work, worth doing. That doesn't necessarily really inform, you know, provide information around how helpful is this for offenders versus defenders, how much does this compare to what a human can do versus can't? How expensive is it? Versus, you know, having cyber attackers and offenders.

So building up that responsive evaluations and monitoring ecosystem I think is pretty important here, and I think promising in the sense that it can be overstated. Like I would not want to gaze in a crystal ball and take bets on, you know, what exactly the cyber domain is going to look like, you know, five years from now, given AI.

At the same time, there's this thing we see of if a model can do something like some of the time or expensively or, you know, the closed models can do it, but not the open ones, the pretty reliable bet is okay, at some point open—cheap foreign, not that hard to use, you know, relative to ones that might require a bit of you know, knowledge about how to best work with AI models, you do eventually see that diffuse over the coming year or two.

So, you know, you can't see all the way into the future. But I think you can actually see a little bit of the way into the future if you look at what models can do now or how people are trying to use them.

And then, you know, that probably informs or at least gives you a little bit of warning of, oh, hey, there's something that's a little bit different here that might require a little bit of a different policy response than the usual.

Kevin Frazier: Yeah, that's really interesting to put emphasis on those early capabilities that we're seeing from labs that may not get the sort of headline news of, oh, well this model can do this thing 20% of the time. If you're thinking, huh, if it can only do it 20% of the time, who really cares, let's just address other issues.

But to your point, as we see these developments and as we see more and more bad actors get AI-savvy, then whatever's possible 20% of the time today may drastically improve by just the next model release. And so having that sort of aggressive, proactive cybersecurity posture seems really important here from a, ‘are we taking this seriously enough’ perspective?

Would you like to see more of an emphasis on cybersecurity from a policy landscape, or do you think it's the right balance so far?

Caleb Withers: I think there's a challenge here in that the demands on the cybersecurity community be that sort of policy makers or practitioners. It is already pretty high. People are already pretty stretched thin, and so saying you have to divert a bunch of resources to focus on AI, things that might be coming down the pipe, you know, there's tradeoffs there.

I think the things that I would be most excited about is, as I mentioned, is making sure that there really is that strong evaluative function going on both within industry and within government. And also a sort of, and maybe this is a bit of a fuzzy thing to point at, a willingness to move fast and turn on a dime and sort of think about what would be worth doing if we saw certain things.

Because I mean, at the moment, you know, people talk about, you know, should there be some sort of federal regulation of AI systems with regards to these risks. And I think people have maybe pretty reasonably advanced arguments that it might be a bit premature. You might not end up with the exact shape of things that you want.

If you sort of move to too quickly in this regard. At the same time, if the thing you are hoping happens is, okay, if and when we see something that does. Flip the game board of what cybersecurity looks like, given where AI systems have gone, then we'll take, you know, five years to legislative response, you know, that's not going to work.

So, I think it's a matter of just paying attention and thinking about what might be justified. You know, people talk about, in policymaking this sort of, you know, the piece of paper in the drawer ready for if and when things have been great.

So I think some of that sort of thinking is important.

Kevin Frazier: I love this answer for so many reasons, not only because it aligns a lot with what I heard from the three economists we had on the podcast a few episodes ago, basically saying, look, we just need scenario planning. There's not enough information out there right now to know what is the definitive hard-law, statutory approach that we want to enshrine for the next decade to mitigate against some of these AI cybersecurity concerns. But why not start to iterate on, okay, if we see these capabilities at this point, here's how we would like to respond?

And that just sort of playbook of possible scenarios isn't our usual approach to policymaking. It requires a degree of creativity and flexibility. Like you said, that isn't the sort of stuff you commonly associate with DC or state capitals.

So if you were to wake up tomorrow and you had a magical wand of either a policy solution or you could just host a convening or host a round table. Who would you want in the room? What would you want the agenda to be? What would that sort of immediate next step be based off of all of your impressive research on this topic?

Caleb Withers: I think the thing that comes to mind here is—and it's a bit of a cliched answer—but just getting people from the different communities, be it the AI community more so, or the cybersecurity community more so, and then also the government sort of side of things together to, to, as you sort of say, think about what are the sort of scenarios that are worth paying attention to.

And I think there's two important questions when it comes to like, looking at evaluations of AI models and whatnot. You know, one is, ‘what do the evaluations, you know, what should we be evaluating for?’ But then also, ‘is anyone going to actually listen or find those evaluations compelling?’

And so if I sat here and said, well, you know, and I have some things I point to in the report where I'm like, I think if AI models could do this, it would be a really big deal, but then if, you know, policymakers and the cybersecurity community and other experts are well, I don't actually think it would be a big deal if AI could do this or then, you know, before we even set up the evaluations, you know, ecosystem to find these things, we want to make sure that these things are actually decision-relevant and compelling, and sort of tease out some of the assumptions that people might have of, oh, you know, actually there should be some federal regulation of AI models that can do this thing or not.

So yeah, a bit of a cliche answer, but, you know, convening those sorts of folks seems good.

Kevin Frazier: I also think that’s fascinating to think about the, perhaps over-proliferation of evals in response to this of, oh, we need more information. So then everybody says, oh, I've got a test for this cyber vulnerability, oh, I've got a different test for this other cyber vulnerability.

If you start to get that mentality of ‘if an eval lands in a forest, does anyone hear it,’ right, does anyone even care what that information is? But how do you get consensus around those core factors of when we have crossed a dangerous point is obviously something that I think everyone would want to agree if we want to avoid a world in which cyber crime costs $10 trillion per year.

Caleb Withers: Especially because the pool of people who are, you know, really good at doing evals and thinking through this thing, it's actually a pretty lucrative skillset because if you can set up an evaluation for, is an AI model good at this thing that actually matters, that would actually have some economic or strategic value if it could do this.

That rhymes a lot with, ‘what would it look like to set up a good training pipeline for models to do that?’ And if anyone's been paying attention to sort of the, you know, salaries and trends about how much value is placed on people that can nicely operationalize, what would it mean for a model to be good at this thing that matters?

You know, there's a lot of lucrative opportunities there, so yeah. You know, without going all the way to the, oh, we need to sort of have some like central planning of, you know, what all the evals community is doing, you know, there's—it's a scarce skillset and bunch of people, and so thinking thoughtfully about what are the things that are worth taking some big bets on and spending on is worthwhile, I think.

Kevin Frazier: So given that you could go start a consultancy and charge a much higher hourly rate than folks probably want to imagine, I know your time's valuable, so I don't want to steal too much more of it, but doing this research in exhaustive report—and I really encourage listeners to, to give it a read—what was the biggest misconception you flagged?

If there was any vibe you want to call out or any just general sort of misinformation, right? Not necessarily intended, but just something that you hear repeated nowadays that as a result of your research kind of makes you cringe. Like, oh no, that's just so inaccurate, I wish people would stop saying that.

Did anything like that emerge from your research?

Caleb Withers: Yeah. I would say the one thing that stood out is, you know, it––there's this big ongoing debate and people like to point to, you know, is AI hitting a wall? Was GPT-5 a flop, or whatnot?

And you know, there's obviously, there's something to these discussions. And there's always the question of even if we've seen strong AI progress to date, to what extent can this be expected to continue into the future?

With this being said, for the cyber benchmarks I looked at that see most compelling, they have been reliably going up, you know, over recent years. If I was to think of, you know, ways that I could continue to see them going up and months and years to come, I think there's a lot of opportunity still there for them to keep doing so, so.

Yeah, I mean, I'm not sure whether this is a contrarian or un-contrarian take, depending on who you are talking to, but yeah, I think that AI models are getting better at cyber stuff pretty rapidly, and I think this will continue for at least a few months to years.

Kevin Frazier: Yeah. I think the general sense of ‘don't bet against AI right now’ is a pretty good gamble on most fronts in terms of capabilities.

I have a lot of tired lines, but one that I definitely repeat frequently is always remembering that today's AI is the worst AI you'll ever use. And I think also acknowledging the fact that we're seeing the development and fine tuning of models democratize across the globe.

So there are certain models, certain capabilities we may never learn about through these more centralized evals, for example. And so that to me is something else that would keep me up at night as a cybersecurity scholar.

But I want to give you the final word here. Any other key takeaways that you want listeners to know from your report?

Caleb Withers: No. I would just double down on that paying attention to what AI models can do in the cyber domain and doing so in a thoughtful way I think is pretty important.

Kevin Frazier: All right. Well Caleb, we'll have to leave it there. Thanks so much for coming on.

Caleb Withers: Thank you.

Kevin Frazier: Scaling Laws is a joint production of Lawfare and the University of Texas School of Law. You can get an ad-free version of this and other Lawfare podcasts by becoming a material subscriber at our website, lawfaremedia.org/support. You'll also get access to special events and other content available only to our supporters.

Please rate and review us wherever you get your podcasts. Check out our written work at lawfaremedia.org. You can also follow us on X and Bluesky.

This podcast was edited by Noam Osband of Goat Rodeo. Our music is from ALIBI. As always, thanks for listening.


Kevin Frazier is a Senior Fellow at the Abundance Institute, Director of the AI Innovation and Law Program at the University of Texas School of Law, a Senior Editor at Lawfare, and a Adjunct Research Fellow at the Cato Institute.
Caleb Withers is a research associate for the Technology and National Security Program at the Center for a New American Security.
}

Subscribe to Lawfare