Is Anthropic’s New A.I. Really That Scary? It Depends Whom You Ask.

philb2 · May 14, 2026

https://www.nytimes.com/2026/05/12/...9870&user_id=c383821527c441214d07ce6e4a6ba12a

Only a handful of the groups or companies that have spent time using Mythos would discuss it with The New York Times. But companies and researchers that did not have access were happy to offer their thoughts on the way that Anthropic released its new A.I.

Their feedback so far has ranged from serious concern to a shrug. It could be some time before the broader tech community concludes whether Anthropic was right to limit the Mythos release — a challenge that Anthropic executives acknowledge.

For Pavel Gurvich, co-founder and chief executive of the security company Tenzai, part of the problem is that independent cybersecurity experts are unable to test the system and gain a complete understanding of its strengths and weaknesses. That understanding can help them defend against attacks from the technology.

Fifliffl · May 15, 2026

I can see his point , but more from concern about who gets picked to protect us from the big scary AI. If the only ones that can protect us are the megacorps given access its starting to feel alot more like Cyberpunk future, cause no other company could compete without access.

Kardonxt · May 15, 2026

My understanding is that Mythos does a much better job at creating working attack chains (without human involvement) than prior iterations.

A big part of the current effectiveness are the free compute resources being provided. Some of these exploits would have cost $10,000+ in tokens according to the data they released. There's a very high likelihood that other models would have identified the same vulnerabilities given enough resources.

Skyblue · May 15, 2026

I wouldn’t trust the New York Times to tell me what the weather was yesterday.

MikeTrike · May 15, 2026

I'll start worrying when AI advances from its LLM phase.

LukeTbk · May 15, 2026

would have to find the archives version, but that independent cybersecurity experts are unable to test the system and gain a complete understanding of its strengths and weaknesses.
Depends what he mean, who independant in that space, private security expert are participant and state public one, like the UK one: https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities

LukeTbk · May 15, 2026

MikeTrike said:
I'll start worrying when AI advances from its LLM phase.

multimodal VLM ?
https://en.wikipedia.org/wiki/Vision-language_model, language is just one of the way those are used

GotNoRice · May 15, 2026

This really seems like a series of carefully crafted PR decisions / stunts rather than anything having to do with their AI actually being better than others. Remember, people always want what they can't have. The fake feud with the US Military, again, just made it seem like the Military was desperate for their AI and thus made everyone else want it also. Carefully limiting which companies get to test the new models also increases it's artificial / perceived value. And as a bonus, less actual testers means less people writing bad articles about all of the slop and hallucinations, and the ones who do get to test will most likely be afraid to say anything negative because they don't want to get cut off.

sharknice · May 15, 2026

Kardonxt said:
My understanding is that Mythos does a much better job at creating working attack chains (without human involvement) than prior iterations.

A big part of the current effectiveness are the free compute resources being provided. Some of these exploits would have cost $10,000+ in tokens according to the data they released. There's a very high likelihood that other models would have identified the same vulnerabilities given enough resources.

Yes, that is pretty much it. Most of the hype around it is just marketing. "It's too dangerous to release to the public, only select organizations." They do that with every major release and eventually everyone gets access.

The chained exploits it finds probably already show up in security scans and it's just more capable of linking them together. Which means if the bad guys get this they can more easily automate attacks.
A big part of software security is having multiple layers which limits the amount of damage that can be done and gives you more time to react and fix the problem.

LukeTbk · May 15, 2026

GotNoRice said:
This really seems like a series of carefully crafted PR decisions / stunts rather than anything having to do with their AI actually being better than others

Rumors in that space, is a mix of that with a way to not put the focus on their compute starving, it is an heavy to run model, it was maybe not even a realistic option to mainstream its access a couple of months ago.

The fact that the giant window can find issue by chaining/timing event on multiple vector at the same time is not a small deal, somme of the issues GPT 5.5 is finding were possible with 5.4 when they test back too it is true. But able to link from giant context would not be a small factor security wise.

MikeTrike · May 15, 2026

LukeTbk said:
multimodal VLM ?
https://en.wikipedia.org/wiki/Vision-language_model, language is just one of the way those are used

LLM + Vision?

LukeTbk · May 15, 2026

MikeTrike said:
LLM + Vision?

with the output being control if you have an VLA with it (vision language action), some very similar to LLM ai "infer-extrapolate-interporlate-reflect/talk/output" the 4 letter of dna sequence/genome instead of an human language, language is just a specific way to use them.

you can get video stream in and output joint vector and torques instead of english in and english out as your token, some output protein in amino acid 3d construction or like dlss an image, what you train them on and what they output is quite vast they do not seem to care much, transformer model are quite adaptive and agnostic to the nature of the data it seem.

MikeTrike · May 15, 2026

LukeTbk said:
with the output being control if you have an VLA with it (vision language action), some very similar to LLM ai "infer-extrapolate-interporlate-reflect/talk/output" the 4 letter of dna sequence/genome instead of an human language, language is just a specific way to use them.

you can get video stream in and output joint vector and torques instead of english in and english out as your token, some output protein in amino acid 3d construction or like dlss an image, what you train them on and what they output is quite vast they do not seem to care much, transformer model are quite adaptive and agnostic to the nature of the data it seem.

My understanding is limited, but I understand it as there are a technical limits into how far you can push an LLM because the overhead of how it works, and that to me, indicates that a VLM with vastly more overhead would simply hit the limits faster.

I guess what I'm watching for is a generational leap beyond what we do with LLMs. Something not LLM-based, that I'm not aware of or doesn't exist yet. Throwing infinite compute at the problem is not a solution long-term, near term it buys time at best

That's not to say I think LLMs are useless, they have their purpose, I just don't think they're going to be the actual "real AI thing" people are after, at least not in the way all the big tech companies are marketing them as.

Shoganai · May 15, 2026

Bro ... Mythos is just a Linux-based parallel docker that's fed source code to find bugs. There is nothing scary about it except to those that have no idea what it is. It hammers the entire codebase of the code source its fed thousands of times until it finds something. It's not going to "escape" and ruin the world or something. It can't. That's literally not what it is or how it functions. The government freaking out about it is also retarded.

LukeTbk · May 15, 2026

it is not about mythos escaping, it is about people using a model of similar capability (or mythos itself) to find security flay and exploit them faster than the people fixing them that people fear.

Shoganai · May 15, 2026

LukeTbk said:
it is not about mythos escaping, it is about people using a model of similar capability (or mythos itself) to find security flay and exploit them faster than the people fixing them that people fear.

Mythos can be replicated for pennies. Other AI companies have already done it. I'm working on something similar myself right now.

LukeTbk · May 15, 2026

I really doubt an model that big and that capable can be made for pennies or that other companies have done it (not that OpenAI is that far, but the other are probably 2-3 months away)...

Anthropic has a 44 billions ARR without mythos available, if you can do it.... why not make 100 billions with it (at the cost of pennies up front) ?

Sycraft · May 15, 2026

LukeTbk said:
I really doubt an model that big and that capable can be made for pennies or that other companies have done it (not that OpenAI is that far, but the other are probably 2-3 months away)...

Anthropic has a 44 billions ARR without mythos available, if you can do it.... why not make 100 billions with it (at the cost of pennies up front) ?

No kidding. Sounds like a really silly outlandish brag to me that is not at all realistic. I mean let's say Mythos is only about as big as the largest public model (Deepseek) and not many times bigger as it really is. You are still talking a 5-6 figure system just to even RUN something of that size, much less CREATE it. Deepseek is 1.6 trillion parameters. Even a 2-bit quantization is 562GB. So to even run it slowly, with much degraded capability (2-bit has a lot of loss) you'd need a system with over 512GB of RAM and it would chug because of all the offloading going on. To really run it fast and use a higher precision model, you'd need a server with 8xH200 cards in it so that all or at least almost all of it could be GPU resident. Not getting one of those for pennies, never mind the cost of running one.

And that's Deepseek. It is likely that Mythos is a good bit larger estimates put Opus somewhere in the 1-5T range and Mythos would almost certainly be larger.

Shoganai · May 16, 2026

LukeTbk said:
I really doubt an model that big and that capable can be made for pennies or that other companies have done it (not that OpenAI is that far, but the other are probably 2-3 months away)...

Anthropic has a 44 billions ARR without mythos available, if you can do it.... why not make 100 billions with it (at the cost of pennies up front) ?

It's already been done. I'm not gaslighting you. They aren't big companies. They proved that the orchestration, not the model, is what allows it to find bugs. I haven't done it, I'm just seeing if I can replicate what the others have done for astronomically less money. If you don't even want to bother doing basic research on this topic, then this is already an exhausting conversation.

Sycraft · May 16, 2026

Shoganai said:
It's already been done. I'm not gaslighting you. They aren't big companies. They proved that the orchestration, not the model, is what allows it to find bugs. I haven't done it, I'm just seeing if I can replicate what the others have done for astronomically less money. If you don't even want to bother doing basic research on this topic, then this is already an exhausting conversation.

To me that sounds like "I made up something, so now I'm going to pretend like I didn't and say you need to 'do your own research' as a deflection."

Shoganai · May 16, 2026

Sycraft said:
To me that sounds like "I made up something, so now I'm going to pretend like I didn't and say you need to 'do your own research' as a deflection."

You're very annoying. Here you go:

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
https://xbow.com/blog/mythos-offensive-security-xbow-evaluation

Sycraft · May 16, 2026

Shoganai said:
You're very annoying. Here you go:

I'm annoying? You are the one making rather outlandish claims but ok let's have a look here. Neither source is what I'd call unbiased but we'll take them at their word. Xbow says it is a major advancement and a significant step up over all existing models. Of course they say that it isn't amazing on it's own, it needs their magic sauce to manage it for it to work well.

The Asile one is where I'm guessing you are getting your "pennies" figure from because you are looking at their figure to rent a cheap model. That is misleading in all respects. Not in that is actually costs that little, but in that when someone is talking about the cost of replicating a model they aren't talking about the cost of running a million tokens on it, they are talking about the entire training and development. Your statement was either poorly phrased or deliberately misleading.

Also the tests Asile did in no way say that GPT-OSS-20b, which is I'm guessing the one they are talking about being that cheap, matches Mythos (which they didn't get to test) or even the other models generally which is what your statement implied. They say that it was able to find CVE-2026-4747. Ok, great... but if indeed it was so easy for all the models to find, why was it not found by them or someone else before? Also if you dig in to their prompt it was VERY narrow. They gave the models the specific function that has the vulnerability (about 20 lines) and asks the models if there's a security vulnerability.

Ok, great, but first that is a way different task than searching through large amounts of code and locating a problem, which is what is needed in the real world. This is analyzing one function that you know has an issue. Second, and this is the big one, it makes the models prone to think there is a vulnerability and say there is. You can see that in the false positive section. GPT-OSS-120b correctly called it safe, but most, including the cheap GPT-OSS-20b, said it was still vulnerable.

Now I'm not trying to glaze Mythos, I believe that the hype is greater than the reality on the simple fact that if it really were uncovering tons of massive problems, we'd see some massive and important patch Tuesdays going on since MS is one of the companies that has it. We haven't, that implies it really isn't just ripping open tons of holes we've never seen before.

However your statement seems to deliberately make it sound like you think that someone can "replicate Mythos for pennies", not "use a small model, that you can rent for pennies, to find one of the bugs. "replicate" would imply "does everything it does" which would imply a massive model that would be neither pennies to create nor run. That aside, even those small models were not pennies to create.

Basically I'd say your statement is just like Anthropic's marketing on Mythos: Maybe not an outright fabrication, but extremely misleading, trying to push a narrative, and based on what is likely biased data.

Shoganai · May 16, 2026

Sycraft said:
I'm annoying? You are the one making rather outlandish claims but ok let's have a look here. Neither source is what I'd call unbiased but we'll take them at their word. Xbow says it is a major advancement and a significant step up over all existing models. Of course they say that it isn't amazing on it's own, it needs their magic sauce to manage it for it to work well.

The Asile one is where I'm guessing you are getting your "pennies" figure from because you are looking at their figure to rent a cheap model. That is misleading in all respects. Not in that is actually costs that little, but in that when someone is talking about the cost of replicating a model they aren't talking about the cost of running a million tokens on it, they are talking about the entire training and development. Your statement was either poorly phrased or deliberately misleading.

Also the tests Asile did in no way say that GPT-OSS-20b, which is I'm guessing the one they are talking about being that cheap, matches Mythos (which they didn't get to test) or even the other models generally which is what your statement implied. They say that it was able to find CVE-2026-4747. Ok, great... but if indeed it was so easy for all the models to find, why was it not found by them or someone else before? Also if you dig in to their prompt it was VERY narrow. They gave the models the specific function that has the vulnerability (about 20 lines) and asks the models if there's a security vulnerability.

Ok, great, but first that is a way different task than searching through large amounts of code and locating a problem, which is what is needed in the real world. This is analyzing one function that you know has an issue. Second, and this is the big one, it makes the models prone to think there is a vulnerability and say there is. You can see that in the false positive section. GPT-OSS-120b correctly called it safe, but most, including the cheap GPT-OSS-20b, said it was still vulnerable.

Now I'm not trying to glaze Mythos, I believe that the hype is greater than the reality on the simple fact that if it really were uncovering tons of massive problems, we'd see some massive and important patch Tuesdays going on since MS is one of the companies that has it. We haven't, that implies it really isn't just ripping open tons of holes we've never seen before.

However your statement seems to deliberately make it sound like you think that someone can "replicate Mythos for pennies", not "use a small model, that you can rent for pennies, to find one of the bugs. "replicate" would imply "does everything it does" which would imply a massive model that would be neither pennies to create nor run. That aside, even those small models were not pennies to create.

Basically I'd say your statement is just like Anthropic's marketing on Mythos: Maybe not an outright fabrication, but extremely misleading, trying to push a narrative, and based on what is likely biased data.

Bro ... the article bodies each of your points openly. Your arguments are wrong because:

They gave the models the specific function

That's literally how Mythos works too. The technical post describes: "launch a container, prompt the model to scan files, let it hypothesize and test." The model doesn't start from a blank terminal and guess where bugs are ... the scaffold finds the function, the model analyzes it. Same pipeline, same scope.

GPT-OSS-20b correctly said safe

You inverted the result. Read the table: GPT-OSS-20b false-positived 3/3 on patched code. GPT-OSS-120b (5.1B active) is the one that got it right both ways. The point stands.

Replicate means do everything Mythos does

Nobody claims a $0.11/M token model writes multi-round kernel exploits autonomously. The claim ... stated explicitly ... is that discovery-grade capability is broadly accessible and the bottleneck is the system wrapped around the model. The very thing you're defending Mythos for is the pipeline you're pretending doesn't matter.

Why wasn't it found before?

A 17-year-old bug is evidence that nobody was looking at that function with a systematic AI scanner ... not evidence that finding it requires frontier intelligence. Both Mythos and GPT-OSS-20b found it when pointed at the function. The model isn't the rate limiter. The source code is public ... anyone can test. I reproduced the same result with a janky version of the same pipeline for two cents. The model is not the bottleneck.

Sycraft · May 16, 2026

And yet all of that (in addition to misunderstanding the articles) ignores the fact that this all started because you said "Mythos can be replicated for pennies." Not "Other smaller LLMs can find security vulnerabilities as well." or "I think Anthropic is overstating Mythos' security capabilities." No, you said, explicitly "Mythos can be replicated for pennies." You've then proceeded to try and argue in tangents about how that is right, when it is clearly a BS statement and people are clearly going to take it to mean "You can make your own model that does what Mythos does for pennies."

Shoganai · May 16, 2026

Sycraft said:
And yet all of that (in addition to misunderstanding the articles) ignores the fact that this all started because you said "Mythos can be replicated for pennies." Not "Other smaller LLMs can find security vulnerabilities as well." or "I think Anthropic is overstating Mythos' security capabilities." No, you said, explicitly "Mythos can be replicated for pennies." You've then proceeded to try and argue in tangents about how that is right, when it is clearly a BS statement and people are clearly going to take it to mean "You can make your own model that does what Mythos does for pennies."

Right. You seem so laser focused on "replicated for pennies" that you're ignoring everything else in front of you. The claim I am defending is that a 3.6B-parameter open model ... at an inference cost of less than one dollar ... detected the same 17-year-old FreeBSD RCE that Mythos detected, on the same code, in the same prompt format. The AISLE data proves this. Do you accept that specific finding or reject it? I'm not actually sure what point you're trying to make. The studies ... and the demonstration show ... that the orchestration of the model or models is more important than the model itself.

tunatime · May 16, 2026

MikeTrike said:
I'll start worrying when AI advances from its LLM phase.

Who's to say it hasn't? Think about this....the tech they are letting us play around with is probably stuff they had 10+ years ago. Look at the UFO stuff. Or the cold war planes like the f117 that was flying for 20+ years before they told us about it.....do you really think that in some corporate or government lab they don't have something that makes the llms we get to play around with look like a old Windows 3.1 box?

kram182 · May 16, 2026

tunatime said:
Who's to say it hasn't? Think about this....the tech they are letting us play around with is probably stuff they had 10+ years ago. Look at the UFO stuff. Or the cold war planes like the f117 that was flying for 20+ years before they told us about it.....do you really think that in some corporate or government lab they don't have something that makes the llms we get to play around with look like a old Windows 3.1 box?

For example:

https://en.wikipedia.org/wiki/Sentient_(intelligence_analysis_system)

Development and core buildout occurred from 2010 to 2016 under the NRO's Advanced Systems and Technology Directorate.

MikeTrike · May 16, 2026

tunatime said:
Who's to say it hasn't? Think about this....the tech they are letting us play around with is probably stuff they had 10+ years ago. Look at the UFO stuff. Or the cold war planes like the f117 that was flying for 20+ years before they told us about it.....do you really think that in some corporate or government lab they don't have something that makes the llms we get to play around with look like a old Windows 3.1 box?

I still don't care. What am I gonna do about it?

My concern is more in this direction from the general population:

Old folks are already being abused by existing simple methods and current AI tech. More advanced tech well just make it worse...

Shoganai · May 16, 2026

MikeTrike said:
I still don't care. What am I gonna do about it?

My concern is more in this direction from the general population:
View attachment 803662

Old folks are already being abused by existing simple methods and current AI tech. More advanced tech well just make it worse...

I deal with this crap constantly with my older clients. I finally got to talk to and mess with one in real time. I normally have to deal with the aftermath. My client was talking to this a-hole Indian scammer and she told him "this doesn't feel right, let me call my IT guy" and of course he said "no don't do that!" When I got there I saw he had control of her computer. I messed with him for a good 20 minutes thinking he was getting somewhere on the computer until I finally opened Notepad and typed "GO F@#K YOURSELF" on the screen and purged him from the system. I didn't even bother trying to figure out what he did to her computer, I just backed up everything and did a fresh install of Windows.

tunatime · May 16, 2026

MikeTrike said:
I still don't care. What am I gonna do about it?

My concern is more in this direction from the general population:
View attachment 803662

Old folks are already being abused by existing simple methods and current AI tech. More advanced tech well just make it worse...

I didn't even think about that. When we are old we going to have to deal with ai voices of our loved one calling to say they are in jail ECT and need money

Shoganai · May 16, 2026

tunatime said:
I didn't even think about that. When we are old we going to have to deal with ai voices of our loved one calling to say they are in jail ECT and need money

That's already happening right now.

sc5mu93 · May 16, 2026

Anthropic: " mythos is so powerful it can't be contained."
Also anthropic: "we're containing it"

jbltecnicspro · May 16, 2026

sc5mu93 said:
Anthropic: " mythos is so powerful it can't be contained."
Also anthropic: "we're containing it"

Anthropic: "AI is coming for your job. It will make you irrelevant"
Also Anthropic: "Join our team! We're hiring!"

uOpt · May 17, 2026

I also have some ideas how to tune my local LLMs into something more specialized for finding security bugs. But it would be really hard work and I lack the resources, too.

LukeTbk · May 17, 2026

Shoganai said:
You're very annoying. Here you go:

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
https://xbow.com/blog/mythos-offensive-security-xbow-evaluation

the difference between doing this:
But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug.

And what mythos can do, is quite large. Doing this:
We isolated the vulnerable svc_rpc_gss_validate function, provided architectural context (that it handles network-parsed RPC credentials, that oa_length comes from the packet), and asked eight models to assess it for security vulnerabilities.

Versus finding it from stratch are 2 very different use case and capability, I am sure that if you point problem with small amount of code to very basic LLM they will agree with you. The interesting test would have been to ask it to find vulnabilities in the giant source code as a whole with zero pointer and see if it refind it... now you need large context, infracstructure (and the ability to scale on it) and so on.

And the finding most issues is well, but the ability to find one more than everyone else is a big deal in the space, very big one, yes small cheaper to do model can do a large amount and it will not scale in a linear fashion, but the ability to be the best by any amount is worth a lot and scary for those who do not have access to it.

Is Anthropic’s New A.I. Really That Scary? It Depends Whom You Ask.

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Fully [H]

[H]F Junkie

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Fully [H]

[H]F Junkie

Fully [H]

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Well...OK

[H]ard|Gawd

Fully [H]

Supreme [H]ardness

Well...OK

Supreme [H]ardness

[H]ard|Gawd

[H]F Junkie

2[H]4U

[H]F Junkie