Rendered at 08:32:12 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
iamflimflam1 48 minutes ago [-]
From reading the article. They offered their developers both Claude code and Copilot.
What they wanted was for them to use both and feedback which was better.
The developers voted with their feet and didn’t use Copilot.
What Microsoft were hoping was that the opposite would happen...
verst 36 minutes ago [-]
Most of us never had the option for work to pay for Claude Code -- some internal orgs did this. That being said I had a personal Claude Code subscription for a bit.
Honestly I find GitHub Copilot CLI (and now also the new GitHub Copilot app) quite decent. I mostly use it with Opus 4.7, or rarely with GPT-5.5. The VSCode extension is ok, but CLI or app are the better experience IMO.
cfunderburg 43 minutes ago [-]
I wish I could understand the appeal of using Claude Code inside VScode rather than Copilot. I feel like I'm missing something obvious.
rplnt 21 minutes ago [-]
Slightly related (me not understanding) is why the Copilot in VS code is essentially just CLI interface. Why can't it use the IDE tools (search, LSP, ...). All it ever does is trying to execute grep.
mattmanser 11 minutes ago [-]
Someone mentioned here the other day that when you try and give Claude those tools throughan MCP or skill it tends to go a bit loopy.
At the moment it seems like the way it's been trained has been tightly coupled with grep.
It does feel bizarre though that it doesn't use the symbol servers.
skywhopper 12 minutes ago [-]
Because it’s far far easier to make a text-generation machine generate text that has decades of how-to explanations on the Internet than to correctly work an internal editor API that changes often and isn’t as well-documented.
Especially if you want effective results.
gbro3n 21 minutes ago [-]
Same, with regard to TUIs in general. The VS code copilot chat extension has really nice integration for 'human in the loop' style agentic development. I build some tooling - https://www.agentkanban.io to integrate a taskboard and git worktrees with copilot chat
tags2k 19 minutes ago [-]
I'm with you there. I can't stand the CLI that wants to take you away from the mostly bad code it writes. Give me the structure, let me finesse it - to do that I need to actually see it no matter how much Anthropic pretends that it's perfect.
stanac 32 minutes ago [-]
I think they were comparing CLIs, not VS extensions.
mattmanser 17 minutes ago [-]
I'm a little the opposite, what's the point of using an IDE with AI? I genuinely don't get it?
These days I just use Claude Code Desktop or Claude Code in powershell. Standalone, not inside and IDE. Honestly, I'm using Desktop more and more as it gets more features.
The IDE is for me. No AI in it at all. If I want to get Claude to do something specific to a file I just @ the file.
darig 31 minutes ago [-]
[dead]
zabil 12 minutes ago [-]
I switched from Claude code to the GitHub copilot app recently. Since our repositories are hosted on GitHub I find the copilot app better integrated for the PR workflow with PR management available in the app. I don’t think I miss any of the features of Claude code I never thought I would make the switch but copilot upped the game.
Also it became very hard to convince management to keep both Claude code and GitHub Copilot enterprise licenses.
tra3 15 hours ago [-]
There's definitely a way to use Claude code that is token conscious.
I've tried throwing unsupervised agentic software factory workflows against the wall, and they burned through my tokens like nobody's business but didn't produce much.
Supervised, human-in-the-loop process on the other hand is much more productive but doesn't consume nearly as much. Maybe that's why everyone's pushing agentic approaches so much.
jstummbillig 45 minutes ago [-]
I think it's great. People at a broad scale are getting first hand experience with resource management. It's a fairly cheap way of doing it too (in contrast to: learning this by managing humans) and we can all benefit from the skill transfer.
CoolestBeans 15 hours ago [-]
The current thinking is automated agents is what turns this from an industry in the tens of billions to a multi trillion dollar one. So yes you are right on the money, agents stimulate demand for this thing they've built.
dualvariable 9 hours ago [-]
"The bureaucracy is expanding to meet the needs of the expanding bureaucracy"
beardyw 2 hours ago [-]
I didn't know that one. Loosely said to be Oscar Wilde.
gmerc 2 hours ago [-]
Delivered in the voice of Lenard Lemoy to millions of GenX during their formative years.
hannofcart 46 minutes ago [-]
Is that a civ 4 reference? You sir, have my upvote.
joe_mamba 13 hours ago [-]
[flagged]
kridsdale1 12 hours ago [-]
There is always a quantity of lubricant that can get any machine moving. Just add so much that you create an all consuming river of lube and watch your thing sail away.
ndesaulniers 12 hours ago [-]
Good then that Amazon sells it by the 55 gal drum then.
At the enterprise level though, its going to be hard to want to use a service in which costs are not predictable, and keeping those costs under control requires employee training.
jochem9 3 hours ago [-]
You can put a limit on token spend and provide training (and even pre-configured workflows) on how to limit token spend.
Like the other commenter said: cloud spend can also spin out of control if you don't pay attention, yet we've found ways to keep it under control (training, guardrails, limits, transparancy).
mrgoldenbrown 7 hours ago [-]
>...use a service in which costs are not predictable, and keeping those costs under control requires employee training.
Isn't this a (mildly exaggerated) description of AWS, which is a very successful service?
noodletheworld 1 hours ago [-]
Mmm… but for AWS its pay for external use right?
So your costs scale with the number of users you have.
Thats an op ex that you can explain.
For tokens for developers its maybe closer, cost/outcome wise, to hiring an external consulting company to write your code; money paid scales with work done, no promise of delivery, arbitrary unpredictable external price changes.
Its not quite the same; though, similarly lucrative for consultants.
sidewndr46 12 hours ago [-]
Am I losing my mind, aren't there multiple headlines each day about companies penalizing employees for not using AI enough?
iSnow 12 hours ago [-]
That was roughly 3 weeks ago, with the reprising of Claude 4.7 and GPT 5.5, things have become more spicy.
sidewndr46 11 hours ago [-]
use AI, don't use AI, this whole thing is getting really hard to follow
andrekandre 8 hours ago [-]
i've worked at so many places where the propaganda/marketing and reality on the ground is so disorienting/shocking i don't really expect this to be any different...
basch 11 hours ago [-]
since those headlines started ive felt it just encouraged inefficiency. "say as much as you can without saying anything." if you were accomplishing your task the need for more would end, thus there is incentive to never succeed.
layer8 13 hours ago [-]
To be fair, the cost of software development has always been fairly unpredictable. What may be different is that the cost used to be roughly proportional to man-hours spent, while now the number of agents running in parallel may be less predictable.
ilovecake1984 12 hours ago [-]
The cost per month is 100% known and always has been. What has been variable is the rate of delivery. AI is different and can be substantial in countries with lower wages.
xienze 12 hours ago [-]
> To be fair, the cost of software development has always been fairly unpredictable.
Yes, but in a "oops this is gonna take another two months to finish" kind of way, not the "oops this is the 12th time this month 8 developers have burned $2K in tokens in a single day and no one really knows how it happened" kind of way.
kridsdale1 12 hours ago [-]
We’re all being given belt-loaded machine guns and tossed on to Planet K. We used to pay for the salaries of soldiers, now we have an Ammo
Budget.
dgellow 2 hours ago [-]
A belt loaded spinwheel machine gun, where there are some chances the next bullet is a dummy round, or goes in the wrong direction. And everytime you reload a new soldier is in charge of the gun
salawat 14 hours ago [-]
There's no fucking training to mitigate a slot machine.
dgellow 2 hours ago [-]
Games like Diablo are basically a whole bunch of slot machines, and there are strategies you can follow to optimize your run.
gambiting 1 hours ago [-]
Yes, because in video games there is always a chance to win so you can optimize your strategy around that chance. If you have a 1% chance to drop a legendary weapon, the question becomes how do I manufacture 100 chances for a weapon drop in the shortest possible time. With agentic coding there is no such guaranteed chance - in a way it's worse than a slot machine that is guaranteed to pay out eventually. You could spend hundreds of millions of tokens and still not get what you asked for.
KronisLV 12 hours ago [-]
> There's definitely a way to use Claude code that is token conscious.
Colleague used Sonnet 4.6 on some pretty normal agentic coding tasks through AWS Bedrock to keep the data in the EU, 100 EUR usage in a single day. In comparison, the Mistral subscription costs about 20 EUR per month and we tested that for similar tasks it was okay, the usage got to around 10% of that monthly limit in a single day. Or Anthropic's own Max (5x) plan where you get way, way more tokens to do with as you please.
I feel like the sweet spot is having a monthly subscription with any of the providers (you're subsidized a bunch), but if you have to pay per tokens, now I'd just look in the direction of what tasks DeepSeek would be okay for, sadly probably not in the situation above. For a startup, though...
On the other hand, this feels a bit hypocritical:
> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time, and sources tell me that Claude Code has proved very popular inside Microsoft over the past six months.
They're gonna say that the future is all AI... until they get the bill.
michaelbuckbee 11 hours ago [-]
I was trying to get a better sense of the time cost quality matrix of these, so I threw together a quick eval of Sonnet 4.6, Mistral's dev model, and Opus 4.7 (figuring it's what you'd use if you were on Max).
The results for a function implementation and test of levenshtein distance in js are pretty similar but Mistral is 30x cheaper than Opus 4.7 and 4x faster than Sonnet 4.6.
Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.
My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.
There's where even frontier models struggle, which makes comparisons meaningful.
CraigJPerry 5 minutes ago [-]
>> many small decisions
It’s making guesses not decisions, framing as decisions will lead you astray to wasted time and tokens.
It’s vaguely productive to tell them a ton of relevant info upfront attempting to minimise their need for load bearing guesses. I say vaguely because obedience is generally only around the level where it's good enough to lull you into a false sense of security, not to actually be obedient.
It’s a bit more productive to use the various loop mechanisms (hooks, /goal etc) to evaluate each end of turn against guard rails and reject with clear instruction on whats unacceptable. Obviously if you only do this without the front load of info then you’re likely to spend more tokens to reach a satisfactory end of iteration.
KronisLV 10 hours ago [-]
The one detail I did forget to mention is that if anyone goes with the Mistral subscription (instead of paying per-token), then the Mistral Vibe tool gives you their Medium 3.5 model by default, with a 200k token context. It will probably be enough for plenty of tasks, though there's also a noticeable difference between that and up to 1M.
11 hours ago [-]
dgellow 2 hours ago [-]
> They're gonna say that the future is all AI... until they get the bill.
I mean, the will continue to say so, they just want to be the ones being paid for the service, not anthropic :)
brookst 13 hours ago [-]
I get 98.6% cache hits on Claude code. Short of drastic arch changes it’s hard to imagine it getting much better.
gobdovan 13 hours ago [-]
98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.
kridsdale1 12 hours ago [-]
We are all going to be graded by (tickets closed / tokens burned) soon enough.
nchie 8 minutes ago [-]
I doubt it, the difference between someone slightly inefficient and someone extremely efficient isn't big enough to matter compared to how much they cost in salary.
recursive 12 hours ago [-]
Sweet. I can get that up to infinity, assuming they're using IEEE-754 division.
hedgehog 11 hours ago [-]
You pay for cache hits on every turn and even with the newest architectures longer context is slower/more energy intensive. Constructing concise turns that reuse prefix and stop when the new context is no longer useful help, as does pushing generation down into cheaper models while using stronger models for verification.
tracker1 15 hours ago [-]
My experience as well... I've only hit Antrhopic's 5hr threshold a few times, and two of them was within a half hour of the window. Also, all three times I'd already accomplished a LOT.
I tend to work with the agent, and observe what's going on as well as review/test and work through results/changes. I spend a lot more time planning tasks/features than the execution, even using the agent as part of planning and pre-documentation. It works really well. I don't think people burning through the 5hr allotment in under an hour are actually reviewing/QC/QA the results of what they're doing in any meaningful way, and likely producing as much garbage as good (slop).
I'm really curious as to HOW the MS employees were using the agents as much as what they were doing.
kristjansson 14 hours ago [-]
I suspect subscription limits are quite a bit higher than the equivalent tokens their dollar cost could purchase. I similarly feel like I can get a lot done with a $20/mo Claude Pro subscriptions, but also can easily spend $10-20/day at API pricing with similar usage.
brookst 13 hours ago [-]
Yep. I get $6k - $8k worth of tokens (at api rates) using the $200 max subscription.
skeledrew 13 hours ago [-]
Can verify that I've gotten about $400 worth of tokens from my $20 sub.
kridsdale1 12 hours ago [-]
Now that sounds like a business I’d like to invest in! When’s that Anthropic IPO anyway?
lawn 13 hours ago [-]
I don't understand why people are using the API pricing instead of the Pro/Max subscriptions? What am I missing?
giancarlostoro 13 hours ago [-]
Enterprise customers don't get that option. But also if you want a fully custom harness, you also don't get that option.
sulam 12 hours ago [-]
Personally I prefer the API pricing because I feel like I'm not going to get rug pulled on my work. When it comes to personal stuff, I use the shit out of my sub, but it's not making me money.
kristjansson 5 hours ago [-]
I’ve made the same argument On Here. Paying the full price (should!) make you consider you usage, pick the right model, delegate to cheaper/local providers, …. It makes you use the models the way they’re going to be used after the subsidy ends.
mrgoldenbrown 7 hours ago [-]
Terms of service prohibit subscriptions for employees of companies bigger than X people. I suppose they could all sign up as individuals and try to get away with it but presumably that would look pretty obvious with a tiny bit of analytics.
HDThoreaun 12 hours ago [-]
Anthropic is forcing large enterprises onto api billing instead of subscriptions.
nurettin 2 hours ago [-]
---- Before it was:
Me: We need to do this this that.
Claude: <random stuff that approximates human outout>
Me: Are you sure?
Claude: Well actually there is a bug <more random stuff that looks right this time>
----- Now it is:
Me: We need to do this this that.
Claude: <random stuff that approximates human outout>
Claude: Let me consult the advisor on that.
Claude: advisor came up with some advice, adjusting according to that. <more random stuff that looks right this time>
thegreatpeter 13 hours ago [-]
yeah, by using codex
relevant_stats 11 hours ago [-]
So, snippet from the article says the following:
> I understand that Microsoft is planning to remove most of its Claude Code licenses and push many of its developers to use Copilot CLI instead. While Claude Code has been a popular addition, it has also undermined Microsoft’s new GitHub Copilot CLI coding tool — a command line version of GitHub Copilot that runs outside of development apps like Visual Studio Code.
So "Microsoft chooses to eat its own dogfood" is a more accurate title?
johnnypangs 2 hours ago [-]
I don’t think people read the article, I didn’t until I saw your comment. The article feels like clickbait tbh.
RobRivera 11 hours ago [-]
Por que no los dos?
Eso mensaje de hijo de Carlos
relevant_stats 11 hours ago [-]
Äh, was?
keyle 45 minutes ago [-]
The title is somewhat bait. It reads like MSFT is using less AI, while in fact it's just a force swap to Copilot.
Arguably, Copilot is GPT 5? Not sure what the CLI offers behind the covers.
golf1052 16 minutes ago [-]
Employees (at least on my team) get access to the Claude models as well when using Copilot CLI.
meowkit 35 minutes ago [-]
Copilot is the name for the harness / wrapper of MSFT products
The CLI can swap to whatever model (/models) based on your subscriptions.
The copilots on desktop or Office Apps are likely just GPT5 nano or other tiny models with cheap inference
patentlyze 37 minutes ago [-]
I disagree. As someone who just got a new Windows laptop with Copilot baked(forced) in I've tested Copilot a lot.
It. is. so. bad.
It feels like it's at least 1-2 years behind the current top models.
gbro3n 19 minutes ago [-]
But there isn't a copilot model is there? Just a harnesse, and the vscode copilot extension is pretty good (haven't tried the tui)
usernametaken29 14 minutes ago [-]
I switched to OpenRouter and OpenCode a while ago. It is much cheaper, much much cheaper, and A LOT more reliable. Particulary Gemini was a piece of trash when it came to uptime
proxysna 15 hours ago [-]
Feels about right.
I've launched an internal demo of Claude Code and Deepseek on the same day and we burned through our monthly allowance for Claude in just over a week, with more than a half of that budget being spent in one day. With DS people are unable to go through that same amount of money in a month, not even close.
With that Claude feels like an expensive toy, while DS is a shovel, purely because developers do not feel like they are eating into a precious resource while using it. Also it does not feel like there is much of a difference in capability between Claude and DS-pro. DS-pro and flash do feel like sonnet/opus and haiku, but flash is still very-very capable.
onlyrealcuzzo 12 hours ago [-]
I rage canceled Claude today.
After 2 weeks of Claude getting progressively worse and worse, today was the final straw.
I don't care if they have a phone app. The model is COMPLETE garbage after you subscribe long enough and they think they've "got you".
I can't code on my phone if the model literally moves in the wrong direction and does the opposite of what I tell it to. If I wanted to make my code worse, I'd just randomly commit garbage. I don't need a mobile app for that.
mmusc 10 hours ago [-]
All these tools have almost feature parity. The GitHub cli allows remote sessions and can run anthropic models anyway
couchdb_ouchdb 10 hours ago [-]
I've seen a lot of this sentiment over the previous six months from people on reddit. I have yet to experience this myself as a developer with over 20 years of experience.
solenoid0937 22 minutes ago [-]
It's the same phenomenon as when you learn a new vocabulary word you see it everywhere.
People heard "Claude is nerfed" and now they see it everywhere, they notice failures a lot more than they would have otherwise.
Doesn't matter that Claude is not, in fact, nerfed. Perception is powerful and most humans are not rational.
dgellow 2 hours ago [-]
Opus 4.7 has been a real downgrade for me. I’m back to mid 2025 when I had to catch all the completely intermediary goals/assumptions the model is creating for itself
chantepierre 37 minutes ago [-]
I felt that but find it worked way better by invoking it with `claude --effort max` only
Wowfunhappy 1 hours ago [-]
You can still use older versions of Opus if they work better for you. Just need to set the environment variable.
colechristensen 2 hours ago [-]
What it does seem like is that they're tuning some knobs up and down or releasing new versions of models or system prompts that result in the model getting dumber and smarter in waves.
Opus has been dumb this week.
Claude was having a lot of capacity problems and downtime and then this week that has been much less obvious... and the model is dumber.
It could also just be luck and my impressions are false... who knows.
Our_Benefactors 5 hours ago [-]
It’s because it’s not true, there’s no evidence for it that passes the sniff test. No lab is “shipping a worse model once they’ve got you”. People have a bad few days and blame the model providers instead of stepping back to fix their workflow.
raincole 2 hours ago [-]
When it comes to something with random results (unfortunately that's what LLMs are), people will think the odds are rigged against them.
It's a good thing that hype-chasers are cancelling though. So we can use the services with a reasonable latency.
kridsdale1 12 hours ago [-]
Considered Gemini?
operatingthetan 12 hours ago [-]
Gemini got a big reduction in usage limits this week. There was backlash and they added 3x usage for Antigravity a day later but I haven't really tried it out to get a feel for it yet.
saulpw 12 hours ago [-]
Google has burnt all of its goodwill in dev communities so no, I don't think Gemini is worth consideration.
rnxrx 14 hours ago [-]
Thus does kind of beg the question: If developers are being laid off because AI is better/faster/cheaper or makes all their people 10x or whatever fig leaf, what happens if the required tooling ends up being more expensive? From the investor’s point of view is the drag of employee costs better or worse than a ballooning expense item?
andrewl-hn 12 hours ago [-]
They lay people off and look good in front of investors. Then they hire people, talk about "growth", and once again look good in front of investors.
This would never fly if stock market was rational. But it never is.
dawnerd 1 hours ago [-]
And if/when companies need to scale back their ai investments they can spin it too and the stock market will eat it up.
JumpCrisscross 1 hours ago [-]
> If developers are being laid off because AI is better/faster/cheaper
This is, in my opinion, tripe. SWEs are being laid off because of post-Covid over-hiring. The only evidence for labour destruction is in junior hires. But not because anyone is being fired, but because entry-level jobs are being cannibalised.
dividedbyzero 12 hours ago [-]
I suppose if it all works out it'll end up way more expensive than the employees the models displaced ever were. These kinds of technologies usually end up as an oligopoly at best, and those players will have a wide moat by then, and the things these models build will be tweaked such that no other model or human being can realistically work on them anymore, and then they can price gouge everyone to the brink of unprofitability.
kridsdale1 12 hours ago [-]
At least the models don’t need health insurance, office space, a cafeteria, or have a threat of unionizing.
dividedbyzero 11 hours ago [-]
The model provider would be like a union, at least if unions had absolute control over their members, could take them all away at any time forever with no substantial negative consequences to itself, and spend billions on employer lock-in so switching to the competition is worse than paying the 12% model salary raise.
thewebguyd 12 hours ago [-]
Shh, that's the quiet part the investors don't want to say outloud.
user34283 53 minutes ago [-]
There's 10-15 labs near the frontier, and like 30 serious inference providers, over 70 total on OpenRouter.
With research and hardware near guaranteed to bring the efficiency way up, I'm not scared here of massive price hikes.
There is no moat.
thewebguyd 13 hours ago [-]
I suspect AI would have to get drastically more expensive before it starts looking worse than payroll. If one developer using Claude Code can effectively substitute for 2 developers, you are already coming out ahead at current API pricing assuming very heavy usage, your cost is going to be ~1.5x developer (factoring in beyond salary - benefits, PTO, the other overhead that comes with having employees).
So you're getting 2 for the price of 1.5. Scale that up to 500 devs at a big company and it's a big chunk of change saved on payroll.
Keeping your headcount or hiring humans instead, AI would have to start to cost upwards of $15k/month/developer or more before it costs more than hiring. You're looking at about 4 billion tokens per month before humans start to break even or are cheaper.
jayd16 13 hours ago [-]
You're starting from the assumption that its a 2x benefit. That's a massive leap.
thewebguyd 12 hours ago [-]
True, that was more hypothetical if it got good enough to 2x.
But even taking a more realistic 1.25x (20% time savings) gain, lets say you drop from 500 to 400 devs, you'd have to hit around $4,000/dev/month in token spend before hiring humans again would break even.
Payroll is just expensive, in most companies it's by far the biggest expense. AI still has to cost drastically more before investors would call it out as being worse than increasing headcount, from a pure dollars perspective.
mrgoldenbrown 7 hours ago [-]
Also assuming that current API pricing is sustainable and not subsidized.
ilovecake1984 12 hours ago [-]
This is economy dependant. It’s really Indians why will take the brunt of AI job losses.
ngc248 2 hours ago [-]
"AI" is just a cover for laying ppl off and saving cost. But the pendulum will swing the the other way and the companies will realise that knowledgeable ppl are still required to generate and utilize the generated code. No serious company can run with vibe-coded apps generated by laymen.
ares623 13 hours ago [-]
There is no profit, expense, revenue. Those don't matter. Only thing that matters is stock price goes up, and laying off makes stock price go up. When laying off make stock price go down, then laying off stop.
stock_toaster 12 hours ago [-]
I imagine layoffs are also very much "this quarter and next quarter" with regards to investor visibility.
While LLM Opex is "some future quarter" and very easy to co-mingle with other expenses.
12 hours ago [-]
sreekanth850 49 minutes ago [-]
If you properly keep documents, architecture, and decision records, token consumption can be pretty less. Iam managing everything with two codex plus sub. Repo size is 300 k loc ( backend).
zkmon 15 hours ago [-]
My experience is, Claude Code burns way more tokens compared to other agents, probably to ensure high levels of perceived quality, which is, most of the times not worth the bloat for the user. The bloat works for Anthropic as an advertisement at the cost of your tokens.
andrekandre 8 hours ago [-]
its kind of weird tho, jensen also said we should be burning tons of tokens as well... 'perceived quality' cant be the only reason these ceos pushing token usage so hard can it?
robertkarl 15 hours ago [-]
Cancellation effective June 30. This was a _pilot_ launched in December that accidentally consumed their 2026 yearly target spend on AI!
I expect the r/LocalLLaMA guys to be going nuts about this news.
thewebguyd 13 hours ago [-]
From the article
> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time.
I suspect they weren't as efficient as they could be with token use either. Sounds like they were trying to encourage non-developers to vibe code stuff
xienze 12 hours ago [-]
I'd argue you have a lot more to worry about with developers as far as token usage goes because they're the ones who know how to rig up these wild workflows where tens of agents simulate an entire software development team. The non-developers are probably going to be sticking more in the realm of iterating via chat.
12 hours ago [-]
andrewl-hn 12 hours ago [-]
I'm surprised they even had them in a first place. Doesn't Microsoft have a deep partnership with OpenAI? Aren't all Copilot things powered by various GPT models? I would assume the two companies have barter agreements of sorts.
RevEng 11 hours ago [-]
They do have agreements, but they aren't exclusive, and Microsoft and Open AI have had a rather public falling out over the last year.
skeledrew 13 hours ago [-]
Well, that's the inevitable outcome of token-maxxing :shrugs:
matt3210 1 hours ago [-]
Tokens aren’t that much of an issue when your not evaluated on the usage
tyleo 15 hours ago [-]
Lots of these places measure employee token use with managers having dashboards. It seems like performative code production rather than making anything useful.
They got DeepSeek on Azure, would cut costs by 10x … if they ran it on Huawei
dsagent 14 hours ago [-]
I think whats funny is that employees were most likely already covering the cost for these tools because they are useful. Companies didn't believe employees were using these tools and now have forced their usage and no longer have the costs subsidized.
Similarly companies seem to reward high token usage as a sign of someone willing to play ball with AI and again have forced higher costs on themselves for people reward hacking or using tokens out of spite.
QuiEgo 13 hours ago [-]
There is no world where I can put my company’s data through an external site without their express consent and security sign off. I suspect at most companies there’s zero path for people to have been paying for it themselves.
kridsdale1 12 hours ago [-]
An enormous percentage of America’s white collar work force has been doing this since 2023.
Fun fact, up until you face a consequence for crime, all crime is free! Have fun and go win the competition game against your co-workers.
cityofdelusion 12 hours ago [-]
None of the 5 places I have worked is this possible, but they are also all highly regulated industries. Firewalls block virtually everything by default.
QuiEgo 11 hours ago [-]
Fair, but I assume everything on my work laptop is key logged. Surely they would notice Claude phoning home from my company laptop? I suspect a network rule to look for that traffic is trivial?
RevEng 9 hours ago [-]
My employer doesn't specifically block this stuff, but does put up a warning when you visit it to review our AI usage policy. There isn't detection for using things in ways we shouldn't, but they have an audit trail and can review it if there is suspicion.
InsideOutSanta 12 hours ago [-]
My guess is that at most companies, employees are prohibited from doing this, but not prevented.
uniclaude 15 hours ago [-]
That's very interesting to reconcile with the fact that not too far, Amazon employees feel incentivized to use as many tokens as possible.
HDThoreaun 12 hours ago [-]
"incentivize to use as many tokens as possible" = "Upper management knows people dont like change so we are forcing them to come up with ways to use this thing". It does not mean that management will encourage wastefulness in the future, and it also doesnt mean that token usage from now wont be reviewed in the future. Whats to stop them from dinging your performance in november because you wasted a hundred thousand on tokens with nothing to show for it?
boelboel 12 hours ago [-]
Makes sense why Anthropic wants to IPO as soon as possible as the growth right now comes from temporary wastefulness. Makes all the investments more risky.
andyfilms1 15 hours ago [-]
Surely a company as large as Microsoft is actively attempting to build their own models. They couldn't possibly have expected to stake the future of their software development on the conditions of a third party company?
mrweasel 14 hours ago [-]
Okay, but what if you're not Microsofts size and don't have and R&D budget large enough to fund development of your own models and tools?
This is a warning to any company, not building their own AI, that AI assisted development could become really expensive really fast and most likely won't pay off. What Microsoft is suggesting is that the current price is to high, but it's still not high enough for e.g. Anthropic to be profitable, or AI coding tools are only as good as the developers using them. So you can't meaningfully do layoffs by replacing the developers with AIs, because the cost is to high.
How does Microsoft plan to fix CoPilot, so that the cost will be so much lower than Claude, that budget overruns won't be a problem for their own customer?
andyfilms1 14 hours ago [-]
I expect in the next year or so, we'll stop seeing headlines like "Anthropic buys $15b of compute from SpaceX" and we'll start seeing headlines like "Uber's AI department licenses GPT 6.2 as the foundation for their internal model," or something like that.
Smaller companies will have departments that distill larger models into something more specifically manageable and useful for them. At least, that's my personal prediction :)
mrweasel 12 hours ago [-]
How would that help with pricing? The cost of hardware is already subsidized to hell and back by investors and that's not dropping costs enough. I'm not concerned about Uber, they are way to big. I'm thinking sub 1000 employees in total and maybe 50 - 100 people in the IT department. Are they just going to be cut off from AI tools, because the cost of running them would ruin the company?
I do think your prediction makes sense, because the AI really isn't the product, it needs to be baked into something and licensing the models saves you the R&D and cost of implementing your own.
kridsdale1 12 hours ago [-]
Giving your workforce Claude is like giving everyone in the USPS a Ferrari.
There may be a spot of “good enough to pay for and make a profit” that exists.
NitpickLawyer 15 hours ago [-]
> attempting to build their own models.
At one point there were rumours that they'd do that. They also have the rigts to oAI models for a few more years still, so they could always use that but apparently they're also compute starved (like anyone else).
kridsdale1 12 hours ago [-]
MSFT does have a frontier AI Lab. My friend works there. I don’t know what they’re doing. But MSFT is one of like 5 entities that actually have the talent and physical infrastructure to compete in model-building.
onlyrealcuzzo 12 hours ago [-]
MSFT and Apple are taking the same approach.
The frontier model space costs 1000x as much to develop as the small language models, and is only 1.5 years ahead.
Factually, the frontier models have not paid for themselves. So, if you're MSFT and Apple, you don't need to run in a race where even the winner loses massively.
You can try to train models 1.5 years behind that are highly likely to be profitable, given your market position.
The average person is lagging behind what AI is capable of by 3+ years anyway...
So you can save 1000x on training and 10x on inference and just use SOTA small models.
Why spend $5B training a model that's for sure not going to make $5B (after inference costs) when you can spend $5M building one that WILL make far more than that after inference costs?
rglover 15 hours ago [-]
Curb Your Enthusiasm theme starts playing.
andrekandre 7 hours ago [-]
i was thinking more arrested development but that works as well
killerstorm 15 hours ago [-]
The way coding agent work is fantastically wasteful. All the megabytes of code are processed over and over and over, sometimes withing just one session.
There are papers describing KV cache precomputation for commonly used documents (e.g. KVLink), but, of course, it's not a priority for model providers: they'd rather sell you more tokens, also they would rather get to AGI/ASI first than optimize usage of existing models...
brookst 13 hours ago [-]
Claude code gets >98% KV cache hits. It’s not reprocessing unless you let the cache go cold (5 minutes, which is annoyingly short).
killerstorm 12 hours ago [-]
I meant caching on a bigger level. If you're an organization with 100 developers each doing 10 sessions a day, you're paying for 10000x tokens in frequently used document even if you had 100% KV cache hits within one session. Apparently that's too costly even for companies with trillion dollar market cap...
Normally KV cache works only if your context prefix is identical, but there are papers which demonstrate documents can be cached between different contexts.
brookst 9 hours ago [-]
Ah, understood, and thanks for the clarification!
beoberha 12 hours ago [-]
I believe OP is talking about new sessions or after compaction. He’s getting at the fact that LLMs are stateless and have to rediscover your codebase on every new session.
iainmerrick 39 minutes ago [-]
To be fair, on the Monday morning after a holiday, that’s exactly what I’m like too.
dgellow 1 hours ago [-]
Are you sure that hitting the cache mean you’re not paying for those tokens?
2 hours ago [-]
wolvoleo 10 hours ago [-]
What's the point of eating your own dog food when the only thing you are doing is reselling other people's dog food? Microsoft don't have any competing LLM.
nobodywillobsrv 2 hours ago [-]
This feels like these kind of bad incentive problems we always here about on here ... Like bugs and vipers.
guluarte 15 hours ago [-]
I think tech companies are doing layoffs partly because they need to cover AI operating expenses.
stock_toaster 11 hours ago [-]
I think so too, otherwise why wouldn't you put that (purported) increased capacity/output into improving your existing products or creating new ones, with the headcount that you already have?
DeathArrow 2 hours ago [-]
Doesn't MS have the compute to run GPT 5.5 for all its employees?
wg0 13 hours ago [-]
Microsoft should host DeepseekV4 internally for its developers. And you're welcome.
rvz 12 hours ago [-]
This is the smartest solution to do, to self host the model locally on premise.
kridsdale1 12 hours ago [-]
And by that, you mean, in Azure, surely.
sergiomattei 12 hours ago [-]
My impression is they're being cancelled in favor of full internal adoption of Copilot CLI, which has got much better over the past few months.
Shalomboy 12 hours ago [-]
I'm also a big fan of Copilot CLI, especially after demoing it to a coworker who liked Claude Code.
o10449366 14 hours ago [-]
I switched from Anthropic to OpenAI after spending ~$40K in equivalent token costs using Claude over 3 months.
I found Opus 4.7 to be slow and wasteful with token usage. It's shocking how inefficient it is with tasks like bash tool usage and web searching, delegating them to a dozen subagents only to get stuck and never return until you esc and intervene. That, in addition to all of the broken tooling Anthropic built in to limit token usage like the broken monitoring tool made managing Claude a chore. I was happy to pay $200/month for Opus 4.5 when they had more capacity, but 4.7 felt like a huge step back and no longer worth the price and inconvenience.
I remember an OpenAI employee comment on the GPT5.5 release post about how they specifically geared it towards long-horizon tasks and its been a breathe of fresh air in that regard. I have five two-week long sessions going right now and there's been no degradation in performance or efficiency. It's much better at carrying rules/learnings forward even in long-running sessions and grounding/refreshing itself in verified facts when it loses context.
Its funny because in two weeks I've gotten way more done with GPT5.5 with way fewer tokens and way less handholding. I think this goes to show how important tooling and the harness is and how a capable model like Opus 4.7 can be severely handicapped by bad product decisions.
gnat 13 hours ago [-]
Being able to mange context over long running sessions is a function of the harness, not the model. Are you using Claude Code with GPT5.5? Codex? piclaw? They’ll all have different context management strategies to let you keep going when you would otherwise have filled up context and be forced to stop.
The absolute state of the Hacker News main page in 2026. Thank you for taking your time to put it all together.
ajd555 15 hours ago [-]
2nd link doesn't work.
That would be a neat tool, to find the original article and see how many levels of AI summary it has gone through, a game of AI telephone!
OnionBlender 14 hours ago [-]
I had thought about creating something like that for finding comments for articles. For a given article, display links to comments for HN, lobsters, reddit, etc. However, I feel I already waste too much time reading comments. I shouldn't make it easier and more tempting.
robertkarl 15 hours ago [-]
My bad. I had trouble finding the original source when I googled for it and grabbed a link. I was originally shown a screenshot of a x.com post.
robertkarl 15 hours ago [-]
I emailed dang to politely ask to make the link point to the Verge article since I can't update it.
Man, maybe it's time for me to give the verge a subscription. There the only ones actually doing any journalism here and a bunch of AI blogs skimming off the top.
siva7 15 hours ago [-]
boy i'm leaving the internet. sun is shining. was a good time here while it lasted.
scarmig 15 hours ago [-]
The artificial centipede.
q3k 14 hours ago [-]
i swear i'm going to start an amish community and internet where we forbid any technological development past 2019
call me a luddite, i'll be wearing it as a badge of honor
sashank_1509 15 hours ago [-]
Welp, this is the future we live in now
arowthway 14 hours ago [-]
[dead]
jasondillingham 6 hours ago [-]
[flagged]
othmarodev 12 hours ago [-]
[flagged]
josefritzishere 14 hours ago [-]
AI slop ruined a story about AI? This thread is a story about itself.
thadk 15 hours ago [-]
Microsoft poorly manages token use of most expensive models in a pilot. Then they use that failure to advertise/position their own Github Copilot agents to procurement teams, over the now widely validated Claude Code-based agents.
At least Codex is trying to win validation on merit.
What they wanted was for them to use both and feedback which was better.
The developers voted with their feet and didn’t use Copilot.
What Microsoft were hoping was that the opposite would happen...
Honestly I find GitHub Copilot CLI (and now also the new GitHub Copilot app) quite decent. I mostly use it with Opus 4.7, or rarely with GPT-5.5. The VSCode extension is ok, but CLI or app are the better experience IMO.
At the moment it seems like the way it's been trained has been tightly coupled with grep.
It does feel bizarre though that it doesn't use the symbol servers.
Especially if you want effective results.
These days I just use Claude Code Desktop or Claude Code in powershell. Standalone, not inside and IDE. Honestly, I'm using Desktop more and more as it gets more features.
The IDE is for me. No AI in it at all. If I want to get Claude to do something specific to a file I just @ the file.
Also it became very hard to convince management to keep both Claude code and GitHub Copilot enterprise licenses.
I've tried throwing unsupervised agentic software factory workflows against the wall, and they burned through my tokens like nobody's business but didn't produce much.
Supervised, human-in-the-loop process on the other hand is much more productive but doesn't consume nearly as much. Maybe that's why everyone's pushing agentic approaches so much.
https://www.amazon.com/Passion-Lubes-Natural-Water-Based-Lub...
> This product is out of stock
Ah, shoot, there go my weekend plans. Bummer.
Like the other commenter said: cloud spend can also spin out of control if you don't pay attention, yet we've found ways to keep it under control (training, guardrails, limits, transparancy).
Isn't this a (mildly exaggerated) description of AWS, which is a very successful service?
So your costs scale with the number of users you have.
Thats an op ex that you can explain.
For tokens for developers its maybe closer, cost/outcome wise, to hiring an external consulting company to write your code; money paid scales with work done, no promise of delivery, arbitrary unpredictable external price changes.
Its not quite the same; though, similarly lucrative for consultants.
Yes, but in a "oops this is gonna take another two months to finish" kind of way, not the "oops this is the 12th time this month 8 developers have burned $2K in tokens in a single day and no one really knows how it happened" kind of way.
Colleague used Sonnet 4.6 on some pretty normal agentic coding tasks through AWS Bedrock to keep the data in the EU, 100 EUR usage in a single day. In comparison, the Mistral subscription costs about 20 EUR per month and we tested that for similar tasks it was okay, the usage got to around 10% of that monthly limit in a single day. Or Anthropic's own Max (5x) plan where you get way, way more tokens to do with as you please.
I feel like the sweet spot is having a monthly subscription with any of the providers (you're subsidized a bunch), but if you have to pay per tokens, now I'd just look in the direction of what tasks DeepSeek would be okay for, sadly probably not in the situation above. For a startup, though...
On the other hand, this feels a bit hypocritical:
> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time, and sources tell me that Claude Code has proved very popular inside Microsoft over the past six months.
They're gonna say that the future is all AI... until they get the bill.
The results for a function implementation and test of levenshtein distance in js are pretty similar but Mistral is 30x cheaper than Opus 4.7 and 4x faster than Sonnet 4.6.
https://5m6qnuhyde.evvl.io/
Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.
My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.
There's where even frontier models struggle, which makes comparisons meaningful.
It’s making guesses not decisions, framing as decisions will lead you astray to wasted time and tokens.
It’s vaguely productive to tell them a ton of relevant info upfront attempting to minimise their need for load bearing guesses. I say vaguely because obedience is generally only around the level where it's good enough to lull you into a false sense of security, not to actually be obedient.
It’s a bit more productive to use the various loop mechanisms (hooks, /goal etc) to evaluate each end of turn against guard rails and reject with clear instruction on whats unacceptable. Obviously if you only do this without the front load of info then you’re likely to spend more tokens to reach a satisfactory end of iteration.
I mean, the will continue to say so, they just want to be the ones being paid for the service, not anthropic :)
I tend to work with the agent, and observe what's going on as well as review/test and work through results/changes. I spend a lot more time planning tasks/features than the execution, even using the agent as part of planning and pre-documentation. It works really well. I don't think people burning through the 5hr allotment in under an hour are actually reviewing/QC/QA the results of what they're doing in any meaningful way, and likely producing as much garbage as good (slop).
I'm really curious as to HOW the MS employees were using the agents as much as what they were doing.
Me: We need to do this this that.
Claude: <random stuff that approximates human outout>
Me: Are you sure?
Claude: Well actually there is a bug <more random stuff that looks right this time>
----- Now it is:
Me: We need to do this this that.
Claude: <random stuff that approximates human outout>
Claude: Let me consult the advisor on that.
Claude: advisor came up with some advice, adjusting according to that. <more random stuff that looks right this time>
> I understand that Microsoft is planning to remove most of its Claude Code licenses and push many of its developers to use Copilot CLI instead. While Claude Code has been a popular addition, it has also undermined Microsoft’s new GitHub Copilot CLI coding tool — a command line version of GitHub Copilot that runs outside of development apps like Visual Studio Code.
And people here are interpreting this as related mainly to the Claude burning too much tokens too quickly and suggesting Microsoft should rather use SomeOtherLLM©?
Is this Hacker News or rather Marketing Wars?
Eso mensaje de hijo de Carlos
Arguably, Copilot is GPT 5? Not sure what the CLI offers behind the covers.
The CLI can swap to whatever model (/models) based on your subscriptions.
The copilots on desktop or Office Apps are likely just GPT5 nano or other tiny models with cheap inference
It. is. so. bad.
It feels like it's at least 1-2 years behind the current top models.
I've launched an internal demo of Claude Code and Deepseek on the same day and we burned through our monthly allowance for Claude in just over a week, with more than a half of that budget being spent in one day. With DS people are unable to go through that same amount of money in a month, not even close.
With that Claude feels like an expensive toy, while DS is a shovel, purely because developers do not feel like they are eating into a precious resource while using it. Also it does not feel like there is much of a difference in capability between Claude and DS-pro. DS-pro and flash do feel like sonnet/opus and haiku, but flash is still very-very capable.
After 2 weeks of Claude getting progressively worse and worse, today was the final straw.
I don't care if they have a phone app. The model is COMPLETE garbage after you subscribe long enough and they think they've "got you".
I can't code on my phone if the model literally moves in the wrong direction and does the opposite of what I tell it to. If I wanted to make my code worse, I'd just randomly commit garbage. I don't need a mobile app for that.
People heard "Claude is nerfed" and now they see it everywhere, they notice failures a lot more than they would have otherwise.
Doesn't matter that Claude is not, in fact, nerfed. Perception is powerful and most humans are not rational.
Opus has been dumb this week.
Claude was having a lot of capacity problems and downtime and then this week that has been much less obvious... and the model is dumber.
It could also just be luck and my impressions are false... who knows.
It's a good thing that hype-chasers are cancelling though. So we can use the services with a reasonable latency.
This would never fly if stock market was rational. But it never is.
This is, in my opinion, tripe. SWEs are being laid off because of post-Covid over-hiring. The only evidence for labour destruction is in junior hires. But not because anyone is being fired, but because entry-level jobs are being cannibalised.
With research and hardware near guaranteed to bring the efficiency way up, I'm not scared here of massive price hikes.
There is no moat.
So you're getting 2 for the price of 1.5. Scale that up to 500 devs at a big company and it's a big chunk of change saved on payroll.
Keeping your headcount or hiring humans instead, AI would have to start to cost upwards of $15k/month/developer or more before it costs more than hiring. You're looking at about 4 billion tokens per month before humans start to break even or are cheaper.
But even taking a more realistic 1.25x (20% time savings) gain, lets say you drop from 500 to 400 devs, you'd have to hit around $4,000/dev/month in token spend before hiring humans again would break even.
Payroll is just expensive, in most companies it's by far the biggest expense. AI still has to cost drastically more before investors would call it out as being worse than increasing headcount, from a pure dollars perspective.
While LLM Opex is "some future quarter" and very easy to co-mingle with other expenses.
I expect the r/LocalLLaMA guys to be going nuts about this news.
> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time.
I suspect they weren't as efficient as they could be with token use either. Sounds like they were trying to encourage non-developers to vibe code stuff
Speed without judgement always compounds badly.
https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Similarly companies seem to reward high token usage as a sign of someone willing to play ball with AI and again have forced higher costs on themselves for people reward hacking or using tokens out of spite.
Fun fact, up until you face a consequence for crime, all crime is free! Have fun and go win the competition game against your co-workers.
This is a warning to any company, not building their own AI, that AI assisted development could become really expensive really fast and most likely won't pay off. What Microsoft is suggesting is that the current price is to high, but it's still not high enough for e.g. Anthropic to be profitable, or AI coding tools are only as good as the developers using them. So you can't meaningfully do layoffs by replacing the developers with AIs, because the cost is to high.
How does Microsoft plan to fix CoPilot, so that the cost will be so much lower than Claude, that budget overruns won't be a problem for their own customer?
Smaller companies will have departments that distill larger models into something more specifically manageable and useful for them. At least, that's my personal prediction :)
I do think your prediction makes sense, because the AI really isn't the product, it needs to be baked into something and licensing the models saves you the R&D and cost of implementing your own.
There may be a spot of “good enough to pay for and make a profit” that exists.
At one point there were rumours that they'd do that. They also have the rigts to oAI models for a few more years still, so they could always use that but apparently they're also compute starved (like anyone else).
The frontier model space costs 1000x as much to develop as the small language models, and is only 1.5 years ahead.
Factually, the frontier models have not paid for themselves. So, if you're MSFT and Apple, you don't need to run in a race where even the winner loses massively.
You can try to train models 1.5 years behind that are highly likely to be profitable, given your market position.
The average person is lagging behind what AI is capable of by 3+ years anyway...
So you can save 1000x on training and 10x on inference and just use SOTA small models.
Why spend $5B training a model that's for sure not going to make $5B (after inference costs) when you can spend $5M building one that WILL make far more than that after inference costs?
There are papers describing KV cache precomputation for commonly used documents (e.g. KVLink), but, of course, it's not a priority for model providers: they'd rather sell you more tokens, also they would rather get to AGI/ASI first than optimize usage of existing models...
Normally KV cache works only if your context prefix is identical, but there are papers which demonstrate documents can be cached between different contexts.
I found Opus 4.7 to be slow and wasteful with token usage. It's shocking how inefficient it is with tasks like bash tool usage and web searching, delegating them to a dozen subagents only to get stuck and never return until you esc and intervene. That, in addition to all of the broken tooling Anthropic built in to limit token usage like the broken monitoring tool made managing Claude a chore. I was happy to pay $200/month for Opus 4.5 when they had more capacity, but 4.7 felt like a huge step back and no longer worth the price and inconvenience.
I remember an OpenAI employee comment on the GPT5.5 release post about how they specifically geared it towards long-horizon tasks and its been a breathe of fresh air in that regard. I have five two-week long sessions going right now and there's been no degradation in performance or efficiency. It's much better at carrying rules/learnings forward even in long-running sessions and grounding/refreshing itself in verified facts when it loses context.
Its funny because in two weeks I've gotten way more done with GPT5.5 with way fewer tokens and way less handholding. I think this goes to show how important tooling and the harness is and how a capable model like Opus 4.7 can be severely handicapped by bad product decisions.
call me a luddite, i'll be wearing it as a badge of honor
At least Codex is trying to win validation on merit.