Hacker Newsnew | past | comments | ask | show | jobs | submit | dhorthy's commentslogin

I don’t think anyone serious would recommend it for serious production systems. I respect the Ralph technique as a fascinating learning exercise in understanding llm context windows and how to squeeze more performance (read: quality) from today’s models

Even if in the absolute the ceiling remains low, it’s interesting the degree to which good context engineering raises it


How is it a “fascinating learning exercise” when the intention is to run the model in a closed loop with zero transparency. Running a black box in a black box to learn? What signals are you even listening to to determine whether your context engineering is good or whether the quality has improved aside from a brief glimpse at the final product. So essentially every time I want to test a prompt I waste $100 on Claude and have it an entire project for me?

I’m all for AI and it’s evident that the future of AI is more transparency (MLOPs, tracing, mech interp, AI safety) not less.


Current transparency is rubbish but people will continue to put up with it if they're getting decent output quality

there is the theoretical "how the world should be" and there is the practical "what's working today" - decry the latter and wait around for the former at your peril

there are hundreds of useful resources, including many linked in the article itself

the note about the crypto token was intended to “okay this is now hype slop and it’s time to move on”

I read it. i agree this is out of touch. Not because the things its saying are wrong, but because the things its saying have been true for almost a year now. They are not "getting worse" they "have been bad". I am staggered to find this article qualifies as "news".

If you're going to write about something that's been true and discussed widely online for a year+, at least have the awareness/integrity to not brand it as "this new thing is happening".


Perhaps the advertising money from the big AI money sinks is running out and we are finally seeing more AI scepticism articles.


> They are not "getting worse" they "have been bad".

The agents available in January 2025 were much much worse than the agents available in November 2025.


Yes, and for some cases no.

The models are gotten very good, but I rather have an obviously broken pile of crap that I can spot immediately, than something that is deep fried with RL to always succeed, but has subtle problems that someone will lgtm :( I guess its not much different with human written code, but the models seem to have weirdly inhuman failures - like, you would just skim some code, cause you just cant believe that anyone can do it wrong, and it turns out to be.


That's what test cases are for, which is good for both humans and nonhumans.


Test cases are great, but not a total solution. Can you write a test case for the add_numbers(a, b) function?


Well, for some reason it doesnt let me respond to the child comments :(

The problem (which should be obvious) is that with a/b real you cant construct an exhaustive input/output set. The test case can just prove the presence of a bug, but not its absence.

Another category of problems that you cant just test and have to prove is concurrency problems.

And so forth and so on.


Of course you can. You can write test cases for anything.

Even an add_numbers function can have bugs, e.g. you have to ensure the inputs are numbers. Most coding agents would catch this in loosely-typed languages.


I mean "have been bad" doesnt exclude "getting worse" right :)


engineers always want to re write from scratch and it never works.

a tale as old as time - my second job out of college back in like 2016, I landed at the tail end of a 3-month feature-freeze refactor project. was pitched to the CEO as 1-month, sprawled out to 3 months, still wasn't finished. Non-technical teams were pissed, technical teams were exhausted, all hope was lost. Ended up cutting a bunch of scope and slopping out a bunch of bugs anyway.


i had the privilege of working w/ some incredible eng leaders at my previous gig - they were very good at working both upwards and downwards to execute against the "50/50" rule - half of any given sprint's work is focused on new features, and half is focused on bug fixes, chores, things that improve team velocity.


the people yearn for refactoring


I think the key here is “if X then Y syntax” - this seems to be quite effective at piercing through the “probably ignore this” system message by highlighting WHEN a given instruction is “highly relevant”


What?


It helps when questions intended to resolve ambiguity are not themselves hopelessly ambiguous.

See also: "Help me help you" - https://en.wikipedia.org/wiki/Jerry_Maguire


agree - i've had claude one-shot this for me at least 10 times at this point cause i'm too lazy to lug whatever code around. literally made a new one this morning


For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.

Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`

> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.

I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.


> For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.

On Reddit's LLM subreddits people are rediscovering the very basics of software project management as some massive insights daily or very least weekly.

Who would've guessed that proper planning, accessible and up to documentation and splitting tasks into manageable testable chunks produces good code? Amazing!

Then they write a massive blog post or even some MCP mostrosity for it and post it everywhere as a new discovery =)


I can totally understand where you are coming from with this comment. It does feel a bit frustrating that people are rediscovering things that were written in books 30/40/50 years ago.

However, I think this is awesome for the industry. People are rediscovering basic things, but if they didn't know about the existing literature this is a perfect opportunity to refer them to it. And if they were aware, but maybe not practicing it, this is a great time for the ideas to be reinforced.

A lot of people, myself included, never really understand which practices are important or not until we were forced to work on a system that was most definitely not written with any good practices in mind.

My current view of agentic coding is that it's forcing an entire generation of devs to learn software project management or drowning under the mountain of debt an LLM can produce. Previously it took much longer to feel the weight of bad decisions in a project but an LLM allows you to speed-run this process in a few weeks or months.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: