AI solves every river crossing puzzle, we can go home now [content warning: botshit]

diz@awful.systems · 1 day ago

AI solves every river crossing puzzle, we can go home now [content warning: botshit]

NigelFrobisher@aussie.zone · 5 hours ago

They scripted the river crossing puzzle into LLMs months ago. It’s a demo set-piece to convince users that the bot can solve any class of problem - the only issue is that it’ll often turn them into more river-crossing problems.

BlueMonday1984@awful.systems · 17 hours ago

Although that does suggest a new dunk on computer touchers, of the AI enthusiast kind, you can point at that and say that coding clearly does not require any logical reasoning.

Considering how many actual programmers fell for botshit and autoplag, it probably never did. That’s probably being a bit harsh, but I feel this bubble’s shown that your ability to puke out computational word salad does not translate into having any sort of useful skill.

HedyL@awful.systems · 17 hours ago

At the very least, many of them were probably unable to differentiate between “coding problems that have been solved a million times and are therefore in the training data” and “coding problems that are specific to a particular situation”. I’m not a software developer myself, but that’s my best guess.

HedyL@awful.systems · 20 hours ago

It is funny how, when generating the code, it suddenly appears to have “understood” what the instruction “The dog can not be left unattended” means, while that was clearly not the case for the natural language output.

YourNetworkIsHaunted@awful.systems · 11 hours ago

That’s what I was going to say. The natural language version actually claims that it leaves the dog behind unattended in every step, even though the following step continues as though it still has the dog and not whichever vegetable it brought back in the previous step.

Either it’s not actually good at natural language processing or some element of the solution isn’t surviving the shift from the river_cross() tool to natural language output. Whatever actual state it’s tracking internally doesn’t track to the output past the headline.

Kg. Madee Ⅱ.@mathstodon.xyz · 16 hours ago

@HedyL @diz I kinda wonder if this would work better if it just was worded the other way round: “must be supervised always”
If I understand correctly, LLMs have difficulties encoding negative correlations (not, un-, …)

diz@awful.systems · 12 hours ago

That is not equivalent, though; other solutions to “can not be left unattended” exist; just ask Kristi Noem.

Kg. Madee Ⅱ.@mathstodon.xyz · 11 hours ago

@diz OK, that would have prevented any escape 🙃

diz@awful.systems · edit-2 16 hours ago

Other funny thing: it only became a fully automatic plagiarism machine when it claimed that it wrote the code (referring to itself by name which is a dead giveaway that the system prompt makes it do that).

I wonder if code is where they will ultimately get nailed to the wall for willful copyright infringement. Code is too brittle for their standard approach, “we sort of blurred a lot of works together so its ours now, transformative use, fuck you, prove that you don’t just blur other people’s work together, huh?”.

But also for a piece of code, you can very easily test if the code has the same “meaning” - you can implement a parser that converts code to an expression graph, and then compare that. Which makes it far easier to output code that is functionally identical to the code they are plagiarizing, but looks very different.

But also I estimate approximately 0% probability that the assholes working on that wouldn’t have banter between themselves about copyright laundering.

edit: Another thing is that since it can have no own conception of what “correct” behavior is for a piece of code being plagiarized, it would also plagiarize all the security exploits.

This hasn’t been a big problem for the industry, because only short snippets were being cut and pasted (how to make some stupid API call, etc), but with generative AI whole implementations are going to get plagiarized wholesale.

Unlike any other work, code comes with its own built in, essentially irremovable “watermark” in the form of security exploits. In several thousands lines of code, there would be enough “watermark” for identification.

Architeuthis@awful.systems · 13 hours ago

I’d say that incredibly unlikely unless an LLM suddenly blurts out Tesla’s entire self-driving codebase.

The code itself is probably among the least behind-a-moat things in software development, that’s why so many big players are fine with open sourcing their stuff.

diz@awful.systems · edit-2 10 hours ago

Pre-LLM, I had to sit through one or two annual videos to the sense of “dont cut and paste from open source, better yet don’t even look at GPLd code you arent working on” and had to do a click test with questions like “is it ok if you rename all the variables yes no”. Ohh and I had to run a scanning tool as part of the release process.

I don’t think its the FSD they would worry about, but GPL especially v3. Nobody gives a shit if it steals some leetcode snippet, or cuts and pastes some calls to a stupid API.

But if you have a “coding agent” just replicating GPL code wholesale, thousands and thousands of lines, it would be very obvious. And not all companies ship shitcode. Apple is a premium product and ages old patched CVEs from open source cropping up in there wouldn’t be exactly premium.

Architeuthis@awful.systems · 7 hours ago

I too love to reminisce over the time (like 3m ago) when the c-suite would think twice before okaying uploading whatever wherever, ostensibly on the promise that it would cut delivery time (up to) some notable percentage, but mostly because everyone else is also doing it.

Code isn’t unmoated because it’s mostly shit, it’s because there’s only so many ways to pound a nail into wood, and a big part of what makes a programming language good is that it won’t let you stray too much without good reason.

You are way overselling coding agents.

HedyL@awful.systems · 9 hours ago

And, after the end of the AI boom, do we really know what wealthy investors are going to do with the money they cannot throw at startups anymore? Can we be sure they won’t be using it to fund lawsuits over alleged copyright infringements instead?

Architeuthis@awful.systems · edit-2 7 hours ago

Fund copyright infringement lawsuits against the people they had been bankrolling the last few years? Sure, if the ROI is there, but I’m guessing they’ll likely move on to then next trendy sounding thing, like a quantum remote diddling stablecoin or whatevertheshit.

BlueMonday1984@awful.systems · 15 hours ago

Unlike any other work, code comes with its own built in, essentially irremovable “watermark” in the form of security exploits. In several thousands lines of code, there would be enough “watermark” for identification.

To give an example, Warner Brothers got sued by Bethesda for stealing code from Fallout Shelter when making their Westworld mobile game, with Bethesda pointing to a bug that appeared in early versions of FO Shelter as evidence of stolen code.

Architeuthis@awful.systems · edit-2 12 hours ago

On the other hand they blatantly reskinned an entire existing game, and there’s a whole breach of contract aspect there since apparently they were reusing their own code that they wrote while working for Bethesda, who I doubt would’ve cared as much if this were only about an LLM-snippet length of code.

diz@awful.systems · 12 hours ago

LLM snippets are so 2024. Coding agents, baby.

Architeuthis@awful.systems · 7 hours ago

Ah yes, the supreme technological miracle of automating the ctrl+c/ctrl+v parts when applying the LLM snippet into your codebase.

diz@awful.systems · 14 hours ago

Yeah, that’s a great example.

The other thing is that unlike art, source code is already made to be consumed by a machine. It is not any more transformative to convert source code to equivalent source code, than it is to re-encode a video.

The only thing they do that is “transformative” is using source code not for compiling it but for defrauding the investors.