Did you know that your chatbot might be out to deceive you? That it might be lying to you? And it might turn you into paperclips? Huge if true! Ordinary people have been using chatbots for a couple…
AI chat bots that do bizarre and pointless things, but are clearly capable of some kind of sophistication, are exactly the warning sign that as it gains new capabilities this is a danger we need to be aware of.
You’re free to decide that you are right and we are wrong, but I feel like that’s more likely to be from the Dunning-Kruger effect than from your having achieved a deeper understanding of the issues than he has.
For anyone who rightfully can’t be arsed to click that link, the expert is “Robert Miles AI Safety”, who I assume is an expert (a youtuber) in the madeup field of “AI safety”.
Not to be confused with the late and great dream trance producer Robert Miles whom we all love dearly.
who the fuck is “we”? you’re some asshole who bought the critihype so hard you think that when the chatbot does dumb computer shit that only proves it’s more human and more dangerous. you’re not in on this grift, you’re a mark.
Okay apparently it was my turn to subject myself to this nonsense and it’s pretty obvious what the problem is. As far as citations go I’m gonna go ahead and fall back to “watching how a human toddler learns about the world” which is something I’m sure most AI researchers probably don’t have experience with as it does usually involve interacting with a woman at some point.
In the real examples that he provides, the system isn’t “picking up the wrong goal” as an agent somehow. Instead it’s seeing the wrong pattern. Learning “I get a pat on the head for getting to the bottom-right-est corner of the level” rather than “I get a pat on the head when I touch the coin.” These are totally equivalent in the training data, so it’s not surprising that it’s going with the simpler option that doesn’t require recognizing “coin” as anything relevant. This failure state is entirely within the realms of existing machine learning techniques and models because identifying patterns in large amounts of data is the kind of thing they’re known to be very good at. But there isn’t any kind of instrumental goal establishing happening here as much as the system is recognizing that it should reproduce games where it moves in certain ways.
This is also a failure state that’s common in humans learning about the world, so it’s easy to see why people think we’re on the right track. We had to teach my little on the difference between “Daddy doesn’t like music” and “Daddy doesn’t like having the Blaze and the Monster Machines theme song shout/sang at him when I’m trying to talk to Mama.” The difference comes in the fact that even as a toddler there’s enough metacognition and actual thought going on that you can help guide them in the right direction, rather than needing to feed them a whole mess of additional examples and rebuild the underlying pattern.
And the extension of this kind of pattern misrecognition into sci-fi end of the world nonsense is still unwarranted anthropomorphism. Like, we’re trying to use evidence that it’s too dumb to learn the rules of a video game as evidence that it’s going to start engaging in advanced metacognition and secrecy.
“watching how a human toddler learns about the world”
I have several family members with kids now and this is quite funny. A toddler learns how to crawl by looking at a lot of adults crawling or something.
hahahaha nope
Here’s a video of an expert in the field saying it more coherently and at more length than I did:
https://youtu.be/zkbPdEHEyEI
You’re free to decide that you are right and we are wrong, but I feel like that’s more likely to be from the Dunning-Kruger effect than from your having achieved a deeper understanding of the issues than he has.
For anyone who rightfully can’t be arsed to click that link, the expert is “Robert Miles AI Safety”, who I assume is an expert (a youtuber) in the madeup field of “AI safety”.
Not to be confused with the late and great dream trance producer Robert Miles whom we all love dearly.
I think he is a cs guy, who also is into EA, so basically it is the same source as the research OP posted about.
who the fuck is “we”? you’re some asshole who bought the critihype so hard you think that when the chatbot does dumb computer shit that only proves it’s more human and more dangerous. you’re not in on this grift, you’re a mark.
With all due respect, a counterpoint:
https://www.youtube.com/watch?v=xvFZjo5PgG0
convincing
Okay apparently it was my turn to subject myself to this nonsense and it’s pretty obvious what the problem is. As far as citations go I’m gonna go ahead and fall back to “watching how a human toddler learns about the world” which is something I’m sure most AI researchers probably don’t have experience with as it does usually involve interacting with a woman at some point.
In the real examples that he provides, the system isn’t “picking up the wrong goal” as an agent somehow. Instead it’s seeing the wrong pattern. Learning “I get a pat on the head for getting to the bottom-right-est corner of the level” rather than “I get a pat on the head when I touch the coin.” These are totally equivalent in the training data, so it’s not surprising that it’s going with the simpler option that doesn’t require recognizing “coin” as anything relevant. This failure state is entirely within the realms of existing machine learning techniques and models because identifying patterns in large amounts of data is the kind of thing they’re known to be very good at. But there isn’t any kind of instrumental goal establishing happening here as much as the system is recognizing that it should reproduce games where it moves in certain ways.
This is also a failure state that’s common in humans learning about the world, so it’s easy to see why people think we’re on the right track. We had to teach my little on the difference between “Daddy doesn’t like music” and “Daddy doesn’t like having the Blaze and the Monster Machines theme song shout/sang at him when I’m trying to talk to Mama.” The difference comes in the fact that even as a toddler there’s enough metacognition and actual thought going on that you can help guide them in the right direction, rather than needing to feed them a whole mess of additional examples and rebuild the underlying pattern.
And the extension of this kind of pattern misrecognition into sci-fi end of the world nonsense is still unwarranted anthropomorphism. Like, we’re trying to use evidence that it’s too dumb to learn the rules of a video game as evidence that it’s going to start engaging in advanced metacognition and secrecy.
I have several family members with kids now and this is quite funny. A toddler learns how to crawl by looking at a lot of adults crawling or something.
@PhilipTheBucket
Is 13 years without completing a PhD good?