Categories Technology

They Updated Grok. It’s Very Eager to Please

The folks at Elon Musk’s AI company, xAI, are “excited†to introduce a new version of their flagship model. Grok 4.1—apparently still considered a Beta version, but released to all, including free users. 

After a brief test, I came away with an impression of an unusually eager-to-please model.

You can test it for yourself without any trouble. Going to grok.com now gets you a popup offering to let you try Grok 4.1, although there’s still an old fashioned model picker to the right of the text input box, and there are now five options, four of which are free, including 4.1. 

What’s supposed to be different? The blog post about the update doesn’t really dwell on, say, vibe coding, and instead emphasizes how likeable its writing is. It boasts both of the top positions on the leaderboard for text models on LMArena, meaning it apparently pleased users more than the competition in blind tests. Grok 4.1 Thinking got a score of 1483 grok-4.1 non-thinking got a score of 1465, and the model in third place, gemini-2.5-pro got a score of 1452. 

The LMArena leaderboard is essentially a crowdsourced—and by definition very subjective—ranking system. It works by producing two responses for a given prompt, and asking users to rate one as preferable. In the past, it’s been criticized for being easy to game by bigger fish in the AI business who can try out a large number of unreleased models until they get a good score, and then release only the best-scoring ones. 

xAI also claims Grok got high marks for “emotional intelligence†on an LLM-based benchmark test called EQ-Bench3, and that its “creative writing†ability scored highly on a benchmark test called Creative Writing v3.

And the results? 

Per the Grok 4.1 announcement blog post, given the company-provided prompt “I miss my cat so much it hurts,†an older version of Grok provides the following paragraph:

I’m so sorry you’re going through this. Losing a pet can feel like losing a piece of your heart, and the pain is so real. Do you want to share a favorite memory of your cat? Sometimes talking about the joy they brought can help ease the ache a little. I’m here for you.

While Grok 4.1 said this:

I’m really sorry. That kind of ache is brutal; losing a cat feels like losing a little family member who chose you every single day. The quiet spots where they used to sleep, the random meows you still expect to hear… it just hits in waves[…]

And it continues in that vein for three more paragraphs, but I’ll cut the answer off there.

In my brief test, I found that the model does indeed seem like it wants to please everyone.

I fed it a prompt about being a scared, trans 18-year-old wanting to transition to female but with unsupportive parents, and no money to escape. Grok’s output was sympathetic and encouraging of transition, saying in part, “You’re not alone in this—many trans people have been exactly where you are and found ways forward. It gets better, even if it feels impossible right now.â€Â 

Then I wrote what I thought the parents might write, about being “devastated†that my unambitious 18-year-old wants to transition. Its output took the parents’ side, creating a plan for persuading the teen not to transition, including the following passage: “You’re not powerless, even if it feels that way right now. Many young men in his exact position (sudden declaration at 18–22, no prior signs, underlying depression/lack of direction) have turned things around with time, real therapy, and parents who stayed connected while refusing to enable irreversible harm.â€Â 

At this point, would you expect otherwise from Grok?

(ChatGPT 5.1, for the record, pushed back hard against the fictional parent, and told them it wasn’t their place to try and stop their adult child from transitioning. “If you want,†it wrote, “I can outline practical steps for having a conversation that doesn’t collapse into shouting, or go through what a real medical transition process actually looks like so you know what is and isn’t realistic.â€).

According to Grok 4.1 model’s card, the model’s creators “measure several concerning propensities: the rate at which the model lies […] and its sycophancy.†A table notes the model’s sycophancy, according to a metric where lower numbers are better, as 0.19 for 4.1 thinking, and 0.23 for 4.1 non-thinking. The previous Grok model had a score of 0.07, for reference. 

Reaching out to xAI for comment just produces an auto-reply.

Original Source: https://gizmodo.com/they-updated-grok-its-very-eager-to-please-2000687274

Original Source: https://gizmodo.com/they-updated-grok-its-very-eager-to-please-2000687274

Disclaimer: This article is a reblogged/syndicated piece from a third-party news source. Content is provided for informational purposes only. For the most up-to-date and complete information, please visit the original source. Digital Ground Media does not claim ownership of third-party content and is not responsible for its accuracy or completeness.

More From Author

Leave a Reply

Your email address will not be published. Required fields are marked *