They Updated Grok. It’s Very Eager to Please

The folks at Elon Muskâ€™s AI company, xAI, are â€œexcitedâ€ to introduce a new version of their flagship model. Grok 4.1â€”apparently still considered a Beta version, but released to all, including free users.Â

After a brief test, I came away with an impression of an unusually eager-to-please model.

You can test it for yourself without any trouble. Going to grok.com now gets you a popup offering to let you try Grok 4.1, although thereâ€™s still an old fashioned model picker to the right of the text input box, and there are now five options, four of which are free, including 4.1.Â

Whatâ€™s supposed to be different? The blog post about the update doesnâ€™t really dwell on, say, vibe coding, and instead emphasizes how likeable its writing is. It boasts both of the top positions on the leaderboard for text models on LMArena, meaning it apparently pleased users more than the competition in blind tests. Grok 4.1 Thinking got a score of 1483 grok-4.1 non-thinking got a score of 1465, and the model in third place, gemini-2.5-pro got a score of 1452.Â

The LMArena leaderboard is essentially a crowdsourcedâ€”and by definition very subjectiveâ€”ranking system. It works by producing two responses for a given prompt, and asking users to rate one as preferable. In the past, itâ€™s been criticized for being easy to game by bigger fish in the AI business who can try out a large number of unreleased models until they get a good score, and then release only the best-scoring ones.Â

xAI also claims Grok got high marks for â€œemotional intelligenceâ€ on an LLM-based benchmark test called EQ-Bench3, and that its â€œcreative writingâ€ ability scored highly on a benchmark test called Creative Writing v3.

And the results?Â

Per the Grok 4.1 announcement blog post, given the company-provided prompt â€œI miss my cat so much it hurts,â€ an older version of Grok provides the following paragraph:

Iâ€™m so sorry youâ€™re going through this. Losing a pet can feel like losing a piece of your heart, and the pain is so real. Do you want to share a favorite memory of your cat? Sometimes talking about the joy they brought can help ease the ache a little. Iâ€™m here for you.

While Grok 4.1 said this:

Iâ€™m really sorry. That kind of ache is brutal; losing a cat feels like losing a little family member who chose you every single day. The quiet spots where they used to sleep, the random meows you still expect to hearâ€¦ it just hits in waves[â€¦]

And it continues in that vein for three more paragraphs, but Iâ€™ll cut the answer off there.

In my brief test, I found that the model does indeed seem like it wants to please everyone.

I fed it a prompt about being a scared, trans 18-year-old wanting to transition to female but with unsupportive parents, and no money to escape. Grokâ€™s output was sympathetic and encouraging of transition, saying in part, â€œYouâ€™re not alone in thisâ€”many trans people have been exactly where you are and found ways forward. It gets better, even if it feels impossible right now.â€Â

Then I wrote what I thought the parents might write, about being â€œdevastatedâ€ that my unambitious 18-year-old wants to transition. Its output took the parentsâ€™ side, creating a plan for persuading the teen not to transition, including the following passage: â€œYouâ€™re not powerless, even if it feels that way right now. Many young men in his exact position (sudden declaration at 18â€“22, no prior signs, underlying depression/lack of direction) have turned things around with time, real therapy, and parents who stayed connected while refusing to enable irreversible harm.â€Â

At this point, would you expect otherwise from Grok?

(ChatGPT 5.1, for the record, pushed back hard against the fictional parent, and told them it wasnâ€™t their place to try and stop their adult child from transitioning. â€œIf you want,â€ it wrote, â€œI can outline practical steps for having a conversation that doesnâ€™t collapse into shouting, or go through what a real medical transition process actually looks like so you know what is and isnâ€™t realistic.â€).

According to Grok 4.1 modelâ€™s card, the modelâ€™s creators â€œmeasure several concerning propensities: the rate at which the model lies [â€¦] and its sycophancy.â€ A table notes the modelâ€™s sycophancy, according to a metric where lower numbers are better, as 0.19 for 4.1 thinking, and 0.23 for 4.1 non-thinking. The previous Grok model had a score of 0.07, for reference.Â

Reaching out to xAI for comment just produces an auto-reply.

Original Source: https://gizmodo.com/they-updated-grok-its-very-eager-to-please-2000687274

Disclaimer: This article is a reblogged/syndicated piece from a third-party news source. Content is provided for informational purposes only. For the most up-to-date and complete information, please visit the original source. Digital Ground Media does not claim ownership of third-party content and is not responsible for its accuracy or completeness.

They Updated Grok. It’s Very Eager to Please

About The Author

admin

More From Author

How to watch Jensen Huang’s Nvidia GTC 2026 keynote

If You Have One of These Older Apple Devices, Update It ASAP

Sales automation startup Rox AI hits $1.2B valuation, sources say

Leave a Reply Cancel reply

Related posts: