from Introducing the next generation of Claude \ Anthropic (note this is not a neutral site but the measures are standard)
Scary...
(1) Alex on X: "Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… https://t.co/m7wWhhu6Fg" / X (twitter.com)
Scary and too anthropomorphic in nature