Richard Socher: From Feature Engineering to Superintelligence

Few people have shaped the trajectory of modern artificial intelligence as directly as Richard Socher. As a PhD student at Stanford, he championed the use of neural networks for natural language processing at a time when the research establishment was deeply sceptical, going on to invent the most widely used word vectors and contextual vectors, and pioneering the concept of prompt engineering that would become central to the age of large language models. With over 235,000 academic citations, a Stanford PhD thesis award, and recognition on TIME’s 100 AI list, Socher has been at the frontier of AI research for more than a decade. He served as Chief Scientist and EVP at Salesforce following its acquisition of his startup MetaMind, and later founded You.com — the first platform to integrate a large language model directly into a search engine — and AIX Ventures, a fund backing the next generation of AI-native companies.

In this conversation with Thought Economics, Socher traces the intellectual arc from his early contrarian bet on neural NLP through the unification of the field via prompt engineering, the innovator’s dilemma that let a startup outpace Google in AI-powered search, and his current work at Recursive on enabling AI systems to improve themselves with minimal human intervention. He offers a striking reframing of the superintelligence safety debate, arguing that multiple superintelligences would drive the net risk of catastrophe toward zero, and makes a compelling case for what he calls constructive optimism — the philosophical conviction that technology, applied with agency, remains humanity’s most powerful tool for progress. Previous Thought Economics conversations on AI and its implications include those with Mo Gawdat, Federico Faggin, Jaan Tallinn, and Noam Chomsky.

Q: You were pushing neural networks and natural language processing at a time when most of the field was dismissive. What gave you the conviction that this was so important?

[Richard Socher]: There’s probably a contrarian in me ever since I was a teenager, and maybe even before that. But I think ultimately it was first principles. I saw all the super smart PhD students at Stanford write these beautiful papers about sophisticated statistical models, but then when they really wanted to get their models to work, they spent the majority of their actual programming time on designing the features that were then fed into those beautiful statistical models. There was clearly a discrepancy there. When I was fortunate enough to hear that a small group of people like Andrew Ng at Stanford, and others, were starting to explore these ideas for computer vision — where everything’s already a vector and it’s easily fed into a neural network — I thought, maybe if they can learn their features in vision, we can do the same thing for natural language processing. It was one of those typical examples of people combining different ideas from different fields and finding magic at that intersection.

Q: Was that what led you to think about prompt engineering as being so significant? Because at the time, it perhaps wouldn’t have been the logical conclusion that prompt engineering would become quite the breakthrough it has.

[Richard Socher]: In the early 2010s, my goal was to take the manual feature engineering out and have it all trained end to end with a neural network, from the raw text data to the final outcomes you care about. Then I feel like we got the field to move and realise everything should be a neural network, but then everyone was doing architecture engineering. I was like, I guess we’re now at a higher level, but we still have a lot of humans involved in this supposedly artificial intelligence field. It was mostly “graduate student descent” of who’s really good at coming up with different ideas, implementing them, validating them, and so on. And so I thought, well, this means that each group will do some research and then maybe you’re lucky and you sit on top of that research in some abstract way, but we want the models to keep getting better and better and have one model. The only way we can have one model that keeps getting better is to move away from architecture engineering to having a single model that we can just keep feeding different tasks into.

That led me to say, how do we unify all of natural language processing? There are two papers — one called ‘Ask Me Anything’, which had that first-time idea of all these different tasks being just questions over some context with an output. And then the decaNLP paper where we really took the ten hardest NLP tasks and said everything you could think about a conversation could be framed this way.

The idea was that if you wanted to connect the whole field and make it so that we don’t have to manually feature- or architecture-engineer for every task, you have to unify it all. The question becomes: what is a unifying system? And the unifying natural language system was that everything is essentially some kind of input, some kind of task or question about that input, and some kind of output. That led us to prompt engineering, which in retrospect is of course a simple, trivial idea, but the way to get to it was that we had to actually train a single model for all of NLP. That model had to have a ton of attention mechanisms, a ton of different scaling issues, and so on. Of course, in comparison to nowadays the scale was still small, but that was the first time we could actually take the hardest tasks — translation, summarisation, question answering, part-of-speech tagging, semantic role labelling, and all these different tasks that natural language processing had — and we unified them within one model.

The funny thing is, ultimately the reviewers said this paper’s overcrowded and it’s a silly idea to try to unify NLP. They rejected it, and it was all public. You can actually find those reviews on OpenReview for the ICLR submission, which show you how far-fetched that idea seemed to people at the time. But fortunately, some of our friends like Alec Radford and Ilya Sutskever saw the paper and were inspired, and cited it nicely in their early GPT papers. And so the idea lived on.

Q: With You.com — great domain, by the way — you were the first to put a large language model inside a search engine. You got there before Google. Talk us through that, because the way you’ve disrupted it is so powerful.

[Richard Socher]: In many ways it felt like, okay, we’re the first, but not the last — and it’s often better actually to be the last. But at the time, it felt like the problem of our information society isn’t anymore that we don’t have access to information. It’s that we have actually too much information. We had to truly use AI now to help us not get access to more, but actually summarise all of it for us. That was the goal — to use AI and large language models and generally transformers and natural language processing to essentially summarise search results for us, and bring more and more useful answers to people in a more consistent and concise way. That was the original plan.

Unfortunately, when we did the traditional thing of launching early and iterating with users, the users mostly said, “That’s great, but we want it to be a little more like Google. Can you make the links more blue? Can you make the links more prominent?” We had the LM summarise the search results on top, and the users just scrolled right through it. At some point we did user studies and asked, “Why did you skip that nice summary that says AI summary at the top?” And they said, “Oh, Google has all the ads on top, so I assumed there were ads and I just skipped it and went directly to the links.” And we’re like, no, that was the cool thing — that was all AI, there are no ads in there. We had no ads.

It was really interesting. Ultimately we just ditched the whole traditional search engine and brought LMs front and centre so you just could not avoid them anymore. But then the marketing moment from the consumer side was clearly hit by ChatGPT. So long story short, now we’re providing that search infrastructure to all the other LLMs that want to be up to date, accurate, and have citations. And that’s how the revenue has been growing nicely since.

Q: Your accuracy is more than ten per cent better than your best competitor, your latency is less than a fifth of the nearest competitor, and the precision recall is significantly higher. How important were these metrics in designing it, and what are others not doing that you were able to do?

[Richard Socher]: It’s something that you just have to truly care about as an organisation. For better or for worse, we cared about it almost too much. A lot of other folks just said, “Oh, the technology is not quite accurate but let’s just market it more,” and maybe that was the better move in the consumer world. But in enterprise land, people actually do care about accuracy a lot. You could lose your job if you say incorrect things, and so that’s why we found our niche in the more enterprise B2B context where we provide these search answers to others.

How are we better? We really care. We focus on it. It’s our values — trust, facts, accuracy, and kindness. We have a bunch of amazing researchers and engineers working on this non-stop, constantly improving the state of the art, and then publishing those APIs and also blog posts and open-source comparisons and ways so that everyone can verify that we’re more accurate. I think in the field of APIs, you can actually often make much more objective decisions. And when subjective decisions are made, You.com wins because we’re just more accurate than everyone else. Whereas in consumer land, it’s a lot more about the vibes and hitting a nerve in your brand and your marketing and your design.

Q: Is this where we need to think about intelligence differently? Consumers see Gemini or GPT and just think “that’s AI,” whereas there are actually different forms of intelligence for different use cases. How is that linking to Recursive, which you’re building alongside it?

[Richard Socher]: In many ways there are different dimensions to intelligence, and there are different ways you can work on it and improve it. I think You.com never quite had the funding to build truly foundational frontier models. The next level now is that we all agree we should have one model, and that one model should get better and better and incorporate more and more ideas from humanity into AI. That’s great. But now people are doing manual context engineering, manual prompt engineering, manual single-model engineering of these large language models. They’re using a lot of their ideas, their intuitions to improve and have the next GPT 5.6 or 7 or 8. And each of them has a lot of manual processes associated with it.

At Recursive, we believe that it will be possible now to actually allow the AI to recursively self-improve with much, much less human intervention than any other lab is currently doing. And I believe, just like from feature engineering to architecture engineering to prompt engineering, this is the next — and maybe last — level we will need to actually get to superintelligence.

Q: What, in your view, is the line where we can identify something as fundamentally superintelligent? I remember you wrote a wonderful essay on AI safety which reframes how we as a society should think about the safety implications.

[Richard Socher]: I don’t know if there’s a single line. The way we should think about intelligence is in terms of a volume in a high-dimensional space. One axis is language intelligence, another axis is visual intelligence, another axis is motor intelligence. Then it gets harder to imagine more than 3D volumes, but a fourth dimension can be seen as social and interaction coordination intelligence. And yet another axis could be metacognition and your ability to really think about the box, think about that volume of intelligence.

The tricky bit with such a volumetric definition of intelligence — which I’m describing in my book; I haven’t published it yet, it’s going to come out this year — is that none of those dimensions are logically necessary nor sufficient. You can be blind and still be intelligent, and you can have no physical instantiation and the ability to manipulate physical matter, but you can still be intelligent. I think humans have very much tried to define intelligence based only on our own and maybe other biological intelligence, but clearly intelligence is a concept and an ability that transcends biology and can be recreated in different ways.

Q: In your essay, you talked about superintelligence having the capability to effectively give us the best-in-class defence, and that multiple superintelligences bring the net risk of P(doom) towards zero. Can you talk through that, because reading your essay genuinely changed my position on the matter?

[Richard Socher]: Again, maybe there’s a slight contrarian in me that can help point these things out. But I think a lot of the P(doom) crowd — they say, oh, we have the superintelligence, and then A) it immediately wants to kill everyone, which is totally unclear why more intelligence leads to more wanting to kill everyone, and B) because it’s superintelligent, it will have this just instantaneous magic ability to murder us all. Let’s unpack that.

The best sort of steelman argument would be a biological attack. Somehow the AI wants to do it, maybe humans convince it to do it, and it’s superintelligent. Even though you would assume that when it’s that intelligent, it knows that it needs humans to maintain the servers and so on, and it doesn’t make sense to kill them all. But it’s as similar and silly as the paperclip scenario — somehow it’s so intelligent that it could literally wipe out all humanity, but it’s not intelligent enough to know that if no one wants to buy your paperclips, you don’t need to produce any more. It’s this weird dichotomy of superintelligence and super stupidity, sort of mixed in one scenario.

And I think what’s more realistic is: okay, humans trade this technology and this incredibly intelligent system to try to build this supervirus. Now of course, if the virus immediately kills anyone that touches it, in an instant, then it probably can’t spread. Viruses like the Spanish flu used to be deadly, and now it’s just the flu. We have it every season, and it becomes less and less deadly. But let’s imagine you could have this incredibly sophisticated virus that first spreads everywhere on Earth and then magically switches and kills everyone. If someone could conceive of an AI that has this near-magical, perfect programmability of such a virus that doesn’t currently exist — some magical digital switch that turns biology on in every human — if we were to have such a technology, I would argue that the resources that the crazy people have who would want to do such a thing would be far smaller than the resources of all the people who would want to prevent it. And if it’s that cheap and easy and amazing to do it, to have this incredible capability of such a supervirus, I would just build a super vaccine against all such superviruses.

It used to be that an attacker had this massive advantage because they only have to be right once, and as a defender, you have to be right all the time. But if we can make what I think are realistic assumptions — that for any system, whether human or computational, there is a fixed number of potential attack vectors to such a system, and the cost of attacking any such vector tends closer and closer to zero — then I can, as a defender, have that same superintelligent AI attack myself and inoculate my system against every single attack. That equation — the cost being so low that I can run all of these finite numbers of attacks myself and then inoculate myself against those finite attack vectors — I think actually makes me very optimistic.

Q: How should we apply constructive optimism to these technologies? It doesn’t take more than a cursory glance at the news to be told that social media is causing everyone harm and AI is going to kill us all. What does that philosophy mean to you, and how does it change our relationship with technological change?

[Richard Socher]: I think I should probably write this out in more detail and collect my thoughts. To me, it’s a philosophy that says for most problems out there, the best solution is one where we use technology to solve those problems. Whenever we can, we should have that optimism that there is a solution. And when there is a solution and you have the agency to go work on that solution, you should go forth and do it.

I think there are a lot of philosophical schools now, especially in Europe, where the thinking is that energy mostly comes from dirty sources, so we should just stop using energy as much as we can and reduce our energy consumption maximally. Basically degrowth, even “have fewer humans.” I mean, there are people who are so extreme they say it’s unethical to have a baby because babies are bad for the environment — they create more CO₂, they need to be driven around, they need concrete so they have buildings, and that’s all unethical. That is such crazy anti-human, anti-progress, stone-age thinking. It’s surprising that fairly well-educated, smart people think that, because when you actually look at the history of all of humanity, things have gotten better and better over time. It’s never the most exciting news to break — it’s never exciting to say “hey, things got a little bit better here” — but that is what we’ve seen. The reason why we don’t have that many more rags-to-riches stories is that no matter how poor you are, you don’t really have to wear rags anymore. Why? Because technology automated the production of t-shirts and other clothing.

I think that idea actually makes me surprised that even people on the left side of the spectrum don’t embrace technology more, because it is objectively making things better. In many cases we can look to technology when we try to predict the future, both in terms of actual products and in terms of the human condition. We can often look at the wealthiest people, see what goods and services they have access to that are bottlenecked on intelligence or technology in general, and then make those goods and services more accessible to everyone. It used to be impossible to have a private chauffeur — well, now you have Uber, and most people can afford a ride with a private chauffeur. A personal assistant seemed way out of reach for most people — with AI, you can now have a personal assistant. And that will be true for personal tutors, personal healthcare teams that read all the latest research papers and care just about your health — things that very wealthy people have access to, but most people don’t.

Q: Every major leap in technology removes the mundane aspects of human work and existence, meaning we can focus on what actually makes us unique. It feels like the technologies you and your peers are developing are removing the mundane so we can genuinely focus on being more human, not less.

[Richard Socher]: Exactly. Now, the interesting bit is that very few people have the luxury to contemplate those ideas at all. Most people are just thinking, “I need to have food on the table next month. I need to do my job.” And even if they hate their job, it gives them meaning because it allows them to survive and allows their families to survive, or maybe even thrive if they’re lucky. And so those kinds of folks, and the people who have a lot of empathy for them, say, “Well, you guys are in your lofty world and you’re not taking this seriously.” I do think we have to be realistic in that these phases of transition are and can be painful, and if they’re in the wrong social context, can become violent.

Look at the Luddites, who broke the weaving machines — “This weaving machine took our job, let’s go destroy it.” I think we’re going to see similar things, and are seeing them already, with AI, where it has a very direct, immediate impact and people can point to it. Someone who used to be able to charge a thousand dollars for an illustration can now be undercut — every blog now has an illustration, but the illustrators hate those kinds of outputs. And that is understandable. You need social systems that help people in those transition phases, because they’re very unpleasant. But long-term, it usually seems to be the case in history that after every wave of automation, humanity was better off. And in retrospect, no one wants to go back and say, “What’d be awesome is if we all work in the field, no matter if it’s sunny or rainy or snowy, today we work in the field” — which is what ninety-five per cent of people did a hundred and fifty years ago.

From Feature Engineering to Superintelligence: Richard Socher on the Next Frontier of AI

About the Author

From Feature Engineering to Superintelligence: Richard Socher on the Next Frontier of AI

Related posts:

About the Author

Up Next: Serendipity in Entrepreneurship