No, Attention Is NOT All We NEED!

Jun 14
17 min read

Updated: Jul 10

Attention writes the bottom layer of reality.

The 2017 Google paper “Attention Is All You Need” did more than change artificial intelligence and kickstart the LLM race It gave us a clean metaphor for modern life.

The core idea is simple. And it's a dark one. On the first pass, in the first draft, we do not distribute out our attention based on actual importance, or based on what we want, desire or truly need. We distribute it to what scares us, what soothes us, what angers us, what pacifies us, and to who ignores and mistreats us each day rather than to the hundreds of others who treated us with kindness that same day? In some cases, such as with close relations, our loved ones, we (due to not processing past traumas) fixate on the moments the people we love mistreat us, rather than on all the many ways they show they care for us.

We can learn to overcome this, and to build a collective pre-frontal cortex in order to get what we actually all want out of life. Its simple, but it's sadly not easy.

First, lets look at how modern LLMs work, how Google's paper taught them to work. A transformer does not treat every word equally. It looks across the whole context and decides what matters most. Some tokens get high weight. Others fade into the background. But sadly, its definition of what matters is not all what really matters.

Meaning emerges from what the system attends to.

That is also how politics, media, culture, and power work now.

The thing that gets attention gets processed.The thing that gets processed gets remembered.The thing that gets remembered gets weighted.The thing that gets weighted starts shaping decisions.

Attention is not truth. It is not morality. It is not wisdom.

Attention is weighting.

And whoever controls the weights controls the first draft of reality.

Trump is a high-weight token

Trump understood this better than almost any modern politician.

Traditional politicians often move sequentially. They make arguments. They release policy papers. They speak in polished paragraphs. They assume voters are following point A to point B to point C.

Trump did something different.

He became the token every other token had to attend to.

Supporters attended to him. Opponents attended to him. News networks attended to him. Courts attended to him. Donors, comedians, late-night hosts, social media platforms, rivals, voters, and entire institutions kept computing around him.

That is power.

He didn’t need everyone to like him. He needed centrality.

In a transformer, the highest-weight token does not have to be beautiful, good, accurate, or noble. It just has to be useful for prediction. In public life, the same rule applies. The person everyone must react to becomes structurally powerful, even when half the attention is disgust.

Bad attention still trains the system.

That is the trap his opponents often missed. Every rage-share, every quote-tweet, every cable news panel, every “can you believe he said this?” kept him inside the national context window.

The model doesn’t ask, “Was this token loved?”

It asks, “Was this token useful?”

Trump turned himself into the token without which the whole sequence seemed harder to interpret.

Attention behaves like gravity

Attention behaves like gravity.

In general relativity, mass tells spacetime how to curve, and spacetime tells mass how to move.

In our media ecosystem, outrage tells attention how to curve, and attention tells culture how to move.

When a person, scandal, slogan, or movement becomes dense enough, it turns into a cultural black hole. It does not matter whether the light entering it is praise, fact-checking, mockery, or hatred. It all gets trapped in the accretion disk.

You don’t debate a gravity well.

You orbit it.

That is why bad attention is still attention. Moral disgust does not cancel the pull. In fact, based on what each person perceives as their most prominent flaws, once it reaches our Amygdala, it far outweights any possible pushback. And anything that triggers a past traumatic event often does just that.

Association with past traumas, and with perceived personal weaknesses, logarithmically increases it. The system does not care about the moral valence of the mass. It calculates the force. And in doing so, it can manifest the outcome we least want. Dialetical Behavioral Therapy techniuques such as mindfulness and radical acceptances are some of the best plates of armor one can build against this weapon.

[Link to XKCD Comic About The Amount of Energy in Uranium Here] - XKCD is the first blog that proved to the western masses that the best ideas can consistently be communicated with fewer than V stick figure illustrations.

This is why Trump’s media strategy worked. He did not simply seek approval. He sought gravitational dominance.

Love him or hate him, he occupied space in people’s heads.

And in the attention economy, occupying space is half the war.

Multi-head attention explains political identity

The transformer paper introduced multi-head attention, where the model looks at the same sequence through several different lenses at once.

One attention head might track grammar. Another tracks long-range dependency. Another tracks emotional salience. Another tracks subject-object relationships.

Humans do the same thing socially.

One person may attend to Trump as a businessman. Another sees an entertainer. Another sees a fighter. Another sees a villain. Another sees a meme. Another sees revenge. Another sees danger. Another sees protection. Another sees chaos. Another sees freedom from polite institutional language.

Same token, different heads.

This is why fact-checking often fails politically.

One head hears the factual claim. Another hears dominance. Another hears humor. Another hears tribal loyalty. Another hears, “He annoys the people I dislike.” Another hears, “He says what others are afraid to say.”

The content is not just the sentence.

It is the emotional payload, the status signal, the tribe marker, the performance, and the aesthetic.

A political figure becomes powerful when different attention heads can project different meanings onto the same object.

Trump was not one message.

He was a many-headed token.

Positional encoding is timing

Transformers need positional encoding because order matters.

“Dog bites man” is not “man bites dog.”

The same words mean different things depending on where they sit in the sequence.

Politics works the same way.

A scandal before breakfast is not the same as a scandal after a debate. A joke in 2015 lands differently after COVID, inflation, January 6, wars, AI disruption, and institutional distrust.

Timing changes meaning.

Trump’s gift was not only grabbing attention. It was timing attention. He knew when to provoke, when to insult, when to perform victimhood, when to flood the zone, and when to force everyone else to respond.

He treated the national timeline like a stage.

That is positional encoding in politics.

The same statement can be ignored, explosive, funny, dangerous, or historic depending on when it lands.

Social media is a giant attention layer, with the internet as it current day foundation and something akin to the metaverse one plausible future successor (if we come to a point that thats where we decide is the best way to expend our attention).

Social media does not mainly sort the world by truth, beauty, civic value, or wisdom.

It sorts by predicted engagement.

The platform is always asking one question.

What should this user attend to next?

That question sounds harmless. It is not.

Because when attention becomes the main routing system, the world begins to reorganize around whatever captures it most efficiently.

Outrage captures it. Fear captures it. Sex captures it. Tribal threat captures it. Humiliation captures it. Conflict captures it. Status competition captures it.

So the system learns.

Not because it is evil in a comic-book way. Because it is optimizing for the wrong target.

Social media is like a casino dealer that knows exactly which card will keep you seated. It does not need to make you happy. It only needs to keep you playing.

And once the feed learns that your amygdala is easier to hijack than your judgment, it keeps serving the amygdala.

The cultural brain is being rewired

We can map multi-head attention and positional encoding onto neuroplasticity.

Neurons that fire together, wire together. So do people, and the systems built from groups of people working together as one unit, one team. one family, and optimistically one day, as one species.

Every time an algorithm serves a highly weighted piece of outrage bait, it forces a collective action potential across the population. The same pathways keep firing. The same tribal circuits get reinforced. The same fear-based reactions become faster, smoother, and more automatic.

We are building deep neural pathways in the cultural mind.

Our societal multi-head attention has overtrained the threat-detection head and the tribal-signaling head. Those pathways now have thick myelin. They fire instantly.

Meanwhile, the nuance head, the patience head, the empathy head, and the long-term-planning head are being pruned from disuse.

That is how a society becomes reactive.

Not all at once. Not through one law or one villain. It happens through repeated weighting.

Attention becomes habit. Habit becomes architecture. Architecture becomes destiny.

The behavioral sink problem

This is where the analogy gets darker.

Social media is our cognitive version of Calhoun’s Universe 25, the famous mouse experiment where animals had plenty of food and water but collapsed under pathological social density.

The mice did not die from starvation.

They broke down from social friction.

Hyper-aggressive males emerged. Others withdrew completely. Social roles collapsed. The system became too crowded, too stimulated, too unstable.

Now look at us.

We live in infinite informational density. We are exposed to the suffering, rage, beauty, stupidity, grief, war, luxury, humiliation, and success of billions of people every day.

No nervous system was built for that. We have infinite agency, but the decision maker is a procrastinaging monkey that can only be reined in by a panic monster.

[Link to WaitButWhy blogpost about procrasinators being controlled by a monkey and panic monsters] (there's only one panic monster shown here but there are actually several different panic monsters we can use as Tim Urban (the inspiration for my blog) found on how his own while racing to publish his book by the time he became a dad] - and some effective cheaty bypass codes that I covered on my blog earlier).

The result is cultural overactivation. Some people become hyper-reactive tribal combatants.

Others withdraw into private algorithmic caves, grooming their identities and avoiding the social world.

The beautiful ones now have ring lights.

The aggressive ones have podcasts.

And the feed keeps pumping.

Ideas become real when attention stacks

Things do not happen simply because they are true.

They happen when enough attention aligns.

A policy becomes real when voters, donors, journalists, staffers, courts, agencies, lobbyists, and activists all attend to it at once.

A startup becomes real when customers, investors, engineers, press, and early believers assign weight to the same possibility.

A scientific idea becomes real when funders, reviewers, journals, clinicians, patients, and industry all start treating it as worth building around.

This is almost quantum.

Before attention collapses around them, policies, startups, movements, and scientific ideas exist in superposition. They are possibilities. They float.

Then enough people look at the same time.

The wave function collapses.

The thing becomes real.

Attention is the routing system for resources.

Money follows attention.Talent follows attention.Institutions follow attention.Legitimacy follows attention.Law follows attention.Memory follows attention.

Not always fairly. Not always wisely. But predictably.

Attention hacking is power

The dark lesson is that attention can be gamed.

In LLMs, attention helps the model identify relevance.

In society, attention becomes dangerous because relevance can be manufactured.

A person can become important by becoming impossible to ignore.A bad idea can become powerful by becoming endlessly arguable.A lie can win partial victory by forcing everyone to repeat its shape.

That is the central political trick.

You don’t need people to believe the whole thing. You just need them to rehearse it. You need them to repeat the frame, argue inside the frame, emotionally react to the frame, and drag the frame into every room.

At that point, you have already won something.

You have set the context.

Attention is tempo

There is another useful analogy from gaming.

Trump played politics like tempo decks in all of the popular collectable card games including MTG.

A tempo deck does not necessarily have the most impressive late-game plan. It wins by forcing the opponent to spend every turn responding.

You play a cheap threat.They answer it.You play another.They answer that.You keep dictating the pace until they never get to play their real strategy.

That was Trump’s political genius.

He forced opponents, journalists, institutions, and comedians to spend their mana reacting to him. They may have had better policies, better credentials, better arguments, and more institutional approval.

But they were not setting the pace.

He was.

In attention politics, the person who controls tempo controls the game.

The current system has no prefrontal cortex

The human brain does not function by letting every impulse become action.

If it did, society would collapse by lunchtime.

The prefrontal cortex inserts a gap between stimulus and response. It does not prevent the amygdala from firing. It does not censor emotion. It holds the impulse long enough to ask, “Is this actually a tiger, or is it just a moving bush?”

Our media system has no equivalent.

We have an infinitely scrolling amygdala.

No pause.No inhibition.No working memory.No future modeling.No cost for being wrong.No penalty for emotional arson.

We do not need crude censorship. That would be a lobotomy.

We need engineered friction.

We need a cultural prefrontal cortex.

Algorithmic GABA

Right now, the internet behaves like an unchecked glutamate storm.

An excitatory signal appears, outrage, fear, disgust, humiliation, and the platform opens every channel. It spreads the signal as fast as possible.

That is how you induce a cultural seizure.

The fix is not to hide everything disturbing. The fix is to build structural inhibition.

Algorithmic GABA.

When a piece of content shows a massive emotional gradient and starts spreading exponentially, the platform should apply a velocity dampener. Not deletion. Drag.

Think of control rods in a nuclear reactor.

You do not destroy the reactor. You slow the chain reaction before it melts the system.

A high-outrage post could still travel, but through a higher-friction medium. More context.

Slower sharing. Prompts before reposting. Reputation overlays. Cooling periods. Routing away from impulsive amplification.

The goal is not silence.

The goal is time.

A functioning brain needs milliseconds of inhibition to avoid disaster. A functioning society needs the same thing at scale.

The algorithmic ACC

The anterior cingulate cortex, or ACC, acts like the brain’s conflict detector.

When two incompatible impulses compete, the ACC flags the conflict. It does not choose the final action. It says, “There is a routing problem here. Allocate more control.”

Social media needs an algorithmic ACC.

When two massive tribal clusters engage with the same event but produce opposite semantic interpretations, platforms currently treat that as profitable engagement. They accelerate both sides.

That is insane.

An algorithmic ACC would recognize high-variance conflict and move the topic out of the fast-path feed into an evaluation layer.

Not censorship.

Triage.

The system would say, “This topic is structurally unstable. Slow the spread. Add context. Surface cross-factional agreement. Track claims. Route carefully.”

Right now, platforms treat conflict like fuel.

A healthier system would treat conflict like heat.

Useful in controlled amounts. Dangerous when uncontained.

Lateral inhibition and shared reality - The key to returning to our Premodern Strengths as a Species, a Culture, and a People.

Once the brain detects conflict, it needs a way to resolve action selection. The more time you can buy to engage this process, the more subconscious and supraconscious algorithms you can use to review and proofread them, to improve them, to help you achieve the goal that would actually make you truly happy, fulfilled and excited. Like a book you escape into to avoid a terrifying obstacle that you should instead work to overcome. (I binge-read all of the Harry Potter books over 36 sleepless hours 4 days before I took my USMLE Step 1), and I did so despite being more of a sci fi guy, just because I wanted to escape into a world that does involve me having to answer so many tedious questions all together in a row in one sitting, a great filter to select for those people who have enough dopamine to spend 36 sleepless hours on call in one sitting). True story, and I would not recommend it.

But I would absolutely recommend spending 3 hours in a dark movie theater watching watching whatever movie catches your eye the afternoon before any major exam, obstacle or interview, and then setting aside 2 hours to review through all your material one last time, and 10 full hours allocated to staying up awake in bed anxiously thinking about how everything you just read told you that literally everything in your body works better after 8 hours of deep restful sleep. Okay, not that second part. But a definite thumbs up to a 3 hour movie, if done with a pair of headphones in your pocket. Because if you were really over exhausting yourself by studying that sleep was that essential, you would have fallen asleep during the movie itself as soon as you put the headphones on just to catch a quick doze (ideally with a note taped to your shirt instructing whoever finds your comatose body kindly requesting to be left alone to sleep for alteast another 2 hours).

One mechanism is lateral inhibition. As one neural pathway wins, it suppresses competing pathways. This helps the system settle instead of firing everything at once.

Social media has almost no lateral inhibition.

Every impulse fires forever. Every claim survives as content. Every correction becomes more content. Every outrage generates counter-outrage. The system never cools.

It is like a spin glass permanently pumped with heat.

A healthier platform would reward verified consensus across historically opposed groups. If a Community Note, Pol.is statement, or prediction-market-backed claim earns agreement from people who usually disagree, the system should boost that bridge and suppress nearby low-quality noise.

Cross-factional agreement is rare and valuable.

It is the civic equivalent of finding a stable molecule in a chemical storm.

The vmPFC as a truth exchange

The ventromedial prefrontal cortex helps the brain compare unlike things.

Do I eat the donut now, or protect my A1C later?

Those are different categories. Taste, memory, glucose, identity, future health, comfort, shame, pleasure. The vmPFC converts them into a shared internal currency of expected value.

Culture needs that too.

Right now, opposing political groups often cannot even trade in the same currency. One side argues harm. Another argues freedom. One argues fairness. Another argues order. One argues identity. Another argues efficiency.

They are not merely disagreeing. They are pricing reality in different currencies.

Prediction markets offer one possible bridge.

You do not ask every faction to agree on moral meaning. You ask people to stake visibility, credibility, or money on falsifiable outcomes.

Will this policy reduce crime?

Will this claim hold up?

Will this prediction happen by a certain date?

Will this alleged scandal be verified?

Will this economic claim survive contact with data?

Reality eventually resolves the bet. That creates a shared pressure system. Not perfect. But better than endless performance.

Make being wrong expensive

The amygdala lives in the present.

The prefrontal cortex remembers the past and models the future.

Our current media system has engineered amnesia.

A pundit can predict catastrophe, be wrong, and return the next day with the same reach.

An influencer can spread false outrage, delete the post, and keep growing. A politician can flood the zone with nonsense and suffer no attention penalty.

There is no metabolic cost to being wrong.

That has to change.

If someone broadcasts claims to millions of people, their future algorithmic weight should be shaped by their track record.

Not by whether they are polite. Not by whether they are establishment-approved. Not by whether they are popular.

By whether they repeatedly map reality well.

Think of it like a credit score for public claims. Or like a supply-chain audit for information. If a factory keeps shipping poisoned food, it does not get the same distribution privileges. If an account keeps shipping poisoned claims, it should not get the same routing weight.

Attention should be earned, not endlessly farmed.

Cross-factional synthesis

The current objective function of social media is gradient descent into tribalism.

The algorithm finds the steepest path to in-group validation and out-group hatred.

A better system would optimize for synthesis.

Community Notes on X and Pol.is in Taiwan point toward this. Their most interesting feature is not that they add context. It is that they can reward statements that earn trust across groups that usually disagree.

That is executive function.

The system detects bridges between isolated neural clusters.

A society does not become sane because everyone agrees. It becomes sane when disagreement can be routed through shared reality instead of endless tribal hallucination.

The best content layer would not ask, “What enrages one side most?”

It would ask, “What can opposing sides both recognize as real?”

That is a much better target.

A PID controller for culture

Right now, society behaves like a bad proportional controller.

It reacts wildly to the present error.

Crime rises, overcorrect. Markets crash, overcorrect. A scandal breaks, overcorrect. A new technology appears, panic. A new moral panic appears, panic harder.

A mature system would behave more like a PID controller.

It would look at the present signal, the accumulated past, and the future trajectory.

Where are we now? What pattern led us here? Where is this going if nothing changes?

That is what a cultural prefrontal cortex should do.

Not just react. Remember. Predict. Counter-steer.

A platform with this architecture would monitor not only what is trending, but where the trend is pushing the collective mood. If the system sees acceleration toward panic, violence, false certainty, or mass harassment, it applies drag before the crash.

Not after.

Don’t beg incumbents to fix it

Meta and X own much of the user data and routing logic.

They are structurally bad candidates for fixing the problem because slowing transmission harms ad revenue. Asking them to voluntarily install algorithmic GABA is like asking a tapeworm to prescribe the dewormer.

So the better path is architectural.

You either modify the host organism from the outside, or you break its monopoly on the algorithm.

The retrovirus strategy

A retrovirus does not build its own cell.

It slips into the host and hijacks the machinery.

That is one way to bootstrap a cultural prefrontal cortex.

Build an epistemic ledger as an overlay, a browser extension, decentralized protocol bridge, or feed layer that sits on top of incumbent platforms.

When a pundit posts a highly charged claim, the overlay shows their historical accuracy score, related prediction markets, claim history, and cross-factional consensus signals directly in the visual field.

The host platform still pays to distribute the content.

But the overlay changes the attention weight.

You hijack the routing without needing to own the whole network.

That is how you begin.

Break the cultural default mode network

Meta and X are the default mode network of culture.

They are the loops we fall back into when we are bored, anxious, angry, lonely, or ruminating.

In the brain, the default mode network can become rigid. Psychedelic research suggests that one possible therapeutic mechanism is temporarily loosening those rigid pathways, allowing new routes between brain networks.

The software equivalent is federated architecture.

Protocols like AT Protocol matter because they separate the data layer from the algorithm layer. You no longer need to build a billion-dollar platform to compete with the feed. You can build a better routing system on top of portable identity and social graphs.

That is huge.

It means people can unplug from the incumbent attention machine without losing their entire social world.

The future is not one perfect platform.

It is many competing algorithms, with users able to choose how their attention gets routed.

Public financing and attention democracy

There is also a political version of this.

You do not fix a captured political system by asking the captured class to become noble.

That is fantasy.

If money buys attention, and attention buys viability, then privately funded politics will keep producing candidates optimized for donor gravity.

A more radical model would tax automation and use some of that money to fund universal basic income and fully public campaigns.

Ban PACs. Cap individual and corporate contributions at a tiny amount. Make candidates run on public funds, AI tools, small teams, volunteers, and ideas that actually inspire people.

Trump proved something important here.

He showed that a politician can be outspent, outstaffed, hated by major institutions, and still dominate through attention. That proves the old consulting-heavy campaign model is more fragile than it looks.

But he also showed the danger.

Negative attention can be abused by someone with no interest in civic health.

So the answer is not to reject attention politics. That will not work.

The answer is to make the attention layer healthier, more accountable, and harder to hijack.

The real objective function

The core problem is not that humans are stupid.

It is that our collective attention is being routed through a primitive objective function.

Engagement is not wisdom.

Virality is not legitimacy.

Familiarity is not truth.

Centrality is not leadership.

Outrage is not importance.

Right now, we are a giant biological LLM optimizing for the wrong target.

We have built a machine that treats attention as the highest good, then acted surprised when the loudest, angriest, most emotionally manipulative signals win.

A better society would train better weights.

Who gets repeated? What gets covered? Which problems get premium mental real estate?Which people become unavoidable? Which ideas are allowed to remain marginal? Which claims get slowed down before they infect the whole system?

Attention is like sunlight in a garden. Whatever receives it grows.

But it is also like oxygen in a fire. Feed the wrong thing and it spreads.

So the real lesson is not “ignore everything.”

The lesson is to stop giving premium attention to every explosion.

Build friction. Build memory. Build reputation. Build cross-factional synthesis. Build algorithmic inhibition. Build markets that punish false certainty. Build protocols that let people choose better feeds. Build a cultural prefrontal cortex.

Because attention is not all we need.

But without attention, almost nothing happens.