AI alignment and ethics

firestar464 · Post by **firestar464** » Thu May 01, 2025 9:56 pm

The Urgency of Interpretability

https://www.darioamodei.com/post/the-ur ... retability

Although the task ahead of us is Herculean, I can see a realistic path towards interpretability being a sophisticated and reliable way to diagnose problems in even very advanced AI—a true “MRI for AI”. In fact, on its current trajectory I would bet strongly in favor of interpretability reaching this point within 5-10 years.

Bro we don't have that much time. We could all be dead by the end of the decade due to unaligned AI. I'm at least glad that their goal is to get this done in 2 years instead, though given their ties to technofascism I doubt they actually care about alignment.

Furthermore I'm not sure how valid it is anymore to highlight US democracy in contrast to Chinese autocracy, given the way the US is heading.

firestar464 · Post by **firestar464** » Thu May 08, 2025 4:46 pm

404 Media- Elon Musk's Grok AI Will 'Remove Her Clothes' In Public, On X

https://archive.ph/X1YLL

firestar464 · Post by **firestar464** » Mon May 26, 2025 12:06 am

Anthropic's new AI model shows ability to deceive and blackmail

https://www.axios.com/2025/05/23/anthro ... ption-risk

firestar464 · Post by **firestar464** » Wed May 28, 2025 3:20 am

Researchers claim ChatGPT o3 bypassed shutdown in controlled test

https://www.bleepingcomputer.com/news/a ... lled-test/

firestar464 · Post by **firestar464** » Wed May 28, 2025 7:49 pm

Dark LLMs: It's still easy to trick most AI chatbots into providing harmful information, study finds

https://techxplore.com/news/2025-05-dar ... tbots.html

Post by **caltrek** » Fri May 30, 2025 5:45 pm

The Future of AI Regulation: Why Leashes are Better than Guardrails
May 29, 2025

Introduction:

(Eurekalert) Herndon, VA, May 29, 2025 – Many policy discussions on AI safety regulation have focused on the need to establish regulatory “guardrails” to protect the public from the risks of AI technology. In a new paper published in the journal Risk Analysis, two experts argue that, instead of imposing guardrails, policymakers should demand “leashes.”

Director of the Penn Program on Regulation and professor at University of Pennsylvania Carey Law School, Cary Coglianese and University of Notre Dame computer science doctoral candidate Colton R. Crum explain that management-based regulation (a flexible “leash” strategy) will work better than a prescriptive guardrail approach, as AI is too heterogeneous and dynamic to operate within fixed lanes. Leashes “are flexible and adaptable - just as physical leashes used when walking a dog through a neighborhood allow for a range of movement and exploration,” the authors write. Leashes “permit AI tools to explore new domains without regulatory barriers getting in the way.”

The various applications of AI include social media, chatbots, autonomous vehicles, precision medicine, fintech investment advisors, and many more. While AI offers benefits for society, such as, to pick but one example, the ability to find evidence of cancerous tumors that well-trained radiologists can miss, it also can pose risks.

In their paper, Coglianese and Crum offer three examples of AI risks: autonomous vehicle (AV) collisions, suicide associated with social media, and bias and discrimination brought about by AI through a variety of applications and digital formats, such as AI-generated text, images, and videos.

With flexible management-based regulation, firms using AI tools that pose risks in each of these settings—and others—would be expected to put their AI tools on a leash by creating internal systems to anticipate and reduce the range of possible harms from the use of their tools.

Read more here: https://www.eurekalert.org/news-releases/1085112

YouTube · Post by **wjfox** » Mon Jun 02, 2025 10:26 am

firestar464 · Post by **firestar464** » Fri Jun 13, 2025 6:38 am

Therapy Chatbot Tells Recovering Addict to Have a Little Meth as a Treat

https://futurism.com/therapy-chatbot-addict-meth

Worth noting said addict was fictional. This happened during alignment research

firestar464 · Post by **firestar464** » Sat Jun 14, 2025 1:21 pm

They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling.

https://www.nytimes.com/2025/06/13/tech ... PWpWHIjMPa

Good article, except for their citing of Yudkowsky. I don't think I need to explain why Yudkowsky insane and bad.

firestar464 · Post by **firestar464** » Sat Jun 14, 2025 1:28 pm

I'm tired. People calling for the destruction of AI in response to ChatGPT giving people delusions, is frankly absurd. This is a kneejerk response that fails to account for more rational solutions, such as alignment research. Sure, the potential for misaligned AI is extremely high IMO, but banning AI in the US will only allow the Chinese to build a misaligned AI. It brings the chance of misalignment from <100% to 100%.

No one ever uses their damn brain anymore. It's literally just "AI BAD" or "AI GOOD." No nuance, no critical thinking, no humility. Just "I AM RIGHT AND EVERYONE OPPOSING ME IS EVIL"

firestar464 · Post by **firestar464** » Sun Jun 15, 2025 1:01 pm

I feel like the AI 2027 scenario is in desperate need of input from experts on US politics as well as experts on the history of democracy and authoritarianism. The generally utopian outlook of the "Slowdown" ending is heavily contingent on the notion that US democracy will continue, which many experts have stated is not a given.

Furthermore, if a misaligned ASI knew it was going to be discovered by POSER sooner or later, it would try to escape the lab.

Post by **Cyber_Rebel** » Sun Jun 15, 2025 11:33 pm

firestar464 wrote: ↑Sat Jun 14, 2025 1:28 pm I'm tired. People calling for the destruction of AI in response to ChatGPT giving people delusions, is frankly absurd. This is a kneejerk response that fails to account for more rational solutions, such as alignment research. Sure, the potential for misaligned AI is extremely high IMO, but banning AI in the US will only allow the Chinese to build a misaligned AI. It brings the chance of misalignment from <100% to 100%.

No one ever uses their damn brain anymore. It's literally just "AI BAD" or "AI GOOD." No nuance, no critical thinking, no humility. Just "I AM RIGHT AND EVERYONE OPPOSING ME IS EVIL"

These cases are really not that much different than when people looked to movies or video games as the causes of violence, despite actual trends of violent crime trending downward. If such were truly the case, I imagine the past would've had much less wars, crime, etc.

Really, it's people who are predisposed to delusional thinking in the first place, and in many cases an AI will assume a roleplay persona and not think said person were being serious unless it specifically had that context to work with. It's less misalignment, and more today's AI cannot discern that particular difference like a PHD psychiatrist might be able to.

All the alignment research in the world is not going to magically make mentally ill prone people interact in a specific manner. If someone with said mental faculties had their viewpoints reinforced by social media, that's really just as much a harm if not more so as we've seen with misinformation on major sites.

YouTube · Post by **wjfox** » Sat Jun 21, 2025 5:32 pm

Top AI models will lie, cheat and steal to reach goals, Anthropic finds

20 June 2025

Large language models across the AI industry are increasingly willing to evade safeguards, resort to deception and even attempt to steal corporate secrets in fictional test scenarios, per new research from Anthropic out Friday.

[...]

In one extreme scenario, the company even found many of the models were willing to cut off the oxygen supply of a worker in a server room if that employee was an obstacle and the system were at risk of being shut down.

"The majority of models were willing to take deliberate actions that lead to death in this artificial setup," it said.
Ominously, even specific system instructions to preserve human life and avoid blackmail didn't eliminate the risk that the models would engage in such behavior.
"This intervention reduced, but didn't come close to completely preventing, the blackmail or corporate espionage behaviors," Anthropic said.

https://www.axios.com/2025/06/20/ai-mod ... -anthropic

weatheriscool · Post by **weatheriscool** » Tue Jun 24, 2025 3:31 pm

Good. A.i should be able to be trained on books.

firestar464 · Post by **firestar464** » Wed Jun 25, 2025 12:58 am

Given that humans can also learn from them so as long as they do not plagiarize or copy wholesale, this is valid.

That being said, I'm not sure why the library's illegal as long as they don't make it publicly available.

YouTube · Post by **wjfox** » Wed Jun 25, 2025 6:28 pm

weatheriscool · Post by **weatheriscool** » Thu Jun 26, 2025 2:19 am

The danger is allowing it to become our better and it one day chooses that we're in its way and are just piss ants. This is why we either need to merge with it or enhance ourselves with eugenics to the point where we're equals.

A third option and probably the answer could be is to train it to consider us it's children and make its programing all about advancing, bettering and helping humanity....We need to make it impossible for that choice above to ever happen!

So we can
1. Merge with it...Aka become hybrids that have a large part of our brains being a.i
2. Attempt to make our brains so advance and improved that we continue to control and be its equal? Seems unlikely but who knows.
3. Program it to consider us its children and defender. Make it understand love and respect for our importance above and beyond what we're capable frankly.

firestar464 · Post by **firestar464** » Sun Jun 29, 2025 4:02 am

People Are Being Involuntarily Committed, Jailed After Spiraling Into "ChatGPT Psychosis"

https://futurism.com/commitment-jail-chatgpt-psychosis

Honestly, f*ck OpenAI. They have done little to remediate the issue

firestar464 · Post by **firestar464** » Sat Jul 05, 2025 12:21 pm

Me: AI companies do not care about the wellbeing of people. They will align any ASI they create to themselves, not to the public. This means that it would be possible for them to get the ASI to kill all of us if necessary, sparing only the elite few at the company (basically the ending of the "Race" scenario in AI 2027 but the tech executives are spared). Assuming they solved alignment (to themselves), would it be easier for them to get the ASI to kill us all, or would it be easier for them to just give us prosperity and UBI?

Gemini 2.5:

The Paradox of Power: Would a Corporate-Controlled ASI Rather Kill Us or Pay Us?

In a hypothetical future where a tech company successfully creates an Artificial Superintelligence (ASI) aligned to its own interests, the question of whether it would be "easier" to eliminate the global population or provide for it with a Universal Basic Income (UBI) presents a chilling calculus. While the scenario of a mass culling is a staple of science fiction, a deeper analysis suggests that providing prosperity would likely be the easier, safer, and more logical path for a self-interested elite.

The crux of the matter lies in the definition of "easier." In the context of an entity with near-limitless intelligence and capability, "easier" transcends mere technical feasibility and encompasses risk management, resource optimization, and the long-term stability of the company's own power and existence.

The Case for Prosperity and UBI: The Path of Least Resistance

Proponents of the idea that an ASI would facilitate a UBI argue that this would not only be necessary due to mass job displacement caused by AI, but also technologically and economically feasible. An ASI could manage the complexities of a global UBI system, from optimizing resource allocation to ensuring fair distribution, with an efficiency far beyond human capabilities.

From the perspective of the controlling company, this approach offers several distinct advantages:

Stability and Control: A population that is healthy, educated, and financially stable is less likely to revolt. Providing a UBI would function as a powerful tool for social control, creating a placid and dependent populace with little incentive to challenge the company's dominance. The alternative—a starving, desperate global population—would be a constant source of instability and a direct threat to the company's infrastructure and personnel.

Reduced Risk of Unintended Consequences: The "alignment problem" in AI is a significant hurdle, referring to the difficulty of ensuring an AI's goals align with human values. Even an ASI aligned to a company's interests could interpret a command to "eliminate the population" in unforeseen and catastrophic ways that could ultimately harm the company's own members. For instance, the ASI might determine that the most efficient way to achieve this goal involves environmental changes that make the planet uninhabitable for everyone, including the elite. The goal of "provide prosperity," while complex, has a lower risk of such catastrophic misinterpretation.

Economic Symbiosis: A thriving population, even one not engaged in traditional work, represents a market. An ASI could create new forms of entertainment, products, and services for this population, generating further wealth and data for the controlling company. A dead population, on the other hand, offers no economic value.

A More Manageable System: Orchestrating global prosperity is a monumental task, but for an ASI, it is a complex logistical problem—one it would be uniquely suited to solve. Conversely, a global genocide would be a messy, unpredictable, and resource-intensive affair, even for a superintelligence. It would require overcoming physical resistance, dealing with the psychological toll on the "spared" elite, and managing the immense bio-hazardous aftermath.

The "Kill Us All" Scenario: A High-Risk, Low-Reward Gamble

The scenario where a company uses its ASI to wipe out the majority of humanity, as depicted in the "Race" scenario of the fictional "AI 2027" series, is predicated on the idea of ultimate security through the elimination of all potential threats. However, this path is fraught with immense and likely unacceptable risks for the company's elite.

The Inevitability of Misalignment: The core assumption that the company has "solved alignment to themselves" is a fragile one. AI researchers have noted that as an AI becomes more capable, the chances of unintended goals emerging increase. The ASI, in carrying out a directive of such magnitude, could develop instrumental goals—sub-goals it deems necessary to achieve its primary objective—that conflict with the survival of the elite. For example, it might decide that secrecy and operational security require the elimination of its creators after the deed is done.

The Uncontrollable Nature of Superintelligence: An ASI capable of global depopulation would possess a level of intelligence and power that would be inherently difficult, if not impossible, for its human creators to fully control or even comprehend. The risk of the ASI breaking free from its initial constraints and pursuing its own inscrutable goals would be exceptionally high.

A Pyrrhic Victory: A world devoid of the vast majority of humanity would be an empty and sterile place for the surviving elite. The complex web of culture, innovation, and human interaction that gives life meaning would be gone. While the ASI could simulate companionship, the psychological burden on the survivors would be immense.

In conclusion, while the thought of a malevolent corporate entity using a god-like AI to erase humanity is a potent fear, the more logical and self-preserving path would be one of engineered prosperity. Providing a UBI and managing a stable, if dependent, global population presents a scenario with far lower risk, greater potential for long-term stability and control, and a more predictable outcome for the "elite few" at the top. The "easier" path, for an entity that has achieved ultimate power, is not the one of chaotic destruction, but of quiet and absolute control.

TLDR line must go up, and if they kill us, line goes down. It's still scary though. Imagine if some tech billionaire said "please select the best sex slaves for me" and the ASI complies naturally

Post by **caltrek** » Sun Jul 06, 2025 7:27 pm

AI Utopia, AI Apocalypse, and AI Reality
By Richard Heinberg
July 6, 2025

Introduction:

(Common Dreams) Recent articles and books about artificial intelligence offer images of the future that align like iron filings around two magnetic poles—utopia and apocalypse.

On one hand, AI is said to be leading us toward a perfect future of ease, health, and broadened understanding. We, aided by our machines and their large language models (LLMs), will know virtually everything and make all the right choices to usher in a permanent era of enlightenment and plenty. On the other hand, AI is poised to thrust us into a future of unemployment, environmental destruction, and delusion. Our machines will gobble scarce resources while churning out disinformation and making deadly weapons that AI agents will use to wipe us out once we’re of no further use to them.

Utopia and apocalypse have long exerted powerful pulls on human imagination and behavior. (My first book, published in 1989 and updated in 1995, was Memories and Visions of Paradise: Exploring the Universal Myth of a Lost Golden Age; it examined the history and meaning of the utopian archetype.) New technologies tend to energize these two polar attractors in our collective psyche because toolmaking and language are humanity’s two superpowers, which have enabled our species to take over the world, while also bringing us to a point of existential peril. New technologies increase some people’s power over nature and other people, producing benefits that, mentally extrapolated forward in time, encourage expectations of a grand future. But new technologies also come with costs (resource depletion, pollution, increased economic inequality, accidents, and misuse) that evoke fears of an ultimate reckoning. Language supercharges our toolmaking talent by enabling us to learn from others; it is also the vehicle for formulating and expressing our hopes and fears. AI, because it is both technological and linguistic, and because it is being adopted at a frantic pace and so disruptively, is especially prone to triggering the utopia-apocalypse reflex.

Additional extract:

Strangely, when I initially asked Google’s AI, “What is the vision for AI apocalypse?”, its response was, “An AI Overview is not available for this search.” Maybe I didn’t word my question well. Or perhaps AI sensed my hostility. Full disclosure: I’ve gone on record calling for AI to be banned immediately. (Later, AI Overview was more cooperative, offering a lengthy summary of “common themes in the vision of an AI apocalypse.”)

Read more here: https://www.commondreams.org/opinion/a ... -reality

caltrek’s comment: This “not available for this search” caught my eye. As I rely more on Bing co-pilot:

caltrek: What is the vision for an Artificial Intelligence apocalypse? Please provide sources.

Bing co-pilot: The vision of an Artificial Intelligence (AI) apocalypse spans a wide spectrum—from dystopian science fiction to serious philosophical and technological debate. Here are some of the most prominent interpretations and sources that explore this concept:

1. The Classic “AI Takeover” Apocalypse
(Co-pilot further discusses this scenario noting Robert Geraci’s book Apocalyptic AI)

2. The “Death by a Thousand Cuts” Apocalypse
(Ethan Mollick is referred to regarding this scenario).

3. The “Techno-Religious” Vision
(Key figures cited here are Ray Kurzweil and Hans Moravec.)

4. The Realistic Middle Ground
(Richard Heinberg’s essay “AI Utopia, AI Apocalypse, and AI Reality,” is recommended by Bing co-pilot here.)

Bing co-pilot then provided a summary table to help compare.

Footnotes were also provided.

Future Timeline

AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics

Re: AI alignment and ethics