Should We Link an “AI Pause” to AI Interpretability?

To Pause or Not to Pause

You’ve probably heard about the controversial “AI pause” proposal. On March 22, more than 1,800 signatories – including Elson Musk of Tesla, cognitive scientist Gary Marcus and Apple co-founder Steve Wozniak – signed an open letter calling for a six-month pause on the development of AI systems “more powerful” than the GPT-4 AI system released by OpenAI.

Many have argued against a pause, often citing our competition with China, or the need for continuing business competition and innovation. I suspect some just want to get to the “singularity” as soon as possible because they have a quasi-religious belief in their own pending cyber-ascendance and immortality.

Meanwhile, the pro-pausers are essentially saying, “Are you folks out of your minds? China’s just a nation. We’re talking about a superbrain that, if it gets out of hand, could wipe out humanity as well as the whole biosphere!”

This is, to put it mildly, not fertile soil for consensus.

Pausing to Talk About the Pause

Nobody knows if there’s going to be pause yet, but people in the AI industry at least seem to be talking about setting better standards. Axios reported, “Prominent tech investor Ron Conway’s firm SV Angel will convene top staffers from AI companies in San Francisco … to discuss AI policy issues….The meeting shows that as AI keeps getting hotter, top companies are realizing the importance of consistent public policy and shared standards to keep use of the technology responsible. Per the source, the group will discuss responsible AI, share best practices and discuss public policy frameworks and standards.”

I don’t know if that meeting actually happened or, if it did, what transpired, but at least all this talk about pauses and possible government regulation has gotten the attention of the biggest AI players.

The Idea of an Interpretative Pause

But what would be the purpose of a pause? Is it to let government regulators catch up? To cool the jets on innovations that could soon tip the world into an era of AGI?

Too soon to tell, but columnist Ezra Klein suggests a pause for a reason: that is, to understand exactly how today’s AI systems actually work.

The truth is that today’s most powerful AIs are basically highly reticular black boxes. That is, the companies that make them know how to make these large language models by having neural networks train themselves, but these companies don’t actually know, except at a more general level, how the systems do what they do.

It’s sort like when the Chinese invented gunpowder. They learned how to make it and what it could do but this was long before humanity had modern atomic theory and the Periodic Table of Elements, which were needed to truly understand why things go kablooey.

Some organizations can now make very smart-seeming machines, but there’s no equivalent of a Periodic Table to help them understand exactly what’s happening at a deeper level.

A Pause-Worthy Argument

In an interview on the Hard Fork podcast, Klein riffed on a government policy approach:

[O]ne thing that would slow the [AI] systems down is to insist on interpretability….[I]f you look at the Blueprint for an AI Bill of Rights that the White House released, it says things like — and I’m paraphrasing — you deserve an explanation for a decision a machine learning algorithm has made about you. Now, in order to get that, we would need interpretability. We don’t know why machine learning algorithms make the decisions or correlations or inferences or predictions that they make. We cannot see into the box. We just get like an incomprehensible series of calculations.

Now, you’ll hear from the companies like this is really hard. And I believe it is hard. I’m not sure it is impossible. From what I can tell, it does not get anywhere near the resources inside these companies of let’s scale the model. Right? The companies are hugely bought in on scaling the model, and a couple of people are working on interpretability.

And when you regulate something, it is not necessarily on the regulator to prove that it is possible to make the thing safe. It is on the producer to prove the thing they are making is safe. And that is going to mean you need to change your product roadmap and change your allocation of resources and spend some of these billions and billions of dollars trying to figure out the way to answer the public’s concerns here. And that may well slow you down, but I think that will also make a better system.And so this is my point about the pause, that instead of saying no training of a model bigger than GPT 4, it is to say no training of a model bigger than GPT 4 that cannot answer for us these set of questions.

Pause Me Now or Pause Me Later

Klein also warns about how bad regulation could get if the AI firms get AI wrong. Assuming their first big mistake wouldn’t be their last one (something that’s possible if there’s a fast-takeoff AI), then imagine what would happen if AI causes a catastrophe: “If you think the regulations will be bad now, imagine what happens when one of these systems comes out and causes, as happened with high speed algorithmic trading in 2010, a gigantic global stock market crash.”

What Is Interpretability?

But what does it actually mean to make these systems interpretable?

Interpretability is the degree to which an AI can be understood by humans without the help of a lot of extra techniques or aids. So, a model’s “interpretable” if its internal workings can be understood by humans. A linear regression model, for example, is interpretable because your average egghead can fully grasp all it’s components and follow its logic.

But neural networks? Much tougher. There tend to be a whole lot of hidden layers and parameters.

Hopelessly Hard?

So, is interpretability hopeless when it comes to today’s AIs? Depends on who you ask. There are some people and companies committed to figuring out how to make these systems more understandable.

Connor Leahy, the CEO of Conjecture, suggests that interpretability is far from hopeless. On the Machine Learning Street Talk podcast, he discusses some approaches for how to make neural nets more interpretable.

Conjecture is, in fact, dedicated to AI alignment and interpretability research, with its homepage asserting, “Powerful language models such as GPT3 cannot currently be prevented from producing undesired outputs and complete fabrications to factual questions. Because we lack a fundamental understanding of the internal mechanisms of current models, we have few guarantees on what our models might do when encountering situations outside their training data, with potentially catastrophic results on a global scale.”

How Does Interpretability Work?

So, what are some techniques that can be used to make the neural networks more interpretable?

Visualization of Network Activations

First, there’s something called visualization of network activations, which helps us see which features the neural network is focusing on at each layer. We can look at the output of each layer, which is known as a feature map. Feature maps show us which parts of the input the neural network is paying attention to and which parts are being ignored.

Feature Importance Analysis

Second, there’s feature importance analysis, which is a way of figuring out which parts of a dataset are most important in making predictions. For example, if we are trying to predict how much a house will sell, we might use features like the number of bedrooms, the square footage, and the location of the house. Feature importance analysis helps us figure out which of these features is most important in predicting the price of the house.

There are different ways to calculate feature importance scores, but they all involve looking at how well each feature helps us make accurate predictions. Some methods involve looking at coefficients or weights assigned to each feature by the model, while others involve looking at how much the model’s accuracy changes when we remove a particular feature.

By understanding which features are most important, we can make better predictions and also identify which features we can ignore without affecting the accuracy of our model.

Saliency Maps

Third are saliency maps, which highlight the most important parts of an image. They show which parts are most noticeable or eye-catching to us or to a computer program. To make a saliency map, we look at things like colors, brightness, and patterns in a picture. The parts of the picture that stand out the most are the ones that get highlighted on the map.

A salience map can be used for interpretability by showing which parts of the input image activate different layers or neurons of the network. This can help to analyze what features the network learns and how it processes the data.

Model Simplification

Model simplification is a technique used to make machine learning models easier to understand by reducing their complexity. This is done by removing unnecessary details, making the model smaller and easier to interpret. There are different ways to simplify models, such as using simpler models like decision trees instead of complex models like deep neural networks, or by reducing the number of layers and neurons in a neural network.

Simplifying models helps people better understand how the model works, but simplifying models too much can also cause problems, like making the model less accurate or introducing mistakes. So, it’s important to balance model simplification with other methods such as visualizations or explanations.

Then There’s Explainability

I think of explainability as something a teacher does to help students understand a difficult concept. So, imagine a heuristic aimed at helping students understand a model’s behavior via natural language or visualizations.

It might involve using various techniques such as partial dependence plots or Local Interpretable Model-agnostic Explanations (LIME). These can used to reveal how the inputs and outputs of an AI model are related, making the model more explainable.

The Need to Use Both

Interpretability is typically harder than explainability but, in practice, they’re closely related and often intertwined. Improving one can often lead to improvements in the other. Ultimately, the goal is to balance interpretability and explainability to meet the needs of the end-users and the specific application.

How to Get to Safer (maybe!) AI

No one knows how this is all going to work out at this stage. Maybe the U.S. or other governments will consider something along the lines Klein proposes, though my guess is that it won’t happen that way in the shorter term. Too many companies have too much money at stake and so will resist an indefinite “interpretability” pause, even if that pause is in the best interest of the world.

Moreover, the worry that “China will get there first” will keep government officials from regulating AI firms as much as they might otherwise. We couldn’t stop the nuclear arms race and we probably won’t be able to stop the AI arms race either. The best we’ve ever been able to do so far is slow things down and deescalate. Of course, the U.S. has not exactly been chummy with China lately, which probably raises the danger level for everyone.

Borrow Principles from the Food and Drug Administration

So, if we can’t follow the Klein plan, what might be more doable?

One idea is to adapt to our new problems by borrowing from existing agencies. One that comes to mind is the FDA. The United States Food and Drug Administration states that it “is responsible for protecting the public health by ensuring the safety, efficacy, and security of human and veterinary drugs, biological products, and medical devices; and by ensuring the safety of our nation’s food supply, cosmetics, and products that emit radiation.”

The principles at the heart of U.S, food and drug regulations might be boiled down to safety, efficacy, and accuracy:

The safety principle ensures that food and drug products are safe for human consumption and don’t pose any significant health risks. This involves testing and evaluating food and drug products for potential hazards, and implementing measures to prevent contamination or other safety issues.

The efficacy principle ensures that drug products are effective in treating the conditions for which they’re intended. This involves conducting rigorous clinical trials and other studies to demonstrate the safety and efficacy of drugs before they can be approved for use.

The security principle ensures that drugs are identified and traced properly as they move through the supply chain. The FDA has issued guidance documents to help stakeholders comply with the requirements of the Drug Supply Chain Security Act (DSCSA), which aims to create a more secure and trusted drug supply chain. The agency fulfills its responsibility by ensuring the security of the food supply and by fostering development of medical products to respond to deliberate and naturally emerging public health threats.

Focus on the Safety and Security Angle

Of those three principles, efficacy will be the most easily understood when it comes to AI. We know, for example, that efficacy is not a given in light of the ability of these AIs to “hallucinate” data.

The principles of safety and security, however, are probably even more important and difficult to attain when it comes to AI. Although better interpretability might be one of the criteria to establishing safety, it probably won’t be the only one.

Security can’t be entirely separated from safety, but an emphasis on it would help the industry focus on all the nefarious ends to which AI could be used, from cyberattacks to deepfakes to autonomous weapons and more.

The Government Needs to Move More Quickly

Governments seldom move quickly, but the AI industry is now moving at Hertzian speeds so governments are going to need to do better. At least the Biden administration has said it wants stronger measures to test the safety of AI tools such as ChatGPT before they are publicly released.

Some of the concern is motivated by a rapid increase in the number of unethical and sometimes illegal incidents being driven by AI.

But how safety can be established isn’t yet known. The U.S. Commerce Department recently said it’s going to spend the next 60 days fielding opinions on the possibility of AI audits, risk assessments and other measures. “There is a heightened level of concern now, given the pace of innovation, that it needs to happen responsibly,” said Assistant Commerce Secretary Alan Davidson, administrator of the National Telecommunications and Information Administration.

Well, yep. That’s one way to put it. And maybe a little more politically circumspect than the “literally everyone on Earth will die” message coming from folks like decision theorist Eliezer Yudkowsy.

PS – If you would like to submit a public comment to “AI Accountability Policy Request for Comment,” please go to this page of the Federal Register. Note the “Submit a Formal Comment” button.