Trust AI: Play Sudoku

While AI (artificial intelligence) holds exciting promise for manufacturing, we are not quite yet at the place where AI is ready for primetime applications. While AI can often serve as guiding tools, we need our workforce to steer the ship in manufacturing. Still, one big lingering question remains: Can we truly trust AI? One research team aims to answer this question by playing Sudoku.

Researchers at the University of Colorado Boulder decided to find out if AI tools such as OpenAI’s ChatGPT or Google’s Gemini can solve your morning sudoku. Let’s dig into the results.

All about the Research and Results

The team developed almost 2,300 original sudoku puzzles. For those who might not know what sudoku is, human players enter numbers into a grid following certain rules. Often the game is played on written paper, but there are online games as well.

The researchers created sudoku puzzles of varying difficulty using a six-by-six grid. They gave the puzzles to a series of AI models and asked several AI tools to fill in the sudoku puzzles.

The results were telling. OpenAI’s o1 model led the pack, solving roughly 65% of the sudoku puzzles correctly. Then the team asked the AI platforms to explain how they got their answers.

While some of the AI models could solve easy sudokus, even the best struggled to explain how they solved them—giving garbled, inaccurate, or even surreal descriptions of how they arrived at their answers.

Ashutosh Trivedi, a co-author of the study and associate professor of computer science at CU Boulder, says sometimes the AI explanations even made-up facts, such as saying a two can’t be here because there was already a two in the same row, when that wasn’t actually the case.

In one case, the researchers were asking one of the AI tools about solving a puzzle and the AI responded with a weather forecast. Yikes.

How Humans and AI Think Differently

These particular puzzles require a very human way of thinking and illustrate a major flaw that still exists with AI. A sudoku grid requires puzzlers to learn and follow a set of logical rules. For example, you can’t enter a two in an empty square if one already exists in the same row or column. Most LLMs (large language models) today struggle at that kind of thinking, in large part because of how they’re trained.

To build ChatGPT, for example, programmers first fed the AI almost everything that had ever been written on the internet. When ChatGPT responds to a question, it predicts the most likely response based on all that data—almost like a computer version of rote memory.

Why the Decisionmaking Process Is Important

It is truly interesting when you think about it. The decisionmaking process is a complex one—something we often want to argue with such as when we were in fourth grade arguing with our teacher that we got the right answer, and our teacher argued back, saying we didn’t show our work. The teachers were on to something there, perhaps.

Coauthor Fabio Somenzi, professor in the Department of Electrical, Computer and Energy Engineering, points to an example that could have repercussions in the real world: “If you have AI prepare your taxes, you want to be able to explain to the IRS why the AI wrote what it wrote.”

To address this, researchers aim to merge those two ways of thinking, combining the memory of a LLM with a human brain’s capacity for logic, which is known as neurosymbolic AI.

The researchers hope to design their own AI system that can do it all, solve complex puzzles and explain how. This particular group of researchers are doing their own research, and they are starting with another type of puzzle called hitori, which, like sudoku, involves a grid of numbers. Will AI one day be able to explain how it solves puzzles? Maybe. But for now, be ready for a potential weather forecast. As we have all been saying, human problem-solving and reasoning are necessary to make innovation part of the future. It’s all a little puzzling for sure.

Want to tweet about this article? Use hashtags #IoT #sustainability #AI #5G #cloud #edge #futureofwork #digitaltransformation

What's Hot

The Great Crew Change

Construction Resilience, Reinvention, and the Road Ahead

Summer Safety Series: Elements of a Good Toolbox Talk

Get your Copy Today

The Great Crew Change

Summer Safety Series: Elements of a Good Toolbox Talk

Construction Resilience, Reinvention, and the Road Ahead

From Isolated Pilots to Scalable Automation

Success Stories: Innovation at the World Cup

Who Pays for AI?

What's Hot

Get your Copy Today

Trust AI: Play Sudoku

Related Posts