Creating Safety Benchmarks for AI

Generative AI (artificial intelligence) brings great benefits to society and industry. We are seeing the rise of such technology in aviation and airlines, power and utilities, and even places like at the dentist. Data is the new oil, and if companies can capture and capitalize on it, then they have a leg up in today’s competitive, labor-constrained market. Still, there are challenges that remain.

Perhaps the biggest challenge is the safety of it all. Cybersecurity is a huge concern for many businesses, as they leverage new, emerging technologies, but digging a bit deeper there are other safety concerns to consider as well.

Misinformation and bias can be just as dangerous. Consider healthcare. Much of the data that exists in the healthcare industry today is based on those who could afford it in the past. This means lower-income families or developing nations simply don’t have the data, which skews the sample.

And then, there is misinformation that can come because of generative AI. As a journalist, I know how important fact checking is on any project because misinformation is everywhere—and I mean everywhere. Last year, USC (University of Southern California) researchers found bias exists in up to 38.6% of facts used by artificial intelligence. That is something we simply cannot ignore.

Many organizations recognize these concerns and others as it relates to safety and artificial intelligence—and some are taking steps to address it. Consider the example of MLCommons, which is an AI benchmarking organization. At the end of October, it announced the creation of the AI Safety Working Group, which will develop a platform and pool of tests from many contributors to support AI safety benchmarks for diverse use cases.

The AIS working group’s initial participation includes a multi-disciplinary group of AI experts including: Anthropic, Coactive AI, Google, Inflection, Intel, Meta, Microsoft, NVIDIA, OpenAI, Qualcomm Technologies, Inc., and academics Joaquin Vanstoren from Eindhoven University of Technology, Percy Liang from Stanford University, and Bo Li from the University of Chicago. Participation in the working group is open to academic and industry researchers and engineers, as well as domain experts from civil society and the public sector.

As an example, Intel plans to share AI safety findings and best practices and processes for responsible development such as red-teaming and safety tests. As a founding member, Intel will contribute its expertise and knowledge to help create a flexible platform for benchmarks that measure the safety and risk factors of AI tools and models.

All in all, the new platform will support defining benchmarks that select from the pool of tests and summarize the outputs into useful, comprehensible scores. This is very similar to what is standard in other industries such as automotive safety test ratings and energy star scores.

The most pressing priority here for the group in the beginning will be supporting rapid evolution of more rigorous and reliable AI safety testing technology. The AIS working group will draw upon the technical and operational expertise of its members, and the larger AI community, to help guide and create the AI safety benchmarking technologies.

One of the initial focuses will be developing safety benchmarks for LLMs (large language models), which will build on the work done by researchers at Stanford University’s Center for Research on Foundation Models and its HELM (Holistic Evaluation of Language Models).

While this is simply one example, it is a step in the right direction toward making AI safer for all, addressing many of the concerns related to misinformation and bias that exist among many industries. As the testing matures, we will have more opportunities to use AI in a way that is safe for all. The future certainly is bright.

Want to tweet about this article? Use hashtags #IoT #sustainability #AI #5G #cloud #edge #futureofwork #digitaltransformation #green #ecosystem #environmental #circularworld

What's Hot

What Can You Trust?

Build a Safer Jobsite with AI and Cameras

Women in Construction: PPE

Get your Copy Today

Creating Safety Benchmarks for AI

What Can You Trust?

Build a Safer Jobsite with AI and Cameras

Women in Construction: PPE

Success Stories: Customized Sensors for Wildfire Prevention

The Rise of Prefab

Manufacturing in an Era of Digital Product Passports

What's Hot

Get your Copy Today

Creating Safety Benchmarks for AI

Related Posts