Some of the companies working on the newest advances in artificial intelligence are becoming focused on quantifying the progress that they've achieved, and benchmarking some aspects of how artificial intelligence has evolved over time. There are numerous reasons why companies are pursuing these types of analysis. In general, they are trying to figure out how far artificial intelligence has come, how it applies to our lives, and how it will affect markets.
Some companies are brainstorming and monitoring their artificial intelligence progress to figure out how new technologies may affect civil liberties, or how they might create new economic realities. Depending on the company's approach, these types of analysis may take the form of trying to figure out how user data may flow through systems, understanding how interfaces will work, or figuring out what capabilities artificial intelligence entities have and how they might use those capabilities.
When it comes to methods, companies that are trying to benchmark artificial intelligence may focus on breaking down abstract information – for instance, a Wired article cites the AI Index project, where researchers like Ray Perrault, who works at nonprofit lab SRI International, are working on a detailed snapshot of what's going on in the artificial intelligence field.
“This is something that needs to be done, in part because there's so much craziness out there about where AI is going,” Perrault says in the article, commenting on the motivation for taking on this type of project.
In explaining how benchmarking artificial intelligence works, some experts are explaining that engineers or other parties may be trying to pursue “hard testing” for artificial intelligence projects, for instance, trying to “trick” or “defeat” artificial intelligence systems. This kind of description really goes to the heart of how companies can truly monitor and evaluate artificial intelligence. One way to think about this is to apply the same kinds of ideas that programmers used in past times to debug linear code systems.
Debugging linear code systems was all about finding the spots where the system would work well – where a program would crash, where it would freeze, where it would run slow, etc. It was about finding where logical errors would halt or confound a project, where a function wouldn't work correctly, or where there might be some unintended user event.
When you think about it, modern testing of artificial intelligence may be a similar endeavor on a very different plane – because the artificial intelligence technologies are more cognitive than linear, that testing takes a much different form, but humans are still looking for “the bugs” – ways that these programs may have unintended consequences, ways that they might act out and harm human institutions, etc. With that in mind, although there are many different divergent methods of creating a speedometer or benchmark for artificial intelligence progress, the types of hard testing described above will generally give the humans unique insight into how far artificial intelligence has come, and what has to be done to keep it delivering more positives without developing more negatives.