Tuesday, December 24, 2024

USA-China Tensions Transform Global Market

After the U.S. elections, relations between the...

AI Agent Assists Users with Disabilities and Revolutionizes Web Navigation

TECHNOLOGYAI Agent Assists Users with Disabilities and Revolutionizes Web Navigation

Scientists at Ohio State University have developed a language model that acts as an AI agent. Its aim is to browse websites and execute user commands. This can enable visually impaired individuals and those with mobility disabilities to fully utilize internet capabilities. According to its developers, the potential of this solution is much greater. It could be employed to analyze websites or even entire systems for the purpose of improvement.

“There are different technologies that aid people with various disabilities in accessing the internet, but the situation is far from ideal. For instance, someone might use a screen reader to audibly read webpage contents, but this is much less effective and slower compared to the experience most people have. Others might scan webpage content and select what they want to read. However, for these assistive technologies to work best, individual website developers should adhere to particular standards and best practices to adapt them to this technology. Unfortunately, many web designers either do not comply with these rules at all or do so inadequately, resulting in these assistive technologies operating in less than optimal ways,” says Yu Su of Ohio State University to Newseria Innowacje.

In order to improve internet accessibility for people with disabilities, researchers at Ohio State University have started working on artificial intelligence agents that can execute tasks on any website, given simple language commands.

“We wanted to create what we have termed as generalist web agents. These are AI agents that can access any website, one of billions available, and perform a specified task. The AI agent will understand the command, acquaint themselves with the content of a website they have never viewed before, and fulfill the command,” explains Yu Su.

Researchers began by creating Mind2Web, the first dataset for universal web agents, which fully incorporates the complex and dynamic nature of real-world websites. The team executed over 2,000 tasks based on 137 different websites, and then used this to train the agent. Tasks ranged from booking one-way and round-trip international flights, following celebrity accounts on Twitter, to reviewing comedy films from 1992-2017 available on Netflix. Many of these tasks were quite complicated; for instance, booking an international flight required up to 14 actions.

“We also developed models based on large language models like ChatGPT, GPT-4, to review web page code and HTML in order to execute a user’s command. However, the results were quite poor, and the success rate was low. We then significantly improved the tool by adding an additional built-in model. The agents could then not only review HTML code as text, but also view the visual rendering of the page, which is accessible to humans. It turned out that this greatly simplified matters and significantly improved the success rate, moving it closer to practical application,” emphasizes Yu Su of Ohio State University.

Consequently, the agent operates in a manner similar to human behavior when browsing the web. As its developers point out, their model can understand the layout and functionality of various websites using only language processing and prediction abilities.

“These tools will be very helpful in assisting individuals with visual impairments or physical disabilities that hinder the use of a mouse or keyboard in accessing the internet. They can help make their internet experiences much more similar to those of other people, compared to traditional assistive technologies. At the same time, many other people can use these tools for daily web use. Modern websites are very complex – upon entering a site, we see countless banners, and we only want to find specific hidden information. If we use an agent that can understand such a complicated website and find information for us, we can save a lot of time,” notes Yu Su.

Though the model was designed with the intention to help individuals, especially those facing difficulties due to disabilities, use the internet, the developers emphasize that it can also be used to improve artificial intelligence solutions, such as ChatGPT. This solution fills the communication gap that exists between individuals communicating in their native languages and computers using programming languages, acting as a bridge between these forms of communication.

“In this way, users will be able to use everyday language to communicate with the computer world without having to learn new computer languages. To illustrate this point, I often use a somewhat cliched statement – we want machines to understand human thinking, not for humans to think like machines,” says the scientist.

He emphasizes that the new solution can also increase the efficiency of everyday internet users by assisting them in searching for necessary information on a webpage. Secondly, it also serves as a tool for democratizing AI, or increasing its accessibility to a wider audience.

“Any barriers to access usually magnify inequalities in society, as only organizations and individuals with significant resources can access the most advanced technologies. This does not apply to the AI technology that we are working on because it is available for everyone. By democratizing advanced AI technologies, we hope to enable everyone to use these technologies to improve their work, enhance their quality of life, and benefit from artificial intelligence as potentially the most powerful automation technology of our time,” concludes Yu Su.

Researchers, however, caution that such tools might also have a dark side and could aid individuals with nefarious intentions. AI agents may engage in potentially dangerous actions aimed at spreading misinformation or misusing financial information.

According to PR Newswire, the global market for large language models will reach a value of nearly $41 billion by 2029. In 2022, it was valued at $10.5 billion.

Check out our other content
Related Articles
The Latest Articles