AI begins creating jobs in rural India while building high-quality datasets

In India, social impact entrepreneurs have developed datasets in multiple Indian languages for the purpose of training AI models and conducting research, all while generating employment opportunities

Artificial intelligence (AI) has taken the world by storm and in India, startups in the social impact space have created datasets in several Indian languages to train AI models and for research while creating jobs, mainly in rural areas. One such organisation called Karya aims to revolutionise the way datasets are created in India and elsewhere, Microsoft said on Wednesday.

The group’s goal is to lift as many people out of poverty as possible while giving them the tools to thrive in the modern digital economy. At the same time, Karya is building high-quality and ethical datasets with an unconventional workforce. "Those datasets are valuable. While about 80 million people speak Marathi, it’s not well-represented in the digital world. The fact that hundreds of millions of potential customers could benefit from those technologies is why Microsoft and others are in a race to make their products available in those ‘under-resourced’ languages," the tech giant said.

Karya got its start as a Microsoft Research project in Bengaluru in 2017. The project was spun off in 2021 as an organisation independent of Microsoft. Its entire operation, including the app that workers use to record and write in their native languages is built on Microsoft Azure and uses Azure OpenAI Service, as well as Azure AI Cognitive Services to validate its data.

Microsoft is one of its major clients. Karya pays workers about $5 (over Rs 400) an hour and is partnering with more than 200 other nonprofits with the goal of reaching 100 million people by 2030. "We really think that rural India can be an excellent builder of AI, but also an excellent recipient of AI technologies," Karya CEO Manu Chopra said. "Let’s say the world is going to spend a trillion dollars on building AI. So over the next 20 years, what percentage of that can I bring directly into the wallets of people who need it the most?" he added.

AI tools like OpenAI’s ChatGPT and Microsoft’s Copilot work well in English because of the abundance of written and audio material on the internet in the language. "I think we want to rectify that most of the internet being in English is not a very good place to start," said Kalika Bali, a language technologist and researcher at the Microsoft Research Lab in Bengaluru.

She uses data collected by Karya for her research."People need to be part of the growth in the digital economy that’s spreading everywhere. No one should be excluded from using technology because of their language," she added. Karya, which says it is on pace to engage with more than 100,000 workers by the end of 2024, seeks participants who need work and education the most – often women in rural areas. In addition to a premium wage, it offers training and other kinds of support when the work is done, said Microsoft.

Related Stories

No stories found.