Recent revelations by Human Rights Watch (HRW) found that AI models are training on photos of children, even those meant to be protected by strict privacy settings. This discovery raises serious concerns about privacy and safety for kids and the implications of using publicly shared photos without consent.
What’s Happening & Why This Matters
Human Rights Watch researcher Hye Jung Han uncovered 170 photos of Brazilian children in the LAION-5B dataset. A second discovery followed of 190 photos of Australian children, including indigenous kids. These images were taken from Common Crawl snapshots of the web and often included identifying details such as names and locations. This poses significant privacy risks and the potential for harmful deepfakes.
Details of the Issue
Photos of children, including those meant to be private, are being used to train AI models. Even unlisted YouTube videos are not safe from AI scraping. The dataset sometimes contains URLs that reveal identifying information about children, making it easy to track them down. This includes photos of children from indigenous communities, which poses additional cultural sensitivity issues. Despite platform policies against scraping, AI models continue to train on these images. For example, YouTube videos with strict privacy settings have still been archived and used in AI datasets.
HRW’s findings support the need for stronger regulations to protect children’s data online. The upcoming draft of Australia’s Privacy Act, which includes the Children’s Online Privacy Code, is a step in the proper direction. Yet its effectiveness remains to be seen. Once AI models train on these images, it is impossible to erase this data from their systems. This could lead to the creation of realistic deepfakes and other privacy breaches.
Real-World Implications
A specific example cited in HRW’s report involves a photo of two boys in Perth, Australia, which included details like their full names, ages, and preschool name. Such information was not available elsewhere on the internet, indicating the families’ efforts to protect their children’s privacy. For First Nations children, the reproduction of photos of deceased people during mourning periods is culturally restricted, making the AI training particularly harmful.
AI models have a notorious reputation for leaking private information. Guardrails in image generators do not always prevent these leaks, exposing children to various risks.
TF Summary: What’s Next
The ongoing issues with AI training on children’s photos underscore the urgent need for stricter regulations and better enforcement of existing policies. Parents and guardians should not have to worry about the misuse of their children’s images online. As AI technology advances, it is crucial for regulatory bodies and tech companies to prioritize the protection of personal data, especially for vulnerable populations like children. The upcoming reforms in Australia’s Privacy Act will be monitored closely to see if they effectively address concerns and propel global data protections standards.