Valyu 2024 Year in Review
As 2024 comes to a close, Valyu celebrates a great year of progress. From being part of Andreessen Horowitz’s CSX program earlier this year, to the Anthropic-AWS Accelerator and shipping a great product and tooling that you love; we have been busy!
We’re excited to kick off 2025 with the launch of The Valyu Platform 2.0, featuring enhancements such as advanced bot detection, consent management, improved metering, payments, fine-grained segmentation and enrichment of content across all modalities; including text, audio, images, and video. The update also introduces a revamped suite of SDK and tooling with better integrations into existing ML frameworks.
We are also excited to share that we will be moving to San Francisco in 2025! 🎉 🛠️
2024 Recap
Valyu's Co-founders: Hirsh Pithadia, Harvey Yorke, and Alexander Ng.
Valyu at A16Z CSX 2024
At the start of the year, we had the privilege of joining Andreessen Horowitz’s a16z CSX Spring 2024 Cohort in London. Alongside 25 incredible early-stage startup founders, we gained invaluable guidance, resources, and insights from the a16z CSX team. We also presented The Valyu Platform at CSX Demo Day at the Outernet London showcasing how we're tackling key challenges in the training data and AI. This experience not only helped us refine our vision but also expanded our network and inspired us with stories of innovation from fellow entrepreneurs.
Hirsh presenting Valyu at CSX 24 London Demo Day at the Outernet London.
What we shipped this year
In 2024, we achieved several exciting milestones. We formed strategic partnerships with leading publishers and content platforms and also onboarded petabyte scale datasets of video, text, images, 3D and audio/speech. Throughout the year we have been actively listening to your feedback about the platform and what you’d love to see there including:
Our product has always been backed by some of the leading research with UCL. Over the coming year we are super excited to share some of our ongoing research work in Mechanistic Interpretability for Attribution, Rerankers and Context Enrichment and Data Contamination—look out!
Partnerships, Events & Alignment
We recognise that the AI Data Licensing and Distribution space is early but growing. Rights holders and AI developers are all looking for a constructive platform to work together. To support this we have been hosting events throughout 2024 including London Data Week (in collaboration with The Alan Turing Institute), Hackathons on Responsible AI (in collaboration with Holistic AI and UCL AI) as well as talks with Common Crawl Foundation. We have also released a series of webinars focused on Data Licensing. We were also at key AI conferences/ talks across San Francisco, London and Singapore.
As we welcome 2025, we’re excited to share some of our upcoming initiatives. We’ll be publishing a comprehensive guide on the data licensing landscape for 2025, offering insights into what to expect in this rapidly evolving field. Additionally, we’re organising an inaugural event in spring that will bring together foundational model companies, AI developers, publishing houses, content platforms, government and other key stakeholders. The event’s objective will focus on advancing responsible training data for AI.
There’s a lot more in store from us as we continue to build great products that we love. Stay tuned for more updates!
Alex presenting Valyu at GitHub's HQ in San Francisco.
London Data Week 2024
From left: Hendrik van der Sange (Moderator), Terence Broad, Simon Wheeler, Emre Kazim, and Hirsh Pithadia on the panel discussing content, copyright, and generative AI.
We teamed up with London Data Week in collaboration with The Alan Turing Institute in July and hosted an event “Content, Copyright, and Generative AI: Understanding the Value of Your Data as Creative Content Creators. It focused on the challenges faced by content creators whose works are often used without consent to train AI models. The event featured a panel of industry experts across music, publishing, academia, and AI policy followed by an interactive Q&A session. Read the event recap here.
Open Data, Research, and Web Archiving in the Age of AI and LLMs
Thom Vaughan and Pedro Ortiz Suarez of Common Crawl Foundation presenting at the event.
In partnership with Common Crawl Foundation and UCL, we hosted an event on the role of open data and web archiving in the age of AI and large language models. Speakers Thom Vaughan and Pedro Ortiz Suarez highlighted Common Crawl’s open datasets and their impact on research and AI innovation, while our Co-Founders Hirsh Pithadia and Harvey Yorke discussed the challenges of data-sharing restrictions and the AI-induced web consent crisis. Read the event recap here.
——
Latest Blogs
Keep Calm and Feed the Model: The Rise of Data Licensing in AI
We explore licensing frameworks currently being used and their implications for AI development. We also explored the distribution economics and how niche data could shape the future of AI. Read the article here.
Rights Holders vs. Gen AI: Latest Lawsuits & Licensing Developments
This article covers the latest lawsuits and licensing developments that forms new pathways between AI companies and content creators. Read the article here.
10 Key Terms in AI and Training Data
In this blog, we've broken down 10 important terms related to AI and training data to help you better grasp how AI systems are developed and function. Read the article here.
——
Webinar: Protecting Content in the Age of Generative AI
Join us in our upcoming webinar on Tuesday, 14 January 2025, 14:30 - 15:30 GMT to discuss:
Introduction to web crawlers and their role in AI training
Ethical considerations: opt in vs. opt out consent and fair compensation
Tools and techniques to protect/ monitor content
Licensing and partnerships with AI companies
Whether you’re a content creator, publisher, or simply want to learn more about our evolving digital landscape, this introductory webinar will provide actionable steps in protecting content in the AI era. Register here.