The widespread adoption of Artificial Intelligence (AI) in business functions is undeniable, with a recent McKinsey study indicating that 78% of organizations leverage AI in at least one area. However, a significant challenge persists: over 80% of these organizations report a lack of tangible impact on enterprise-level Earnings Before Interest and Taxes (EBIT) from their AI deployments.
This disparity highlights a critical need for reliable and robust evaluation tools that can accurately assess AI systems in complex, real-world enterprise environments. PromptQL, a leading platform for reliable AI, today announced a strategic research collaboration with the University of California, Berkeley, to address this fundamental challenge. The partnership aims to develop the first comprehensive data agent benchmark for enterprise reliability, specifically designed to evaluate general-purpose AI data agents in practical business scenarios.
Addressing the Enterprise AI Reliability Gap:
- The “1% Problem” in Current Benchmarks:
- Existing agentic data benchmarks, such as GAIA, Spider, and FRAMES, primarily focus on testing specific AI tasks.
- These benchmarks often overlook the inherent complexity, stringent reliability demands, and “messy, siloed data” that are characteristic of real business environments.
- Professor Aditya Parameswaran of UC Berkeley aptly terms this the “1% problem,” indicating that current benchmarks cater to a small segment of tech giants, neglecting the vast majority of organizations grappling with real-world data intricacies.
- Need for Enterprise-Specific Evaluation:
- The collaboration seeks to create a framework that accurately reflects the complexities of enterprise AI deployments.
- This new data agent benchmark aims to provide organizations with the necessary evaluation tools to make confident deployment decisions, enabling a transition from proof-of-concepts to production-ready AI systems.
- McKinsey Study Reinforces the Challenge:
- The research directly addresses the findings of the McKinsey study, which revealed a significant gap between AI adoption and tangible business impact, particularly in terms of EBIT.
The Collaboration: PromptQL and UC Berkeley’s EPIC Data Lab
- Leadership and Expertise:
- The partnership is spearheaded by Aditya Parameswaran, a distinguished Professor and Co-Director of UC Berkeley’s EPIC Data Lab, along with his students.
- Professor Parameswaran is a leading authority on using AI for next-generation usable data analysis tools, with a track record of creating widely-adopted data tools (tens of millions of downloads) and receiving numerous prestigious awards.
- Leveraging Real-World Data:
- Tanmai Gopal, CEO of PromptQL, stated that customer conversations reveal a clear pattern: a readiness to move to production AI, but a lack of robust evaluation tools.
- The forthcoming data agent benchmark will incorporate representative datasets from PromptQL’s extensive work in diverse sectors such as telecom, healthcare, finance, retail, and anti-money laundering.
- This approach ensures the benchmark accurately reflects the true complexity of enterprise AI.
- Bridging Academic Rigor and Production Insights:
- Professor Parameswaran emphasizes that the data agent benchmark marks a shift towards evaluating AI based on reliability, transparency, and the practical value that enterprises genuinely require.
- This collaboration effectively merges UC Berkeley’s academic rigor and research excellence with PromptQL’s production insights derived from real-world AI deployments.
Future Outlook and Engagement:
- Beta Release Planned:
- The beta version of the comprehensive data agent benchmark is slated for revelation later this year.
- This phased rollout will allow for iterative improvements and feedback incorporation.
- Call for Contributions:
- Organizations interested in gaining early access to the benchmark or contributing relevant use-cases and datasets are encouraged to reach out to the research team at
epic-support@eecs.berkeley.edu
.
- Organizations interested in gaining early access to the benchmark or contributing relevant use-cases and datasets are encouraged to reach out to the research team at
- PromptQL at AI Engineer World’s Fair:
- PromptQL will be present at the AI Engineer World’s Fair in San Francisco from June 3-6.
- Tanmai Gopal, PromptQL’s co-founder and CEO, is scheduled to present a session titled “Al Automation that Actually Works: $100M Impact on Messy Data with Zero Surprises” on June 4 at 11:15 a.m. PT.
- Further details and demo scheduling are available at
https://hasura.io/events/ai-engineer-worlds-fair-2025
.
The strategic research collaboration between PromptQL and the University of California, Berkeley, to develop the first comprehensive data agent benchmark for enterprise reliability is a critical step towards realizing the full potential of AI in business. By addressing the long-standing “1% problem” and focusing on the complexities of real-world enterprise data, this initiative promises to provide organizations with the robust evaluation tools needed to confidently deploy AI systems that deliver tangible business impact and reliable performance, truly bridging the gap between AI aspiration and concrete results.
Explore how TechEdgeAI unlocks transformative insights, streamlines operations, and drives unparalleled innovation for your business. Visit our website or connect with us to discover the future of intelligent solutions.