“The lakehouse architecture combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports use cases from streaming analytics to BI, data science, and AI. “
Hi, Jude. Welcome to our Interview Series. Please tell us a little bit about your journey in this industry and what inspired you to be part of Databricks.
With three brothers in the Marines (and a Navy SEAL brother-in-law), I care deeply about the public sector’s mission and success. Bringing the best technology to the government to support the mission has motivated my career. After having the benefit of speaking with industry thought leaders and public sector customers, I was blown away by Databricks’ capability, growth, and people.
What is the crux of your role? Could you please explain this to our readers?
As VP of Public Sector at Databricks, my first and foremost role is to bring the best capability to public sector customers. My team supports our government customers to drive vital missions from veteran suicide prevention to citizenship and immigration services to Medicare and Medicaid fraud prevention.
I am also focused on ensuring we can deliver the most secure, performant platform to our customers for driving mission outcomes based on the vast trove of data they possess, no matter the data type. And, of course, in my role, my commitment is to our Databricks team to ensure they can help our customers in their journey toward delivering better citizen outcomes with data.
What has changed in the enterprise data industry in the last 3 years? How has Databricks emerged as the go-to platform for data lakehouse?
In the past, federal, state, and local governments could only be reactive when faced with an analytics need and this was primarily due to the limitations of data warehouses and the inability to gain insight from unstructured and semi-structured data. When questions came up that were not part of regularly scheduled dashboards, our customers had to scramble to find ways to get an answer. Beyond that, even for the reports and dashboards these organizations built were looking for guidance focused primarily on the past; asking the question of “what happened” after events had occurred.
Increasingly, innovators in the data space today are looking to use data to react both in real-time with streaming data insight and through AI and ML to predict the future. As organizations modernize their data architecture, they want to tackle a diverse range of new use cases, including those that involve Data Science and Machine Learning. This is easier said than done. To perform this function end-to-end, an agency will need to ingest data, curate and transform it, build BI and SQL analytics on top of it, employ data science and ML and put these models into production. You also need to catalog all of the data and govern it. What this looks like is an integration across 5-6 different services in a traditional world. But organizations want to do this across multiple clouds with a minimal set of tools. Add to this the desire for public sector organizations to leverage open source and open standards to avoid vendor lock in. Imagine collapsing all of these requirements for the data life cycle into a single platform. That’s what Databricks delivers in its Lakehouse Platform.
What’s the most contemporary definition of data lakehouse? How do you differentiate data integrity versus data integration at Databricks?
The lakehouse architecture combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports use cases from streaming analytics to BI, data science, and AI. The Databricks Lakehouse ensures data integrity with ACID transactions, supports schema enforcement on data and supports diverse data types ranging from unstructured to structured data. The storage format is open and standardized, facilitating interoperability with various tools and engines including machine-learning libraries.
Could you highlight the expanded FedRAMP authorization for your data lakehouse and how it would benefit your customers?
FedRAMP framework was created by the US federal government to simplify and standardize the process of risk-based authorizations of cloud services for US government agencies and commercial organizations handling US government data. Databricks recently achieved FedRAMP authorization at the Moderate impact level for its AWS SaaS service offering. This achievement will accelerate the process for federal agencies to leverage the Databricks Lakehouse Platform for their mission-critical data, analytics, and AI use cases on AWS. For those customers who prefer to leverage the Microsoft Azure platform, Azure Databricks continues to be available on Azure as a FedRAMP High and DoD IL5 authorized cloud service.
What do you offer to the Federal agencies seeking to adopt a secure and open data lakehouse to support their cloud smart goals?
Databricks has more than 7,000 customers globally, including more than 50% of the Fortune 500, spanning many industry verticals. In the Public Sector, Databricks has over 150 customers, representing some of the largest state, local, and federal entities in the United States.
Databricks’ mission is to help data teams solve the world’s toughest problems. Databricks is the only platform to qualify as a leader in the Gartner Magic Quadrants for both Cloud Database Management Systems and Data Science and Machine Learning Platforms. Databricks is the original creator of many popular open-source data projects, including Apache Spark, Delta Lake, and MLflow, and multiple exabytes of data are processed each day across millions of machines orchestrated by Databricks. As a result of the FedRAMP authorization, Federal agencies can now take advantage of this industry-leading cloud-native lakehouse platform in alignment with their stringent security and compliance requirements.
Please tell us more about your partnership ecosystem and how these initiatives centralize data operations across multiple workflows and architecture?
A thriving partner ecosystem is one primary advantage of a commitment to open-source technologies and open standards. Databricks has over 450 global partners that provide data, analytics, and AI solutions and services to our joint customers. These partners enable you to leverage Databricks to unify all your data and AI workloads for more meaningful insights. Databricks partnerships include public cloud providers, consulting and SI partners, and ISV and Technology Partners. Databricks ISV partners span multiple focus areas including Data Ingestion, Data Governance, Data Pipelines, BI and Dashboards, Machine Learning, and Data Science. What this means for our customers is that for their investments in their ecosystems, we will integrate with those and drive mission value quickly without the need to make radical changes to their architecture.
Please tell us the importance of having a strong compliance and data governance for data integrity workflows/ frameworks?
Data is a critical asset of any organization. Curating data is essential to create high-value data for BI and AI/ML. Organizations must eliminate data silos and avoid creating copies of data that get out of sync. Data should be treated like a product with a clear definition, schema, and lifecycle so data consumers can fully trust the data. Data assets must be actively managed and access control, auditing, and lineage tracking are key for the correct and secure use of data.
Would you like to share a case study from Databricks’ resources to highlight how data integration improved IT investments within the public sector?
Databricks has over 150 customers in the Public Sector representing some of the largest state, local, and federal entities in the United States. Our customers—including the DOD, HHS, DOJ, DHS, State of California, and State of New York—use Databricks for a broad range of data analytics and AI use cases such as cyber threat detection, geospatial analytics, disease spread modeling, fraud detection, predictive maintenance, cardiac & disease prevention, Medicare & Medicaid modernization and more.
Databricks worked with the Centers for Disease Control (CDC) to support the launch of their Data Modernization Initiative (DMI). Originally conceived to help drive their pandemic response, DMI enabled CDC to ship more than 850 million vaccine orders and conduct more than 15 billion individual COVID-19 tests and vaccines. The ultimate goal was to move the United States away from siloed and brittle public health data systems to connected, resilient, adaptable, and sustainable ‘response-ready’ systems that could help the CDC solve problems before they happen and reduce the harm caused by the problems that do happen.
The Lakehouse has become a key component of the CDC-wide Enterprise Data Analytics and Visualizations (EDAV) platform which touches all of the centers within the CDC. An additional major use case is standing up the CDC Data Hub for the Center of Surveillance which is key to understanding illnesses and spread around the US to help track their origin. Currently, many more use cases are adopting Lakehouse to help the people within the US understand the health situations and help Congress and the White House define the direction of health within the US.
Your take on the future of data science and AI in the data integrity domain:
While at the AMP research lab in UC Berkeley, Databricks’ founders had a front-row seat to the data and AI innovations powering some of Silicon Valley’s most disruptive technologies. They learned how companies like Google and Facebook were finding success by running simple AI algorithms on massive amounts of data. These companies spent millions on talent and infrastructure to build their own proprietary data and AI systems that would ultimately lead to much of their success. Databricks was founded to do the same for any company. On a mission to democratize AI, our co-founders set out to build a simple platform, leveraging the open-source technologies they had created to unify data, analytics and AI in one place. In 2001 entrepreneur and venture capitalist Marc Andreessen penned a now famous article explaining that “software is eating the world”. At Databricks, we agree with Marc’s assertion and extend it with the belief that “AI will eat software”. We are just scratching the surface and anticipate this market to grow exponentially in the coming years.
An advice to business leaders who are looking to invest in the cloud computing and AIOps in data integrity.
AIOps involves jointly managing code (DevOps), data (DataOps), and models (ModelOps) in the journey toward production. The most common and painful challenge we have seen is a gap between data and ML, often split across poorly connected tools and teams. Our platform simplifies AI by defining a data-centric workflow that unifies best practices from DevOps, DataOps, and ModelOps. Machine learning pipelines are ultimately data pipelines, where data flows through the hands of several personas. Data engineers ingest and prepare data; data scientists build models from the data; ML engineers generate model metrics; and business analysts examine predictions. Databricks simplifies machine learning production by enabling these data teams to collaborate and manage this abundance of data on a single platform, instead of in silos. The Databricks approach to AIOps is built on open industry-wide standards and architected natively to maximize the benefits of cloud computing infrastructure.
Thank you, Jude! That was fun and we hope to see you back on itechnologyseries.com soon.
[To participate in our interview series, please write to us at email@example.com]
Jude Boyle is the Vice President of Public Sector at Databricks. Jude’s leadership is focused on supporting Databricks’ customers across the public sector, bringing government agencies insight from data, analytics, and AI to support their life-changing missions.Prior to Databricks, Jude was the SVP of Public Sector Sales at MuleSoft. In addition, Jude brings significant public sector expertise from previous work with technology companies like Splunk, Agilex Technologies, and Oracle. In a prior career, Jude coached college football. Jude grew up in the DC area and is married to Regina Boyle.
Databricks is the lakehouse company. More than 7,000 organizations worldwide — including Comcast, Condé Nast, and over 50% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems.