Google Cloud launches BigLake, a new cross-platform data storage engine – TechCrunch
At its Cloud Data Summit, Google today announced the launch of a preview of BigLake, a lake data storage tool makes it easy for businesses to analyze the data in their data warehouses and data lakes.
The core idea here is to take advantage of Google’s experience with running and managing its BigQuery data warehouse and extend it to data lakes on Google Cloud Storage, combining good data warehouses and data warehouses. into a single service that eliminates basic memory formats and systems.
This data, notably, can reside within BigQuery or live on AWS S3 and Azure Data Lake Storage Gen2, also. Through BigLake, developers will have access to a unified storage engine and the ability to query underlying data stores through a single system, without the need to move or duplicate data. .
“Managing data across different lakes and warehouses creates vaults and increases risks and costs, especially when data needs to be moved,” said Gerrit Kazmaier, VP and CEO of the agency. Databases, Data Analytics, and Business Intelligence at Google Cloud explained, notes in today’s announcement. “BigLake allows companies to unify their warehouses and data lakes for data analysis without worrying about the underlying storage system or format, eliminating the need to copy or move data from source and reduce costs and inefficiencies. ”
Using the policy tag, BigLake allows administrators to configure their security policies at the table, row, and column levels. This includes data stored in Google Cloud Storage, as well as two supported third-party systems where BigQuery Omni, Google’s Multi-Cloud Analytics Service, enables these security controls. Those security controls then also ensure that only the right data flows into engines like Spark, Presto, Trino, and TensorFlow. This service also integrates with Dataplex The tool provides additional data management capabilities.
Google notes that BigLake will provide granular access controls, and its API will extend to Google Cloud, as will file formats like open column-oriented Apache. Wood floor and open source processing tools like Apache Spark.
“The volume of valuable data that organizations must manage and analyze is growing at an astounding rate,” explained Google Cloud software engineer Justin Levandoski and product manager Gaurav Saxena in today’s announcement. . “This data is increasingly distributed across multiple locations, including data warehouses, data lakes, and NoSQL stores. As an organization’s data becomes more complex and grows across different data environments, silos arise, creating increased risks and costs, especially when that data needs to be migrated. . Our customers have made it clear; they need help.”
In addition to BigLake, Google today also announced that Spanner, its globally distributed SQL database, will soon have a new feature called “change flow”. With these, users can easily track any changes to the database in real time be it inserts, updates or deletions. “This ensures customers always have access to the latest data as they can easily copy changes from Spanner to BigQuery for real-time analysis, triggering application behavior down using Use Pub/Sub or store changes in Google Cloud Storage (GCS) for compliance,” explains Kazmaier.
Google Cloud today also brings Vertex AI deska tool for managing the entire lifecycle of a data science project, out of beta and in general availability and launched Connected Sheets for Viewers and accessibility Viewer data models in its Data Studio BI tool.