Data Fusion

Google Data Fusion

We're delighted to bring you the latest news from Bosotrendsour Linkedin newsletter! In an exciting event, our team had the incredible opportunity to visit the renowned Google office and enjoy Google Summit with Carlos de Antonio, immersing ourselves in a world of innovation and cutting-edge technology. This unique experience allowed us to gain valuable insights, forge new connections and further expand our knowledge in the ever-evolving Google Cloud arena. We're excited to share our highlights and explore the fascinating intersection of our Bosonit experience with Google's pioneering environment and, in particular, with the Data Fusion.

About Data Fusion

Data analysis poses a major challenge due to the dispersed nature and different formats of the data. It is often necessary to perform multiple integration tasks before valuable insights can be gained. Data Fusion addresses this challenge by providing a complete solution for enterprise data integration, covering ingest, ETL, ELT and streaming. With an execution engine optimised for SLA and cost-effectiveness, Data Fusion simplifies the lives of ETL developers, data analysts and data engineers working in Google CloudHybrid Cloud or Multi-Cloud. It serves as a centralised hub for all data integration activities, enabling agile and efficient data processing.

Data Fusion on Google Cloud is a powerful service that enables organisations to integrate, transform and analyse data from multiple sources in a unified and scalable way. With Data Fusion, users can create data pipelines and workflows to efficiently ingest, process and manage data, regardless of its format or location.

One of the key benefits of Data Fusion is its visual interface, which allows users to design data integration and transformation flows using a drag-and-drop method. This intuitive interface eliminates the need for complex coding and allows data engineers and analysts to collaborate effectively in the creation of data pipelines.

Data Fusion supports a wide range of data sources, including structured, semi-structured and unstructured data, allowing organisations to handle a variety of data types, including relational databases, CSV files, JSON documents and more. It also integrates seamlessly with other Google Cloud services, such as BigQuery and Cloud Storage, to efficiently store and process data.

By leveraging Data Fusion, organisations can accelerate their data integration processes, reduce development time and improve operational efficiency. The service offers integrated data quality, validation and transformation capabilities, ensuring data accuracy and consistency throughout the process. It also supports real-time data processing, enabling companies to make faster, more informed decisions based on up-to-date data.

Data integration

The data integration capabilities Data Fusion offerings include:

  1. Optimised analysis and accelerated data transformations: Data Fusion enables efficient data integration, improving the speed and efficiency of analytics and data transformations.
  2. Wide range of connectors and formats: With support for over 200 connectors and formats, Data Fusion allows you to seamlessly extract and combine data from multiple sources, enabling you to work with a wide variety of data types.
  3. Visual development of pipelines: Data Fusion provides a visual environment for developing data pipelines, improving productivity and ease of use.
  4. Data management and collaboration: Data Fusion offers data wrangling capabilities to prepare and operationalise data, facilitating collaboration between business and IT teams.
  5. REST API for pipeline management: You can leverage the extensive REST API to design, automate, orchestrate and manage the lifecycle of pipelines, enabling optimised management and control.
  6. Support for multiple data delivery modes: Data Fusion supports batch, streaming and real-time data delivery modes, making it a comprehensive platform suitable for both batch and streaming use cases.
  7. Operational information and optimisation: Data Fusion provides operational insights to monitor data integration processes, manage SLAs and optimise integration jobs, ensuring efficient and effective data processing.
  8. Analysis and enrichment of unstructured data: Data Fusion offers capabilities to analyse and enrich unstructured data using Cloud AI, enabling tasks such as converting audio files to text, NLP sentiment analysis, extracting features from images and documents, and converting HL7 formats to FHIR.

Data consistency

The functions of Data Fusion data consistency enable businesses to make decisions with confidence by ensuring the reliability of data:

  1. Structured transformations and data quality checks: Data Fusion mitigates the risk of errors by providing structured methods for specifying transformations and performing data quality checks using the Wrangler tool. Pre-defined policies further improve data consistency.
  2. Observability of data for quality identification: With Data Fusion, you can track data profiles during the integration process, allowing you to identify and address quality issues. This data observability allows you to make informed decisions based on the health and reliability of your data.
  3. Management of data variation and change: As data formats evolve over time, Data Fusion helps manage data drift. It detects changes in data formats and offers customisation options for error handling, ensuring consistent and accurate data processing despite variations.
  4. Metadata: You can collect technical, business and operational metadata for datasets and pipelines and discover metadata easily with a search.

Data protection

The advantages linked to the data protection are:

  1. Secure access to local data: Data Fusion enables secure access to local data over private IP connections, ensuring confidentiality and data integrity during transmission.
  2. Data encryption at rest: By default, Data Fusion encrypts data at rest, providing an additional layer of security. In addition, users have the option to use Client Managed Encryption Keys (CMEK) to maintain control over data encryption on all supported storage systems.
  3. Protection against data breaches: Data Fusion offers protection against data exfiltration through the use of VPC Service Controls. These controls establish a security perimeter around platform resources, preventing unauthorised access and enhancing data security.
  4. Integration with Cloud Key Management Service (KMS): Sensitive information such as passwords, URLs and JDBC strings can be securely stored in Cloud KMS. Data Fusion also supports integration with external key management systems, ensuring robust key management and protection.
  5. Integration with Cloud Data Loss Prevention (DLP): Data Fusion seamlessly integrates with Cloud DLP, enabling advanced data protection capabilities. Users can leverage Cloud DLP to mask, redact and encrypt data in transit, safeguarding sensitive information from unauthorised disclosure.

Below you can see how to use Cloud Data Fusion.

Personally, I have embarked on a journey to prepare for Google Cloud Professional certification. As I delve deeper into the ins and outs of Google Cloud Platform, I'll be sharing my progress, study tips and resources in upcoming newsletters. Join us as we discuss the highlights of our visit to Google and my preparations for Google Cloud Professional Certification.

Stay tuned for this edition packed with information, industry trends, updates from our team's visit to Google's offices and my journey towards becoming a Google Cloud Professional.

Enrique Sola Gayoso

Enrique Sola Gayoso

Big Data Consultant at Bosonit

You may be interested in

Take the leap

Contact us.