Data Engineering

 Articles

background Coditation

Implementing Data Quality Checks and Validation Using Apache Iceberg's Metadata

Data integrity is paramount for data-driven organizations. Substandard data can result in skewed insights, misguided decisions, and resource inefficiency. This article delves into leveraging Apache Iceberg's metadata capabilities to establish robust data quality checks and validation procedures.

Data Engineering
time
 min read

Designing Scalable Data Ingestion Architectures with Snowflake's Multi-Cluster Warehouses

In the era of data explosion, organizations face the challenge of ingesting and processing massive amounts of data efficiently. Snowflake, a cloud-native data platform, offers a powerful solution with its multi-cluster warehouses. This article explores the intricacies of designing scalable data ingestion architectures using Snowflake's multi-cluster warehouses, providing insights, best practices, and code examples to help you optimize your data pipeline.

Data Engineering
time
5
 min read

How to implement Distributed Tracing for Postgres Queries with OpenTelemetry and Jaeger

Learn how to implement distributed tracing in a microservices-driven architecture using OpenTelemetry and Jaeger. This comprehensive guide covers setting up a Python-based FastAPI application, instrumenting Postgres queries, configuring Jaeger, and analyzing traces to optimize system performance.

Data Engineering
time
7
 min read

How to design scalable ETL Workflows using Databricks Workflows and Delta Live Tables

This article explores the evolving landscape of ETL (Extract, Transform, Load) processes in data-driven organizations, focusing on the challenges faced by traditional ETL approaches in handling the ever-growing volumes of data. It introduces Databricks Workflows and Delta Live Tables (DLT) as powerful tools that offer simplicity, scalability, and reliability in ETL processes

Data Engineering
time
7
 min read

How to optimize PostgreSQL Performance with pgBadger and Grafana

In this blog, we learn how to boost PostgreSQL performance with pgBadger and Grafana. Set up real-time monitoring, configure logs, and create custom dashboards to quickly identify and fix query issues.

Data Engineering
time
9
 min read

How to Implement Custom Windowing Logic in Apache Spark Structured Streaming

Explore the process of implementing custom windowing logic in Apache Spark Structured Streaming to handle advanced event aggregation. This blog delves into the necessity of custom windowing, provides a step-by-step guide, and showcases various advanced aggregation scenarios.

Data Engineering
time
8
 min read

Designing a Multi-Tier Data Warehouse Architecture with Snowflake

In this blog post, we will explore the intricacies of designing a multi-tier data warehouse architecture using Snowflake, specifically tailored for the use case of heat exchanger fouling prediction. We will explore the key components of the architecture, discuss best practices, and provide detailed code snippets to help you implement this solution in your own environment.

Data Engineering
time
8
 min read

How to Build a Scalable Clinical Data Warehouse Using HL7, Kafka, Flink, and AWS Redshift

In this blog, we guide you through building a scalable clinical data warehouse using industry-standard technologies: HL7 for data exchange, Apache Kafka for real-time data streaming, Apache Flink for stream processing, and AWS Redshift for data storage and analytics.

Data Engineering
time
6
 min read

How to Optimize Your Snowflake Data Warehouse with Smart Partitioning Strategies

In this blog, we talk about how to enhance your Snowflake data warehouse performance with smart partitioning strategies, including date-based, hash-based, and composite partitioning techniques, along with best practices and real-world examples.

Data Engineering
time
9
 min read

How to Use Kafka Streams’ Interactive Queries for Real-Time Data Analysis in CEP Pipelines

In this blog we demonstrate how to utilize Kafka Streams’ interactive queries for real-time data analysis in complex event processing (CEP) pipelines through practical code examples, and understand how to implement a powerful fraud detection use case.

Data Engineering
time
7
 min read

Snowflake vs. Redshift: A Detailed Comparison for Data Warehousing

This blog post provides an in-depth comparison of Snowflake and Amazon Redshift, two leading cloud-based data warehousing solutions. We analyze their performance, features, scalability, security, pricing, and our hands-on experience with both platforms.

Data Engineering
time
8
 min read

PySpark on AWS EMR: A Guide to Efficient ETL Processing

This comprehensive guide covers setting up EMR clusters, executing ETL tasks, data extraction, transformation, loading, and optimization techniques to maximize performance.

Data Engineering
time
7
 min read

The Importance of Custom ETL Solutions: Extract, Transform, Load

This blog post emphasizes the advantages of custom ETL solutions and the need for careful consideration in each phase of the ETL process.

Data Engineering
time
5
 min read

How to animate in CSS

A touch of CSS animation goes a long way in designing an immersive experience for visitors. The best animations can serve the content and user experience without distracting or appearing gimmicky. In this blog, we talk about how to animate in CSS with examples.

Data Engineering
time
8
 min read

Structural Pattern Matching in Python II

In this blog, which is a second of a three-part series, we continue our discussion and introduce value pattern, sequence pattern, and mapping pattern.

Data Engineering
time
5
 min read

Structural Pattern Matching in Python I

In this blog, which is a first of a three-part series, we talk about the structural pattern-matching support in Python.

Data Engineering
time
7
 min read

Unit testing with Jest

In this blog, we have picked a library from the JavaScript testing framework, Jest to explain how to do unit testing with some interesting examples. We have utilized some of the key features of Jest library.

Data Engineering
time
6
 min read

How to test JavaScript applications

Testing is an important part of any development process of an application. It can help build better, more reliable apps. In this blog, we talk about different ways to test JavaScript applications.

Data Engineering
time
5
 min read

How to install Robot Framework on Windows

Robot Framework is an open-source test automation framework that is simple to use with minimal programming. In this blog, we give step-by-step instructions on how we installed the robot framework on windows.

Data Engineering
time
5
 min read

Java Project Loom

Java Project Loom is a proposed new feature for the Java platform that aims to improve the support for concurrent programming in Java. In this blog, we talk about a few examples of how Project Loom could be used in Java programs.

Data Engineering
time
6
 min read

Django vs Frappe

Frappe is a full-stack web framework & is a bit different from the traditional ones like Django or Flask. In this blog, you will get a head-to-head comparison between Django & Frappe. We will then move on to the best use case & challenges of Frappe.

Data Engineering
time
5
 min read

GraphQL has a role beyond API Query Language - being the backbone of Application Integration. Are you ready to embrace it?

GraphQL has gained pre-eminence in the last couple of years as the API Query Language of choice. It has been adopted by several hundreds of prominent enterprises and products. But is GraphQL only about API Query Language?

Data Engineering
time
7
 min read

Decade Of Artificial Intelligence: A Summary

The world has seen a boom in the field of Artificial Intelligence in the past few years & What our AI community has achieved in the last decade has set a strong foundation for the future

Data Engineering
time
16
 min read

Data – A Key Enabler For Patient Centric Healthcare

Technology – specifically Big Data, Cloud, IoT, Mobile, and AI will play a pivotal role in enabling the transformation of healthcare towards patient centricity.

Data Engineering
time
14
 min read

Are You Measuring These 3 Important CX Metrics?

3 Most Important CX Metrics that everyone should keep track of to measure their product or service CX. NPS – Net Promoter Score Surveys CES – Customer Effort Score CSAT Score – Customer Satisfaction Score

Data Engineering
time
20
 min read

Want to receive the latest technology insights? Subscribe Now!

Thank you for registering
Oops! Something went wrong.