Data Engineering

Post Image

Choosing the right tool while building your Data Platform: DBT vs. Spark(By example)

Written by: Niranjan Ingale Photo by Isaac Smith on Unsplash What is a Data Platform, Spark & DBT Defining the Scope Airport Insights: Example of a common analytics use case Comparison between Spark & DBT for developing an analytics use case Taxi trip duration prediction: Example of developing a model for batch and real-time prediction Comparison between Spark & DBT for developing a data science use case Push of tech giants towards expanding the use of serverless data warehouses beyond conventional OLAP Data...

Read

Revolutionizing a Leading Automotive Giant's Digital Ecosystem through Innovation and Data-Led Strategies

In the pursuit of continuous improvement and industry transformation, we partnered with a leading automotive giant on a journey to modernise its digital ecosystem. This endeavour aimed to not only enhance the core products and technologies but also stimulate innovation, scale operations and instil data-led strategies at the heart of its operations. The purpose-built Extensible Data Platform revolutionizes the post-purchase usage lifecycle of vehicles. It facilitates real-time vehicle health and telemetry data, efficient fleet and trip management, tracking, safety alerts...

Read
Post Image

Citizens of The Great Barrier Reef — A story of scale, complexity and collaboration for conservation

The Great Barrier Reef, Australia’s iconic ecological site, is not only important to the continent but to the planet as a whole. With over 3000 individual reefs stretching over 2000 kms and home to 25% of the world’s marine species, it is the world’s largest marine ecosystem. However, it now faces a serious threat. Climate change, rising sea level temperatures, pollution and Crown of thorns starfish attacks are just some of the threats that expose the reef to bleaching, and...

Read

Passwordless Auth between Cloud provider and Github

In this blog, I discuss how I deployed my app to AWS Amplify through Github actions. However, you can deploy to any cloud provider and service of your choice by following this method. Towards the end, I have also embedded the Terraform scripts and GitHub actions I used to achieve this. GitHub actions make our lives easier by providing a marketplace of pre-existing actions that we can choose from. For example, deploying to AWS using these plugins is a cakewalk....

Read
Post Image

Building secure systems with PII Data Protection Techniques — Part II

In the previous blog, we discussed the importance of securing PII and sensitive data points captured by enterprises as part of their business apps, analyzed different techniques of securing these data points namely, Encryption and Tokenization, and finally, compared these different techniques. Continuing in this blog series, we will now cover the database-side encryption technique of sensitive data with Spring Boot and Hibernate Technologies. Information on this topic is limited, incomplete and fragmented across different websites, documentation, and blogs. This blog...

Read
Post Image

Building secure systems with PII Data Protection Techniques — Part I

Today, storing a person’s KYC information, credit card/debit card information, or other similar sensitive details to charge for services availed on a platform is commonplace. This entails storing & dealing with Personally Identifiable Information (PII) and sensitive financial information. Hacking of such systems can expose businesses to the following risks: Customer identity theft results in financial losses and damage to an individual’s credit score. Loss of trust and reputation of the enterprise in event of data breach and exposure. Legal...

Read
Post Image

DIVOC — A groundbreaking solution to orchestrate programs of scale worldwide

India is a huge country. Creating solutions for the world’s second largest population comes with its own set of challenges that need to be dealt with through capacity building, scale operations and interoperable frameworks that work seamlessly for citizens across the economy. Developing digital infrastructure to drive innovation and to build applications for multiple use-cases is fundamental for its success, acceptance and usage. The shared vision of DIVOC is to enable countries to digitally orchestrate large scale programs with an...

Read
Post Image

Locking in Databases and Isolation Mechanisms

 should become a familiar term if you are dealing with locks. In a large application accessed by thousands of users, concurrency is inevitable. Your application should be able to handle multiple requests simultaneously. When you execute operations concurrently, the results can be conflicting. For e.g. if you are reading a row while someone else is writing to it simultaneously, then you are bound to get inconsistent data. If we execute these transactions sequentially, then we don’t need concurrency control. But...

Read
Post Image

Helm incubator kafka setup with SSL auth

Photo by Jukan Tateisi on Unsplash Helm chart for incubator kafka is deprecated now. But still I feel this helm chart is very handy for doing PoC setup.ref : https://github.com/helm/charts/tree/master/incubator/kafka Incubator kafka helm chart supports SSL auth setup for brokers but it lacks documentation for doing so. I have struggled to get the setup right by going through their github only available links (mentioned below), which give some idea about the setup. I have used terraform for this setup.Ref :https://github.com/helm/charts/issues/3951https://github.com/helm/charts/pull/7693 Assumptions is you already...

Read
Post Image

MLOps: Building a healthy data platform

You know that a term you coined has made it mainstream when people use it regularly in conversations and rarely understand what you meant. — Martin Fowler (paraphrased from an in-person conversation) Rouan summarises DevOps culture well in his post on Martin’s bliki. It is easy for developers to get disinterested with operational concerns. “It works on my machine” used to be a common phrase between developers in yesteryears. Some operations folks can also be less concerned by development challenges. Increased collaboration can help...

Read