Introduction to a Career in Data Engineering

Written by sarthak-12 | Published 2021/11/18
Tech Story Tags: data-engineering | database | engineering | data-science | data | data-analysis | data-engineer

TLDRData Engineers create a Data Pipeline that prepares data for the task at hand. Structured Data is organized into tables with rows and columns. Semi-structured data is in the form of XML, CSV or PDF files. Binary Data is the most organized type of data and is the least organized type. Data is more valuable than anything in the world today and the companies who own data are the ones exerting their authority over the inside and outside worlds of technology. Data engineers create a three-step process: Acquiring Data, Ingesting Data and Saving Data into a Data Lake. Saving Data in a Data Warehouse is transformed for a specific purpose.via the TL;DR App

Data is the most powerful asset in the world today.

As fantastic as it may seem, data is more valuable than anything in the world today and the companies who own data are the ones exerting their authority over the inside and outside worlds of technology. This is the reason the job of a Data Engineer has become all the more enticing to the new crop of engineers coming out of college.

What is a Data Engineer?

Def. dumbed down - A person who collects all raw data; transforms and maintains it in databases on computers for task-specific usage.

A valuable asset for anyone looking to break into the Data Engineering field is understanding the different types of data which are as follows:

  • Structured Data → Data is organized into tables with rows and columns.

  • Semi-structured Data → Data in the form of XML, CSV, or JSON files

  • Unstructured Data → Data from Emails / PDFs

  • Binary Data → Data such as audio or image files.

The level of organization within these types of data follows a top to bottom order with Structured Data being the most organized and Binary Data being the least.

Data Pipeline

Data Engineers create a Data Pipeline that prepares data for the task at hand.

Creating a data pipeline is a three-step process:

  1. Acquiring Data
  2. Data Ingestion - Ingesting Data into a Data Lake
  3. Data Transformation - Saving Data into a Data Warehouse

The data in a Data Lake is in its raw form, while the data in Data Warehouse is transformed for a specific purpose.

A data scientist would prefer the Data Lake since it has more data. A data analyst would prefer the Data Warehouse as it has data specific to the job.

ETL → “Extract, Transform, and Load” is the pipeline used by Data Engineers

Types of Databases

The different types of Databases are as follows:

  1. Relational Databases - Postgres, MySQL

  2. NoSQL - MongoDB

  3. NewSQL Databases - Vault DB

  4. Search Databases - Elastic Search

  5. Computation Databases - Apache Spark

Data engineering is a more and more enticing career option. Hopefully, anyone looking to break into the Data Engineering field found this story helpful.


Published by HackerNoon on 2021/11/18