Skip to Content

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Blogs

Tools

  • Apache Airflow  is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark  is a unified analytics engine for large-scale data processing
  • Apache Kafka  is a distributed streaming platform
  • Luigi  is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io  is a system for building modern data applications.
  • Prefect  includes everything you need to create and run data applications.
  • Metaflow  build and manage real-life data science projects with ease
  • lakeFS  build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

Last updated on