When working on a project, it’s common to add files or directories to your .gitignore […]
Understanding __repr__ vs __str__ in Python – What’s the Difference?
If you’re coming from a Java background like me, you’re probably used to overriding toString() […]
🔥 Delta Live Tables (DLT) vs. Pipelines classiques (Delta Tables + Structured Streaming + Batch) dans Databricks
Lorsque tu développes une pipeline de données dans Databricks, tu as deux choix principaux : […]
Understanding Fact Tables and Dimension Tables in a Dimensional Model
In a traditional data warehousing architecture—often guided by the Kimball methodology—data is organized around two […]
Databricks : Job Clusters VS All-Purpose Clusters
In Databricks, clusters are distributed environments used to execute tasks or workloads. There are two […]
To convert the type of column in Apache Spark, you use cast, not convert.
The cast function allows you to change the data type of column in a DataFrame […]
In apache spark the executors accept jobs from the driver or tasks from the driver ?
In Apache Spark, executors accept and execute tasks from the driver. Here’s a breakdown of […]
Apache spark glossary
Slot CPU Core, it is often associated with a CPU core. Each physical core of […]
Redshift LOCK
Overview There are three LOCK mode: AccessExclusiveLock: Acquired primarily during DDL operations, such as ALTER TABLE, DROP, or TRUNCATE. […]
Comprendre les Bases de Données Columnar
Une base de données columnar (ou colonne, en français) est une architecture de stockage de […]