When working on a project, it’s common to add files or directories to your .gitignore […]
Understanding __repr__ vs __str__ in Python – What’s the Difference?
If you’re coming from a Java background like me, you’re probably used to overriding toString() […]
🔥 Delta Live Tables (DLT) vs. Pipelines classiques (Delta Tables + Structured Streaming + Batch) dans Databricks
Lorsque tu dĂ©veloppes une pipeline de donnĂ©es dans Databricks, tu as deux choix principaux : […]
Understanding Fact Tables and Dimension Tables in a Dimensional Model
In a traditional data warehousing architecture—often guided by the Kimball methodology—data is organized around two […]
To convert the type of column in Apache Spark, you use cast, not convert.
The cast function allows you to change the data type of column in a DataFrame […]
In apache spark the executors accept jobs from the driver or tasks from the driver ?
In Apache Spark, executors accept and execute tasks from the driver. Here’s a breakdown of […]
Apache spark glossary
Slot CPU Core, it is often associated with a CPU core. Each physical core of […]
Redshift LOCK
Overview There are three LOCK mode: AccessExclusiveLock: Acquired primarily during DDL operations, such as ALTER TABLE, DROP, or TRUNCATE. […]
Comprendre les Bases de Données Columnar
Une base de donnĂ©es columnar (ou colonne, en français) est une architecture de stockage de […]
Comprendre le Pivotement en SQL avec le mot clé PIVOT
Le pivotement en SQL est une technique puissante pour transformer les donnĂ©es de lignes en […]