The cast
function allows you to change the data type of column in a DataFrame to the desired type.
Cast Function Overview
The cast
function is used to change the data type of column to a specified type. The type can be any valid Spark data type.
Example:
from pyspark.sql.functions import col
# Change the column "age" to IntegerType
df = df.withColumn("age", col("age").cast("int"))
Common Types You Can Cast To:
- StringType (
"string"
): Converts the column to a string. - IntegerType (
"int"
,"integer"
): Converts the column to an integer. - LongType (
"long"
): Converts the column to a long integer. - FloatType (
"float"
): Converts the column to a float. - DoubleType (
"double"
): Converts the column to a double. - BooleanType (
"boolean"
): Converts the column to a boolean (True/False). - DateType (
"date"
): Converts the column to a date type. - TimestampType (
"timestamp"
): Converts the column to a timestamp (date and time).
Full Example:
from pyspark.sql.functions import col
# Convert different columns to different types
df = df.withColumn("age", col("age").cast("int")) \
.withColumn("price", col("price").cast("double")) \
.withColumn("created_at", col("created_at").cast("timestamp"))
List of Valid Data Types:
Here’s a list of valid data types you can cast a column to in Spark:
- “string”
- “int” or “integer”
- “long”
- “float”
- “double”
- “boolean”
- “date”
- “timestamp”
- “binary”
You can cast between these types depending on your needs. For example, casting a string to a date or a number to a string.