Documentation Index Fetch the complete documentation index at: https://mintlify.com/PDAL/python/llms.txt
Use this file to discover all available pages before exploring further.
Quick start
This guide will walk you through creating and executing your first PDAL pipeline to process point cloud data with Python.
Your first pipeline
Let’s start with a simple example that reads a LAS file and sorts it by the X dimension.
JSON pipeline approach
You can define a pipeline using a JSON string:
import pdal
json = """
{
"pipeline": [
"1.2-with-color.las",
{
"type": "filters.sort",
"dimension": "X"
}
]
}"""
pipeline = pdal.Pipeline(json)
count = pipeline.execute()
arrays = pipeline.arrays
metadata = pipeline.metadata
log = pipeline.log
print ( f "Processed { count } points" )
Programmatic pipeline approach
Alternatively, you can build pipelines programmatically using Python objects and the pipe operator:
import pdal
pipeline = pdal.Reader( "1.2-with-color.las" ) | pdal.Filter.sort( dimension = "X" )
count = pipeline.execute()
print ( f "Processed { count } points" )
Both approaches produce identical results. The programmatic approach is often more readable for complex pipelines.
Working with arrays
PDAL Python converts point cloud data into NumPy structured arrays, making it easy to work with point attributes:
import pdal
# Read point cloud data
data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true"
pipeline = pdal.Reader.las( filename = data).pipeline()
count = pipeline.execute()
print ( f "Read { count } points" ) # 1065 points
# Access the array
arr = pipeline.arrays[ 0 ]
print (arr.dtype) # Shows available dimensions: X, Y, Z, Intensity, etc.
# Filter with NumPy
intensity_filtered = arr[arr[ "Intensity" ] > 30 ]
print ( f "After NumPy filter: { len (intensity_filtered) } points" ) # 704 points
The array is a NumPy structured array with fields for each dimension (X, Y, Z, Intensity, Classification, etc.).
Combining PDAL and NumPy
You can mix PDAL operations with NumPy processing in the same workflow:
import pdal
data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true"
# Step 1: Read data with PDAL
pipeline = pdal.Reader.las( filename = data).pipeline()
pipeline.execute()
arr = pipeline.arrays[ 0 ]
# Step 2: Filter with NumPy
intensity = arr[arr[ "Intensity" ] > 30 ]
print ( f "After NumPy filter: { len (intensity) } points" ) # 704 points
# Step 3: Process filtered data with PDAL
pipeline = pdal.Filter.expression(
expression = "Intensity >= 100 && Intensity < 300"
).pipeline(intensity)
pipeline.execute()
clamped = pipeline.arrays[ 0 ]
print ( f "After PDAL filter: { len (clamped) } points" ) # 387 points
Writing output
You can write processed point clouds to various formats:
import pdal
# Build a pipeline with a writer
pipeline = (
pdal.Reader.las( "input.las" )
| pdal.Filter.sort( dimension = "X" )
| pdal.Writer.las(
filename = "output.las" ,
offset_x = "auto" ,
offset_y = "auto" ,
offset_z = "auto" ,
scale_x = 0.01 ,
scale_y = 0.01 ,
scale_z = 0.01 ,
)
)
count = pipeline.execute()
print ( f "Wrote { count } points" )
Stage types
PDAL pipelines are built from three types of stages:
Readers
Readers load point cloud data from files or URLs:
# Explicit reader type
reader = pdal.Reader.las( filename = "data.las" )
# Automatic type inference from filename
reader = pdal.Reader( "data.las" )
# Reader with options
reader = pdal.Reader.las(
filename = "data.laz" ,
spatialreference = "EPSG:4326"
)
Filters
Filters transform point cloud data:
# Sort by dimension
filter1 = pdal.Filter.sort( dimension = "Z" )
# Filter by expression
filter2 = pdal.Filter.expression( expression = "Classification == 2" )
# Compute statistics
filter3 = pdal.Filter.stats()
# Chain multiple filters
pipeline = reader | filter1 | filter2 | filter3
Writers
Writers save point cloud data to files:
# LAS writer
writer1 = pdal.Writer.las( filename = "output.las" )
# TileDB writer
writer2 = pdal.Writer.tiledb( array_name = "output_array" )
# Multiple writers in one pipeline
pipeline = reader | filter1 | writer1 | writer2
Streaming large datasets
For large point clouds that don’t fit in memory, use streaming execution:
import pdal
pipeline = (
pdal.Reader( "large-file.las" )
| pdal.Filter.expression( expression = "Intensity > 80 && Intensity < 120" )
)
# Process in chunks of 500 points
for array in pipeline.iterator( chunk_size = 500 ):
print ( f "Processing chunk with { len (array) } points" )
# Process each chunk...
If you don’t need to access the point data (for example, when using writers), use execute_streaming() for better performance:
pipeline = (
pdal.Reader( "input.laz" )
| pdal.Filter.expression( expression = "Classification == 2" )
| pdal.Writer.las( filename = "output.las" )
)
# Stream processing without allocating arrays
count = pipeline.execute_streaming( chunk_size = 1000000 )
print ( f "Processed { count } points" )
Next steps
Now that you’ve created your first PDAL pipeline, explore more advanced features:
Pipeline API Learn about all Pipeline methods and properties
Stage objects Explore Readers, Filters, and Writers
Working with arrays Deep dive into NumPy array operations
Streaming Process massive datasets efficiently