Overview
Data transformation stages handle field-level operations like copying, renaming, deleting, and type conversion. These stages are essential for reshaping document structure and managing field values.CopyFields
Copies values from source fields to destination fields. Supports both flat field names and nested JSON paths.Mapping of source field names to destination field names. Destination can be:
- A single string field name
- A list of field names to copy to multiple destinations
How to handle existing destination field values:
overwrite, append, or skip.Cannot be used with isNested=true.Whether to treat field names as nested JSON paths. When true:
- Field names are split on
.to create nested structures updateModeis ignored (always overwrites)
updateMode.Example: Simple Field Copying
Example: Copy to Multiple Destinations
Example: Nested JSON Paths
RenameFields
Renames fields by moving values from source to destination and removing the source field.1-to-1 mapping of original field names to new field names. Must have at least one mapping.
How to handle existing destination field values:
overwrite, append, or skip.Example: Rename Fields
Unlike
CopyFields, RenameFields removes the source field after copying.DeleteFields
Removes specified fields from documents.List of field names to delete. At least one field must be specified.
Example: Remove Sensitive Data
Example: Clean Temporary Fields
SetStaticValues
Sets fields to static, predefined values.Mapping of field names to static values to assign.
How to handle existing field values:
overwrite, append, or skip.Example: Add Metadata
Example: Set Defaults
RemoveEmptyFields
Removes fields that have null or empty values.Example: Clean Empty Fields
RemoveDuplicateValues
Removes duplicate values from multivalued fields.List of fields to deduplicate.
Example: Deduplicate Tags
DropValues
Removes specific values from fields.List of fields to remove values from.
List of values to remove from the specified fields.
Example: Remove Placeholder Values
DropDocument
Marks documents for dropping from the pipeline based on conditions.Drop document if any of these fields are missing.
Drop document if any of these fields are present.
Example: Drop Incomplete Documents
Example: Filter Test Data
ParseJson
Parses JSON strings and extracts fields using JsonPath expressions.Field containing the JSON string to parse.
Mapping of destination field names to JsonPath expressions. If omitted, all JSON fields are copied to the document’s top level.
Whether the source field is base64 encoded. If true, the stage will decode before parsing.
How to handle existing destination field values.
Example: Parse All JSON Fields
Example: Extract Specific Fields
Example: Parse Base64 Encoded JSON
ParseDate
Parses date strings into standardized date fields.List of source fields containing date strings.
List of destination fields for parsed dates.
List of date format patterns to try parsing. Uses Java SimpleDateFormat syntax.
How to handle existing destination field values.
Example: Parse Multiple Date Formats
ParseFloats
Parses string values into floating-point numbers.List of fields to parse as floats.
Example: Parse Numeric Fields
ParseFilePath
Parses file paths and extracts components like filename, extension, and directory.Source field containing file path.
Destination field for filename (without extension).
Destination field for file extension.
Destination field for directory path.
Example: Extract File Components
NormalizeFieldNames
Normalizes field names by converting to lowercase and replacing spaces/special characters.Example: Standardize Field Names
ComputeFieldSize
Computes the size (number of values) of multivalued fields.List of source fields to measure.
List of destination fields for size values.
Example: Count Array Elements
Length
Computes the character length of string field values.List of source fields to measure.
List of destination fields for length values.
Example: Measure Text Length
Timestamp
Adds a timestamp field with the current processing time.Name of the field to store the timestamp.
Example: Add Processing Timestamp
Base64Decode
Decodes base64-encoded field values.List of fields to decode.