In most cases, a change to an application’s features also requires a change to the data it stores.The challenge: How do we manage schema changes when old and new code need to coexist?Key concepts:
Backward compatibility: New code can read data written by old code
Forward compatibility: Old code can read data written by new code
This chapter explores how different encoding formats handle schema evolution.
Text-based formats that are human-readable and language-independent.Advantages:
Human readable
Language independent
Widely supported
Disadvantages:
Ambiguity with numbers
No binary string support
Verbose, larger size
Schema support varies
Problems with JSON/XML:
import json# Problem: Large integers lose precisionlarge_number = 9007199254740993encoded = json.dumps({'id': large_number})decoded = json.loads(encoded)print(f"Original: {large_number}")print(f"After JSON: {decoded['id']}")# May not be equal depending on implementation!# Problem: No binary data supportbinary_data = b'\x00\x01\x02\xff'# json.dumps({'data': binary_data}) # TypeError!# Workaround: Base64 encodeimport base64encoded_binary = base64.b64encode(binary_data).decode('ascii')json_safe = json.dumps({'data': encoded_binary})# But now it's 33% larger and not human-readable
Rule: New fields must be optional or have default values for backward compatibility.
// Schema v1message Person { required string name = 1; optional int32 age = 2;}// Schema v2 - Adding email fieldmessage Person { required string name = 1; optional int32 age = 2; optional string email = 3; // NEW - must be optional!}
# Example: Backward compatibility# Old code writes data (v1 schema)old_data = encode_v1(Person(name="Alice", age=30))# New code reads data (v2 schema)person = decode_v2(old_data)print(person.name) # "Alice" ✓print(person.age) # 30 ✓print(person.email) # None (default) ✓
// Schema v1message Person { required string name = 1; optional int32 age = 2; // Will remove this optional string email = 3;}// Schema v2 - Removing agemessage Person { required string name = 1; // optional int32 age = 2; // REMOVED optional string email = 3; // Can never use tag 2 again!}
Some type changes are safe (like int32 to int64 for forward compatibility), but they may lose precision in backward compatibility. Always test thoroughly.
Example of safe type change:
// Schema v1message Person { optional int32 age = 2; // 32-bit integer}// Schema v2message Person { optional int64 age = 2; // 64-bit integer}
Avro’s key insight: Need both schemas to decode data!How Avro resolves schemas: The reader compares the writer’s schema with its own schema and maps fields by name.Schema resolution rules:
Writer field present, reader field present: Match by name, convert if types compatible
Writer field present, reader field absent: Ignore field
Writer field absent, reader field present: Use default value or null
GET /api/users/123 HTTP/1.1Host: example.comResponse:{ "id": 123, "name": "Alice", "email": "alice@example.com"}
RPC example (gRPC):
// Service definitionservice UserService { rpc GetUser(UserRequest) returns (UserResponse);}message UserRequest { int32 user_id = 1;}message UserResponse { int32 id = 1; string name = 2; string email = 3;}
# Client code looks like local function callresponse = user_service.GetUser(UserRequest(user_id=123))print(response.name) # "Alice"
RPC hides the differences between network calls and local calls, which can make debugging harder. Network calls are unpredictable, may fail, may timeout, and idempotency matters.