Miscellaneous — Quick Reference

Click any topic to expand it.

Serialization vs Deserialization

Serialization converts an in-memory object into a storable/transmittable format. Deserialization is the reverse — reconstructing the object from that format.

Object in memory          Serialized form          Object in memory
┌───────────────┐         ┌─────────────┐          ┌───────────────┐
│ User {        │──────▶  │ {"id":1,    │ ──────▶  │ User {        │
│   id: 1       │ serial  │  "name":    │ deserial │   id: 1       │
│   name: "M"   │         │  "Madhu"}   │          │   name: "M"   │
│ }             │         └─────────────┘          │ }             │
└───────────────┘                                   └───────────────┘

Common Serialization Formats

Format	Type	Human-readable	Speed	Size	Best for
JSON	Text	✅	Medium	Medium	REST APIs, config
XML	Text	✅	Slow	Large	Legacy enterprise, SOAP
YAML	Text	✅	Slow	Medium	Config files
CSV	Text	✅	Fast	Small	Tabular data
Protocol Buffers	Binary	❌	Very fast	Very small	gRPC, microservices
MessagePack	Binary	❌	Fast	Small	High-throughput APIs
Avro	Binary	❌	Fast	Small	Kafka schemas

Schema Evolution Problem

v1 of your class:    User { id, name }
v2 of your class:    User { id, name, email }

Serialized v1 object → deserialize with v2 code
What is email? → null? error? default value?

Solutions:
  Protobuf: field numbers — old fields ignored, new ones default
  JSON: missing keys → null or default (flexible but no enforcement)
  Avro: schema stored alongside data — full compatibility rules

Key interview distinction:

Serialization  → object to bytes/string    (write to disk, send over wire)
Deserialization → bytes/string to object   (read from disk, receive from wire)

Also called:  marshal / unmarshal  (Go, some other languages)
              pickle / unpickle   (Python)
              encode / decode     (general)

Hashing vs Encryption vs Encoding

Three completely different things that people often confuse. Encoding is not security. Hashing is one-way. Encryption is two-way.

┌───────────────┬──────────────┬──────────────┬───────────────────────────┐
│               │ Reversible?  │ Needs key?   │ Purpose                   │
├───────────────┼──────────────┼──────────────┼───────────────────────────┤
│ Encoding      │ ✅ Always    │ ❌           │ Format / representation   │
│ Hashing       │ ❌ Never     │ ❌           │ Integrity / fingerprint   │
│ Encryption    │ ✅ With key  │ ✅           │ Confidentiality / secrecy │
└───────────────┴──────────────┴──────────────┴───────────────────────────┘

Encoding

Transforms data into another representation. No security involved — anyone can reverse it.

Base64:   "Hello" → "SGVsbG8="        reverse: "SGVsbG8=" → "Hello"
URL:      "hello world" → "hello%20world"
ASCII:    'A' → 65

Use for:  Sending binary data over text channels (email, JSON)
          URL-safe string representation
NOT for:  Passwords, secrets — trivially reversible

Hashing

One-way mathematical function. Same input always → same output. Cannot go backwards.

SHA-256("password123") → "ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f"
SHA-256("password123") → "ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f"  (identical)
SHA-256("password124") → "7f4a9b5e..."  (completely different — avalanche effect)

bcrypt("password123") → "$2a$10$N9qo8uLOickgx2ZMRZo..."  (includes salt, slow by design)

Algorithm	Use for	Notes
SHA-256	File integrity, checksums, digital signatures	Fast — NOT for passwords
SHA-512	Same as SHA-256, larger output
MD5	Legacy checksums only	Broken for security use
bcrypt	Passwords	Deliberately slow, built-in salt
Argon2	Passwords (modern)	Winner of Password Hashing Competition
HMAC	Message authentication	Hash + secret key = tamper detection

Never use SHA-256/MD5 for passwords — they're too fast, making brute-force trivial. Always use bcrypt, Argon2, or scrypt for password hashing.

Encryption

Two-way — encrypted data can be decrypted with the right key.

Symmetric (one shared key):
  AES-256-GCM("secret message", key) → ciphertext
  AES-256-GCM(ciphertext, same_key)  → "secret message"

  Fast. Problem: how do you securely share the key?

Asymmetric (public + private key pair):
  Encrypt with public key  → only private key can decrypt
  Sign with private key    → anyone with public key can verify

  RSA, ECDSA, Ed25519
  Used in: TLS handshake, JWT signatures, SSH

The Mental Model

Encoding   → like translating English to French — everyone can translate back
Hashing    → like a fingerprint — you can't reconstruct the person from it
Encryption → like a locked box — only the key holder can open it

Parsing

Parsing converts raw text or bytes into a structured data structure your code can work with.

Raw input (string/bytes)    →    [Parser]    →    Structured data
'{"name":"Madhu","age":28}'       JSON              { name: "Madhu", age: 28 }
"<h1>Hello</h1>"                  HTML              DOM tree
"SELECT * FROM users"             SQL               AST (Abstract Syntax Tree)
"2024-01-15"                      Date              Date object

Stages: Lexing → Parsing

Raw text: "x = 1 + 2"

Stage 1 — Lexer (tokeniser):
  Breaks text into tokens (meaningful units)
  ["x", "=", "1", "+", "2"]
  → ["IDENTIFIER:x", "OPERATOR:=", "NUMBER:1", "OPERATOR:+", "NUMBER:2"]

Stage 2 — Parser:
  Applies grammar rules to build a tree
      Assignment
      ├── Identifier: x
      └── Addition
          ├── Number: 1
          └── Number: 2

Types of Parsers

Type	Example	Notes
JSON parser	`JSON.parse()`, `json.loads()`	Strict format — throws on invalid JSON
HTML parser	BeautifulSoup, DOMParser	Lenient — browsers recover from bad HTML
CSV parser	csv module, Papa Parse	Handle quoting, delimiters, newlines in fields
Regex parser	Custom extraction	Fragile for complex formats — avoid for HTML
Recursive descent	Language compilers	Handwritten, matches grammar rules recursively

Common Parsing Pitfalls

Encoding issues:  file is UTF-8 but parsed as Latin-1 → garbled text
Escape sequences: "He said \"hello\"" → parser must handle escaped quotes
Newlines in fields: CSV with commas inside quoted fields
Streaming vs batch: large files can't fit in memory → need streaming parser
Malicious input: billion laughs (XML), deeply nested JSON → stack overflow / DoS

Parsing vs Serialization

Serialization → object to string   (you produce the string)
Parsing       → string to object   (you consume the string)

They're inverses, but "parsing" usually implies the consuming direction
and often handling arbitrary/untrusted input with error handling.

SSL vs TLS vs HTTPS

SSL is deprecated. TLS is what's actually used today. HTTPS is HTTP transported over TLS. The term "SSL certificate" is a misnomer that stuck — they're actually TLS certificates.

The Timeline

SSL 1.0 (1994)  — Never released publicly (security flaws)
SSL 2.0 (1995)  — Deprecated 2011
SSL 3.0 (1996)  — Deprecated 2015 (POODLE attack)
TLS 1.0 (1999)  — Deprecated 2020
TLS 1.1 (2006)  — Deprecated 2020
TLS 1.2 (2008)  — Still widely used ✅
TLS 1.3 (2018)  — Current standard ✅ (faster handshake, stronger ciphers)

HTTPS = HTTP  +  TLS
        (app    (transport
        layer)   security)

HTTP  → port 80, unencrypted, data visible to anyone between client and server
HTTPS → port 443, TLS-encrypted, data unreadable in transit

TLS Handshake (Simplified)

Client                              Server
  │                                  │
  │── ClientHello ─────────────────▶ │  "I support TLS 1.3, here are my cipher suites"
  │                                  │
  │◀─ ServerHello + Certificate ──── │  "Use TLS 1.3 + AES-GCM. Here's my cert."
  │                                  │
  │   [Client verifies cert against  │
  │    trusted CA list]              │
  │                                  │
  │── Key exchange ────────────────▶ │  (Diffie-Hellman — both derive same key)
  │                                  │
  │◀──────── Encrypted from here ───▶│  All further traffic is encrypted

TLS 1.3 vs TLS 1.2

	TLS 1.2	TLS 1.3
Handshake round trips	2 RTT	1 RTT (faster)
0-RTT resumption	❌	✅ (reconnect with zero extra latency)
Removed weak ciphers	No	Yes (RC4, DES, SHA-1 gone)
Forward secrecy	Optional	Mandatory

Certificates

A TLS certificate contains:
  - Domain name it's valid for (e.g., *.example.com)
  - Public key of the server
  - Issuer (Certificate Authority — Let's Encrypt, DigiCert, etc.)
  - Expiry date
  - Digital signature from the CA

The CA's signature lets the browser verify:
  "This cert was issued by a trusted authority for this domain"
  → The server is who it claims to be

Self-signed cert:  Server signs its own cert — no trusted CA
                   Browser warns: "Your connection is not private"
                   OK for internal/dev, never for production

One-liner summary

SSL     → Old, broken, do not use
TLS     → The actual protocol securing internet traffic today
HTTPS   → HTTP + TLS — the encrypted web
"SSL certificate" → Really a TLS certificate. The name just stuck.

Authentication vs Authorization

Authentication (AuthN) = verifying identity — who are you? Authorization (AuthZ) = verifying permissions — what are you allowed to do?

Authentication:           Authorization:
─────────────────         ───────────────────────────────
"I'm Madhu"               "Madhu can read /api/posts"
Prove it → token          "Madhu cannot DELETE /api/users"
Identity verified ✅       Permission checked ✅

The Sequence

Request hits server
      │
      ▼
Is there a valid token/session?  ← Authentication
  No  → 401 Unauthorized  ("Who are you? Please log in")
  Yes ↓
Does this user have permission?  ← Authorization
  No  → 403 Forbidden     ("I know who you are, but you can't do this")
  Yes ↓
Handle the request ✅

The HTTP status codes 401 and 403 have misleading names: - 401 Unauthorized actually means Unauthenticated — "I don't know who you are" - 403 Forbidden actually means Unauthorized — "I know you, but you're not allowed"

Authentication Methods

Method	How	Notes
Session + Cookie	Server stores session, sends cookie with session ID	Stateful — server holds session
JWT Bearer Token	Self-contained signed token in `Authorization` header	Stateless — server verifies signature
API Key	Static secret in header (`X-API-Key`)	Simple but hard to rotate
OAuth 2.0	Delegated access — user grants app permission	For third-party access
mTLS	Both client and server present certificates	Strongest — microservices
Passkeys / WebAuthn	Biometric + device-based	Phishing-resistant, passwordless

Authorization Models

RBAC — Role-Based Access Control:
  User has a role → role has permissions
  user → role:admin → can(DELETE /users)
  user → role:viewer → cannot(DELETE /users)
  Simple, widely used

ABAC — Attribute-Based Access Control:
  Permissions based on attributes of user, resource, environment
  "User in department=engineering AND resource.owner=user AND time<18:00"
  Flexible but complex

ReBAC — Relationship-Based Access Control:
  Permissions based on relationships in a graph
  "User can edit document if user is owner OR user is in editors list"
  Google Zanzibar model — used by Google Drive, GitHub

JWT Deep Dive

Header.Payload.Signature

Payload (base64 decoded — NOT encrypted, anyone can read):
{
  "sub": "user_42",          ← subject (user ID)
  "role": "admin",
  "exp": 1704067200,         ← expiry (Unix timestamp)
  "iat": 1704063600          ← issued at
}

Server verifies: HMAC_SHA256(header + "." + payload, secret) == signature
If yes → token is valid and unmodified
Check exp → is it expired?
Check role → is user authorized for this endpoint?

Kafka vs Redis vs RabbitMQ vs AMQP

AMQP is a protocol. RabbitMQ implements it. Kafka and Redis are different tools solving overlapping but distinct problems.

At a Glance

┌─────────────────┬────────────────────────────────────────────────────────┐
│ Kafka           │ Distributed commit log / event streaming platform      │
│                 │ High throughput, durable, replayable, partitioned      │
├─────────────────┼────────────────────────────────────────────────────────┤
│ RabbitMQ        │ Traditional message broker (implements AMQP)           │
│                 │ Complex routing, push-based, messages deleted on ACK   │
├─────────────────┼────────────────────────────────────────────────────────┤
│ Redis (Pub/Sub) │ In-memory pub/sub — fire and forget, no persistence    │
│ Redis Streams   │ Persistent log in Redis, lighter-weight Kafka          │
├─────────────────┼────────────────────────────────────────────────────────┤
│ AMQP            │ Protocol (not a product) — like HTTP is to web servers │
│                 │ RabbitMQ, ActiveMQ, Azure Service Bus implement it     │
└─────────────────┴────────────────────────────────────────────────────────┘

Feature Comparison

Feature	Kafka	RabbitMQ	Redis Pub/Sub	Redis Streams
Persistence	✅ Disk	✅ Optional	❌ In-memory	✅ In-memory + RDB
Message replay	✅ Retention window	❌	❌	✅
Message ordering	Per-partition	Per-queue	❌	Per-stream
Throughput	Very high	Moderate	Very high	High
Routing	Partition key	Exchanges + bindings	Channel name	Stream key
Consumer groups	✅	✅ (competing consumers)	❌	✅
Protocol	Custom (Kafka protocol)	AMQP	Redis protocol	Redis protocol
Best for	Event streaming, audit log	Task queues, complex routing	Real-time pub/sub (e.g. live updates)	Lightweight event log

AMQP Routing Model (RabbitMQ)

Producer → Exchange → (routing rules) → Queue → Consumer

Exchange types:
  Direct:  route by exact routing key
  Fanout:  broadcast to ALL bound queues (ignore key)
  Topic:   route by pattern matching  (e.g. "orders.*")
  Headers: route by message headers

When to Use What

Kafka:
  → High throughput event streaming (millions/sec)
  → Need to replay messages / audit trail
  → Multiple independent consumers reading same events
  → Event sourcing, CDC (Change Data Capture)

RabbitMQ:
  → Task queues with complex routing logic
  → Need push delivery (server pushes to consumer)
  → RPC-style request-reply pattern
  → Already in an AMQP ecosystem

Redis Pub/Sub:
  → Real-time broadcast (WebSocket presence, live notifications)
  → Loss of messages is acceptable (fire and forget)
  → Ultra-low latency, in-memory only

Redis Streams:
  → Need Kafka-like semantics but within existing Redis infra
  → Lighter workload, don't need Kafka's operational complexity

Backpressure & Piggybacking

Backpressure

Backpressure is a mechanism for a consumer to signal to its upstream producer to slow down — preventing the consumer from being overwhelmed.

Without backpressure:
  Producer (300 msg/s) ──▶ Queue ──▶ Consumer (200 msg/s)
                            │
                            Queue grows 100 msg/s → memory exhausted → crash

With backpressure:
  Producer (300 msg/s) ──▶ [FULL SIGNAL] ──▶ Producer slows to 200 msg/s
  Consumer (200 msg/s) ←── balanced ─────────────────────────────────────

Backpressure Strategies

Strategy	Mechanism	Trade-off
Reject / error	Return 429 / error to producer	Simple, producer must handle retry
Block	Producer call blocks until consumer ready	Easy but stalls producer thread
Drop	Silently discard excess messages	Fast, but data loss — only for non-critical
Buffer with limit	Queue up to N messages, then apply one of above	Absorbs short bursts
Rate limit	Token bucket / leaky bucket at producer	Smooth, controlled flow

In practice (code side)

# Async Python — backpressure via bounded queue
import asyncio

queue = asyncio.Queue(maxsize=100)  # ← bounded — blocks producer when full

async def producer():
    for item in data_stream:
        await queue.put(item)    # blocks if queue is full (backpressure!)

async def consumer():
    while True:
        item = await queue.get()
        await process(item)
        queue.task_done()

Backpressure in Kafka

Kafka doesn't push to consumers — consumers pull.
This is inherently backpressure-friendly:
  Consumer controls its own rate of consumption.
  If consumer is slow → it just pulls less frequently.
  Queue (partition) grows → that's fine, Kafka handles it durably.

The problem: if partition grows unboundedly → disk fills up.
Solution: monitor lag (consumer_lag metric) + autoscale consumers.

Piggybacking

Piggybacking attaches acknowledgements (or control data) onto outgoing data packets going in the opposite direction, rather than sending a separate ACK packet. Saves a round-trip.

Without piggybacking:
  A ──── Data ────────────────▶ B
  A ◀─── ACK (separate packet) ─ B   ← extra packet, extra latency

With piggybacking:
  A ──── Data ────────────────▶ B
  B ──── Data + ACK ──────────▶ A   ← ACK rides along on B's next data packet

Where Piggybacking Appears

TCP:
  TCP delays ACKs by up to 200ms (Delayed ACK) waiting for
  outgoing data to piggyback on. If data arrives, ACK goes with it.

HTTP/2:
  SETTINGS_ACK, PING_ACK frames are piggybacked on data frames
  where possible.

Sliding Window Protocols:
  Receiver's window update (flow control) piggybacked on data frame
  heading back to sender.

Piggybacking vs Backpressure

Backpressure:  flow control — slow down the sender
Piggybacking:  efficiency — combine ACK with data to save packets

Different problems, different layers.
Piggybacking is a TCP/data-link optimisation.
Backpressure is an application/system architecture concern.

UNIX Socket vs TCP/IP Stack

Unix domain sockets communicate between processes on the same machine entirely in kernel space — no network stack involved. TCP/IP goes through the full network stack, even for loopback (127.0.0.1).

The Architecture Difference

TCP localhost (127.0.0.1):
  App A → socket syscall
        → TCP layer (segmentation, sequencing)
        → IP layer (routing, headers)
        → Loopback interface (lo)
        → IP layer
        → TCP layer
        → App B
  (full network stack — just never leaves the machine)

Unix domain socket (/tmp/app.sock):
  App A → socket syscall
        → Kernel buffer (direct memory copy)
        → App B
  (skips TCP/IP entirely)

Performance

Benchmark (rough numbers — vary by system):
  TCP localhost:    ~40–60 μs latency, ~800 MB/s throughput
  Unix socket:      ~20–30 μs latency, ~2 GB/s throughput

Why faster?
  No TCP headers to construct/parse
  No IP routing decisions
  No checksum calculation (kernel handles integrity itself)
  No port allocation overhead
  Fewer syscalls in some implementations

Feature Comparison

Feature	Unix Socket	TCP localhost
Same machine only	✅ Required	❌ Can cross machines
Network overhead	❌ None	✅ Full stack
Speed	Faster	Slower
File system path	✅ e.g. `/run/nginx.sock`	❌ IP:port
Permission control	✅ File system ACLs	❌ IP-based only
Works across hosts	❌	✅
Port required	❌	✅

Real-World Usage

Nginx → PHP-FPM:    fastcgi_pass unix:/run/php/php8.2-fpm.sock;
Nginx → Gunicorn:   proxy_pass http://unix:/run/gunicorn.sock;
PostgreSQL (local): psql connects via /var/run/postgresql/.s.PGSQL.5432
Redis (local):      redis-cli -s /var/run/redis/redis.sock
Docker daemon:      /var/run/docker.sock

When to Use Which

Unix socket:
  ✅ Two processes always on the same host
  ✅ Maximum local throughput (e.g. web server ↔ app server)
  ✅ Want file-permission-based access control
  ✅ Containerised app where both services are in same pod

TCP/IP:
  ✅ Services may be on different machines
  ✅ Microservices across a network
  ✅ Need to switch between local and remote without code changes
  ✅ Easier firewall / load balancer integration

Redirect vs Webhook

Redirect tells the client "go look over there" — the client drives the follow-up. Webhook is the server proactively calling your URL when something happens — server drives the notification.

Redirect

Client ──── GET /old-page ────────────▶ Server A
Client ◀─── 301 Location: /new-page ── Server A
Client ──── GET /new-page ────────────▶ Server A   ← client follows automatically

The client is the one making the second request.
The server is passive — it just tells the client where to go.

Common Redirect Use Cases

Code	Type	Example
`301`	Permanent	Old domain → new domain migration
`302`	Temporary	A/B test, maintenance page
`307`	Temporary (method preserved)	POST redirect keeping method
`308`	Permanent (method preserved)	POST endpoint permanently moved
OAuth	Login flow	After auth, redirect to `?code=xyz`

OAuth Redirect Flow

Your app ──── redirect user to Google OAuth ────▶ Google
User logs in at Google
Google ──── redirect back with code ────────────▶ Your app (/callback?code=abc)
Your app exchanges code for access token (server-to-server, no redirect)

Webhook

Your server registers: "call https://myapp.com/webhook when payment succeeds"

[Payment succeeds at Stripe]

Stripe ──── POST https://myapp.com/webhook ────▶ Your server
Body: { "event": "payment.succeeded", "amount": 5000, "id": "ch_abc" }

Your server ──── 200 OK ────────────────────────▶ Stripe

The external service is the one initiating the call.
Your server is the passive receiver.

Webhook Best Practices

Verify signatures:
  Stripe sends: Stripe-Signature: t=timestamp,v1=HMAC_SHA256(payload, secret)
  You verify the HMAC before trusting the payload.
  Without this → anyone can POST fake events to your webhook URL.

Respond immediately (200 OK), process async:
  Webhook caller has a short timeout (~30s).
  If your processing takes longer → queue the job, return 200 right away.

Idempotency:
  Webhooks may be delivered more than once (retries on network failure).
  Store event IDs → skip if already processed.

Retry handling:
  If you return 5xx, the sender retries (usually with backoff).
  Design for this — don't double-charge, double-send emails, etc.

Redirect vs Webhook — Side by Side

┌─────────────────┬────────────────────────────┬────────────────────────────┐
│                 │ Redirect                   │ Webhook                    │
├─────────────────┼────────────────────────────┼────────────────────────────┤
│ Who initiates   │ Client follows the redirect│ External server calls you  │
│ Direction       │ Client → Server            │ Server → Your server       │
│ Trigger         │ HTTP response code         │ Event on the other system  │
│ Real-time       │ Synchronous                │ Async / event-driven       │
│ Common use      │ URL changes, OAuth flows   │ Payment events, CI/CD      │
│                 │ www → non-www              │ triggers, notifications    │
└─────────────────┴────────────────────────────┴────────────────────────────┘