Skip to content

Miscellaneous — Quick Reference

Click any topic to expand it.


Serialization vs Deserialization

Serialization converts an in-memory object into a storable/transmittable format. Deserialization is the reverse — reconstructing the object from that format.

Object in memory          Serialized form          Object in memory
┌───────────────┐         ┌─────────────┐          ┌───────────────┐
│ User {        │──────▶  │ {"id":1,    │ ──────▶  │ User {        │
│   id: 1       │ serial  │  "name":    │ deserial │   id: 1       │
│   name: "M"   │         │  "Madhu"}   │          │   name: "M"   │
│ }             │         └─────────────┘          │ }             │
└───────────────┘                                   └───────────────┘

Common Serialization Formats

Format Type Human-readable Speed Size Best for
JSON Text Medium Medium REST APIs, config
XML Text Slow Large Legacy enterprise, SOAP
YAML Text Slow Medium Config files
CSV Text Fast Small Tabular data
Protocol Buffers Binary Very fast Very small gRPC, microservices
MessagePack Binary Fast Small High-throughput APIs
Avro Binary Fast Small Kafka schemas

Schema Evolution Problem

v1 of your class:    User { id, name }
v2 of your class:    User { id, name, email }

Serialized v1 object → deserialize with v2 code
What is email? → null? error? default value?

Solutions:
  Protobuf: field numbers — old fields ignored, new ones default
  JSON: missing keys → null or default (flexible but no enforcement)
  Avro: schema stored alongside data — full compatibility rules

Key interview distinction:

Serialization  → object to bytes/string    (write to disk, send over wire)
Deserialization → bytes/string to object   (read from disk, receive from wire)

Also called:  marshal / unmarshal  (Go, some other languages)
              pickle / unpickle   (Python)
              encode / decode     (general)

Hashing vs Encryption vs Encoding

Three completely different things that people often confuse. Encoding is not security. Hashing is one-way. Encryption is two-way.

┌───────────────┬──────────────┬──────────────┬───────────────────────────┐
│               │ Reversible?  │ Needs key?   │ Purpose                   │
├───────────────┼──────────────┼──────────────┼───────────────────────────┤
│ Encoding      │ ✅ Always    │ ❌           │ Format / representation   │
│ Hashing       │ ❌ Never     │ ❌           │ Integrity / fingerprint   │
│ Encryption    │ ✅ With key  │ ✅           │ Confidentiality / secrecy │
└───────────────┴──────────────┴──────────────┴───────────────────────────┘

Encoding

Transforms data into another representation. No security involved — anyone can reverse it.

Base64:   "Hello" → "SGVsbG8="        reverse: "SGVsbG8=" → "Hello"
URL:      "hello world" → "hello%20world"
ASCII:    'A' → 65

Use for:  Sending binary data over text channels (email, JSON)
          URL-safe string representation
NOT for:  Passwords, secrets — trivially reversible

Hashing

One-way mathematical function. Same input always → same output. Cannot go backwards.

SHA-256("password123") → "ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f"
SHA-256("password123") → "ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f"  (identical)
SHA-256("password124") → "7f4a9b5e..."  (completely different — avalanche effect)

bcrypt("password123") → "$2a$10$N9qo8uLOickgx2ZMRZo..."  (includes salt, slow by design)
Algorithm Use for Notes
SHA-256 File integrity, checksums, digital signatures Fast — NOT for passwords
SHA-512 Same as SHA-256, larger output
MD5 Legacy checksums only Broken for security use
bcrypt Passwords Deliberately slow, built-in salt
Argon2 Passwords (modern) Winner of Password Hashing Competition
HMAC Message authentication Hash + secret key = tamper detection

Never use SHA-256/MD5 for passwords — they're too fast, making brute-force trivial. Always use bcrypt, Argon2, or scrypt for password hashing.

Encryption

Two-way — encrypted data can be decrypted with the right key.

Symmetric (one shared key):
  AES-256-GCM("secret message", key) → ciphertext
  AES-256-GCM(ciphertext, same_key)  → "secret message"

  Fast. Problem: how do you securely share the key?

Asymmetric (public + private key pair):
  Encrypt with public key  → only private key can decrypt
  Sign with private key    → anyone with public key can verify

  RSA, ECDSA, Ed25519
  Used in: TLS handshake, JWT signatures, SSH

The Mental Model

Encoding   → like translating English to French — everyone can translate back
Hashing    → like a fingerprint — you can't reconstruct the person from it
Encryption → like a locked box — only the key holder can open it

Parsing

Parsing converts raw text or bytes into a structured data structure your code can work with.

Raw input (string/bytes)    →    [Parser]    →    Structured data
'{"name":"Madhu","age":28}'       JSON              { name: "Madhu", age: 28 }
"<h1>Hello</h1>"                  HTML              DOM tree
"SELECT * FROM users"             SQL               AST (Abstract Syntax Tree)
"2024-01-15"                      Date              Date object

Stages: Lexing → Parsing

Raw text: "x = 1 + 2"

Stage 1 — Lexer (tokeniser):
  Breaks text into tokens (meaningful units)
  ["x", "=", "1", "+", "2"]
  → ["IDENTIFIER:x", "OPERATOR:=", "NUMBER:1", "OPERATOR:+", "NUMBER:2"]

Stage 2 — Parser:
  Applies grammar rules to build a tree
      Assignment
      ├── Identifier: x
      └── Addition
          ├── Number: 1
          └── Number: 2

Types of Parsers

Type Example Notes
JSON parser JSON.parse(), json.loads() Strict format — throws on invalid JSON
HTML parser BeautifulSoup, DOMParser Lenient — browsers recover from bad HTML
CSV parser csv module, Papa Parse Handle quoting, delimiters, newlines in fields
Regex parser Custom extraction Fragile for complex formats — avoid for HTML
Recursive descent Language compilers Handwritten, matches grammar rules recursively

Common Parsing Pitfalls

Encoding issues:  file is UTF-8 but parsed as Latin-1 → garbled text
Escape sequences: "He said \"hello\"" → parser must handle escaped quotes
Newlines in fields: CSV with commas inside quoted fields
Streaming vs batch: large files can't fit in memory → need streaming parser
Malicious input: billion laughs (XML), deeply nested JSON → stack overflow / DoS

Parsing vs Serialization

Serialization → object to string   (you produce the string)
Parsing       → string to object   (you consume the string)

They're inverses, but "parsing" usually implies the consuming direction
and often handling arbitrary/untrusted input with error handling.

SSL vs TLS vs HTTPS

SSL is deprecated. TLS is what's actually used today. HTTPS is HTTP transported over TLS. The term "SSL certificate" is a misnomer that stuck — they're actually TLS certificates.

The Timeline

SSL 1.0 (1994)  — Never released publicly (security flaws)
SSL 2.0 (1995)  — Deprecated 2011
SSL 3.0 (1996)  — Deprecated 2015 (POODLE attack)
TLS 1.0 (1999)  — Deprecated 2020
TLS 1.1 (2006)  — Deprecated 2020
TLS 1.2 (2008)  — Still widely used ✅
TLS 1.3 (2018)  — Current standard ✅ (faster handshake, stronger ciphers)
HTTPS = HTTP  +  TLS
        (app    (transport
        layer)   security)

HTTP  → port 80, unencrypted, data visible to anyone between client and server
HTTPS → port 443, TLS-encrypted, data unreadable in transit

TLS Handshake (Simplified)

Client                              Server
  │                                  │
  │── ClientHello ─────────────────▶ │  "I support TLS 1.3, here are my cipher suites"
  │                                  │
  │◀─ ServerHello + Certificate ──── │  "Use TLS 1.3 + AES-GCM. Here's my cert."
  │                                  │
  │   [Client verifies cert against  │
  │    trusted CA list]              │
  │                                  │
  │── Key exchange ────────────────▶ │  (Diffie-Hellman — both derive same key)
  │                                  │
  │◀──────── Encrypted from here ───▶│  All further traffic is encrypted

TLS 1.3 vs TLS 1.2

TLS 1.2 TLS 1.3
Handshake round trips 2 RTT 1 RTT (faster)
0-RTT resumption ✅ (reconnect with zero extra latency)
Removed weak ciphers No Yes (RC4, DES, SHA-1 gone)
Forward secrecy Optional Mandatory

Certificates

A TLS certificate contains:
  - Domain name it's valid for (e.g., *.example.com)
  - Public key of the server
  - Issuer (Certificate Authority — Let's Encrypt, DigiCert, etc.)
  - Expiry date
  - Digital signature from the CA

The CA's signature lets the browser verify:
  "This cert was issued by a trusted authority for this domain"
  → The server is who it claims to be

Self-signed cert:  Server signs its own cert — no trusted CA
                   Browser warns: "Your connection is not private"
                   OK for internal/dev, never for production

One-liner summary

SSL     → Old, broken, do not use
TLS     → The actual protocol securing internet traffic today
HTTPS   → HTTP + TLS — the encrypted web
"SSL certificate" → Really a TLS certificate. The name just stuck.

Authentication vs Authorization

Authentication (AuthN) = verifying identity — who are you? Authorization (AuthZ) = verifying permissions — what are you allowed to do?

Authentication:           Authorization:
─────────────────         ───────────────────────────────
"I'm Madhu"               "Madhu can read /api/posts"
Prove it → token          "Madhu cannot DELETE /api/users"
Identity verified ✅       Permission checked ✅

The Sequence

Request hits server
      │
      ▼
Is there a valid token/session?  ← Authentication
  No  → 401 Unauthorized  ("Who are you? Please log in")
  Yes ↓
Does this user have permission?  ← Authorization
  No  → 403 Forbidden     ("I know who you are, but you can't do this")
  Yes ↓
Handle the request ✅

The HTTP status codes 401 and 403 have misleading names: - 401 Unauthorized actually means Unauthenticated — "I don't know who you are" - 403 Forbidden actually means Unauthorized — "I know you, but you're not allowed"

Authentication Methods

Method How Notes
Session + Cookie Server stores session, sends cookie with session ID Stateful — server holds session
JWT Bearer Token Self-contained signed token in Authorization header Stateless — server verifies signature
API Key Static secret in header (X-API-Key) Simple but hard to rotate
OAuth 2.0 Delegated access — user grants app permission For third-party access
mTLS Both client and server present certificates Strongest — microservices
Passkeys / WebAuthn Biometric + device-based Phishing-resistant, passwordless

Authorization Models

RBAC — Role-Based Access Control:
  User has a role → role has permissions
  user → role:admin → can(DELETE /users)
  user → role:viewer → cannot(DELETE /users)
  Simple, widely used

ABAC — Attribute-Based Access Control:
  Permissions based on attributes of user, resource, environment
  "User in department=engineering AND resource.owner=user AND time<18:00"
  Flexible but complex

ReBAC — Relationship-Based Access Control:
  Permissions based on relationships in a graph
  "User can edit document if user is owner OR user is in editors list"
  Google Zanzibar model — used by Google Drive, GitHub

JWT Deep Dive

Header.Payload.Signature

Payload (base64 decoded — NOT encrypted, anyone can read):
{
  "sub": "user_42",          ← subject (user ID)
  "role": "admin",
  "exp": 1704067200,         ← expiry (Unix timestamp)
  "iat": 1704063600          ← issued at
}

Server verifies: HMAC_SHA256(header + "." + payload, secret) == signature
If yes → token is valid and unmodified
Check exp → is it expired?
Check role → is user authorized for this endpoint?

Kafka vs Redis vs RabbitMQ vs AMQP

AMQP is a protocol. RabbitMQ implements it. Kafka and Redis are different tools solving overlapping but distinct problems.

At a Glance

┌─────────────────┬────────────────────────────────────────────────────────┐
│ Kafka           │ Distributed commit log / event streaming platform      │
│                 │ High throughput, durable, replayable, partitioned      │
├─────────────────┼────────────────────────────────────────────────────────┤
│ RabbitMQ        │ Traditional message broker (implements AMQP)           │
│                 │ Complex routing, push-based, messages deleted on ACK   │
├─────────────────┼────────────────────────────────────────────────────────┤
│ Redis (Pub/Sub) │ In-memory pub/sub — fire and forget, no persistence    │
│ Redis Streams   │ Persistent log in Redis, lighter-weight Kafka          │
├─────────────────┼────────────────────────────────────────────────────────┤
│ AMQP            │ Protocol (not a product) — like HTTP is to web servers │
│                 │ RabbitMQ, ActiveMQ, Azure Service Bus implement it     │
└─────────────────┴────────────────────────────────────────────────────────┘

Feature Comparison

Feature Kafka RabbitMQ Redis Pub/Sub Redis Streams
Persistence ✅ Disk ✅ Optional ❌ In-memory ✅ In-memory + RDB
Message replay ✅ Retention window
Message ordering Per-partition Per-queue Per-stream
Throughput Very high Moderate Very high High
Routing Partition key Exchanges + bindings Channel name Stream key
Consumer groups ✅ (competing consumers)
Protocol Custom (Kafka protocol) AMQP Redis protocol Redis protocol
Best for Event streaming, audit log Task queues, complex routing Real-time pub/sub (e.g. live updates) Lightweight event log

AMQP Routing Model (RabbitMQ)

Producer → Exchange → (routing rules) → Queue → Consumer

Exchange types:
  Direct:  route by exact routing key
  Fanout:  broadcast to ALL bound queues (ignore key)
  Topic:   route by pattern matching  (e.g. "orders.*")
  Headers: route by message headers

When to Use What

Kafka:
  → High throughput event streaming (millions/sec)
  → Need to replay messages / audit trail
  → Multiple independent consumers reading same events
  → Event sourcing, CDC (Change Data Capture)

RabbitMQ:
  → Task queues with complex routing logic
  → Need push delivery (server pushes to consumer)
  → RPC-style request-reply pattern
  → Already in an AMQP ecosystem

Redis Pub/Sub:
  → Real-time broadcast (WebSocket presence, live notifications)
  → Loss of messages is acceptable (fire and forget)
  → Ultra-low latency, in-memory only

Redis Streams:
  → Need Kafka-like semantics but within existing Redis infra
  → Lighter workload, don't need Kafka's operational complexity

Backpressure & Piggybacking

Backpressure

Backpressure is a mechanism for a consumer to signal to its upstream producer to slow down — preventing the consumer from being overwhelmed.

Without backpressure:
  Producer (300 msg/s) ──▶ Queue ──▶ Consumer (200 msg/s)
                            │
                            Queue grows 100 msg/s → memory exhausted → crash

With backpressure:
  Producer (300 msg/s) ──▶ [FULL SIGNAL] ──▶ Producer slows to 200 msg/s
  Consumer (200 msg/s) ←── balanced ─────────────────────────────────────

Backpressure Strategies

Strategy Mechanism Trade-off
Reject / error Return 429 / error to producer Simple, producer must handle retry
Block Producer call blocks until consumer ready Easy but stalls producer thread
Drop Silently discard excess messages Fast, but data loss — only for non-critical
Buffer with limit Queue up to N messages, then apply one of above Absorbs short bursts
Rate limit Token bucket / leaky bucket at producer Smooth, controlled flow

In practice (code side)

# Async Python — backpressure via bounded queue
import asyncio

queue = asyncio.Queue(maxsize=100)  # ← bounded — blocks producer when full

async def producer():
    for item in data_stream:
        await queue.put(item)    # blocks if queue is full (backpressure!)

async def consumer():
    while True:
        item = await queue.get()
        await process(item)
        queue.task_done()

Backpressure in Kafka

Kafka doesn't push to consumers — consumers pull.
This is inherently backpressure-friendly:
  Consumer controls its own rate of consumption.
  If consumer is slow → it just pulls less frequently.
  Queue (partition) grows → that's fine, Kafka handles it durably.

The problem: if partition grows unboundedly → disk fills up.
Solution: monitor lag (consumer_lag metric) + autoscale consumers.

Piggybacking

Piggybacking attaches acknowledgements (or control data) onto outgoing data packets going in the opposite direction, rather than sending a separate ACK packet. Saves a round-trip.

Without piggybacking:
  A ──── Data ────────────────▶ B
  A ◀─── ACK (separate packet) ─ B   ← extra packet, extra latency

With piggybacking:
  A ──── Data ────────────────▶ B
  B ──── Data + ACK ──────────▶ A   ← ACK rides along on B's next data packet

Where Piggybacking Appears

TCP:
  TCP delays ACKs by up to 200ms (Delayed ACK) waiting for
  outgoing data to piggyback on. If data arrives, ACK goes with it.

HTTP/2:
  SETTINGS_ACK, PING_ACK frames are piggybacked on data frames
  where possible.

Sliding Window Protocols:
  Receiver's window update (flow control) piggybacked on data frame
  heading back to sender.

Piggybacking vs Backpressure

Backpressure:  flow control — slow down the sender
Piggybacking:  efficiency — combine ACK with data to save packets

Different problems, different layers.
Piggybacking is a TCP/data-link optimisation.
Backpressure is an application/system architecture concern.

UNIX Socket vs TCP/IP Stack

Unix domain sockets communicate between processes on the same machine entirely in kernel space — no network stack involved. TCP/IP goes through the full network stack, even for loopback (127.0.0.1).

The Architecture Difference

TCP localhost (127.0.0.1):
  App A → socket syscall
        → TCP layer (segmentation, sequencing)
        → IP layer (routing, headers)
        → Loopback interface (lo)
        → IP layer
        → TCP layer
        → App B
  (full network stack — just never leaves the machine)

Unix domain socket (/tmp/app.sock):
  App A → socket syscall
        → Kernel buffer (direct memory copy)
        → App B
  (skips TCP/IP entirely)

Performance

Benchmark (rough numbers — vary by system):
  TCP localhost:    ~40–60 μs latency, ~800 MB/s throughput
  Unix socket:      ~20–30 μs latency, ~2 GB/s throughput

Why faster?
  No TCP headers to construct/parse
  No IP routing decisions
  No checksum calculation (kernel handles integrity itself)
  No port allocation overhead
  Fewer syscalls in some implementations

Feature Comparison

Feature Unix Socket TCP localhost
Same machine only ✅ Required ❌ Can cross machines
Network overhead ❌ None ✅ Full stack
Speed Faster Slower
File system path ✅ e.g. /run/nginx.sock ❌ IP:port
Permission control ✅ File system ACLs ❌ IP-based only
Works across hosts
Port required

Real-World Usage

Nginx → PHP-FPM:    fastcgi_pass unix:/run/php/php8.2-fpm.sock;
Nginx → Gunicorn:   proxy_pass http://unix:/run/gunicorn.sock;
PostgreSQL (local): psql connects via /var/run/postgresql/.s.PGSQL.5432
Redis (local):      redis-cli -s /var/run/redis/redis.sock
Docker daemon:      /var/run/docker.sock

When to Use Which

Unix socket:
  ✅ Two processes always on the same host
  ✅ Maximum local throughput (e.g. web server ↔ app server)
  ✅ Want file-permission-based access control
  ✅ Containerised app where both services are in same pod

TCP/IP:
  ✅ Services may be on different machines
  ✅ Microservices across a network
  ✅ Need to switch between local and remote without code changes
  ✅ Easier firewall / load balancer integration

Redirect vs Webhook

Redirect tells the client "go look over there" — the client drives the follow-up. Webhook is the server proactively calling your URL when something happens — server drives the notification.

Redirect

Client ──── GET /old-page ────────────▶ Server A
Client ◀─── 301 Location: /new-page ── Server A
Client ──── GET /new-page ────────────▶ Server A   ← client follows automatically

The client is the one making the second request.
The server is passive — it just tells the client where to go.

Common Redirect Use Cases

Code Type Example
301 Permanent Old domain → new domain migration
302 Temporary A/B test, maintenance page
307 Temporary (method preserved) POST redirect keeping method
308 Permanent (method preserved) POST endpoint permanently moved
OAuth Login flow After auth, redirect to ?code=xyz

OAuth Redirect Flow

Your app ──── redirect user to Google OAuth ────▶ Google
User logs in at Google
Google ──── redirect back with code ────────────▶ Your app (/callback?code=abc)
Your app exchanges code for access token (server-to-server, no redirect)

Webhook

Your server registers: "call https://myapp.com/webhook when payment succeeds"

[Payment succeeds at Stripe]

Stripe ──── POST https://myapp.com/webhook ────▶ Your server
Body: { "event": "payment.succeeded", "amount": 5000, "id": "ch_abc" }

Your server ──── 200 OK ────────────────────────▶ Stripe

The external service is the one initiating the call.
Your server is the passive receiver.

Webhook Best Practices

Verify signatures:
  Stripe sends: Stripe-Signature: t=timestamp,v1=HMAC_SHA256(payload, secret)
  You verify the HMAC before trusting the payload.
  Without this → anyone can POST fake events to your webhook URL.

Respond immediately (200 OK), process async:
  Webhook caller has a short timeout (~30s).
  If your processing takes longer → queue the job, return 200 right away.

Idempotency:
  Webhooks may be delivered more than once (retries on network failure).
  Store event IDs → skip if already processed.

Retry handling:
  If you return 5xx, the sender retries (usually with backoff).
  Design for this — don't double-charge, double-send emails, etc.

Redirect vs Webhook — Side by Side

┌─────────────────┬────────────────────────────┬────────────────────────────┐
│                 │ Redirect                   │ Webhook                    │
├─────────────────┼────────────────────────────┼────────────────────────────┤
│ Who initiates   │ Client follows the redirect│ External server calls you  │
│ Direction       │ Client → Server            │ Server → Your server       │
│ Trigger         │ HTTP response code         │ Event on the other system  │
│ Real-time       │ Synchronous                │ Async / event-driven       │
│ Common use      │ URL changes, OAuth flows   │ Payment events, CI/CD      │
│                 │ www → non-www              │ triggers, notifications    │
└─────────────────┴────────────────────────────┴────────────────────────────┘