Tag: python

All the articles with the tag "python".

Migrating Apache Airflow 2.x to 3.2 — A Real Project Walkthrough
Posted on:April 19, 2026 at 10:00 AM
Airflow 3.x is a genuine architectural upgrade, not a version bump. This post walks through migrating the ECB exchange-rate ETL from 2.9 to 3.2 — new services, JWT-signed inter-service auth, provider installation changes, and every gotcha surfaced while actually running it.
Apache Airflow ETL Demo — Scheduling Real Pipelines with PostgreSQL and No Abstractions
Posted on:April 19, 2026 at 10:00 AM
A practical Apache Airflow tutorial using Docker Compose and the TaskFlow API. Builds a complete ETL pipeline that fetches ECB exchange rates from a public API, transforms them, and loads them into PostgreSQL — without Astronomer or any managed wrapper.
Apache Superset — Visualizing Your Airflow + PostgreSQL Pipeline in a Live Dashboard
Posted on:April 19, 2026 at 10:00 AM
Apache Superset sits at the top of the data platform stack — Airflow loads, PostgreSQL stores, Superset visualizes. This post adds Superset to the Docker Compose from the Airflow ETL demo and builds exchange rate dashboards over the same PostgreSQL data warehouse.
Python Time Series at Scale — Lessons from Processing 400M Financial Records
Posted on:March 15, 2026 at 10:00 AM
Real-world lessons from building a time series pipeline that processes 400 million financial data points daily. Covers memory layout, chunked processing, dtype optimization, and the specific pandas/NumPy patterns that keep memory under control at scale.
TensorFlow.js vs scikit-learn in the Browser — Two Paradigms of Client-Side ML
Posted on:November 17, 2025 at 10:00 AM
TensorFlow.js brings neural networks to the browser natively. scikit-learn runs via Pyodide WebAssembly. Same Iris dataset, same task — totally different philosophies. Train both client-side with no backend and compare accuracy, interpretability, and the cold-start gap.
Python vs JavaScript DataFrames in the Browser — Live Benchmarks with No Backend
Posted on:October 27, 2025 at 10:00 AM
Both Python (pandas via Pyodide WebAssembly) and JavaScript (arquero) can process DataFrames entirely in the browser. This post runs the same groupby, filter, and pivot benchmarks in both — live, client-side, no server needed — and measures the real tradeoffs.
Kubernetes for Backend Engineers — Pods, Deployments, and Services Without the Jargon
Posted on:September 10, 2025 at 09:00 AM
Kubernetes looks intimidating until you understand the one mental model that explains everything: declare desired state, and the control plane reconciles reality toward it continuously. A practical guide for backend engineers who deploy APIs and don't want to think about servers.
Python AsyncIO vs Node.js Event Loop — The Differences That Bite You
Posted on:September 8, 2025 at 10:00 AM
Both Python asyncio and Node.js use a single-threaded event loop for concurrency. But the implementation differences are significant: how coroutines suspend, blocking code behavior, thread pool integration, and the GIL's effect on async Python code.
Vectorization in Python — NumPy vs Pandas vs Polars vs Numba
Posted on:April 14, 2025 at 10:00 AM
Systematic benchmarks comparing four vectorization approaches across different dataset sizes and operation types. When to use NumPy directly, when Polars wins, and when Numba's JIT compilation is the only answer.
Polars vs Pandas — A Benchmark That Changed How I Process Data
Posted on:October 14, 2024 at 10:00 AM
Comprehensive benchmarks comparing Polars and pandas across groupby, join, filter, and window operations on datasets from 1M to 100M rows. Polars wins by 5-20x in most scenarios — here's what that means for your data pipelines.
Data Science Fundamentals — Why Choosing the Right Average Matters More Than You Think
Posted on:July 23, 2024 at 10:00 AM
A companion to my technical article on measures of central tendency: arithmetic, geometric, and harmonic means, median, mode, and when each one is correct. Understanding which average to use — and which one to distrust — is the foundation of honest data analysis.
Pandas Performance — Stop Using .iterrows() (with Benchmarks)
Posted on:March 14, 2024 at 10:00 AM
Benchmarking five approaches to row-level operations in pandas — from the naive .iterrows() to fully vectorized NumPy operations — with real timing numbers. Shows 100-1000x speedups using vectorization and explains why Python's object model makes .iterrows() so slow.

Tag: python

Migrating Apache Airflow 2.x to 3.2 — A Real Project Walkthrough

Apache Airflow ETL Demo — Scheduling Real Pipelines with PostgreSQL and No Abstractions

Apache Superset — Visualizing Your Airflow + PostgreSQL Pipeline in a Live Dashboard

Python Time Series at Scale — Lessons from Processing 400M Financial Records

TensorFlow.js vs scikit-learn in the Browser — Two Paradigms of Client-Side ML

Python vs JavaScript DataFrames in the Browser — Live Benchmarks with No Backend

Kubernetes for Backend Engineers — Pods, Deployments, and Services Without the Jargon

Python AsyncIO vs Node.js Event Loop — The Differences That Bite You

Vectorization in Python — NumPy vs Pandas vs Polars vs Numba

Polars vs Pandas — A Benchmark That Changed How I Process Data

Data Science Fundamentals — Why Choosing the Right Average Matters More Than You Think

Pandas Performance — Stop Using .iterrows() (with Benchmarks)