For years the answer to “can I run pandas in the browser?” was “no — spin up a Python backend.” That answer is now wrong. Pyodide compiles CPython and the entire scientific Python stack to WebAssembly. You can run real pandas code, with the full API, inside a browser tab.
On the JavaScript side, arquero brings a pandas-inspired table API to the browser with a 105 KB footprint. No WASM, no compilation — pure JavaScript optimised for columnar data.
This post runs three real benchmarks — groupby aggregation, filter + derive, and pivot table — in both environments, in your browser, right now.
Try it first
Pick a benchmark, click ▶ Run JS (instant), then ▶ Run Python (first run takes 25–30 seconds to bootstrap the WebAssembly runtime and install pandas — a tradeoff we’ll discuss at length).
Dataset: 100,000 rows — Sum revenue and count units per category, then sort by total revenue descending.
// arquero — pandas-inspired table API for JavaScript
const t0 = performance.now();
const result = DATA // DATA injected by runtime
.groupby("category")
.rollup({
total_revenue: aq.op.sum("revenue"),
total_units: aq.op.sum("units"),
avg_revenue: aq.op.mean("revenue"),
})
.orderby(aq.desc("total_revenue"));
const ms = performance.now() - t0;
return { table: result, ms };import pandas as pd, time, json
df = pd.DataFrame(DATA) # DATA injected by runtime
t0 = time.perf_counter()
result = (
df.groupby("category")
.agg(total_revenue=("revenue", "sum"),
total_units=("units", "sum"),
avg_revenue=("revenue", "mean"))
.sort_values("total_revenue", ascending=False)
.reset_index()
)
ms = (time.perf_counter() - t0) * 1000
print(result.to_string(index=False))
print(f"\n{len(result)} rows · {ms:.2f} ms")
print(f"__PYMS__{ms:.4f}__PYMS__")What’s actually happening
JavaScript side — arquero
Arquero loads as part of the page bundle (~105 KB). When you click Run JS:
- A 100,000-row dataset is generated deterministically in memory
- The arquero
ColumnTableis constructed (columnar layout, typed arrays) - The operation runs synchronously on the main thread
- Results appear in milliseconds
The whole thing fits in a blog page without any loading state because there’s nothing to load.
Python side — Pyodide + pandas
Pyodide works differently. When you click Run Python for the first time:
Browser downloads Pyodide runtime ~6 MBBrowser downloads pandas + deps ~20 MBCPython initialised in WASMpandas importedYour code runsThis is a 25–30 second wall clock delay on first run. After that, Pyodide and pandas are cached in the browser’s HTTP cache and subsequent runs are fast. The execution time for the pandas operation itself — after startup — is comparable to native Python (Pyodide runs CPython in WebAssembly at roughly 50–70% of native speed for CPU-bound operations, depending on the workload).
Architecture of the comparison
Your browser ───────────────────────────────────────────────────────────────── Main thread │ ├─ JavaScript (arquero) │ • synchronous │ • columnar typed arrays, vectorised ops │ • result in < 50 ms for 100k rows │ └─ Python (Pyodide) • async (non-blocking thanks to WebAssembly) • CPython compiled to WASM • pandas API: identical to server-side code • first-run: ~30s (runtime download + pandas install) • warm runs: comparable execution speed to native
Zero servers. Zero APIs. Both runtimes run entirely in-process.Benchmark design
All three benchmarks use a 100,000-row synthetic sales dataset with columns: id, category, region, revenue, units, month. The dataset is generated deterministically (fixed seed) so results are reproducible across languages.
Benchmark 1: GroupBy + Aggregation
Sum revenue, sum units, mean revenue — grouped by category, sorted descending.
This is the most common data operation in any analytical pipeline. It tests the core columnar engine of both libraries.
Arquero internally represents each column as a typed array (e.g. Float64Array for revenue). GroupBy creates a hash table over the group keys and reduces over typed arrays (the JS engine may auto-vectorize hot loops, but arquero itself doesn’t use explicit SIMD). Pandas does the same, implemented in C via numpy/Cython.
Benchmark 2: Filter + Derive + Top-N
Filter rows where revenue > 5000, compute a derived margin column (revenue / units), return top 10 by revenue.
This tests predicate evaluation, column derivation, and sorting — a common ETL pattern. Both libraries evaluate the filter predicate over the columnar representation without creating intermediate row objects.
Benchmark 3: Pivot Table
Aggregate total revenue grouped by region × category, then pivot categories into columns.
Pandas has a dedicated pd.pivot_table() function for this. Arquero doesn’t have a direct pivot primitive — the benchmark constructs it manually via groupby + join, which is instructive: JavaScript dataframe libraries are less feature-complete than pandas for ad-hoc analysis.
What the numbers tell you
After running all three benchmarks you’ll see a comparison table. Some observations:
Execution speed is comparable on warm runs. After Pyodide is loaded, the pandas code typically runs within 2–3x of the arquero time for these operations. NumPy’s C-level loops are fast; Pyodide’s overhead is mostly at import time, not runtime.
Cold start is the decisive difference. The 25–30 second bootstrap makes Pyodide unsuitable for any interactive experience where users haven’t explicitly opted into a Python environment. Arquero is instant — it’s just JavaScript, already parsed and optimised.
Bundle size matters for page weight. An arquero-powered blog component adds ~105 KB to the page. Adding Pyodide adds ~26 MB of WASM fetched on first use (cached after). For a blog post this is fine with a loading indicator; for a product page it would need a lazy-load gate.
Pandas is more expressive. The pivot benchmark illustrates this — pandas’ pd.pivot_table() in 8 lines vs. arquero’s manual groupby + loop-join. For exploratory analysis or data science work, the pandas API is genuinely richer.
When to choose each
| Scenario | Use |
|---|---|
| Data science notebook / interactive analysis | Python (pandas) — richer API, familiar syntax |
| Blog demo / interactive visualization | JavaScript (arquero) — instant load, no waiting |
| Users expect Python output format | Pyodide — pandas output is identical to server-side |
| Performance-critical production pipeline | Neither — run server-side pandas or Polars |
| Client-side ETL with zero backend | Arquero for JS-native apps, Pyodide if you need pandas compat |
| Teaching Python data science interactively | Pyodide — learners write real pandas code |
The WebAssembly angle
This comparison is only possible because of two converging trends:
- WebAssembly allows near-native speed native code (CPython) in the browser
- Columnar JS data structures (typed arrays, Arrow) bring database-grade performance to JavaScript without WASM
Five years ago “run pandas in the browser” was a hack involving Skulpt (a Python interpreter written in JS) that couldn’t handle numpy. Today Pyodide passes the numpy test suite and supports the entire scientific Python stack. The tradeoff is binary size, not capability.
Limitations of this demo
- Main thread execution: The Python code runs on the main thread. Long operations will freeze the UI. Production Pyodide deployments should use Web Workers to keep the page responsive.
- Dataset size: 100k rows is a moderate test. Pyodide starts to show GC pressure at 1M+ rows; arquero handles tens of millions.
- No file I/O: Both Pyodide and arquero can load CSV/JSON/Parquet from the network, but this demo uses in-memory generated data for simplicity.
- CDN dependency: Pyodide is loaded from jsDelivr CDN. In production you’d self-host the WASM bundle.
Related posts
- Python Time Series at Scale — Lessons from Processing 400M Financial Records — server-side pandas at ECB scale: when in-browser is not enough
- Vectorization in Python — NumPy vs Pandas vs Polars vs Numba — deep dive into the vectorization mechanisms both libraries rely on
- Polars vs Pandas — A Benchmark That Changed How I Process Data — for large-scale server-side workloads, Polars often beats pandas significantly