Python vs JavaScript DataFrames in the Browser — Live Benchmarks with No Backend

For years the answer to “can I run pandas in the browser?” was “no — spin up a Python backend.” That answer is now wrong. Pyodide compiles CPython and the entire scientific Python stack to WebAssembly. You can run real pandas code, with the full API, inside a browser tab.

On the JavaScript side, arquero brings a pandas-inspired table API to the browser with a 105 KB footprint. No WASM, no compilation — pure JavaScript optimised for columnar data.

This post runs three real benchmarks — groupby aggregation, filter + derive, and pivot table — in both environments, in your browser, right now.

Try it first

Pick a benchmark, click ▶ Run JS (instant), then ▶ Run Python (first run takes 25–30 seconds to bootstrap the WebAssembly runtime and install pandas — a tradeoff we’ll discuss at length).

Dataset: 100,000 rows — Sum revenue and count units per category, then sort by total revenue descending.

JavaScript — arquero

// arquero — pandas-inspired table API for JavaScript
const t0 = performance.now();

const result = DATA                        // DATA injected by runtime
  .groupby("category")
  .rollup({
    total_revenue: aq.op.sum("revenue"),
    total_units:   aq.op.sum("units"),
    avg_revenue:   aq.op.mean("revenue"),
  })
  .orderby(aq.desc("total_revenue"));

const ms = performance.now() - t0;
return { table: result, ms };

Python — pandas (Pyodide)

import pandas as pd, time, json

df = pd.DataFrame(DATA)          # DATA injected by runtime
t0 = time.perf_counter()

result = (
    df.groupby("category")
      .agg(total_revenue=("revenue", "sum"),
           total_units=("units", "sum"),
           avg_revenue=("revenue", "mean"))
      .sort_values("total_revenue", ascending=False)
      .reset_index()
)

ms = (time.perf_counter() - t0) * 1000
print(result.to_string(index=False))
print(f"\n{len(result)} rows · {ms:.2f} ms")
print(f"__PYMS__{ms:.4f}__PYMS__")

What’s actually happening

JavaScript side — arquero

Arquero loads as part of the page bundle (~105 KB). When you click Run JS:

A 100,000-row dataset is generated deterministically in memory
The arquero ColumnTable is constructed (columnar layout, typed arrays)
The operation runs synchronously on the main thread
Results appear in milliseconds

The whole thing fits in a blog page without any loading state because there’s nothing to load.

Python side — Pyodide + pandas

Pyodide works differently. When you click Run Python for the first time:

Browser downloads Pyodide runtime   ~6 MB
Browser downloads pandas + deps    ~20 MB
CPython initialised in WASM
pandas imported
Your code runs

This is a 25–30 second wall clock delay on first run. After that, Pyodide and pandas are cached in the browser’s HTTP cache and subsequent runs are fast. The execution time for the pandas operation itself — after startup — is comparable to native Python (Pyodide runs CPython in WebAssembly at roughly 50–70% of native speed for CPU-bound operations, depending on the workload).

Architecture of the comparison

  Your browser
  ─────────────────────────────────────────────────────────────────
  Main thread
    │
    ├─ JavaScript (arquero)
    │    • synchronous
    │    • columnar typed arrays, vectorised ops
    │    • result in < 50 ms for 100k rows
    │
    └─ Python (Pyodide)
         • async (non-blocking thanks to WebAssembly)
         • CPython compiled to WASM
         • pandas API: identical to server-side code
         • first-run: ~30s  (runtime download + pandas install)
         • warm runs: comparable execution speed to native

  Zero servers. Zero APIs. Both runtimes run entirely in-process.

Benchmark design

All three benchmarks use a 100,000-row synthetic sales dataset with columns: id, category, region, revenue, units, month. The dataset is generated deterministically (fixed seed) so results are reproducible across languages.

Benchmark 1: GroupBy + Aggregation

Sum revenue, sum units, mean revenue — grouped by category, sorted descending.

This is the most common data operation in any analytical pipeline. It tests the core columnar engine of both libraries.

Arquero internally represents each column as a typed array (e.g. Float64Array for revenue). GroupBy creates a hash table over the group keys and reduces over typed arrays (the JS engine may auto-vectorize hot loops, but arquero itself doesn’t use explicit SIMD). Pandas does the same, implemented in C via numpy/Cython.

Benchmark 2: Filter + Derive + Top-N

Filter rows where revenue > 5000, compute a derived margin column (revenue / units), return top 10 by revenue.

This tests predicate evaluation, column derivation, and sorting — a common ETL pattern. Both libraries evaluate the filter predicate over the columnar representation without creating intermediate row objects.

Benchmark 3: Pivot Table

Aggregate total revenue grouped by region × category, then pivot categories into columns.

Pandas has a dedicated pd.pivot_table() function for this. Arquero doesn’t have a direct pivot primitive — the benchmark constructs it manually via groupby + join, which is instructive: JavaScript dataframe libraries are less feature-complete than pandas for ad-hoc analysis.

What the numbers tell you

After running all three benchmarks you’ll see a comparison table. Some observations:

Execution speed is comparable on warm runs. After Pyodide is loaded, the pandas code typically runs within 2–3x of the arquero time for these operations. NumPy’s C-level loops are fast; Pyodide’s overhead is mostly at import time, not runtime.

Cold start is the decisive difference. The 25–30 second bootstrap makes Pyodide unsuitable for any interactive experience where users haven’t explicitly opted into a Python environment. Arquero is instant — it’s just JavaScript, already parsed and optimised.

Bundle size matters for page weight. An arquero-powered blog component adds ~105 KB to the page. Adding Pyodide adds ~26 MB of WASM fetched on first use (cached after). For a blog post this is fine with a loading indicator; for a product page it would need a lazy-load gate.

Pandas is more expressive. The pivot benchmark illustrates this — pandas’ pd.pivot_table() in 8 lines vs. arquero’s manual groupby + loop-join. For exploratory analysis or data science work, the pandas API is genuinely richer.

When to choose each

Scenario	Use
Data science notebook / interactive analysis	Python (pandas) — richer API, familiar syntax
Blog demo / interactive visualization	JavaScript (arquero) — instant load, no waiting
Users expect Python output format	Pyodide — pandas output is identical to server-side
Performance-critical production pipeline	Neither — run server-side pandas or Polars
Client-side ETL with zero backend	Arquero for JS-native apps, Pyodide if you need pandas compat
Teaching Python data science interactively	Pyodide — learners write real pandas code

The WebAssembly angle

This comparison is only possible because of two converging trends:

WebAssembly allows near-native speed native code (CPython) in the browser
Columnar JS data structures (typed arrays, Arrow) bring database-grade performance to JavaScript without WASM

Five years ago “run pandas in the browser” was a hack involving Skulpt (a Python interpreter written in JS) that couldn’t handle numpy. Today Pyodide passes the numpy test suite and supports the entire scientific Python stack. The tradeoff is binary size, not capability.

Limitations of this demo

Main thread execution: The Python code runs on the main thread. Long operations will freeze the UI. Production Pyodide deployments should use Web Workers to keep the page responsive.
Dataset size: 100k rows is a moderate test. Pyodide starts to show GC pressure at 1M+ rows; arquero handles tens of millions.
No file I/O: Both Pyodide and arquero can load CSV/JSON/Parquet from the network, but this demo uses in-memory generated data for simplicity.
CDN dependency: Pyodide is loaded from jsDelivr CDN. In production you’d self-host the WASM bundle.

Python Time Series at Scale — Lessons from Processing 400M Financial Records — server-side pandas at ECB scale: when in-browser is not enough
Vectorization in Python — NumPy vs Pandas vs Polars vs Numba — deep dive into the vectorization mechanisms both libraries rely on
Polars vs Pandas — A Benchmark That Changed How I Process Data — for large-scale server-side workloads, Polars often beats pandas significantly