TensorFlow has two first-party runtimes: the Python library you know, and TensorFlow.js — a full JavaScript port maintained by the TensorFlow team, designed to run in the browser using WebGL or WASM backends. It’s not a stripped-down port; it supports training, inference, pre-trained model loading, and transfer learning.
On the Python side, TensorFlow itself does not run in Pyodide — its complex native C++/CUDA extensions cannot be compiled to WebAssembly, and the binary dependencies are too large for browser delivery. But scikit-learn does run in Pyodide, giving you KNN, Decision Trees, Logistic Regression, Random Forests, and more — all in-browser via WASM.
These two tools represent genuinely different ML philosophies. Training them on the same problem in the same browser is an unusually direct way to see the tradeoffs.
Try it — train both models now
The Iris dataset: 150 samples, 4 features (sepal/petal length and width), 3 classes (Setosa, Versicolor, Virginica). 80/20 train/test split, standardised features. The task is identical for both models.
Dataset: Iris (150 samples, 4 features, 3 classes) — 80/20 train/test split, standardised features. Task: classify iris species.
// TensorFlow.js — MLP classifier
const model = tf.sequential({
layers: [
tf.layers.dense({ inputShape: [4],
units: 16, activation: "relu" }),
tf.layers.dense({ units: 16,
activation: "relu" }),
tf.layers.dense({ units: 3,
activation: "softmax" }),
]
});
model.compile({
optimizer: tf.train.adam(0.01),
loss: "categoricalCrossentropy",
metrics: ["accuracy"],
});
await model.fit(X_train, y_train, {
epochs: 50, batchSize: 16,
validationSplit: 0.1,
});
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np, json, time
X_train = np.array(X_TRAIN)
y_train = np.array(Y_TRAIN)
X_test = np.array(X_TEST)
y_test = np.array(Y_TEST)
t0 = time.perf_counter()
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
ms = (time.perf_counter() - t0) * 1000
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred, labels=[0,1,2]).tolist()
print(json.dumps({"acc": acc, "ms": ms, "cm": cm,
"info": f"KNN k=5, train=120, test=30"}))
What you just ran
TensorFlow.js: Multi-Layer Perceptron
TF.js trains a small neural network entirely in-browser — JavaScript executing on WebGL (GPU-accelerated) or CPU depending on your device.
Architecture: Dense(4→16, ReLU) → Dense(16→16, ReLU) → Dense(16→3, Softmax)Parameters: ~400Optimiser: Adam (lr=0.01)Loss: Categorical cross-entropyAll computation happens via tf.Tensor operations — typed arrays managed by TF.js’s memory system. The training loop, backpropagation, and weight updates run on WebGL shaders or WASM kernels depending on the available backend. This is the same graph execution model as Python TensorFlow, compiled to a different target.
scikit-learn via Pyodide: Classical classifiers
Three algorithms available:
K-Nearest Neighbours (k=5): No “training” in the traditional sense. The model memorises all training samples. At inference time it finds the 5 nearest neighbours by Euclidean distance in feature space and majority-votes their labels. Fast to fit, O(d·log(n)) average-case inference with sklearn’s default KD-tree (O(n·d) brute-force worst case).
Decision Tree (max_depth=4): Recursively partitions feature space by asking yes/no questions about single features (petal_width > 0.8?). The tree structure is fully inspectable — you can print the rules. Feature importances tell you which features drove the splits. On Iris, petal dimensions dominate.
Logistic Regression: A linear model — it finds a hyperplane in 4D feature space that separates the classes. One-vs-rest for multiclass. Fast, stable, coefficients are directly interpretable as log-odds weights.
All three train in milliseconds once scikit-learn is loaded. The execution time shown in the sandbox is Python-internal — measured after imports complete, so it reflects pure algorithm performance.
Why these aren’t really competitors
The sandbox shows them on the same task, but TF.js and scikit-learn operate in different regimes:
Tabular / structured data (small-medium, labelled) │ ┌───────────┴───────────┐ │ │ Classical ML Deep learning (scikit-learn) (TF.js) │ │ ┌─────────┴──────────┐ ┌───────┴──────────┐ │ │ │ │ Interpretable Fast Images Sequences (tree rules, to train Audio Text coefficients) Tabular (when data > 10k rows and features > ~100)On Iris (150 rows, 4 features), classical ML wins or ties on accuracy and is faster, more interpretable, and requires no hyperparameter search for the architecture. A decision tree achieves ~97% accuracy with fully readable split rules. A neural network can match this but needs enough epochs, the right architecture, and gives you nothing to inspect.
Where the equation flips: image classification, text processing, audio, or any domain where you can’t hand-engineer features and have thousands of training samples. On those problems, TF.js’s WebGL backend becomes valuable — GPU-accelerated matrix multiplication for convolutional layers.
The interpretability gap
Run the Decision Tree in the sandbox and look at the feature importances. You’ll see something like:
petal length: 0.44petal width: 0.42sepal length: 0.10sepal width: 0.04This is actionable. A botanist can verify it makes biological sense. A regulator can audit it. If the model makes a wrong prediction, you can trace exactly which split went wrong.
The TF.js neural network gives you nothing equivalent. It achieves similar accuracy through a composition of matrix multiplications and nonlinearities that has no human-readable interpretation. This is the fundamental tradeoff in ML: representational power vs. explainability.
For production deployments in regulated industries (finance, healthcare, insurance), this matters enormously. The EU AI Act requires “meaningful explanation” for high-risk automated decisions. Decision trees and logistic regression satisfy this. Neural networks generally don’t without additional tooling (SHAP, LIME, attention maps).
Bundle size and startup
TF.js (neural net) Pyodide + sklearn npm/CDN size: ~200 KB gzipped ~23 MB (Pyodide + sklearn) Cold start: < 500 ms 15–25 s first run Warm runs: instant instant GPU acceleration: Yes (WebGL backend) No Training in-browser: Yes Yes (CPU)Pyodide’s cold-start delay is the dominant UX consideration. Both runtimes cache in the browser after first load. For a blog post or educational tool with a loading indicator, the 20-second wait is acceptable. For a product feature, TF.js’s sub-second startup is essential.
TF.js beyond tiny datasets
The Iris demo understates what TF.js is actually for. The compelling cases are:
Real-time inference on device: Pose estimation, object detection, face landmark detection — all running at 30fps in a browser tab via pre-trained models from the TF.js model hub. No round trip to a server. No latency. No data leaving the device.
Transfer learning: Load a MobileNet backbone (pre-trained on ImageNet), freeze the base, add a classification head, train the head on your 200-image custom dataset in-browser. This is 10 lines of TF.js code.
Privacy-preserving ML: Medical image analysis, personal document processing — when you cannot send user data to a server, running inference in the browser is the only viable architecture.
scikit-learn via Pyodide covers different ground: exploratory data analysis, classical statistical models, pipelines built by data scientists who think in Python and need to run those pipelines without a backend for demos or educational content.
What’s not feasible in-browser (yet)
- TensorFlow Python: Too large (~500 MB with dependencies), requires GPU drivers unavailable in WASM
- PyTorch: Same constraints — no browser WASM build from the PyTorch team
- Training large models: TF.js can train small models. Training a ResNet-50 from scratch in a browser tab is not practical
- Distributed training: WebAssembly is single-process; no multi-GPU, no multi-node
The browser ML ecosystem is genuinely useful for inference, small training jobs, and classical ML via Pyodide. It is not a replacement for GPU servers running model training.
Related posts
- Python vs JavaScript DataFrames in the Browser — Live Benchmarks with No Backend — the same Pyodide approach applied to data processing with pandas vs arquero
- LLM API Integration Patterns — Structured Outputs, Function Calling, Streaming — when in-browser ML isn’t enough and you need a hosted model via API
- Multi-Agent Workflows with Claude API — Architecture Patterns That Work — orchestrating ML inference as part of larger AI pipelines