TensorFlow.js vs scikit-learn in the Browser — Two Paradigms of Client-Side ML

TensorFlow has two first-party runtimes: the Python library you know, and TensorFlow.js — a full JavaScript port maintained by the TensorFlow team, designed to run in the browser using WebGL or WASM backends. It’s not a stripped-down port; it supports training, inference, pre-trained model loading, and transfer learning.

On the Python side, TensorFlow itself does not run in Pyodide — its complex native C++/CUDA extensions cannot be compiled to WebAssembly, and the binary dependencies are too large for browser delivery. But scikit-learn does run in Pyodide, giving you KNN, Decision Trees, Logistic Regression, Random Forests, and more — all in-browser via WASM.

These two tools represent genuinely different ML philosophies. Training them on the same problem in the same browser is an unusually direct way to see the tradeoffs.

Try it — train both models now

The Iris dataset: 150 samples, 4 features (sepal/petal length and width), 3 classes (Setosa, Versicolor, Virginica). 80/20 train/test split, standardised features. The task is identical for both models.

Dataset: Iris (150 samples, 4 features, 3 classes) — 80/20 train/test split, standardised features. Task: classify iris species.

TensorFlow.js — Neural Network

epochs

// TensorFlow.js — MLP classifier
const model = tf.sequential({
  layers: [
    tf.layers.dense({ inputShape: [4],
      units: 16, activation: "relu" }),
    tf.layers.dense({ units: 16,
      activation: "relu" }),
    tf.layers.dense({ units: 3,
      activation: "softmax" }),
  ]
});
model.compile({
  optimizer: tf.train.adam(0.01),
  loss: "categoricalCrossentropy",
  metrics: ["accuracy"],
});
await model.fit(X_train, y_train, {
  epochs: 50, batchSize: 16,
  validationSplit: 0.1,
});

Python — scikit-learn


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np, json, time

X_train = np.array(X_TRAIN)
y_train = np.array(Y_TRAIN)
X_test  = np.array(X_TEST)
y_test  = np.array(Y_TEST)

t0 = time.perf_counter()
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
ms = (time.perf_counter() - t0) * 1000

acc = accuracy_score(y_test, y_pred)
cm  = confusion_matrix(y_test, y_pred, labels=[0,1,2]).tolist()
print(json.dumps({"acc": acc, "ms": ms, "cm": cm,
                  "info": f"KNN k=5, train=120, test=30"}))

What you just ran

TensorFlow.js: Multi-Layer Perceptron

TF.js trains a small neural network entirely in-browser — JavaScript executing on WebGL (GPU-accelerated) or CPU depending on your device.

Architecture:  Dense(4→16, ReLU) → Dense(16→16, ReLU) → Dense(16→3, Softmax)
Parameters:    ~400
Optimiser:     Adam (lr=0.01)
Loss:          Categorical cross-entropy

All computation happens via tf.Tensor operations — typed arrays managed by TF.js’s memory system. The training loop, backpropagation, and weight updates run on WebGL shaders or WASM kernels depending on the available backend. This is the same graph execution model as Python TensorFlow, compiled to a different target.

scikit-learn via Pyodide: Classical classifiers

Three algorithms available:

K-Nearest Neighbours (k=5): No “training” in the traditional sense. The model memorises all training samples. At inference time it finds the 5 nearest neighbours by Euclidean distance in feature space and majority-votes their labels. Fast to fit, O(d·log(n)) average-case inference with sklearn’s default KD-tree (O(n·d) brute-force worst case).

Decision Tree (max_depth=4): Recursively partitions feature space by asking yes/no questions about single features (petal_width > 0.8?). The tree structure is fully inspectable — you can print the rules. Feature importances tell you which features drove the splits. On Iris, petal dimensions dominate.

Logistic Regression: A linear model — it finds a hyperplane in 4D feature space that separates the classes. One-vs-rest for multiclass. Fast, stable, coefficients are directly interpretable as log-odds weights.

All three train in milliseconds once scikit-learn is loaded. The execution time shown in the sandbox is Python-internal — measured after imports complete, so it reflects pure algorithm performance.

Why these aren’t really competitors

The sandbox shows them on the same task, but TF.js and scikit-learn operate in different regimes:

                 Tabular / structured data
                 (small-medium, labelled)
                          │
              ┌───────────┴───────────┐
              │                       │
         Classical ML             Deep learning
         (scikit-learn)           (TF.js)
              │                       │
    ┌─────────┴──────────┐    ┌───────┴──────────┐
    │                    │    │                  │
 Interpretable       Fast     Images         Sequences
 (tree rules,     to train    Audio          Text
  coefficients)                             Tabular
                               (when data > 10k rows
                                and features > ~100)

On Iris (150 rows, 4 features), classical ML wins or ties on accuracy and is faster, more interpretable, and requires no hyperparameter search for the architecture. A decision tree achieves ~97% accuracy with fully readable split rules. A neural network can match this but needs enough epochs, the right architecture, and gives you nothing to inspect.

Where the equation flips: image classification, text processing, audio, or any domain where you can’t hand-engineer features and have thousands of training samples. On those problems, TF.js’s WebGL backend becomes valuable — GPU-accelerated matrix multiplication for convolutional layers.

The interpretability gap

Run the Decision Tree in the sandbox and look at the feature importances. You’ll see something like:

petal length:  0.44
petal width:   0.42
sepal length:  0.10
sepal width:   0.04

This is actionable. A botanist can verify it makes biological sense. A regulator can audit it. If the model makes a wrong prediction, you can trace exactly which split went wrong.

The TF.js neural network gives you nothing equivalent. It achieves similar accuracy through a composition of matrix multiplications and nonlinearities that has no human-readable interpretation. This is the fundamental tradeoff in ML: representational power vs. explainability.

For production deployments in regulated industries (finance, healthcare, insurance), this matters enormously. The EU AI Act requires “meaningful explanation” for high-risk automated decisions. Decision trees and logistic regression satisfy this. Neural networks generally don’t without additional tooling (SHAP, LIME, attention maps).

Bundle size and startup

                    TF.js (neural net)     Pyodide + sklearn
  npm/CDN size:     ~200 KB gzipped        ~23 MB (Pyodide + sklearn)
  Cold start:       < 500 ms              15–25 s first run
  Warm runs:        instant               instant
  GPU acceleration: Yes (WebGL backend)   No
  Training in-browser: Yes               Yes (CPU)

Pyodide’s cold-start delay is the dominant UX consideration. Both runtimes cache in the browser after first load. For a blog post or educational tool with a loading indicator, the 20-second wait is acceptable. For a product feature, TF.js’s sub-second startup is essential.

TF.js beyond tiny datasets

The Iris demo understates what TF.js is actually for. The compelling cases are:

Real-time inference on device: Pose estimation, object detection, face landmark detection — all running at 30fps in a browser tab via pre-trained models from the TF.js model hub. No round trip to a server. No latency. No data leaving the device.

Transfer learning: Load a MobileNet backbone (pre-trained on ImageNet), freeze the base, add a classification head, train the head on your 200-image custom dataset in-browser. This is 10 lines of TF.js code.

Privacy-preserving ML: Medical image analysis, personal document processing — when you cannot send user data to a server, running inference in the browser is the only viable architecture.

scikit-learn via Pyodide covers different ground: exploratory data analysis, classical statistical models, pipelines built by data scientists who think in Python and need to run those pipelines without a backend for demos or educational content.

What’s not feasible in-browser (yet)

TensorFlow Python: Too large (~500 MB with dependencies), requires GPU drivers unavailable in WASM
PyTorch: Same constraints — no browser WASM build from the PyTorch team
Training large models: TF.js can train small models. Training a ResNet-50 from scratch in a browser tab is not practical
Distributed training: WebAssembly is single-process; no multi-GPU, no multi-node

The browser ML ecosystem is genuinely useful for inference, small training jobs, and classical ML via Pyodide. It is not a replacement for GPU servers running model training.

Python vs JavaScript DataFrames in the Browser — Live Benchmarks with No Backend — the same Pyodide approach applied to data processing with pandas vs arquero
LLM API Integration Patterns — Structured Outputs, Function Calling, Streaming — when in-browser ML isn’t enough and you need a hosted model via API
Multi-Agent Workflows with Claude API — Architecture Patterns That Work — orchestrating ML inference as part of larger AI pipelines