← Selected Work

Student Analytics Dashboard

How I built live analyticsacross two branches without taking anyone's spreadsheet away.

Role
Sole developer — database, backend, frontend, auth, deployment
Timeline
~8 weeks, shipped Q2 2025
Stack
Python / Flask · MSAL + PyJWT · gspread · pandas · IIS + Waitress + PM2
01

The problem

A multi-branch education consultancy. Counsellors, admission officers, frontdesk staff — each of them kept their own Google Sheet, in whatever shape felt natural. One person called the column student name. Another used Name. A third had FULL NAME. Dates were a mess. Status values were a mess. Every sheet was its own universe.

Leadership wanted unified reporting. Conversion rates, counsellor performance, frontdesk KPIs — the usual. And they wanted it live, not on a Monday-morning compilation cycle.

The obvious answer was to migrate everyone onto a CRM with one clean schema. Management had tried. It failed — predictably. People had spent months building their sheets. They were fast on them. They had their own colour codes, their own notes columns, their own little habits nobody else understood. Asking them to abandon it all in favour of a standardised form felt, to them, like being asked to work with one hand tied behind their back.

So the sheets stayed. And the reporting stayed broken.

02

The insight

I sat with the counsellors for a week before writing a line of code. Watching what they did, not what they said they did. Two things stood out.

First, nobody was actually using their sheets as free-form documents. Every sheet had a fixed structure the person had settled on months earlier, and they weren't changing it. The chaos across sheets masked a lot of internal order inside each one.

Second, the formats didn't drift. Once someone had decided their column was called stat., it stayed stat. for good. Changing a working spreadsheet is expensive for the person doing the work, so they almost never do it.

“If the formats don't change, I don't need to standardise them upstream. I just need to map them once — in code — and read them forever.”

That was the pivot. Stop trying to change behaviour. Build around it.

03

The approach

A Python service that reads every counsellor's Google Sheet directly over the Sheets API, applies a per-sheet mapping (source column → standard field), joins everything into one unified pandas DataFrame server-side, and serves reports off that unified dataset. A dashboard on top, with Microsoft SSO so people log in with their existing work accounts.

Five counsellors, five formatsstudent namedobstatus_2024NamePhone #ApplicationFULL NAMECNTCTstat.applicantcoursestageA mapping layer per sheet"student name"full_name"Name"full_name"FULL NAME"full_name"applicant"full_name"status_2024"stage"Application"stage"stat."stage"stage"stageformats don't change — so the mapping stays validOne live dashboardLead conversionTotal1,284Converted312Rate24.3%Counsellor perf.Front-desk KPIlive
How messy spreadsheets become one live dataset — without taking anyone's spreadsheet away.

The mapping layer is the whole trick. Each user's sheet gets its own mapping config — a small YAML-like structure that says “this sheet's stat.column means stage, use these regex rules to normalise the values, treat blank cells as unassigned”. Writing the mapping takes fifteen minutes per new user. After that, their sheet is part of the unified dataset and they never know.

The dashboard runs on Flask, secured behind Azure AD / MSAL SSO with three-tier role-based access (admin / branch head / counsellor). A full-dataset caching layer sits between the Sheets API and the report queries — this matters because drill-downs on bar charts were hammering the Sheets API with redundant reads until I added it. Quota issues vanished. Response times dropped.

The whole thing runs on a self-hosted Windows Server 2019 stack — IIS reverse-proxy to a Waitress WSGI server, kept alive with PM2. Cloudflare Tunnel handles public access. No managed PaaS, no cold starts.

04

What it changed

Counsellors kept their spreadsheets. That was the non-negotiable and it stayed true. Nobody had to relearn a new system, nobody had to re-enter their pipeline into a new UI, nobody had their colour codes taken away.

Meanwhile, leadership got a single live dashboard — conversion by source, by branch, by counsellor. Frontdesk throughput. Admission stage distribution. All of it pulled from the same unified dataset, all of it refreshable at any moment.

The weekly reporting meeting stopped being a compilation meeting. It became an actual discussion meeting. That was the real win.

2
branches on live analytics
0
spreadsheets taken away
~15 min
to onboard a new sheet
05

Under the hood

Backend
Python 3.11, Flask, pandas for in-memory joins and aggregations, gspread for the Sheets API layer.
Authentication
Azure AD / MSAL with PyJWT token validation. Three-tier RBAC: admin gets everything, branch head sees their branch, counsellor sees their own pipeline.
Performance
Full-dataset caching layer invalidated on write. Sheets API quota usage dropped sharply once drill-downs stopped triggering redundant reads.
Deployment
Windows Server 2019, IIS reverse-proxy → Waitress WSGI, process-managed by PM2. Cloudflare Tunnel for public egress. Same server also hosts IR Connect and an existing production app — the migration away from Render + Netlify eliminated cold-start latency entirely.
What I'd do differently
Move the mapping configs from files into a small admin UI so branch heads could onboard new counsellors without needing me. That's the v2 I'm scoping now.