From Sidelined to Streamlined: Cracking the Code (Part 2)
In Part 1, I talked about the high-level "magic" happening on your wrist. How Apple uses neural networks and math to guess your VO2Max even when you’re just walking around the block. But if you're like me, "trusting papers" isn't enough. I wanted to get my hands on the raw numbers to see the "Prediction Delta" for myself
The problem? When you hit "Export Health Data" on your iPhone, you don’t get a clean spreadsheet. You get a giant, messy folder full of scary XML files.
Here’s how I tackled the two biggest data hurdles in the export: the massive health records and the hidden workout GPS paths.
Streaming the "Heavyweight" (The CDA File)
Inside your export, there's a file called export_cda.xml. This is where the "clinical" stuff lives. For example The second-by-second Heart Rate and Blood Oxygen levels we will use to compute a VO2Max.
The catch? This file can easily be 300MB or more. If you try to open that in a normal text editor or a basic script, your computer will probably hang. To get around this, I wrote a "streaming" parser. Instead of loading the whole file at once, it reads it one tiny piece at a time, grabs what it needs, and then throws the rest away.
The "Secret Sauce" in the Code:
- LOINC Codes: I told the script to ignore everything except two specific ID numbers:
8867-4for Heart Rate and2710-2for Blood Oxygen. - Cleaning the Mess: Apple records Oxygen as a decimal (like
0.98), so I had the script multiply that by 100 to make it a readable percentage. - Memory Management: By using
elem.clear(), the script stays fast and light, no matter how many years of data you've collected.
Hunting for Your Routes (The GPX Files)
If you’ve ever looked at your workout map in the Fitness app and wondered where that data lives, it's in the workout-routes/ folder. Every time you track a walk or run, Apple generates a .gpx file.
These files are great because they contain the "Environmental Kinematics"—the pace, distance, and topography that Apple uses to validate your fitness.
What my GPX Parser does:
It loops through every file in that folder and extracts more than just your GPS coordinates. It calculates:
- The "Verticality": It finds the highest and lowest points to see how much of a hill you actually climbed.
- The Pace: It pulls the speed data directly recorded by the watch sensors.
- The Duration: It calculates exactly how long you were moving by comparing the first and last timestamps.
Why do we need this data?
When these two scripts finish running, that mountain of XML files is transformed into a few clean CSV files. Now, instead of staring at raw code, I have a clear spreadsheet of my rehab walks.
I can see exactly how my heart rate reacted to that one steep hill, and more importantly, I have the "Data Diet" ready to feed into my own model.
Grab the code here.
Coming up in Part 3: We finally get to the fun part. I’m taking these CSVs into a Jupyter Notebook to build my own Multi-Layer Perceptron (MLP). We’re going to see if my DIY "Watch Brain" comes up with the same VO2Max scores as Apple's. Stay tuned!
First published 4/12/26 on blog.farzon.org



