Privacy-aware Bits for the Fit...
or is it all Bullshit?

18 Jul 2024 by pet

Fully concentrate on swimming without having to count the laps myself anymore. Wouldn’t that be nice? So I thought. This initial impulse led me to buy a smartwatch with the premise of not sharing any of my data with a company. Some research beforehand made me optimistic: it seemed that there is a class of devices (wristbands) for which this is possible. It is made possible by using the free and open source Gadgetbridge app instead of the original vendor app. Gadgetbridge can communicate with the smartwatch via Bluetooth, fetching and sending data back and forth the same way the vendor app does. Privacy is preserved by the design of the app: it has no way to access the internet. So, all data fetched from the watch will stay on the smartphone. Now you may wonder:

  • What are the downsides of this approach?
  • Why does this even work? Isn’t it a big harm for the companies trying to get my data?
  • How certain can I be that no third party gets my data?
  • Are you trying to make me buy consumer electronics with a good conscience?

Some of these questions also weigh heavily upon me. So I will go ahead and turn myself inside out, trying to tackle them the best I can.

Gadgetbridge – How it works

Gadgetbridge is an app that can operate several Bluetooth gadgets, such as smartwatches and Bluetooth speakers. For Gadgetbridge to interact with devices via Bluetooth, knowledge about the structure of the information sent via Bluetooth is needed to understand which message leads to which result and where we can find the information we are looking for. To make sense of the data transferred via Bluetooth, so called “reverse engineering” is used to acquire this knowledge. The process is needed because the used protocol is not publicly available (as it should be). Reverse engineering is the process of observing the traffic generated by the vendor app during normal usage and exploiting statistical patterns in it to be able to replay the same interactions. For example, we could observe that every interaction of type “get data from device” begins with the exact same sequence of numbers. This could indicate that this sequence might be interpreted by the device as “send the data to the phone.”

Before showing a little example of how this may look like, let’s talk about the downsides of this approach. Since every bit of information needs to be decoded manually by the hacker, some features covered by the vendor’s application might be missing. For example, for the device I ended up deciding on, the data of the swimming workout could not be interpreted by the app. What a pity! (Remember the first sentence of this article?) Furthermore, the process is neither intended nor supported by the vendors. The worst that could happen is the vendor replacing the protocol with a completely new one, which would reset the progress of decoding back to zero. Although, in this case, it would still be possible to not update the firmware on the device itself to stick to the old protocol.

One other thing that needs to be considered: some devices require an encryption key that can only be generated by an initial pairing with the vendor app, and for this, an account is needed. For example, you need an account for Mi Fitness (Xiaomi) to pair a device of type Mi Band [4..8]. After this, you can uninstall the Mi Fitness app and use Gadgetbridge instead to make sure that at least your fitness data remains local. The good thing about the key generation is that afterward, the data sent between the smartphone and the watch is encrypted. What this implies for security I will discuss in a moment. Before taking a closer look at security, let’s consider how reverse engineering can be viewed as an (enjoyable) puzzle game.

Decoding the hacker myth

Optional if you want to learn something about reverse engineering:

Click here for the text.

So the swimming workout type was not supported by Gadgetbridge. Since other workouts, such as walking, could be interpreted by the app, it did not seem impossible to add this missing piece of information. So, I decided to dig a little deeper to check what would be necessary to add this functionality to the app. It turned out that you can export a raw summary for each workout fetched from the watch through the app, and the app was only missing instructions on how to interpret this summary data. In our case, “raw” means that the obtained information solely consists of bits, or in other words, is a chain of consecutive ones and zeros. For better readability, we look at them in hexadecimal format. This is just another way to display the same information, whereby one byte (8 ones and zeros) is represented by two hexadecimal digits. Hexadecimal digits reach from 1 to f. After counting up to 10, you continue with along the alphabet: a for 11, b for 12, and so on. For example, a1 in the summary below represents the number 161 or, in binary, 10100001. Don’t worry about the numerical systems too much; for our purpose, it’s enough to know that we can represent the same information in various ways, and there are easy ways to translate between these representations (just search for hex to dec, for example).

To interpret this data, we need to use the information available to us, primarily the statistics displayed on the gadget itself. For swimming, we can see, among other things, how many lengths were swum. So, we know that the summary for the last swimming workout should include a 30 for the number of lengths swum somewhere. If we can identify the field representing this information, we have a way to interpret it on our own and, for example, display it in the app and save it to the long-term database. To make the task a bit harder, two extra burdens are laid upon us: the length of fields representing one value varies, and byte order.

The varying field length stems from the fact that it is possible to represent smaller numbers with fewer digits than larger ones. For example, the configured length goal is represented by one byte, which is two digits in the example below because one byte can represent numbers from 0 to 255, and the length goal is restricted by the vendor firmware to 50 anyway. The distance in meters swum during the workout is represented by an integer, which means 8 digits in the excerpt below, allowing for numbers up to 4294967295.

Byte order means the way we read numbers, from left to right or right to left. Let’s consider the number 237, which by convention we would interpret as 7 * 10^0 + 3 * 10^1 + 2 * 10^2. As you can see, in our notation “237”, the least significant digit (which is only multiplied by 10^0) is in the last place, so it is little-endian notation. In another world, one could interpret the number 237 in this way: 2 * 10^0 + 3 * 10^1 + 7 * 10^2, which would certainly differ from the number we obtained by our convention. You might argue this is a fictitious world we do not have to consider on Earth, but fact is, people made this a reality in the world of digital computing, especially for the hex notation we are currently dealing with.

With this in mind, you can go through the possibilities: do I want to consider 1, 2, or 4 pairs of hex digits? How would it look if I reorder it from left to right? Take a look at the decoded numbers: the field four pairs highlighted in the second row can be interpreted as 00 00 06 40 (watch out for byte order), which is 1600 in decimal. In the third row, the number of strokes is 03 f7, which is 1015 in decimal.

If this seems like a fun game to you, you can go on decoding some other fields. We are looking for a 32 (as a short) for the number of laps and 2385 for the active seconds of the workout. For the solution, you could either take a closer look at the commit contributing the support for the workout gadgetbrige or write me an email.

First rows of the raw training summary in hexadecimal:

  a1 e2 62 66 08 06 a5 fe   7f c0 04 a1 e2 62 66 f7
  eb 62 66 51 09 00 00 [40  06 00 00] a5 01 58 00 00 // Integer: Distance in meters
  00 e8 00 00 00 00 00 00  [f7 03] 00 1f 20 00 6a 00 // Short: Strokes
  3b 00 32 d1 01 51 09 00   00 58 02 00 00 2c 01 88

Without internet connection no risk?

When it comes to security, we rely on the premise that the device itself cannot establish an internet connection on its own. This is due to the fact that the vendor firmware is still running on the device itself, and we have no control over this part. Taking the Xiaomi Mi Band 8 as an example, we can confirm by specification that the only way to establish a connection for the device is using the Bluetooth 5.1 BLE (Bluetooth Low Energy) protocol. For the protocol, pairing is optional, which means it could be used in a way that any device could send and receive data from a peripheral device such as a watch. For the Mi Band 8, it is implemented in a way that pairing is needed, and in the pairing process, encryption keys are negotiated. So, in theory, not only is there no easy way to unauthorizedly get or send data to the device, but also data sent back and forth is not meaningful when someone manages to sniff it. Therefore, we could conclude that the setup seems nice and tidy: we have a peripheral device responding in a secure way to requests from the smartphone, and the smartphone is considered secure because the app managing the data has no network connectivity.

For the majority of users, this security level is totally sufficient, and for everyone, it should definitely be better than sharing all data with the device vendor. For people with high security demands, it is important to note that, as in every digital system, there are some potential weaknesses which could lead to data breaches. Just to mention two of them: for the described device, the vendor still holds the initial pairing key, and since the firmware on the device can be considered unknown to a large extent, the authentication mechanism might be prone to bugs which could lead to unauthorized data access.

Wait a second…

Discussing all the pros and cons of hacking consumer electronics, the broader view has become obscured. Is viewing our body in numbers a good practice? Isn’t a big part of quantification about competition because numbers are easily comparable? And isn’t competition about the suppression of the “loser”? Should my sports practice be “me and my numbers”? Should I spend money on something to use it in a way it is not supposed to, instead of supporting alternative approaches? Does chasing numbers distract me from engaging with my environment and my social surroundings in a more meaningful way?

I’ve made myself guilty of postponing these questions for another day for the easy satisfaction of buying some new toy. However, it turns out there is a lot to contemplate. Maybe you’ll decide differently.

Conclusion

Clever people have managed to partially liberate electronics so that everyone can interact with it. However, the whole process heavily depends on closed-source firmware on the device itself, which introduces security risks, restricts the freedom of computing, and may potentially eliminate this control over electronics altogether in the future. For now, you could continue decoding pieces of missing information, learning about protocols and how software works, and having some fun in the process. A more promising and sustainable approach seems to either support projects that are free from the ground up, like the bangle.js, or focus on finding alternative ways to make sports more enjoyable even without electronics.