bspeice.github.io/content/notebooks/2016-4-6-tick-tock....ipynb

491 lines
19 KiB
Plaintext
Raw Normal View History

2016-04-06 19:39:04 -04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If all we have is a finite number of heartbeats left, what about me?\n",
"\n",
"---\n",
"\n",
"Warning: this one is a bit creepier. But that's what you get when you come up with data science ideas as you're drifting off to sleep.\n",
"\n",
"# 2.5 Billion\n",
"\n",
"If [PBS][1] is right, that's the total number of heartbeats we get. Approximately once every second that number goes down, and down, and down again...\n",
"[1]: http://www.pbs.org/wgbh/nova/heart/heartfacts.html\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"total_heartbeats = 2500000000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I got a Fitbit this past Christmas season, mostly because I was interested in the data and trying to work on some data science projects with it. This is going to be the first project, but there will likely be more (and not nearly as morbid). My idea was: If this is the final number that I'm running up against, how far have I come, and how far am I likely to go? I've currently had about 3 months' time to estimate what my data will look like, so let's go ahead and see: given a lifetime 2.5 billion heart beats, how much time do I have left?\n",
"\n",
"# Statistical Considerations\n",
"\n",
"Since I'm starting to work with health data, there are a few considerations I think are important before I start digging through my data.\n",
"\n",
"1. The concept of 2.5 billion as an agreed-upon number is tenuous at best. I've seen anywhere from [2.21 billion][2] to [3.4 billion][3] so even if I knew exactly how many times my heart had beaten so far, the ending result is suspect at best. I'm using 2.5 billion because that seems to be about the midpoint of the estimates I've seen so far.\n",
"2. Most of the numbers I've seen so far are based on extrapolating number of heart beats from life expectancy. As life expectancy goes up, the number of expected heart beats goes up too.\n",
"3. My estimation of the number of heartbeats in my life so far is based on 3 months worth of data, and I'm extrapolating an entire lifetime based on this.\n",
"\n",
"So while the ending number is **not useful in any medical context**, it is still an interesting project to work with the data I have on hand.\n",
"\n",
"# Getting the data\n",
"\n",
"[Fitbit](https://www.fitbit.com/) has an [API available](https://dev.fitbit.com/) for people to pull their personal data off the system. It requires registering an application, authentication with OAuth, and some other complicated things. **If you're not interested in how I fetch the data, skip [here](#Wild-Extrapolations-from-Small-Data)**.\n",
"\n",
"## Registering an application\n",
"\n",
"I've already [registered a personal application](https://dev.fitbit.com/apps/new) with Fitbit, so I can go ahead and retrieve things like the client secret from a file.\n",
"\n",
"[1]: http://www.pbs.org/wgbh/nova/heart/heartfacts.html\n",
"[2]: http://gizmodo.com/5982977/how-many-heartbeats-does-each-species-get-in-a-lifetime\n",
"[3]: http://wonderopolis.org/wonder/how-many-times-does-your-heart-beat-in-a-lifetime/"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
2016-04-06 19:39:04 -04:00
},
"outputs": [],
"source": [
"# Import all the OAuth secret information from a local file\n",
"from secrets import CLIENT_SECRET, CLIENT_ID, CALLBACK_URL"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Handling OAuth 2\n",
"\n",
"So, all the people that know what OAuth 2 is know what's coming next. For those who don't: OAuth is how people allow applications to access other data without having to know your password. Essentially the dialog goes like this:\n",
"\n",
"```\n",
"Application: I've got a user here who wants to use my application, but I need their data.\n",
"Fitbit: OK, what data do you need access to, and for how long?\n",
"Application: I need all of these scopes, and for this amount of time.\n",
"Fitbit: OK, let me check with the user to make sure they really want to do this.\n",
"\n",
"Fitbit: User, do you really want to let this application have your data?\n",
"User: I do! And to prove it, here's my password.\n",
"Fitbit: OK, everything checks out. I'll let the application access your data.\n",
"\n",
"Fitbit: Application, you can access the user's data. Use this special value whenever you need to request data from me.\n",
"Application: Thank you, now give me all the data.\n",
"```\n",
"\n",
"Effectively, this allows an application to gain access to a user's data without ever needing to know the user's password. That way, even if the other application is hacked, the user's original data remains safe. Plus, the user can let the data service know to stop providing the application access any time they want. All in all, very secure.\n",
"\n",
"It does make handling small requests a bit challenging, but I'll go through the steps here. We'll be using the [Implicit Grant](https://dev.fitbit.com/docs/oauth2/) workflow, as it requires fewer steps in processing.\n",
"\n",
"First, we need to set up the URL the user would visit to authenticate:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import urllib\n",
"\n",
"FITBIT_URI = 'https://www.fitbit.com/oauth2/authorize'\n",
"params = {\n",
" # If we need more than one scope, must be a CSV string\n",
" 'scope': 'heartrate',\n",
" 'response_type': 'token',\n",
" 'expires_in': 86400, # 1 day\n",
" 'redirect_uri': CALLBACK_URL,\n",
" 'client_id': CLIENT_ID\n",
"}\n",
"\n",
"request_url = FITBIT_URI + '?' + urllib.parse.urlencode(params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, here you would print out the request URL, go visit it, and get the full URL that it sends you back to. Because that is very sensitive information (specifically containing my `CLIENT_ID` that I'd really rather not share on the internet), I've skipped that step in the code here, but it happens in the background."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# The `response_url` variable contains the full URL that\n",
"# FitBit sent back to us, but most importantly,\n",
"# contains the token we need for authorization.\n",
"access_token = dict(urllib.parse.parse_qsl(response_url))['access_token']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Requesting the data\n",
"\n",
"Now that we've actually set up our access via the `access_token`, it's time to get the actual [heart rate data](https://dev.fitbit.com/docs/heart-rate/). I'll be using data from January 1, 2016 through March 31, 2016, and extrapolating wildly from that.\n",
"\n",
"Fitbit only lets us fetch intraday data one day at a time, so I'll create a date range using pandas and iterate through that to pull down all the data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Heartbeats from 2016-01-01 00:00:00 to 2016-03-31 23:59:00: 8139060\n"
]
}
],
"source": [
"from requests_oauthlib import OAuth2Session\n",
"import pandas as pd\n",
"from datetime import datetime\n",
"\n",
"session = OAuth2Session(token={\n",
" 'access_token': access_token,\n",
" 'token_type': 'Bearer'\n",
" })\n",
"\n",
"format_str = '%Y-%m-%d'\n",
"start_date = datetime(2016, 1, 1)\n",
"end_date = datetime(2016, 3, 31)\n",
"dr = pd.date_range(start_date, end_date)\n",
"\n",
"url = 'https://api.fitbit.com/1/user/-/activities/heart/date/{0}/1d/1min.json'\n",
"hr_responses = [session.get(url.format(d.strftime(format_str))) for d in dr]\n",
"\n",
"def record_to_df(record):\n",
" if 'activities-heart' not in record:\n",
" return None\n",
" date_str = record['activities-heart'][0]['dateTime']\n",
" df = pd.DataFrame(record['activities-heart-intraday']['dataset'])\n",
" \n",
" df.index = df['time'].apply(\n",
" lambda x: datetime.strptime(date_str + ' ' + x, '%Y-%m-%d %H:%M:%S'))\n",
" return df\n",
"\n",
"hr_dataframes = [record_to_df(record.json()) for record in hr_responses]\n",
"hr_df_concat = pd.concat(hr_dataframes)\n",
"\n",
"\n",
"# There are some minutes with missing data, so we need to correct that\n",
"full_daterange = pd.date_range(hr_df_concat.index[0],\n",
" hr_df_concat.index[-1],\n",
" freq='min')\n",
"hr_df_full = hr_df_concat.reindex(full_daterange, method='nearest')\n",
"\n",
"print(\"Heartbeats from {} to {}: {}\".format(hr_df_full.index[0],\n",
" hr_df_full.index[-1],\n",
" hr_df_full['value'].sum()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we've retrieved all the available heart rate data for January 1<sup>st</sup> through March 31<sup>st</sup>! Let's get to the actual analysis.\n",
"\n",
"# Wild Extrapolations from Small Data\n",
"\n",
"A fundamental issue of this data is that it's pretty small. I'm using 3 months of data to make predictions about my entire life. But, purely as an exercise, I'll move forward.\n",
"\n",
"## How many heartbeats so far?\n",
"\n",
"The first step is figuring out how many of the 2.5 billion heartbeats I've used so far. We're going to try and work backward from the present day to when I was born to get that number. The easy part comes first: going back to January 1<sup>st</sup>, 1992. That's because I can generalize how many 3-month increments there were between now and then, account for leap years, and call that section done.\n",
"\n",
"Between January 1992 and January 2016 there were 96 quarters, and 6 leap days. The number we're looking for is:\n",
"\n",
"\\begin{equation}\n",
"hr_q \\cdot n - hr_d \\cdot (n-m)\n",
"\\end{equation}\n",
"\n",
"- $hr_q$: Number of heartbeats per quarter\n",
"- $hr_d$: Number of heartbeats on leap day\n",
"- $n$: Number of quarters, in this case 96\n",
"- $m$: Number of leap days, in this case 6"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"773609400"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quarterly_count = hr_df_full['value'].sum()\n",
"leap_day_count = hr_df_full[(hr_df_full.index.month == 2) &\n",
" (hr_df_full.index.day == 29)]['value'].sum()\n",
"num_quarters = 96\n",
"leap_days = 6\n",
"\n",
"jan_92_jan_16 = quarterly_count * num_quarters - leap_day_count * (num_quarters - leap_days)\n",
"jan_92_jan_16"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So between January 1992 and January 2016 I've used $\\approx$ 774 million heartbeats. Now, I need to go back to my exact birthday. I'm going to first find on average how many heartbeats I use in a minute, and multiply that by the number of minutes between my birthday and January 1992.\n",
"\n",
"For privacy purposes I'll put the code here that I'm using, but without any identifying information:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Heartbeats so far: 775804660\n",
"Remaining heartbeats: 1724195340\n"
]
}
],
"source": [
"minute_mean = hr_df_full['value'].mean()\n",
"# Don't you wish you knew?\n",
"# birthday_minutes = ???\n",
"\n",
"birthday_heartbeats = birthday_minutes * minute_mean\n",
"\n",
"heartbeats_until_2016 = int(birthday_heartbeats + jan_92_jan_16)\n",
"remaining_2016 = total_heartbeats - heartbeats_until_2016\n",
"\n",
"print(\"Heartbeats so far: {}\".format(heartbeats_until_2016))\n",
"print(\"Remaining heartbeats: {}\".format(remaining_2016))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It would appear that my heart has beaten 775,804,660 times between my moment of birth and January 1<sup>st</sup> 2016, and that I have 1.72 billion left.\n",
"\n",
"## How many heartbeats longer?\n",
"\n",
"Now comes the tricky bit. I know how many heart beats I've used so far, and how many I have remaining, so I'd like to come up with a (relatively) accurate estimate of when exactly my heart should give out. We'll do this in a few steps, increasing in granularity.\n",
"\n",
"First step, how many heartbeats do I use in a 4-year period? I have data for a single quarter including leap day, so I want to know:\n",
"\n",
"\\begin{equation}\n",
"hr_q \\cdot n - hr_d \\cdot (n - m)\n",
"\\end{equation}\n",
"\n",
"- $hr_q$: Heartbeats per quarter\n",
"- $hr_d$: Heartbeats per leap day\n",
"- $n$: Number of quarters = 16\n",
"- $m$: Number of leap days = 1"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"128934900"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"heartbeats_4year = quarterly_count * 16 - leap_day_count * (16 - 1)\n",
"heartbeats_4year"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, I can fast forward from 2016 the number of periods of 4 years I have left."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Four year periods remaining: 13\n",
"Remaining heartbeats after 4 year periods: 48041640\n"
]
}
],
"source": [
"four_year_periods = remaining_2016 // heartbeats_4year\n",
"remaining_4y = remaining_2016 - four_year_periods * heartbeats_4year\n",
"\n",
"print(\"Four year periods remaining: {}\".format(four_year_periods))\n",
"print(\"Remaining heartbeats after 4 year periods: {}\".format(remaining_4y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given that there are 13 four-year periods left, I can move from 2016 all the way to 2068, and find that I will have 48 million heart beats left. Let's drop down to figuring out how many quarters that is. I know that 2068 will have a leap day (unless someone finally decides to get rid of them), so I'll subtract that out first. Then, I'm left to figure out how many quarters exactly are left."
]
},
{
"cell_type": "code",
"execution_count": 12,
2016-04-06 19:39:04 -04:00
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Quarters left starting 2068: 8\n",
2016-04-06 19:39:04 -04:00
"Remaining heartbeats after that: 4760716\n"
]
}
],
"source": [
"remaining_leap = remaining_4y - leap_day_count\n",
"# Ignore leap day in the data set\n",
"heartbeats_quarter = hr_df_full[(hr_df_full.index.month != 2) &\n",
" (hr_df_full.index.day != 29)]['value'].sum()\n",
2016-04-06 19:39:04 -04:00
"quarters_left = remaining_leap // heartbeats_quarter\n",
"remaining_year = remaining_leap - quarters_left * heartbeats_quarter\n",
"\n",
"print(\"Quarters left starting 2068: {}\".format(quarters_left))\n",
"print(\"Remaining heartbeats after that: {}\".format(remaining_year))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, that analysis gets me through until January 1<sup>st</sup> 2070. Final step, using that minute estimate to figure out how many minutes past that I'm predicted to have:"
2016-04-06 19:39:04 -04:00
]
},
{
"cell_type": "code",
"execution_count": 13,
2016-04-06 19:39:04 -04:00
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"datetime.datetime(2070, 2, 23, 5, 28)"
2016-04-06 19:39:04 -04:00
]
},
"execution_count": 13,
2016-04-06 19:39:04 -04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import timedelta\n",
"\n",
"base = datetime(2070, 1, 1)\n",
2016-04-06 19:39:04 -04:00
"minutes_left = remaining_year // minute_mean\n",
"\n",
"kaput = timedelta(minutes=minutes_left)\n",
"base + kaput"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"According to this, I've got until February 23<sup>rd</sup>, 2070 at 5:28 PM in the evening before my heart gives out.\n",
2016-04-06 19:39:04 -04:00
"\n",
"# Summary\n",
"\n",
"Well, that's kind of a creepy date to know. As I said at the top though, **this number is totally useless in any medical context**. It ignores the rate at which we continue to get better at making people live longer, and is extrapolating from 3 months' worth of data the rest of my life. Additionally, throughout my time developing this post I made many minor mistakes. I think they're all fixed now, but it's easy to mix a number up here or there and the analysis gets thrown off by a couple years.\n",
2016-04-06 19:39:04 -04:00
"\n",
"Even still, I think philosophically humans have a desire to know how much time we have left in the world. [Man is but a breath](https://www.biblegateway.com/passage/?search=psalm+144&version=ESV), and it's scary to think just how quickly that date may be coming up. This analysis asks an important question though: what are you going to do with the time you have left?\n",
"\n",
"Thanks for sticking with me on this one, I promise it will be much less depressing next time!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
2016-04-06 19:39:04 -04:00
"language": "python",
"name": "python3"
2016-04-06 19:39:04 -04:00
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
2016-04-06 19:39:04 -04:00
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
2016-04-06 19:39:04 -04:00
}
},
"nbformat": 4,
"nbformat_minor": 0
}