diff --git a/blog/2016-01-23-cloudy-in-seattle/1.png b/blog/2016-01-23-cloudy-in-seattle/1.png new file mode 100644 index 0000000..6702a9f Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/1.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/10.png b/blog/2016-01-23-cloudy-in-seattle/10.png new file mode 100644 index 0000000..c6270e4 Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/10.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/2.png b/blog/2016-01-23-cloudy-in-seattle/2.png new file mode 100644 index 0000000..51b3375 Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/2.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/3.png b/blog/2016-01-23-cloudy-in-seattle/3.png new file mode 100644 index 0000000..c461680 Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/3.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/4.png b/blog/2016-01-23-cloudy-in-seattle/4.png new file mode 100644 index 0000000..c030210 Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/4.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/5.png b/blog/2016-01-23-cloudy-in-seattle/5.png new file mode 100644 index 0000000..4a58b6e Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/5.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/6.png b/blog/2016-01-23-cloudy-in-seattle/6.png new file mode 100644 index 0000000..70d426a Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/6.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/7.png b/blog/2016-01-23-cloudy-in-seattle/7.png new file mode 100644 index 0000000..b9e785f Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/7.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/8.png b/blog/2016-01-23-cloudy-in-seattle/8.png new file mode 100644 index 0000000..d05599c Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/8.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/9.png b/blog/2016-01-23-cloudy-in-seattle/9.png new file mode 100644 index 0000000..2a85bfa Binary files /dev/null and b/blog/2016-01-23-cloudy-in-seattle/9.png differ diff --git a/blog/2016-01-23-cloudy-in-seattle/_article.md b/blog/2016-01-23-cloudy-in-seattle/_article.md new file mode 100644 index 0000000..0139426 --- /dev/null +++ b/blog/2016-01-23-cloudy-in-seattle/_article.md @@ -0,0 +1,15 @@ +Title: Cloudy in Seattle +Date: 2016-01-23 +Category: Blog +Tags: weather, data science +Authors: Bradlee Speice +Summary: Building on prior analysis, is Seattle's reputation as a depressing city actually well-earned? +[//]: <> "Modified: " + +{% notebook 2016-1-23-cloudy-in-seattle.ipynb %} + + + + diff --git a/blog/2016-01-23-cloudy-in-seattle/_notebook.ipynb b/blog/2016-01-23-cloudy-in-seattle/_notebook.ipynb new file mode 100644 index 0000000..b7e2d62 --- /dev/null +++ b/blog/2016-01-23-cloudy-in-seattle/_notebook.ipynb @@ -0,0 +1,721 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + " \n", + "\n", + "\n", + " \n", + "\n", + "
\n", + " \n", + " BokehJS successfully loaded.\n", + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pickle\n", + "import pandas as pd\n", + "import numpy as np\n", + "from bokeh.plotting import output_notebook, figure, show\n", + "from bokeh.palettes import RdBu4 as Palette\n", + "from datetime import datetime\n", + "import warnings\n", + "\n", + "output_notebook()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After taking some time to explore how the weather in North Carolina stacked up over the past years, I was interested in doing the same analysis for other cities. Growing up with family from Binghamton, NY I was always told it was very cloudy there. And Seattle has a nasty reputation for being very depressing and cloudy. All said, the cities I want to examine are:\n", + "- Binghamton, NY\n", + "- Cary, NC\n", + "- Seattle, WA\n", + "- New York City, NY\n", + "\n", + "I'd be interested to try this analysis worldwide at some point - comparing London and Seattle might be an interesting analysis. For now though, we'll stick with trying out the US data.\n", + "\n", + "There will be plenty of charts. I want to know: **How has average cloud cover and precipitation chance changed over the years for each city mentioned?** This will hopefully tell us whether Seattle has actually earned its reputation for being a depressing city." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "city_forecasts = pickle.load(open('city_forecasts.p', 'rb'))\n", + "forecasts_df = pd.DataFrame.from_dict(city_forecasts)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "cities = ['binghamton', 'cary', 'nyc', 'seattle']\n", + "city_colors = {cities[i]: Palette[i] for i in range(0, 4)}\n", + "\n", + "def safe_cover(frame):\n", + " if frame and 'cloudCover' in frame:\n", + " return frame['cloudCover']\n", + " else:\n", + " return np.NaN\n", + "\n", + "def monthly_avg_cloudcover(city, year, month):\n", + " dates = pd.DatetimeIndex(start=datetime(year, month, 1, 12),\n", + " end=datetime(year, month + 1, 1, 12),\n", + " freq='D', closed='left')\n", + " cloud_cover_vals = list(map(lambda x: safe_cover(forecasts_df[city][x]['currently']), dates))\n", + " cloud_cover_samples = len(list(filter(lambda x: x is not np.NaN, cloud_cover_vals)))\n", + " # Ignore an issue with nanmean having all NaN values. We'll discuss the data issues below.\n", + " with warnings.catch_warnings():\n", + " warnings.simplefilter('ignore')\n", + " return np.nanmean(cloud_cover_vals), cloud_cover_samples" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "years = range(1990, 2016)\n", + "def city_avg_cc(city, month):\n", + " return [monthly_avg_cloudcover(city, y, month) for y in years]\n", + "\n", + "months = [\n", + " ('July', 7),\n", + " ('August', 8),\n", + " ('September', 9),\n", + " ('October', 10),\n", + " ('November', 11)\n", + "]\n", + "\n", + "for month, month_id in months:\n", + " month_averages = {city: city_avg_cc(city, month_id) for city in cities}\n", + " f = figure(title=\"{} Average Cloud Cover\".format(month),\n", + " x_axis_label='Year',\n", + " y_axis_label='Cloud Cover Percentage')\n", + " for city in cities:\n", + " f.line(years, [x[0] for x in month_averages[city]],\n", + " legend=city, color=city_colors[city])\n", + " show(f)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well, as it so happens it looks like there are some data issues. July's data is a bit sporadic, and 2013 seems to be missing from most months as well. I think really only two things can really be confirmed here:\n", + "- Seattle, specifically for the months of October and November, is in fact significantly more cloudy on average than are other cities\n", + "- All cities surveyed have seen average cloud cover decline over the months studied. There are data issues, but the trend seems clear.\n", + "\n", + "Let's now move from cloud cover data to looking at average rainfall chance." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "def safe_precip(frame):\n", + " if frame and 'precipProbability' in frame:\n", + " return frame['precipProbability']\n", + " else:\n", + " return np.NaN\n", + "\n", + "def monthly_avg_precip(city, year, month):\n", + " dates = pd.DatetimeIndex(start=datetime(year, month, 1, 12),\n", + " end=datetime(year, month + 1, 1, 12),\n", + " freq='D', closed='left')\n", + " precip_vals = list(map(lambda x: safe_precip(forecasts_df[city][x]['currently']), dates))\n", + " precip_samples = len(list(filter(lambda x: x is not np.NaN, precip_vals)))\n", + " # Ignore an issue with nanmean having all NaN values. We'll discuss the data issues below.\n", + " with warnings.catch_warnings():\n", + " warnings.simplefilter('ignore')\n", + " return np.nanmean(precip_vals), precip_samples\n", + "\n", + "def city_avg_precip(city, month):\n", + " return [monthly_avg_precip(city, y, month) for y in years]\n", + "\n", + "for month, month_id in months:\n", + " month_averages = {city: city_avg_cc(city, month_id) for city in cities}\n", + " f = figure(title=\"{} Average Precipitation Chance\".format(month),\n", + " x_axis_label='Year',\n", + " y_axis_label='Precipitation Chance Percentage')\n", + " for city in cities:\n", + " f.line(years, [x[0] for x in month_averages[city]],\n", + " legend=city, color=city_colors[city])\n", + " show(f)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The same data issue caveats apply here: 2013 seems to be missing some data, and July has some issues as well. However, this seems to confirm the trends we saw with cloud cover:\n", + "- Seattle, specifically for the months of August, October, and November has had a consistently higher chance of rain than other cities surveyed.\n", + "- Average precipitation chance, just like average cloud cover, has been trending down over time.\n", + "\n", + "# Conclusion\n", + "\n", + "I have to admit I was a bit surprised after doing this analysis. Seattle showed a higher average cloud cover and average precipitation chance than did the other cities surveyed. Maybe Seattle is actually an objectively more depressing city to live in.\n", + "\n", + "Well that's all for weather data at the moment. It's been a great experiment, but I think this is about as far as I'll be able to get with weather data without some domain knowledge. Talk again soon!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.1" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/blog/2016-01-23-cloudy-in-seattle/_notebook.md b/blog/2016-01-23-cloudy-in-seattle/_notebook.md new file mode 100644 index 0000000..c091dc2 --- /dev/null +++ b/blog/2016-01-23-cloudy-in-seattle/_notebook.md @@ -0,0 +1,601 @@ +```python +import pickle +import pandas as pd +import numpy as np +from bokeh.plotting import output_notebook, figure, show +from bokeh.palettes import RdBu4 as Palette +from datetime import datetime +import warnings + +output_notebook() +``` + + + + + + + +
+ + BokehJS successfully loaded. +
+ + +After taking some time to explore how the weather in North Carolina stacked up over the past years, I was interested in doing the same analysis for other cities. Growing up with family from Binghamton, NY I was always told it was very cloudy there. And Seattle has a nasty reputation for being very depressing and cloudy. All said, the cities I want to examine are: +- Binghamton, NY +- Cary, NC +- Seattle, WA +- New York City, NY + +I'd be interested to try this analysis worldwide at some point - comparing London and Seattle might be an interesting analysis. For now though, we'll stick with trying out the US data. + +There will be plenty of charts. I want to know: **How has average cloud cover and precipitation chance changed over the years for each city mentioned?** This will hopefully tell us whether Seattle has actually earned its reputation for being a depressing city. + + +```python +city_forecasts = pickle.load(open('city_forecasts.p', 'rb')) +forecasts_df = pd.DataFrame.from_dict(city_forecasts) +``` + + +```python +cities = ['binghamton', 'cary', 'nyc', 'seattle'] +city_colors = {cities[i]: Palette[i] for i in range(0, 4)} + +def safe_cover(frame): + if frame and 'cloudCover' in frame: + return frame['cloudCover'] + else: + return np.NaN + +def monthly_avg_cloudcover(city, year, month): + dates = pd.DatetimeIndex(start=datetime(year, month, 1, 12), + end=datetime(year, month + 1, 1, 12), + freq='D', closed='left') + cloud_cover_vals = list(map(lambda x: safe_cover(forecasts_df[city][x]['currently']), dates)) + cloud_cover_samples = len(list(filter(lambda x: x is not np.NaN, cloud_cover_vals))) + # Ignore an issue with nanmean having all NaN values. We'll discuss the data issues below. + with warnings.catch_warnings(): + warnings.simplefilter('ignore') + return np.nanmean(cloud_cover_vals), cloud_cover_samples +``` + + +```python +years = range(1990, 2016) +def city_avg_cc(city, month): + return [monthly_avg_cloudcover(city, y, month) for y in years] + +months = [ + ('July', 7), + ('August', 8), + ('September', 9), + ('October', 10), + ('November', 11) +] + +for month, month_id in months: + month_averages = {city: city_avg_cc(city, month_id) for city in cities} + f = figure(title="{} Average Cloud Cover".format(month), + x_axis_label='Year', + y_axis_label='Cloud Cover Percentage') + for city in cities: + f.line(years, [x[0] for x in month_averages[city]], + legend=city, color=city_colors[city]) + show(f) +``` + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + +Well, as it so happens it looks like there are some data issues. July's data is a bit sporadic, and 2013 seems to be missing from most months as well. I think really only two things can really be confirmed here: +- Seattle, specifically for the months of October and November, is in fact significantly more cloudy on average than are other cities +- All cities surveyed have seen average cloud cover decline over the months studied. There are data issues, but the trend seems clear. + +Let's now move from cloud cover data to looking at average rainfall chance. + + +```python +def safe_precip(frame): + if frame and 'precipProbability' in frame: + return frame['precipProbability'] + else: + return np.NaN + +def monthly_avg_precip(city, year, month): + dates = pd.DatetimeIndex(start=datetime(year, month, 1, 12), + end=datetime(year, month + 1, 1, 12), + freq='D', closed='left') + precip_vals = list(map(lambda x: safe_precip(forecasts_df[city][x]['currently']), dates)) + precip_samples = len(list(filter(lambda x: x is not np.NaN, precip_vals))) + # Ignore an issue with nanmean having all NaN values. We'll discuss the data issues below. + with warnings.catch_warnings(): + warnings.simplefilter('ignore') + return np.nanmean(precip_vals), precip_samples + +def city_avg_precip(city, month): + return [monthly_avg_precip(city, y, month) for y in years] + +for month, month_id in months: + month_averages = {city: city_avg_cc(city, month_id) for city in cities} + f = figure(title="{} Average Precipitation Chance".format(month), + x_axis_label='Year', + y_axis_label='Precipitation Chance Percentage') + for city in cities: + f.line(years, [x[0] for x in month_averages[city]], + legend=city, color=city_colors[city]) + show(f) +``` + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + +The same data issue caveats apply here: 2013 seems to be missing some data, and July has some issues as well. However, this seems to confirm the trends we saw with cloud cover: +- Seattle, specifically for the months of August, October, and November has had a consistently higher chance of rain than other cities surveyed. +- Average precipitation chance, just like average cloud cover, has been trending down over time. + +# Conclusion + +I have to admit I was a bit surprised after doing this analysis. Seattle showed a higher average cloud cover and average precipitation chance than did the other cities surveyed. Maybe Seattle is actually an objectively more depressing city to live in. + +Well that's all for weather data at the moment. It's been a great experiment, but I think this is about as far as I'll be able to get with weather data without some domain knowledge. Talk again soon! diff --git a/blog/2016-01-23-cloudy-in-seattle/index.mdx b/blog/2016-01-23-cloudy-in-seattle/index.mdx new file mode 100644 index 0000000..9116d24 --- /dev/null +++ b/blog/2016-01-23-cloudy-in-seattle/index.mdx @@ -0,0 +1,154 @@ +--- +slug: 2016/01/cloudy-in-seattle +title: Cloudy in Seattle +date: 2016-01-23 12:00:00 +authors: [bspeice] +tags: [] +--- + +Building on prior analysis, is Seattle's reputation as a depressing city actually well-earned? + + + +```python +import pickle +import pandas as pd +import numpy as np +from bokeh.plotting import output_notebook, figure, show +from bokeh.palettes import RdBu4 as Palette +from datetime import datetime +import warnings + +output_notebook() +``` + +``` +BokehJS successfully loaded. +``` + +## Examining other cities + +After taking some time to explore how the weather in North Carolina stacked up over the past years, I was interested in doing the same analysis for other cities. Growing up with family from Binghamton, NY I was always told it was very cloudy there. And Seattle has a nasty reputation for being very depressing and cloudy. All said, the cities I want to examine are: +- Binghamton, NY +- Cary, NC +- Seattle, WA +- New York City, NY + +I'd be interested to try this analysis worldwide at some point - comparing London and Seattle might be an interesting analysis. For now though, we'll stick with trying out the US data. + +There will be plenty of charts. I want to know: **How has average cloud cover and precipitation chance changed over the years for each city mentioned?** This will hopefully tell us whether Seattle has actually earned its reputation for being a depressing city. + + +```python +city_forecasts = pickle.load(open('city_forecasts.p', 'rb')) +forecasts_df = pd.DataFrame.from_dict(city_forecasts) +``` + + +```python +cities = ['binghamton', 'cary', 'nyc', 'seattle'] +city_colors = {cities[i]: Palette[i] for i in range(0, 4)} + +def safe_cover(frame): + if frame and 'cloudCover' in frame: + return frame['cloudCover'] + else: + return np.NaN + +def monthly_avg_cloudcover(city, year, month): + dates = pd.DatetimeIndex(start=datetime(year, month, 1, 12), + end=datetime(year, month + 1, 1, 12), + freq='D', closed='left') + cloud_cover_vals = list(map(lambda x: safe_cover(forecasts_df[city][x]['currently']), dates)) + cloud_cover_samples = len(list(filter(lambda x: x is not np.NaN, cloud_cover_vals))) + # Ignore an issue with nanmean having all NaN values. We'll discuss the data issues below. + with warnings.catch_warnings(): + warnings.simplefilter('ignore') + return np.nanmean(cloud_cover_vals), cloud_cover_samples +``` + + +```python +years = range(1990, 2016) +def city_avg_cc(city, month): + return [monthly_avg_cloudcover(city, y, month) for y in years] + +months = [ + ('July', 7), + ('August', 8), + ('September', 9), + ('October', 10), + ('November', 11) +] + +for month, month_id in months: + month_averages = {city: city_avg_cc(city, month_id) for city in cities} + f = figure(title="{} Average Cloud Cover".format(month), + x_axis_label='Year', + y_axis_label='Cloud Cover Percentage') + for city in cities: + f.line(years, [x[0] for x in month_averages[city]], + legend=city, color=city_colors[city]) + show(f) +``` + +![July average cloud cover chart](./1.png) +![August average cloud cover chart](./2.png) +![September average cloud cover chart](./3.png) +![October average cloud cover chart](./4.png) +![November average cloud cover chart](./5.png) + +Well, as it so happens it looks like there are some data issues. July's data is a bit sporadic, and 2013 seems to be missing from most months as well. I think really only two things can really be confirmed here: +- Seattle, specifically for the months of October and November, is in fact significantly more cloudy on average than are other cities +- All cities surveyed have seen average cloud cover decline over the months studied. There are data issues, but the trend seems clear. + +Let's now move from cloud cover data to looking at average rainfall chance. + + +```python +def safe_precip(frame): + if frame and 'precipProbability' in frame: + return frame['precipProbability'] + else: + return np.NaN + +def monthly_avg_precip(city, year, month): + dates = pd.DatetimeIndex(start=datetime(year, month, 1, 12), + end=datetime(year, month + 1, 1, 12), + freq='D', closed='left') + precip_vals = list(map(lambda x: safe_precip(forecasts_df[city][x]['currently']), dates)) + precip_samples = len(list(filter(lambda x: x is not np.NaN, precip_vals))) + # Ignore an issue with nanmean having all NaN values. We'll discuss the data issues below. + with warnings.catch_warnings(): + warnings.simplefilter('ignore') + return np.nanmean(precip_vals), precip_samples + +def city_avg_precip(city, month): + return [monthly_avg_precip(city, y, month) for y in years] + +for month, month_id in months: + month_averages = {city: city_avg_cc(city, month_id) for city in cities} + f = figure(title="{} Average Precipitation Chance".format(month), + x_axis_label='Year', + y_axis_label='Precipitation Chance Percentage') + for city in cities: + f.line(years, [x[0] for x in month_averages[city]], + legend=city, color=city_colors[city]) + show(f) +``` + +![July average precipitation chance chart](./6.png) +![August average precipitation chance chart](./7.png) +![September average precipitation chance chart](./8.png) +![October average precipitation chance chart](./9.png) +![November average precipitation chance chart](./10.png) + +The same data issue caveats apply here: 2013 seems to be missing some data, and July has some issues as well. However, this seems to confirm the trends we saw with cloud cover: +- Seattle, specifically for the months of August, October, and November has had a consistently higher chance of rain than other cities surveyed. +- Average precipitation chance, just like average cloud cover, has been trending down over time. + +## Conclusion + +I have to admit I was a bit surprised after doing this analysis. Seattle showed a higher average cloud cover and average precipitation chance than did the other cities surveyed. Maybe Seattle is actually an objectively more depressing city to live in. + +Well that's all for weather data at the moment. It's been a great experiment, but I think this is about as far as I'll be able to get with weather data without some domain knowledge. Talk again soon! \ No newline at end of file