diff --git a/content/en/modules/python_visualization/index.md b/content/en/modules/python_visualization/index.md index 45eba688..cba5c811 100644 --- a/content/en/modules/python_visualization/index.md +++ b/content/en/modules/python_visualization/index.md @@ -5,7 +5,7 @@ type: "modules" # DON'T TOUCH THIS ! :) title: "Introduction to data visualization in Python" # Your project GitHub repository URL -github_repo: +github_repo: https://github.com/school-brainhack/module_python_visualization # If you are working on a project that has website, indicate the full url including "https://" below or leave it empty. website: @@ -36,64 +36,41 @@ If you have any questions regarding the module content please ask them in the re to the server and would like to join, please send us an email at school [dot] brainhack [at] gmail [dot] com. ## Resources + +This module is built around one Jupyter notebook and one set of exercises, all available in the [module repository](https://github.com/school-brainhack/module_python_visualization). + +## Original lecture This module was presented by Jacob Vogel during the QLSC 612 course in 2020, and the associated notebook is available [here](https://github.com/neurodatascience/course-materials-2020/blob/master/lectures/14-may/01-data-visualization/python_visualization_for_data.ipynb). (Note: if you did the BIDS module, the dataset to download is the same - ds000228! A few functions now throw warnings, you can ignore these, or fix them if you like.) The video of the presentation is available below (1h09): -## Tutorial - - * Download the jupyter notebook (save raw version from Github), or start a new jupyter notebook - * Watch the video and run the cells in the notebook - -## Exercice - -For this next part, we will refer to the following [notebook](https://github.com/brainhackorg/school/blob/master/content/en/modules/python_visualization/python_visualization_intro_rmv_answ.ipynb). +## Exercise -For example purposes, we will make use of a phenotypic dataset from the ABIDE II consortium. This amazing international multi-site dataset contains data from individuals diagnosed with Autism Spectrum Disorder (ASD) and healthy controls. We will use a version of the phenotypic [data](http://fcon_1000.projects.nitrc.org/indi/abide/abide_II.html) from a single site (Kennedy Krieger Institute). To download the dataset, click on the link and then 'Kennedy Krieger Institute' on the right-hand side. Then, Downloads -> Phenotypic File. You will need an NITRC account - if you don't have one, you can create one in a few minutes [here](https://www.nitrc.org/account/register.php). +In the exercise notebook (`exercise_visu_en.ipynb`), you will have to create static and interactive visualization to explore the phenotypic and fMRI data from the **ADHD200** dataset available through nilearn. -1. **Read through the notebook running all the cells** -2. **Complete the exercises in the notebook** +The exercise is structured as follows: +1. **Part 1 : Exploring phenotypic data**: you will visualize some clinical and demographic variables of the ADHD200 dataset. +2. **Part 2 : Exploring fMRI data**: you will explore the functional fMRI data from the ADHD200 dataset through visualizations. -**Exercise 1** Create a figure with a single axes and replot the second scatterplot to group by sex instead of dx_group. +**Important**: follow the parameter specifications given in each question exactly — the exercise uses automated grading that compares results to reference values. - Set the figure size to a ratio of 8 (wide) x 5 (height) - Use the colors red and gray - Set the opacity of the points to 0.5 - Label the axes - Add a legend - -**Exercise 2** Using a pairwise plot, compare the distributions of age, viq, and piq with respect to dx_group. - - Set a palette - Set style to ticks - Set context to paper - Suppress the dx_group variable from being on the plot - -**Exercise 3** Using a violin plot separate out viq as a function of sex and dx_group. - - Different dx_group should be on each half of each violin - The x-axis should reflect the different sex categories. - -**Exercise 4** Play around and make an interactive plot using plotly and your project data if you have any. - - * Follow up with your local TA(s) to validate you completed the exercises correctly. - * :tada: :tada: :tada: you completed this training module! :tada: :tada: :tada: +Follow up with your local TA(s) to validate you completed the exercises correctly. ## More resources - Other great resources to get started with plotting in python: - - [dartbrains](https://dartbrains.org/content/Introduction_to_Plotting.html) + - [dartbrains](https://dartbrains.org/Introduction_to_Plotting/) - [neurohackademy](https://github.com/neurohackademy/visualization-in-python/blob/master/visualization-in-python.ipynb) - [PythonDataScienceHandbook](https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/04.00-Introduction-To-Matplotlib.ipynb) - [matplolib tutorial - rougier](https://github.com/rougier/matplotlib-tutorial) - - -- Interactive plotting +- Interactive plotting: + - [Interactive plots and the spectrum of data visualization](https://www.youtube.com/watch?v=FwM_6oZo_2g&t=1s) - [Plotly](https://plotly.com/python/) - [Bokeh](https://docs.bokeh.org/en/latest/index.html) - [Altair](https://altair-viz.github.io/) -- Gallery +- Gallery: - [seaborn gallery](https://seaborn.pydata.org/examples/index.html) + - [Python Graph Gallery](https://python-graph-gallery.com/) diff --git a/content/en/modules/python_visualization/python_visualization_intro_rmv_answ.ipynb b/content/en/modules/python_visualization/python_visualization_intro_rmv_answ.ipynb deleted file mode 100644 index 19cc8c5e..00000000 --- a/content/en/modules/python_visualization/python_visualization_intro_rmv_answ.ipynb +++ /dev/null @@ -1,1062 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Visualization in Python\n", - "\n", - "**PS: Many pieces of this notebook have been scavenged from other visualization notebooks and galleries. But the main things are from Tal Yarkoni's [visualization-in-python notebook](https://github.com/neurohackweek/visualization-in-python).**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# General Overview\n", - "\n", - "In this notebook, we will cover the following python packages. Some of them are exclusively for visualization while others like `Pandas` have many other purposes:\n", - "\n", - "- [Matplotlib](https://matplotlib.org/)\n", - "- [Seaborn](http://seaborn.pydata.org/)\n", - "- [Pandas](http://pandas.pydata.org/)\n", - "- [Bokeh](https://bokeh.pydata.org)\n", - "- [Plotly](https://plot.ly/python/)\n", - "\n", - "The visualization of the first three is all based on `matplotlib` and use `static images`. While the last three create `HTML` outputs and allow much more `interactive plots`. We will talk about each one as we go along.\n", - "\n", - "**Please note**: This notebook is intended as a _showcase_ of what's possible with regard to plotting in `python`. There's a lot going on in this notebook and we don't expect (or intended that) you learn everything there is, but provide an overview and resources that should enable you to start creating plots in `python`. \n", - "\n", - "## Python-graph-gallery\n", - "Check out the very helpful and cool new homepage https://python-graph-gallery.com/ to see how you can create different kinds of graphs." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Preparation\n", - "\n", - "As with most things in Python, we first load the relevant packages. Here we load three important packages:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "import pandas as pd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The first line in the cell above is specific to `Jupyter notebooks`. It tells the interpreter to capture `figures` and embed them in the browser. Otherwise, they would end up almost in digital ether." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## The Dataset\n", - "\n", - "For example purposes, we will make use of a phenotypic dataset from the [ABIDE II](http://fcon_1000.projects.nitrc.org/indi/abide/abide_II.html) consortium. This amazing international multi-site dataset contains data from individuals diagnosed with Autism Spectrum Disorder (ASD) and healthy controls. We will use a version of the [phenotypic data available here](http://fcon_1000.projects.nitrc.org/indi/abide/abide_II.html) from a single site (Kennedy Krieger Institute). Thus please download the dataset from the linked resource providing your [NITRC](https://www.nitrc.org/) credentials.\n", - "\n", - "Let's read this from the Web using Pandas. We explicitly specify that missing values are noted in the dataset as `'n/a'`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "df = pd.read_csv('ABIDEII-KKI_1.csv', na_values=['n/a'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the following cell we remove all columns that have missing values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sub_df = df.dropna(axis=1)\n", - "sub_df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Using the `keys` method we can look at all the column headings that are left." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "list(sub_df.keys())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Lets now see how we can visualize the information in this dataset (`sub_df`). Python has quite a lot of visualization packages. Undeniably, the most famous and at the same time versatile, that additionally is the basis of most others, is [matplotlib](https://matplotlib.org/). " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# `matplotlib`\n", - "* The most widely-used `Python` plotting library\n", - "* Initially modeled on `MATLAB`'s plotting system\n", - "* Designed to provide complete control over a plot" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.figure(figsize=(10, 5))\n", - "plt.scatter(sub_df['AGE_AT_SCAN '], sub_df.VIQ)\n", - "plt.xlabel('Age at scan')\n", - "plt.ylabel('Verbal IQ')\n", - "plt.title('Comparing Age and Verbal IQ');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Thinking about how plotting works with `matplotlib`, we can explore a different approach to plotting, where we at first generate our figure and then access certain parts of it, in order to modify them. Let's create a `figure` with 3 `subplots`, side by side, including a `scatterplot`, `bar plots` and one `scatter plot` with different colors per `group`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Set up a figure with 3 columns\n", - "fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n", - "\n", - "# Scatter plot in left column\n", - "axes[0].scatter(sub_df['AGE_AT_SCAN '], sub_df['VIQ'])\n", - "axes[0].axis('off')\n", - "\n", - "# Mean group viq in middle column\n", - "means = sub_df.groupby('DX_GROUP')['VIQ'].mean()\n", - "axes[1].bar(np.arange(len(means))+1, means)\n", - "\n", - "# Note how **broken** this is without additional code\n", - "axes[1].set_xticks(means.index)\n", - "\n", - "# More scatter plots, breaking up by group in right column\n", - "colors = ['blue', 'green']\n", - "for i, (s, grp) in enumerate(sub_df.groupby('DX_GROUP')):\n", - " axes[2].scatter(grp['AGE_AT_SCAN '], grp['VIQ'], c=colors[i])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "solution2": "hidden", - "solution2_first": true - }, - "source": [ - "## Exercise 1\n", - "\n", - "Create a `figure` and replot the second `scatterplot` above, grouping by `sex` instead of `dx_group`. \n", - "- Set the figure size to a ratio of 8 (wide) x 5 (height)\n", - "- Use the colors `red` and `gray`\n", - "- Set the opacity of the points to 0.5\n", - "- Label the axes\n", - "- Add a legend" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create solution here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Showcases from the Gallery\n", - "\n", - "As this notebook is intended as s a showcase to provide you with examples of what `matplotlib` can do, we're going to use some examples directly from the [matplotlib gallery](https://matplotlib.org/gallery/index.html). Let's check different histograms:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adapted from https://matplotlib.org/gallery/statistics/histogram_multihist.html\n", - "\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "\n", - "n_bins = 10\n", - "x = np.random.randn(1000, 3)\n", - "\n", - "fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))\n", - "ax0, ax1, ax2, ax3 = axes.flatten()\n", - "\n", - "colors = ['red', 'tan', 'lime']\n", - "ax0.hist(x, n_bins, density=1, histtype='bar', color=colors, label=colors)\n", - "ax0.legend(prop={'size': 10})\n", - "ax0.set_title('bars with legend')\n", - "\n", - "ax1.hist(x, n_bins, density=1, histtype='bar', stacked=True)\n", - "ax1.set_title('stacked bar')\n", - "\n", - "ax2.hist(x, n_bins, histtype='step', stacked=True, fill=False)\n", - "ax2.set_title('stack step (unfilled)')\n", - "\n", - "# Make a multiple-histogram of data-sets with different length.\n", - "x_multi = [np.random.randn(n) for n in [10000, 5000, 2000]]\n", - "ax3.hist(x_multi, n_bins, histtype='bar')\n", - "ax3.set_title('different sample sizes')\n", - "\n", - "fig.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And time series:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adapted from https://matplotlib.org/gallery/lines_bars_and_markers/cohere.html\n", - "\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "\n", - "dt = 0.01\n", - "t = np.arange(0, 30, dt)\n", - "nse1 = np.random.randn(len(t)) # white noise 1\n", - "nse2 = np.random.randn(len(t)) # white noise 2\n", - "\n", - "# Two signals with a coherent part at 10Hz and a random part\n", - "s1 = np.sin(2 * np.pi * 10 * t) + nse1\n", - "s2 = np.sin(2 * np.pi * 10 * t) + nse2\n", - "\n", - "fig, axs = plt.subplots(2, 1, figsize=(10, 5))\n", - "axs[0].plot(t, s1, t, s2)\n", - "axs[0].set_xlim(0, 2)\n", - "axs[0].set_xlabel('time')\n", - "axs[0].set_ylabel('s1 and s2')\n", - "axs[0].grid(True)\n", - "\n", - "cxy, f = axs[1].cohere(s1, s2, 256, 1. / dt)\n", - "axs[1].set_ylabel('coherence')\n", - "\n", - "fig.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Even 3-D plots are possible:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true, - "slideshow": { - "slide_type": "slide" - } - }, - "outputs": [], - "source": [ - "# Adapted from http://matplotlib.org/examples/mplot3d/subplot3d_demo.html\n", - "\n", - "from mpl_toolkits.mplot3d.axes3d import Axes3D\n", - "import matplotlib.pyplot as plt\n", - "\n", - "# imports specific to the plots in this example\n", - "import numpy as np\n", - "from matplotlib import cm\n", - "from mpl_toolkits.mplot3d.axes3d import get_test_data\n", - "\n", - "# Twice as wide as it is tall.\n", - "fig = plt.figure(figsize=(15, 5))\n", - "\n", - "#---- First subplot\n", - "ax = fig.add_subplot(1, 2, 1, projection='3d')\n", - "X = np.arange(-5, 5, 0.25)\n", - "Y = np.arange(-5, 5, 0.25)\n", - "X, Y = np.meshgrid(X, Y)\n", - "R = np.sqrt(X**2 + Y**2)\n", - "Z = np.sin(R)\n", - "surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm,\n", - " linewidth=0, antialiased=False)\n", - "ax.set_zlim3d(-1.01, 1.01)\n", - "\n", - "fig.colorbar(surf, shrink=0.5, aspect=10)\n", - "\n", - "#---- Second subplot\n", - "ax = fig.add_subplot(1, 2, 2, projection='3d')\n", - "X, Y, Z = get_test_data(0.05)\n", - "ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Customization in matplotlib\n", - "\n", - "* matplotlib is infinitely customizable\n", - "* As in most modern plotting environments, you can do virtually anything\n", - "* You just have to be willing to spend enough time on it" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## `matplotlib`\n", - "\n", - "**Pros**\n", - "* Provides low-level control over virtually every element of a plot\n", - "* Completely object-oriented API; plot components can be easily modified\n", - "* Close integration with numpy\n", - "* Extremely active community\n", - "* Tons of functionality (figure compositing, layering, annotation, coordinate transformations, color mapping, etc.)\n", - "\n", - "**Cons**\n", - "* Steep learning curve\n", - "* API is extremely unpredictable--redundancy and inconsistency are common\n", - " * Some simple things are hard; some complex things are easy\n", - "* Lacks systematicity/organizing syntax--every plot is its own little world\n", - "* Simple plots often require a lot of code\n", - "* Default styles are not optimal" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# High-level interfaces to matplotlib\n", - "* Matplotlib is very powerful and very robust, but the API is hit-and-miss\n", - "* Many high-level interfaces to matplotlib have been written\n", - " * Abstract away many of the annoying details\n", - " * The best of both worlds: easy generation of plots, but retain `matplotlib`'s power\n", - "* [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/index.html), [ggplot](https://matplotlib.org/stable/gallery/style_sheets/ggplot.html), [pandas](https://pandas.pydata.org/), etc.\n", - "* Many domain-specific visualization tools are built on `matplotlib` (e.g., [nilearn](https://nilearn.github.io/stable/plotting/index.html) in neuroimaging)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# `Pandas`\n", - "* Provides simple but powerful plotting tools\n", - "* DataFrame integration supports, e.g., groupby() calls for faceting\n", - "* Often the easiest approach for simple data exploration\n", - "* Arguably not as powerful, elegant, or intuitive as seaborn" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For example, we can plot the `density` of certain variables in our `DataFrame` as easy as:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sub_df[['HANDEDNESS_SCORES', 'VIQ', 'PIQ']].plot(kind='kde')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Or create a `boxplot` of the `variables` for each `group`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sub_df[['HANDEDNESS_SCORES', 'VIQ', 'PIQ', 'DX_GROUP']].groupby('DX_GROUP').boxplot(rot=45, figsize=(10,6));\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# `Seaborn`\n", - "\n", - "[Seaborn](https://seaborn.pydata.org/) abstracts away many of the complexities to deal with such minutiae and provides a high-level API for creating aesthetic plots. \n", - "\n", - "* Arguably the premier matplotlib interface for high-level plots\n", - "* Generates beautiful plots in very little code\n", - " * Beautiful styles and color palettes\n", - "* Wide range of supported plots\n", - "* Modest support for structured plotting (via grids)\n", - "* Exceptional [documentation](https://stanford.edu/~mwaskom/software/seaborn/index.html)\n", - "* Generally, the best place to start when exploring data\n", - "* Can be quite slow (e.g., with permutation)\n", - "\n", - "For example, the following command auto adjusts the setting for the figure to reflect what you are using the figure for." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import seaborn as sns" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adjust the context of the plot\n", - "sns.set_context('poster') # http://seaborn.pydata.org/tutorial/aesthetics.html#scaling-plot-elements\n", - "sns.set_palette('pastel') # http://seaborn.pydata.org/tutorial/color_palettes.html" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# But still use matplotlib to do the plotting\n", - "plt.figure(figsize=(10, 5))\n", - "plt.scatter(sub_df['AGE_AT_SCAN '], sub_df.VIQ)\n", - "plt.xlabel('Age at scan')\n", - "plt.ylabel('Verbal IQ')\n", - "plt.title('Comparing Age and Verbal IQ');" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adjust the context of the plot\n", - "sns.set_context('paper')\n", - "sns.set_palette('colorblind')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# But still use matplotlib to do the plotting\n", - "plt.figure(figsize=(10, 5))\n", - "plt.scatter(sub_df['AGE_AT_SCAN '], sub_df.VIQ)\n", - "plt.xlabel('Age at scan')\n", - "plt.ylabel('Verbal IQ')\n", - "plt.title('Comparing Age and Verbal IQ');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's redo the scatter plot in seaborn style." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sns.jointplot(x='AGE_AT_SCAN ', y='VIQ', data=sub_df);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## `Seaborn` example\n", - "\n", - "Given the dataset we are using, what would you change to provide a better understanding of the data.\n", - "\n", - "Information about:\n", - "\n", - "- Diagnosis\n", - "- Sex\n", - "\n", - "should be encoded separately.\n", - "\n", - "One way to do this with seaborn is to use a more general interface called the [FacetGrid](https://seaborn.pydata.org/generated/seaborn.FacetGrid.html#seaborn.FacetGrid).\n", - "\n", - "Let's replot the figure while learning about a few new commands. Try to understand what the function does and try to change some parameters." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sns.set(style=\"whitegrid\", palette=\"pastel\", color_codes=True)\n", - "sns.set_context('poster')\n", - "\n", - "kws = dict(s=100, alpha=0.75, linewidth=0.15, edgecolor=\"k\")\n", - "\n", - "g = sns.FacetGrid(sub_df, col=\"SEX\", hue=\"DX_GROUP\", palette=\"Set1\",\n", - " hue_order=[1, 2], height=5.5)\n", - "g = (g.map(plt.scatter, \"AGE_AT_SCAN \", \"VIQ\", **kws).add_legend())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With just a few lines of code, note how much control you have over the figure." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "solution2": "hidden", - "solution2_first": true - }, - "source": [ - "## Exercise 2\n", - "\n", - "Using a [pairwise plot](http://seaborn.pydata.org/generated/seaborn.pairplot.html#seaborn.pairplot), compare the distributions of `age`, `viq`, and `piq` with respect to `dx_group`.\n", - "\n", - "- Set a palette\n", - "- Set style to `ticks`\n", - "- Set context to `paper`\n", - "- Suppress the `dx_group` variable from being on the plot" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create solution here" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "solution2": "hidden", - "solution2_first": true - }, - "source": [ - "## Exercise 3\n", - "\n", - "Using a [violin plot](http://seaborn.pydata.org/generated/seaborn.violinplot.html#seaborn.violinplot) separate out `viq` as a function of `sex` and `dx_group`.\n", - "\n", - "- Different `dx_group` should be on each half of each violin\n", - "- The x-axis should reflect the different `sex` categories." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Create solution here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Showcases from the Gallery\n", - "\n", - "As this notebook is intended as s a showcase to provide you with examples of what `seaborn` can do, we're going to use some examples directly from the [seaborn gallery](https://seaborn.pydata.org/examples/index.html). Just have a look at how easy it is to create a `scatter plot` with `fitted regression` and `distributions`:\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adapted from http://seaborn.pydata.org/examples/regression_marginals.html\n", - "\n", - "import seaborn as sns\n", - "sns.set(style=\"darkgrid\", color_codes=True)\n", - "\n", - "tips = sns.load_dataset('tips')\n", - "g = sns.jointplot(x=\"total_bill\", y=\"tip\", data=tips, kind=\"reg\",\n", - " xlim=(0, 60), ylim=(0, 12), color=\"r\", height=7)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "How about grouped `boxplots`: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "outputs": [], - "source": [ - "sns.set(style=\"ticks\")\n", - "\n", - "print(tips.head())\n", - "\n", - "# Draw a nested boxplot to show bills by day and sex\n", - "sns.boxplot(x=\"day\", y=\"total_bill\", hue=\"sex\", data=tips, palette=\"PRGn\")\n", - "sns.despine(offset=10, trim=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Or a wide range of `distribution plots`?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sns.set(style=\"white\", palette=\"muted\", color_codes=True)\n", - "rs = np.random.RandomState(10)\n", - "\n", - "# Set up the matplotlib figure\n", - "f, axes = plt.subplots(1, 4, figsize=(12, 3), sharex=True)\n", - "sns.despine(left=True)\n", - "\n", - "# Generate a random univariate dataset\n", - "d = rs.normal(size=100)\n", - "\n", - "# Plot a simple histogram with binsize determined automatically\n", - "sns.histplot(d, kde=False, color=\"b\", ax=axes[0])\n", - "\n", - "# Plot a historgram and kernel density estimate\n", - "sns.histplot(d, kde=True, color=\"m\", ax=axes[1])\n", - "\n", - "# Plot a filled kernel density estimate\n", - "sns.kdeplot(d, fill=True, color=\"g\", ax=axes[2])\n", - "\n", - "# Plot a kernel density estimate and rug plot\n", - "sns.kdeplot(d, color=\"r\", ax=axes[3])\n", - "sns.rugplot(d, color=\"r\", ax=axes[3])\n", - "\n", - "plt.setp(axes, yticks=[])\n", - "plt.tight_layout()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are also `pairplots` with `fitted regression`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import seaborn as sns\n", - "sns.set(style=\"ticks\")\n", - "\n", - "iris = sns.load_dataset('iris')\n", - "g = sns.pairplot(iris, hue=\"species\", palette=\"Set2\", kind='reg',\n", - " diag_kind=\"kde\", height=2.5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Plotting a large number of `facets` is also no biggie:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "attend = sns.load_dataset('attention').query(\"subject <= 12\")\n", - "g = sns.FacetGrid(attend, col=\"subject\", col_wrap=4, height=2, ylim=(0, 10))\n", - "g.map(sns.pointplot, \"solutions\", \"score\", color=\".3\", ci=None, order=None);" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# Alternatives to matplotlib\n", - "* You don't *have* to use `matplotlib`\n", - "* Some good reasons to use alternatives:\n", - " * You want to output to HTML, etc.\n", - " * You want something that plays well with other specs or isn't tied to Python\n", - " * You hate matplotlib\n", - "* Good news! You have many options...\n", - " * `bokeh`, `plotly`, `HoloViews`..." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# `Bokeh`\n", - "* A Python visualization engine that outputs directly to the web\n", - "* Can render `matplotlib` plots to Bokeh, but not vice versa\n", - "* Lets you generate [interactive web-based visualizations](https://demo.bokeh.org/) in pure Python (!)\n", - "* You get interactivity for free, and can easily customize them\n", - "* Works seamlessly in Jupyter notebooks\n", - "* Package development is _incredibly_ fast\n", - "* Biggest drawback may be the inability to output static images\n", - "\n", - "**Please note**: if the `Bokeh` plot isn't showing up, please rerun the cell. \n", - "\n", - "#### Showcases from the Gallery\n", - "\n", - "As this notebook is intended as a showcase to provide you with examples of what `bokeh` can do, we're going to use some examples directly from the [bokeh gallery](https://docs.bokeh.org/en/latest/docs/gallery.html). " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "outputs": [], - "source": [ - "from bokeh.plotting import figure, show, output_notebook\n", - "from bokeh.sampledata.iris import flowers\n", - "\n", - "output_notebook()\n", - "\n", - "colormap = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}\n", - "colors = [colormap[x] for x in flowers['species']]\n", - "\n", - "p = figure(title = \"Iris Morphology\")\n", - "p.xaxis.axis_label = 'Petal Length'\n", - "p.yaxis.axis_label = 'Petal Width'\n", - "\n", - "p.circle(flowers[\"petal_length\"], flowers[\"petal_width\"],\n", - " color=colors, fill_alpha=0.2, size=10)\n", - "\n", - "show(p)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "\n", - "from bokeh.plotting import figure, show, output_notebook\n", - "from bokeh.models import HoverTool, ColumnDataSource\n", - "from bokeh.sampledata.les_mis import data\n", - "\n", - "output_notebook()\n", - "\n", - "nodes = data['nodes']\n", - "names = [node['name'] for node in sorted(data['nodes'], key=lambda x: x['group'])]\n", - "\n", - "N = len(nodes)\n", - "counts = np.zeros((N, N))\n", - "for link in data['links']:\n", - " counts[link['source'], link['target']] = link['value']\n", - " counts[link['target'], link['source']] = link['value']\n", - "\n", - "colormap = [\"#444444\", \"#a6cee3\", \"#1f78b4\", \"#b2df8a\", \"#33a02c\", \"#fb9a99\",\n", - " \"#e31a1c\", \"#fdbf6f\", \"#ff7f00\", \"#cab2d6\", \"#6a3d9a\"]\n", - "\n", - "xname = []\n", - "yname = []\n", - "color = []\n", - "alpha = []\n", - "for i, node1 in enumerate(nodes):\n", - " for j, node2 in enumerate(nodes):\n", - " xname.append(node1['name'])\n", - " yname.append(node2['name'])\n", - "\n", - " alpha.append(min(counts[i,j]/4.0, 0.9) + 0.1)\n", - "\n", - " if node1['group'] == node2['group']:\n", - " color.append(colormap[node1['group']])\n", - " else:\n", - " color.append('lightgrey')\n", - "\n", - "source = ColumnDataSource(data=dict(xname=xname, yname=yname, colors=color,\n", - " alphas=alpha, count=counts.flatten()))\n", - "\n", - "p = figure(title=\"Les Mis Occurrences\",\n", - " x_axis_location=\"above\", tools=\"hover,save\",\n", - " x_range=list(reversed(names)), y_range=names)\n", - "\n", - "p.plot_width = 800\n", - "p.plot_height = 800\n", - "p.grid.grid_line_color = None\n", - "p.axis.axis_line_color = None\n", - "p.axis.major_tick_line_color = None\n", - "p.axis.major_label_text_font_size = \"5pt\"\n", - "p.axis.major_label_standoff = 0\n", - "p.xaxis.major_label_orientation = np.pi/3\n", - "\n", - "p.rect('xname', 'yname', 0.9, 0.9, source=source,\n", - " color='colors', alpha='alphas', line_color=None,\n", - " hover_line_color='black', hover_color='colors')\n", - "\n", - "p.select_one(HoverTool).tooltips = [('names', '@yname, @xname'),\n", - " ('count', '@count')]\n", - "\n", - "show(p) # show the plot" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# `Plot.ly`\n", - "* [Plot.ly](https://plot.ly/python/) fills the same niche as Bokeh - web-based visualization via other languages\n", - "* Lets you build visualizations either in native code or online" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adapted from https://plot.ly/python/ipython-notebook-tutorial/\n", - "\n", - "import plotly\n", - "plotly.offline.init_notebook_mode()\n", - "import plotly.figure_factory as ff\n", - "from plotly.graph_objs import *\n", - "\n", - "import pandas as pd\n", - "\n", - "df = pd.read_csv(\"https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv\")\n", - "table = ff.create_table(df)\n", - "\n", - "trace_women = Bar(x=df.School, y=df.Women, name='Women', marker=dict(color='#ffcdd2'))\n", - "trace_men = Bar(x=df.School, y=df.Men, name='Men', marker=dict(color='#A2D5F2'))\n", - "trace_gap = Bar(x=df.School, y=df.Gap, name='Gap', marker=dict(color='#59606D'))\n", - "\n", - "data = [trace_women, trace_men, trace_gap]\n", - "layout = Layout(title=\"Average Earnings for Graduates\",\n", - " xaxis=dict(title='School'),\n", - " yaxis=dict(title='Salary (in thousands)'))\n", - "fig = Figure(data=data, layout=layout)\n", - "\n", - "plotly.offline.iplot(fig)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adapted from https://plot.ly/python/line-and-scatter/\n", - "\n", - "import plotly\n", - "plotly.offline.init_notebook_mode()\n", - "import plotly.graph_objs as go\n", - "\n", - "# Create random data with numpy\n", - "import numpy as np\n", - "\n", - "N = 100\n", - "random_x = np.linspace(0, 1, N)\n", - "random_y0 = np.random.randn(N) + 5\n", - "random_y1 = np.random.randn(N)\n", - "random_y2 = np.random.randn(N) - 5\n", - "\n", - "# Create traces\n", - "trace0 = go.Scatter(x=random_x, y=random_y0, mode='lines', name='lines')\n", - "trace1 = go.Scatter(x=random_x, y=random_y1, mode='lines+markers', name='lines+markers')\n", - "trace2 = go.Scatter(x=random_x, y=random_y2, mode='markers', name='markers')\n", - "data = [trace0, trace1, trace2]\n", - "\n", - "plotly.offline.iplot(data, filename='line-mode')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "# Adapted from https://plot.ly/python/continuous-error-bars/\n", - "\n", - "import plotly\n", - "plotly.offline.init_notebook_mode()\n", - "import plotly.graph_objs as go\n", - "import pandas as pd\n", - "\n", - "df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/wind_speed_laurel_nebraska.csv')\n", - "\n", - "upper_bound = go.Scatter(\n", - " name='Upper Bound', x=df['Time'], y=df['10 Min Sampled Avg'] + df['10 Min Std Dev'], mode='lines',\n", - " marker=dict(color=\"#444444\"), line=dict(width=0), fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty')\n", - "\n", - "trace = go.Scatter(\n", - " name='Measurement', x=df['Time'], y=df['10 Min Sampled Avg'], mode='lines',\n", - " line=dict(color='rgb(31, 119, 180)'), fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty')\n", - "\n", - "lower_bound = go.Scatter(\n", - " name='Lower Bound', x=df['Time'], y=df['10 Min Sampled Avg']-df['10 Min Std Dev'],\n", - " marker=dict(color=\"#444444\"), line=dict(width=0), mode='lines')\n", - "\n", - "# Trace order can be important with continuous error bars\n", - "data = [lower_bound, trace, upper_bound]\n", - "\n", - "layout = go.Layout(yaxis=dict(title='Wind speed (m/s)'),\n", - " title='Continuous, variable value error bars.', showlegend = False)\n", - "\n", - "fig = go.Figure(data=data, layout=layout)\n", - "plotly.offline.iplot(fig, filename='pandas-continuous-error-bars')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Adapted from https://plot.ly/python/3d-surface-plots/\n", - "\n", - "import plotly\n", - "plotly.offline.init_notebook_mode()\n", - "import plotly.graph_objs as go\n", - "import pandas as pd\n", - "\n", - "# Read data from a csv\n", - "z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')\n", - "\n", - "data = [go.Surface(z=z_data.values)]\n", - "layout = go.Layout(autosize=True, width=500, height=500, margin=dict(l=65, r=50, b=65, t=90))\n", - "fig = go.Figure(data=data, layout=layout)\n", - "plotly.offline.iplot(fig)" - ] - } - ], - "metadata": { - "anaconda-cloud": {}, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 1 -}