But Show Me *How* Jupyter is the New Excel

These are the slides, notes, and the resulting video from a presentation I gave at TekMountain on Tuesday, September 17th, inspired by the article mentioned below

I read this great article just a few weeks ago called “Jupyter is the New Excel“. I loved it, and was provoked by its premise, and wrote to the author to tell her so. This dominant and for many users intimidating part of the data science toolchain, called a Jupyter notebook, could be used for more everyday tasks. You didn’t have to do data science per se with notebooks, didn’t have to, like, crunch big data, worry about data storage, care what generalized least squares were. Jupyter was easy and useful enough to use for front office tasks, for fantasy football, dinner party invites, what have you. You could fool around with it!

I work at IBM, where I fool around with data science as a rank amateur, and I took the article as a jumping off point: Yes, how? How would Jupyter replace Excel exactly? How could you use Jupyter for your email marketing and fantasy football, for your real estate office spreadsheets?

So I thought I’d create this tech talk, a presentation where we can step through examples of everyday data crunching, the things that many of do now in Excel, see if the article’s premise checks out.

My goal is to introduce Jupyter notebooks very, very briefly, get right into just a few super-practical, everyday tasks, to not talk about data science, to welcome and un-intimidate. If I’m successful, I’ll persuade you that Jupyter notebooks are no more complicated to use than Excel, might work better for some things, can be the kind of ready-to-hand tool that spreadsheets are for many now. This might even be a gateway drug to do some better integrations of your data, which is where Jupyter starts to really outpace all these separate, versioned, weirdo macro-laden spreadsheets, or even be a useful starting point for some data-sciencing on your own :-)

Resources for learning more

Jupyter and Python and a lot of the premier data science tools are open source, which means there are a TON of resources out there for learning, trying. Here are a few good ones, focusing again not on the ocean of data science but on Jupyter notebooks:

Brython, browser-based Python

And if like a new lover or a Taylor Swiftie or a vegan you need your favorite programming language to really be everywhere, there’s Brython, which allows you to put Python code in <script> tags, where it talks to the DOM in a mostly Pythonic way, and have that code translated on the fly into JavaScript and sent to the browser.

Awkward in conception, probably pointless in production, it’s straight forward in execution. Import brython.js, call brython() on load, and you can do stuff like this

from browser import document, alert
def echo(ev):
    alert("Hello {} !".format(document["zone"].value))

document["test"].bind("click", echo)

The gallery shows some more useful things, like doing Ajax requests in Brython, sorting tabular data

But then I don’t know,…In the richer examples, like ones with decorators binding Brython functions to events, it starts to look a little more like a framework, like Flask or something. 

@bind("#get_test", "click")
def get(ev):
    ajax.get("/cgi-bin/get_test.py",
             oncomplete=show,
             data={"foo": 34})

Since Taylor Swift does country tunes, pop, politics, and emo, maybe it is time to throw out all my other music :-)

   

Nesting: Setting up your Python Environment

When you begin using Python a lot, you realize you’re creating (or re-creating!) the same functions over and over again.  I don’t know how many times, for example, I have re-Googled and re-typed the recipe from the magnificent and eccentric Python library BeautifulSoup to get text out of an HTML document, as when I want to create a simple search index.

I also end up needing to find this meta-recipe for loading my own frequently-used functions and helpers into my Python environment. So I thought I’d write up, in one place, how to set up your Python environment so that your own libraries and code snippets are in scope, and ready for you to use. It’s simple and I really wanted to get it down.

Feathering the Python nest

What we want to end up with is a Python environment in which tools and script we have already written are already loaded and at our fingertips. Start Python—or iPython, or some other Python-based environment—and your stuff is there.

In my environment I have created a module called brownhen in which functions I reuse are defined. You can call yours  anything you want—bobby, greatstuff, powertools, whatever. But it’s important to “name space” these functions and utilities to remind yourself where these things come from and what package they’re a part of. In this tutorial, my stuff is in a module called “brownhen”.

So this recipe describes how to:

  1. Organize your useful stuff into modules—a very good idea in any case
  2. Set up your environment to find your modules when Python starts (PYTHONPATH)
  3. Tell your environment not only where your modules are, but to pre-load them when you fire Python up (PYTHONSTART)

When I start up the iPython interpreter, which I keep up most all the time and sort of live in, my environment welcomes me and lets me know that my imports are working:

When Python tells me that it has 2) found and 3) loaded utilities I have 1) organized into my own module(s), it means that I can easily reuse my own.

PYTHONPATH

Where does Python look for the libraries and code it needs when it starts up? Where can it look for new stuff when you use import statements? It doesn’t just look everywhere on your laptop. That’s bad form and takes too long. 

The answer is that Python reads the system’s PYTHONPATH environment variable for any paths that should be added to what it already knows about, which typically gets defined when you install Python for the first ime. I am using the Anaconda distribution of Python right now, so the scope of Python’s searching is basically contained to the directory where Anaconda put down a series of executables, libraries, and tools:

Where does Python look for the libraries and code it needs when it starts up? Where can it look for new stuff when you use import statements? It doesn’t just look everywhere on your laptop. That’s bad form and takes too long.

The answer is that Python reads the system’s PYTHONPATH environment variable for any paths that should be added to what it already knows about, which typically gets defined when you install Python for the first ime. I am using the Anaconda distribution of Python right now, so the scope of Python’s searching is basically contained to the directory where Anaconda put down a series of executables, libraries, and tools:

To add to the list of places that Python searches when you start up, so that import statements can work without a hitch, edit the system PYTHONPATH variable and tell Python where else it should look:

Add the parent directory of “brownhen”, or “bobby”, to PYTHONPATH by editing your .bash_profie file on Mac, your .bashrc file on Linux, or your system environment variables on Windows:

If you put your stuff in a directory like ~/Dropbox/Programming/packages/brownhen/, then put the following (on Mac) into the file ~/.bash_profile:

This tells Python to look there when it starts up.  Already, you’ve set things up so that you can reach your brownhen stuff with import statements:

PYTHONSTART

Go one step further by not only telling Python where your stuff is, but asking Python to load it when it starts up. Use the PYTHONSTART environment variable for this:

This tells Python that there’s a particular script that should be executed when Python starts up. In this case, the script has one line, and that is the import statement that pulls the brownhen fabulousness in and makes it available. This is pystart.py in its entirety:

This tells Python to load the brownhen module, which can define functions and execute things right off the bat if you want. For this tutorial, my brownhen module file (see below) looks like this:

See that this file:

  • Imports a library—BeautifulSoup
  • Prints a statement to output to let you know it’s been loaded 
  • Defines two functions, one a test (hello) and the other we want to make sure and have available in all our Python sessions (textify)

With things set up in this way, I can start Python and begin to use my textify() function right away:

It works! I’ve got my environment set up so that frequently-used functions are 1) organized into a module, which 2) Python can find, and which is 3) loaded when I start up. Power. 

But how does this work? What’s a module again?

Python modules and module loading

The Python docs on this are good, and there’s lots of online support about modules and packages in Python, but just a word about it to round things out here:

Any directory in your path—in the list of directories you tell Python it should search via PYTHONPATH—is understood to be not just a directory but a Python module when it has a Python file in it called __init__.py (two underscores on either side of “init”):

This file __init__.py registers the directory “brownhen” as a module in Python, but it also gets executed as that module is loaded. So if in your PYTHONSTART file you import brownhen, as I have, you execute this file. 

In your module file, you can set define subdirectories as packages and do lots of other things, but know that this module initialization file gets execute by Python when the module is loaded, and is where you can put your favorite functions and tools.