Saturday, May 21, 2016

An intro to Regression Analysis with Decision Trees

It's a while that there are no posts on this blog, but the Glowing Python is still active and strong! I just decided to publish some of my post on the Cambridge Coding Academy blog. Here are the links to a series of two posts about Regression Analysis with Decision Trees: In this introduction to Regression Analysis we will see how to user scikit-learn to train Decision Trees to solve a specific problem: "How to predict the number of bikes hired in a bike sharing system in a given day?"

In the first post, we will see how to train a simple Decision Tree to exploit the relation between temperature and bikes hired, this tree will be analysed to explain the result of the training process and gain insights about the data. In the second, we will see how to learn more complex decision trees and how to assess the accuracy of the prediction using cross validation.

Here's a sneak peak of the figures that we will generate:

Tuesday, October 13, 2015

Game of Life with Python

The game of life consists of a grid of cells where each cell can be dead or alive and the state of the cells can change at each time step. The state of a cell at the time step t depends on the state of the grid at time t-1 and it is determined with a very simple rule:

A cell is alive if it's already alive and has two living neighbours, or if it has three live neighbours.

We call the grid universe and the alive cells population. At each time step the population evolves and we have a new generation. The evolution of the population is a fascinating process to observe because it can generate an incredible variety of patterns (and also puzzles!).

Implementing the game of life in Python is quite straightforward:
import numpy as np

def life(X, steps):
     Conway's Game of Life.
     - X, matrix with the initial state of the game.
     - steps, number of generations.
    def roll_it(x, y):
        # rolls the matrix X in a given direction
        # x=1, y=0 on the left;  x=-1, y=0 right;
        # x=0, y=1 top; x=0, y=-1 down; x=1, y=1 top left; ...
        return np.roll(np.roll(X, y, axis=0), x, axis=1)

    for _ in range(steps):
        # count the number of neighbours 
        # the universe is considered toroidal
        Y = roll_it(1, 0) + roll_it(0, 1) + roll_it(-1, 0) \
            + roll_it(0, -1) + roll_it(1, 1) + roll_it(-1, -1) \
            + roll_it(1, -1) + roll_it(-1, 1)
        # game of life rules
        X = np.logical_or(np.logical_and(X, Y ==2), Y==3)
        X = X.astype(int)
        yield X
The function life takes in input a matrix X which represents the universe of the game where each cell is alive if its corresponding element has value 1 and dead if 0. The function returns the next steps generations. At each time step the number of neighbours of each cell is counted and the rule of the game is applied. Now we can create an universe with an initial state:
X = np.zeros((40, 40)) # 40 by 40 dead cells

# R-pentomino
X[23, 22:24] = 1
X[24, 21:23] = 1
X[25, 22] = 1
This initial state is known as the R-pentomino. It consists of five living cells organized as shown here (image from Wikipedia)

It is by far the most active polyomino with fewer than six cells, all of the others stabilize in at most 10 generations. Let's create a video to visualize the evolution of the system:
from matplotlib import pyplot as plt
import matplotlib.animation as manimation

FFMpegWriter = manimation.writers['ffmpeg']
metadata = dict(title='Game of life', artist='JustGlowing')
writer = FFMpegWriter(fps=10, metadata=metadata)

fig = plt.figure()
with writer.saving(fig, "game_of_life.mp4", 200):
    for x in life(X, 800):
The result is as follows:

In the video we can notice few very well known patters like gliders and blinkers. Also an exploding start at 0:55!

Thursday, April 9, 2015

Stacked area plots with matplotlib

In a stacked area plot, the values on the y axis are accumulated at each x position and the area between the resulting values is then filled. These plots are helpful when it comes to compare quantities through time. For example, considering the the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971 (data that we already considered here). We can compare the number of cases of each disease month by month. First, we need to load and organize the data properly:
from import loadmat
NYCdiseases = loadmat('NYCDiseases.mat') # loading a matlab file
chickenpox = np.sum(NYCdiseases['chickenPox'],axis=0)
mumps = np.sum(NYCdiseases['mumps'],axis=0)
measles = np.sum(NYCdiseases['measles'],axis=0)
In the snippet above we read the data from a Matlab file and summed the number of cases for each month. We are now ready to visualize our values using the function stackplot:
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt

          [chickenpox, mumps, measles], 

# creating the legend manually
The result is as follows:

We note that the highest number of cases happens between January and Jul, also we see that measles cases are more common than mumps and chicken pox cases.