Alex FM

The Deep Learning Cookbook, Part 1 - A Geometric Intuition of Data

Alex — Mon, 06 Sep 2021 00:15:51 GMT

The Deep Learning Cookbook: Geometric Intuition of Data, Part 1

Introduction and Intention
Useful Libraries, Functions, and Methods
- Numpy
- Pandas
- Scikit-learn
- TensorFlow
- Matplotlib
Unraveling Terminology
Process of Implementing and Deploying Deep Learning Models
Understanding the Problem
- Tips For Learning Deep Learning Online
Understanding the Data and Loading it
Note
References

Introduction and Intention

Machine learning (ML) and deep learning (DL) are fields in constant flux and a state of evolution and innovation, and at times ill-defined. Few educational intuitions offer a comprehensive and standardized pedagogical approach in learning ML and DL. Furthermore, there are an array of online courses with self-professed experts that provide a shallow introduction to ML and DL. They fail to share adequate mathematical intuitions and tend to make outrageous claims on what DL can do (i.e. "predict" stock prices with an RNN based solely on historical prices) and fail to express the depth and excitement of the bleeding edge of DL.

My intention with this guide is to clear up some confusion about the subject, to provide a primer and a refresher on a variety of algorithms used in ML and DL in an organized fashion, and to demonstrate pragmatic approaches and perspectives to ML and DL. I view this guide as a jumping-off point, not a be-all resource. I've distilled tidbits of information and code snippets into this guide, along with research papers and helpful third-party resources. I assume that my reader has some background knowledge. If not, then the aptitude to be a self-learner and the ability to look up these concepts for further clarification.

Terms within machine learning tend to be inconsistently defined and frequently used interchangeably. A prime example is the term machine learning, deep learning, and artificial intelligence. In this guide, I distinguish machine learning and deep learning and disregard artificial intelligence as a misnomer and an ill-adapted phrase. When we talk about machine learning, it'll be about shallow modeling techniques that are typically not layered and typically work well in low-dimensional and structured data environments. Deep learning is the opposite. These techniques are layered and operate well in high-dimensional data environments with a range of structures. Our ultimate goal is to build well-performant models within an ecosystem that inputs data and outputs something meaningful.

Machine Learning: A set of algorithms that utilize data to learn transformations between inputs and outputs. They perform a function with the data given to it and gets progressively better over time.

Deep Learning: A subset of machine learning that consists of a series of models that learn multiple layers of transformations, where each layer offers a representation at each level. It uses a layered network that learns representations of the data, where this learning occurs iteratively and is then evaluated and applied toward problems in a similar domain.

A Brief Historical Tangent in ML and DL and Ethics

In a time not too long ago, computation was expensive and the distributive computing systems that were in place were insufficient to handle these multi-layered deep learning networks and the relevant data transformations and extractions needed. Some of the seminal works were theorized during a time called the AI Winter. There was an immense amount of hype in machine learning, and when these systems could not deliver in the '70s and '80s, there was a momentous drop-off in ML related funding. Arguably, in the 2010s, we have seen the rise and are currently experiencing an AI Spring. We now have the infrastructure, the skills, and the computation to properly train and implement these multi-layered systems, along with time-saving and computationally efficient implementations like transfer learning. This does not mean that we are in a period of infinite growth and where success is a guarantee. We are yet to achieve breakthrough status in some of the biggest autonomous problems, like true Level Four autonomous driving in a busy city, true long-form language generation systems, and effective knowledge systems. Things could drop-off precipitously, or simply grow too slowly. In DL and autonomous technologies, it's easy to fall into an infinite growth fallacy or to be reassured of the notion of perpetually improving systems.

This is where you come in. As a machine learning engineer, I find it to be an obligation and a duty to push this field forward. To challenge assumptions, to call out insufficient research and methodologies, to praise and uphold sufficient research, and more importantly, to contribute in some way. The exciting part about this field is if it works, it works. Keep in mind, we can always do better. Our goal is to be more than trained technicians. We want to implement new ideas, to demonstrate a better way of doing things, to expose bias where it exists, and not to use this technology for regressive purposes.

Deep Learning and Machine Learning Resources

I implore anyone reading this and following along to check out the resources below. It's not a comprehensive or complete list, but it is an excellent companion. All the resources below are free, and in my humble opinion, learning ought to be inexpensive and easily accessible. Be wary of online courses that could end up cost hundreds of dollars. Both paid professors and self-professed experts are perfectly capable of creating online courses that are more wasteful than productive. Before dishing out any cash, be sure you know that the person teaching is not an online marketer posing as a machine learning engineer. There are dozens, if not hundreds of courses that fall into this category.

Further Learning resources:

Fast.ai: Filled with engaging tidbits, insightful implementations, and fun and down to earth commentary. Level up your DL skills quickly and in a friendly manner.
TensorFlow Tutorials: A great way to learn DL while implementing it.
1. Alternatively, a TensorFlow crash-course: ML Crash Course
Stanford CS Cheatsheets: A series of cheatsheets that are well-organized, concise, and well-defined.
Shape of Data: A left-field recommendation that deals mostly with understanding data as geometric objects. The explanations are on-point, but may require a re-read for the information to really sink in.
Papers With Code: A curated source of research papers with some explanations.

What is Machine Learning?

Machine learning solves geometric problems by broadly leveraging linear algebra, matrix decomposition, calculus, and statistical methods to fit a mathematical representation to some data to ultimately output a prediction, classification, cluster, association, or ranking given what the algorithm learned about the data. General machine learning is shallow models where the data lacks complex dimensionality. Higher-dimensional data works best-using a subset of machine learning called deep learning. It is tempting to disregard a traditional machine learning technique for a shiny new deep learning technique, but I recommend that you do not disregard some of these techniques. Sometimes, an ensemble model can provide above par results on a structured dataset with less computational overhead.

There are subsets of machine learning, one being deep learning. Machine learning and deep learning contain supervised, semi-supervised, unsupervised, and reinforcement techniques. Modern deep learning architectures are increasingly semi-supervised (a combination of supervised and unsupervised) and reinforced. When we talk about supervision in ML, it is how the model learns the data.

What is Deep Learning?

When we mention deep learning, we are talking about a subset of models that use a network containing more than one layer to transform and extract features and learn patterns from input data, where each layer offers a representation at every level.

Deep learning contains networks like artificial neural networks (ANN, AKA, multi-layered perceptrons), recurrent neural networks (RNN), convolutional neural networks (CNN), general adversarial networks (GAN), restricted Boltzmann machines (RBM), and transformers. Networks like GANs, RBMs, and transformers are semi-supervised, unsupervised, or fully supervised. Examples of unsupervised networks are self-organizing maps (SOM), autoencoders (AE), deep belief networks (DBN), and deep embedding clustering (DEC). Of course, there are always exceptions to how we may train and implement these networks. When practically everything can be tweaked and redefined, these categories become slippery and ever-morphing.

Along with deep learning, we have reinforcement learning (RL) and deep reinforcement learning (DRL). We will not be covering RL since it is an entire topic in its own right. I encourage you to look into it. There are some incredible advancements in RL that I believe will revolutionize autonomous learning systems.

To understand the difference among all of them, see this StackOverflow answer:

The goal of machine learning methods is to learn rules from data and make predictions and/or decisions based on them. The learning process can be done in a(n) supervised, semi-supervised, unsupervised, reinforcement learning fashion.

In reinforcement learning (RL), an agent interacts with an environment and learns an optimal policy, by trial and error (using reward points for successful actions and penalties for errors). It is used in sequential decision making problems.

Deep learning as a sub-field of machine learning is a mathematical framework for learning latent rules in the data or new representations of the data at hand. The term "deep" refer to the number of learning layers in the framework. Deep learning can be used with any of aforementioned learning strategies, i.e.,supervised, semi-supervised, unsupervised, and reinforcement learning.

A deep reinforcement learning technique is obtained when deep learning is utilized by any of the components of reinforcement learning. Note that Q-learning is a component of RL used to tell an agent that what action needs to be taken in what situation.
StackOverflow Answer, submitted by Roy

Transfer Learning and Research Papers

There are steep advances in deep learning thanks in part to transfer learning and novel architectures to existing networks. Transfer learning is the process of applying pre-trained models, or trained weights, into your architecture, where these weights require little to no training. Pre-trained models are trained on a lot of data using a particular architecture. This advancement has allowed more people to use state-of-the-art models in their implementations without burdening the end-user with thousands of dollars of training costs.

Most of the papers below are some incarnation of transfer learning, novel architectures, or creative implementations of DL. These papers may be advanced, yes, but I'm confident that most people will be able to glean something useful from them. Getting into the habit of reading papers will be crucial. ML is an evolving field. Improvements in existing architectures, novel implementations, and new architectures are published frequently. Paper recommendations will be throughout this guide.

Some seminal papers and architectures are down below:

Image classification:

ResNet: A CNN model with up to 152 layers.

Repository: https://github.com/KaimingHe/deep-residual-networks
Paper: https://arxiv.org/pdf/1512.03385.pdf
Year: 2015

VGG: Deep CNN model with up to 19 layers.

Repository (TensorFlow): https://github.com/machrisaa/tensorflow-vgg
Paper: https://arxiv.org/pdf/1409.1556.pdf
Year: 2015

Inception-v4: Deep CNN model for image classification.

Repository: https://github.com/kentsommer/keras-inceptionV4
Paper: https://arxiv.org/pdf/1602.07261.pdf
Year: 2016

Object detection:

YOLOv4: A deep CNN model for real-time object detection. One-sage object detection that improves on YOLOv3.

Repository (Darknet): https://github.com/pjreddie/darknet
Paper: https://arxiv.org/abs/2004.10934
Year: 2020

Mask R-CNN: A real-time NN object instance segmentation with 80 different classes released by Facebook AI Research. It does label prediction, bounding box prediction and mask prediction.

Repository (TensorFlow): https://github.com/matterport/Mask_RCNN
Repository (PyTorch): https://github.com/facebookresearch/maskrcnn-benchmark
Paper: https://arxiv.org/pdf/1703.06870.pdf
Year: 2018

Natural language:

GPT-3: Generative Pre-trained Transformer 3. Developed by Open.ai with 175 billion parameters. Currently, source code is closed off to the public and is being licensed.

Repository: https://github.com/openai/gpt-3
Paper: https://arxiv.org/pdf/2005.14165.pdf
Year: 2020

BERT: Bidirectional Encoder Representations from Transformers by Google AI Language team. BERT Base model is a 12-layer, 768-hidden layer, 12 heads, 110 million parameter architecture. BERT Large model is a 24-layer, 1024-hidden layer, 16 heads, 34 million parameter architecture. Trained on BookCorpus (800M words) and Wikipedia (2.5B words).

Repository: https://github.com/google-research/bert
Repository (HuggingFace): https://github.com/huggingface/transformers
Paper: https://arxiv.org/pdf/1810.04805.pdf
Year: 2019

The emotional arcs of stories are dominated by six basic shapes: This is the first research paper I reviewed and replicated, and it has a special place in my heart. For me, it's a statement saying that creativity and beauty are perfectly capable in computer science and machine learning, and our ability to test a hypothesis is not limited. The authors of this paper test Kurt Vonnegut's rejected thesis stating that stories follow basic emotional arcs.

Repository: https://github.com/andyreagan/core-stories
Paper: https://arxiv.org/pdf/1606.07772.pdf
Year: 2016

Reinforcement learning:

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

AKA the MuZero paper
Repository (PyTorch): https://github.com/werner-duvaud/muzero-general
Paper: https://arxiv.org/pdf/1911.08265v1.pdf
Year: 2019

Useful Libraries, Functions, and Methods

Numpy

import numpy as np

#### Create Arrays ####

# Create a numpy array
np.array([])

# Create a hard-coded numpy array 
np.array([[2, 5, 2, 1], [1, 4, 1, 6], [3, 2, 0, 6]])

# Create an array representing an array w/ all elements set to 0 and a shape of (2, 3, 4)
np.zeros((2, 3, 4))

# Do the same as zeros with ones
np.ones((2, 3, 4))

# Generate random integers from standard normal
np.random.randn()

# Create array randomly sampled from a normal distribution w/ a mean of 0 and a standard deviation of 1 and shape (2, 3)
x = np.random.normal(0, 1, size=(2, 3))

# Create 1-D array w/ 1000 evenly spaced points between 0 and 10
np.linspace(0, 10, 1000)

# Create log based array
np.logspace(1, 2, 3)

#### Examine Arrays ####

# Check on an array shape (length along each axis)
x.shape

# Check on an array size (total # of elements)
x.size

# Array data type
x.dtype

# Number of dimensions
x.ndim

#### Reshaping Arrays ####

# Change shape of an array w/out changing the number of elements or values
X = x.reshape(2, 3)

# Add an axis to an array
a = np.array([0, 1])
new_col = a[:, np.newaxis]
print(new_col)

# Alternatively
new_col = a[:, None]

# Return a flattened copy of an array
flt = x.flatten()
flt[0] = 10
print(flt)

#### Operations ####

# Concatenate multiple arrays together
X = np.arange(12).reshape(3, 4)
Y = np.array([[1, 4, 3, 3], [2, 1, 4, 2], [3, 1, 1, 1]])
np.concatenate([X, Y], axis=0), np.concatenate([X, Y], axis=1)

# Create a binary array
X == Y

# Sum entire array
X.sum()

#### Broadcasting ####

# Broadcasting: Perform elementwise operations on arrays of differing sizes 
# 1.) expand one or both arrays by copying elements appropriately so that after this transformation, two arrays have same shape
arr1 = np.arange(3).reshape(3, 1)
arr2 = np.arange(2).reshape(1, 2)
# 2.) Carry out the elementwise operations on the resulting arrays.
arr1 + arr2

#### Indexing & Slicing ####

# [-1] selects the last element and [1:3] selects the second and the third element
X[-1], X[1:4]

# Write elements of a matrix by specifying indices
X[1, 2] = 9

# Assign multiple elements the same value via indexing
X[0:4, :] = 12

#### Allocating Memory ####

# Create a new matrix Z with the same shape as another Y, using zeros_like to allocate a block of 0 entries
Z = np.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

# Binary check
before = id(X)
X += Y
id(X) == before

#### Converting to Different Object ####

# Invoke the item function to convert a size-1 tensor to a Python scalar
a = np.array([3.5])
a, a.item(), float(a), int(a)

Pandas

We will use Pandas for handling missing data. We generally want our data to be in a Numpy array. After handling the data, we will convert the data to a Numpy array.

Series: One-dimensional labeled array that holds any data type. Axis labels are an index.

DataFrame: Two-dimensional labeled data structure with the same or differing data types. Like a CSV or table.

import pandas as pd

#### Load Data ####

# Load a CSV file
data = pd.read_csv(data_file)
print(data)

# Load from URL
url = 'https://urlhere.com/data/example.csv'
example = pd.read_csv(url)

#### Handle Missing Data ####

# Imputate and delete 
# By integer-location based indexing (iloc), we split data into inputs and outputs
# For numerical values in inputs that are missing, we replace the “NaN” with the mean value of same column
ins, outs = data.iloc[:, 0], data.iloc[:,0]
ins = ins.fillna(ins.mean())
print(ins)

# For categorical or discrete values in inputs, we consider “NaN” as a category
ins = pd.get_dummies(ins, dummy_na=True)
print(ins)

#### Convert Data Into NP Array ####
import numpy as np
X, y = np.array(ins.values), np.array(outs.values)

#### Create a DataFrame ####

cols = ['username', 'salary']
data1 = pd.DataFrame([['hello', 60000], ['cool_name45', 90000]],  columns = cols)
data2 = pd.DataFrame(dict(username=['kayfabe', 'jk_name'], salary=[95000, 100000]))

#### Combine DataFrames ####

data1.append(data2)
com_data = pd.concat([data1, data2])

#### Join DataFrame ####

# Use union
new_data = pd.DataFrame(dict(username=['oh_hey', 'newness'], salary=[400000, 850000]))
inter_union = pd.merge(com_data, new_data, on="username", how='outer')

#### Reshape ####

# Melt
stack = pd.melt(intersect, id_vars="username", var_name="var", value_name="value")

# Pivot
print(stack.pivot(index='username', columns='var', values='value'))

#### Summarize Data ####

inter_union.head() # Shows first 5 rows
inter_union.tail() # Shows last 5 rows
inter_union.columns # Column names
inter_union.index # Labels
inter_union.dtypes # Data types in each column
inter_union.shape # Number of rows/columns
inter_union.values # NP array
inter_union.info() # Summary

# Detailed stats
print(inter_union.describe())
print(inter_union.describe(include='all'))

# Group stats
print(inter_union.groupby("username")["salary_x"].mean())
print(df.groupby("username").describe(include='all'))

#### Column Selection ####

# Single column selection
inter_union['username']

# Multiple column selection
inter_union[['username', 'salary_x']]

#### Row Selection ####

df = inter_union.copy() # Create a copy
df.iloc[0] # 1st row

# Change the shape of data in specified region
for i in range(df.shape[0]):
  df.loc[i, "salary_x"] *= 10

#### Sort DataFrame ####

df.sort_values(by='salary_x', ascending=False)

#### Remove Duplicates ####

df = inter_union.append(df.iloc[0], ignore_index=True)
print(df.duplicated())
df.duplicated().sum() # Dupe count
df[df.duplicated()] # Show dupes
df.duplicated(['username', 'salary_x']).sum()
df = df.drop_duplicates() # Drop dupes

#### Find Missing Values ####

# Missing values in a Series
df.salary_x.isnull() # True if NaN
df.salary_x.notnull() # False if NaN
df[df.salary_x.notnull()] # only show rows where NaN
df.salary_x.isnull().sum() # count missing values

# Missing values in a DataFrame
df.isnull() # Shows as booleans
df.isnull().sum() # sum of each column

# Drop missing values
df.dropna()
df.dropna(how='all')

# Fill in values
df.salary_x.mean()
df = inter_union.copy()
df.loc[df.salary_x.isnull(), "salary_x"] = df["salary_x"].mean()

Scikit-learn

import sklearn
from sklearn import datasets, metrics
from sklearn.ensemble import ExtraTreesClassifier

# Load iris dataset
dataset = datasets.load_iris()
# Fit an ensemble model
model = ExtraTreesClassifier()
model.fit(dataset.data, dataset.target)
# Print the relative importance of each attribute
print(model.feature_importances_)

TensorFlow

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout, BatchNormalization, Conv2D, Flatten, GlobalMaxPooling2D, MaxPooling2D
from tensorflow.keras.optimizers import Adam

'''
  Implement a CNN using TensorFlow's Sequential API
'''
#### Load Data ####
cifar10 = tf.keras.datasets.cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
y_train, y_test = y_train.flatten(), y_test.flatten()
print("x_train.shape:", x_train.shape)
print("y_train.shape", y_train.shape)

# Print number of classes
K = len(set(y_train))
print("number of classes:", K)

#### Build #####
i = Input(shape=x_train[0].shape)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
x = Dropout(0.2)(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
x = Dropout(0.2)(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
x = Dropout(0.2)(x)

x = Flatten()(x)
x = Dropout(0.2)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(K, activation='softmax')(x)

model = Model(i, x)

#### Compile ####
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


#### Fit ####
# Train on GPU!
f = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=10)

Matplotlib

import matplotlib.pyplot as plt

#### Evaluation Plots ####

# Plot loss per iteration
plt.plot(f.history['loss'], label='loss')
plt.plot(f.history['val_loss'], label='val_loss')
plt.legend()

# Plot accuracy per iteration
plt.plot(f.history['accuracy'], label='acc')
plt.plot(f.history['val_accuracy'], label='val_acc')
plt.legend()

#### Save Figures ####

# PNG format
plt.plot(f, example)
plt.savefig("example.png")
plt.close()
# SVG 
plt.plot(f, example)
plt.savefig("example.svg")
plt.close()
# PDF
plt.plot(f, example)
plt.savefig("example.pdf")
plt.close()

Unraveling Terminology

Building a DL workflow, we want to define a few key terms. Here we will define data, model, objective function, and optimizing algorithm. This breakdown is also found in the textbook Dive into Deep Learning, a text that I highly recommend.

Data

When we say data, we mean a set of points represented in a particular shape. We handle data more or less as vectors. Data consists of fixed-length vectors, where the constant length of the vectors is the dimensionality of the data, and varying-length vectors. For varying-length vectors, imagine processing a text file where words and sentences have differing lengths. These varying-length vectors can be handled by deep learning models with more ease than typical machine learning models.

Model

When we say model, it is the mechanism that ingests data as one type, and then outputs data of another type. This is generally the bones of our architecture, the way in which data are transformed, processed, and outputted. A decision tree algorithm or an ANN would be considered our model. Note that this is not our entire architecture, a model is just one part!

Objective Functions

Objective functions are commonly used in optimization and minimization problems. The terminology is widely used in linear programming to define the goal of a model. Here, we want to improve the performance of a task iteratively. To improve learning a task, we need to know how well or how poorly our models perform. Therefore, we use objective functions to optimize the performance of a model.

We will encounter many different terms which generally mean the same thing. An objective function in the context of ML and DL is also known as loss functions or cost functions.

In regression, or predicted numerical values, we commonly use a loss function called squared error. This is the square of the difference between what was predicted and the actual value. In classification, we try to minimize error rate. This is the rate at which the predictions differ from the actual values. To define a loss function, we want to consider the model's parameters and the dataset used.

In relation to data and objective functions, we typically split a dataset into two sets: a training set to fit model parameters, and the test set (unseen data) to evaluate the model. We then see how the model performs on both sets. If the model performs well on the training set and generalizes well on unseen data, then we have a good model for our given problem.

Optimizing Algorithms

When the three base conditions are met (e.g. data, model, and objective function), we need to call or build an algorithm that will search for the best parameters to minimize the loss function. We typically use some variant of gradient descent. Gradient descent iteratively checks each parameter in how the training set loss would move if that parameter's decision boundary is moved a slight bit. Then it updates the parameter to where it reduces training set loss.

We will introduce stochastic gradient descent and stochastic weight averaging later in this guide.

Process of Implementing and Deploying Deep Learning Models

The basic structure of developing a model using any ML library will follow these steps:

Understand the problem
Load the data
Instantiate/build the model
Train the model
Evaluate the model
Deploy the model

These six steps will work in most deep learning approaches. Note that each approach will differ in steps one, two, and three. Latter steps will generally follow the same pattern for most implementations. Also, note that models that do the same tasks have the same interface.

Most generic deep learning approaches will generally follow this pseudocode:

# Load the data. Unsupervised algos don't use labels (no Y)
X, Y = load_data()

# Build the model
model = MakeModel()

# Train the model
model.fit(X, Y)

# Evaluate the model. Classification returns accuracy, regression returns R^2
model.score(X, Y)

# Use the model
model.predict(X)

# Save the model
model.save()

That's it? Kind-of. In a simplistic, experimental implementation, yes, that is typically the extent it will take to train a single model. Granted, it will take more lines of code, but it's generally not more complicated than the example above. Conversely, in a peer-reviewed research capacity, or when working in a large codebase, more than likely not.

Understanding the Problem

"In deep learning, we are often trying to solve optimization problems: maximize the probability assigned to observed data; minimize the distance between predictions and the ground-truth observations. Assign vector representations to items (like words, products, or news articles) such that the distance between similar items is minimized, and the distance between dissimilar items is maximized."
Dive Into Deep Learning

Understanding a problem before implementing a solution should go without saying. If you don't understand the challenge at hand, let alone the journey ahead in your implementation, then many process errors can and will likely occur and will result in a less than efficient workflow. Though the former may be true, even with well-traveled problems, we still want to maintain an experimental mindset when understanding the problem. Issues arise when a poorly tested implementation is scaled. Experiment as much as you can at the start of your exploration, and in a productized business setting, try to avoid implementing poorly tested solutions in your release.

The most pertinent aspect of understanding your problem is selecting the model. When inputs are already curated while desired outputs are known, then the model falls in place quickly. However, when breaking new ground, you'll generally have to decide if it is a classification, regression, association, clustering, and or reinforcement learning problem. One model will work in most cut and dry prediction problems (e.g. given age, experience, and location, suggest a salary via multivariate linear regression). Though, in most ambitious projects and established infrastructures, multiple models and algorithms are used to achieve the desired result or a series of results either in conjunction or stepwise.

There are a few handy ways to select which category of models we want to experiment with. First, we want to understand:

How much data do we have?
Do the data contain predefined labels, are they unlabeled, or mixed?
Are the desired targets continuous (i.e. any value within a range) or categorical (i.e. qualitative representations)?

In selecting the model, there are no clear set of rules, only guidelines. Therefore, testing and implementing new approaches is paramount. Once you decide on the type of problem you're solving, it may be helpful to explore different models within your problem scope locally and cross-validate the results to see which model is most performant.

Thus we can decompose the task of fitting models into two key concerns: i) optimization: the process of fitting our models to observed data; ii) generalization: the mathematical principles and practitioners' wisdom that guide as to how to produce models whose validity extends beyond the exact set of data examples used to train them.
Dive into Deep Learning

We may be able to simplify model selection with a more basic heuristic given the data we are working with. According to fast.ai, we can generally approach a problem in two ways:

Use ensembles of decision trees (i.e., random forests and gradient boosting machines), mainly for structured data
Use ANNs with Stochastic Gradient Descent (i.e., shallow and/or deep learning), mainly for unstructured data (like audio, images, and natural language)

Tips For Learning Deep Learning Online

A lot of online DL resources either oversimplify and teach the bare minimum of what this technology can do or throw a bunch of models, algorithms, and architectures at you and hope you know what you are doing. To get the most out of DL, read research papers, work on codebases leveraging DL, and focus on a select few models, algorithms, and architectures.

Another thing to be aware of, try not to box yourself in! The beautiful thing about programming and DL is that there is more than one way to do something, and given the ever-changing nature of DL, well-established architectures are constantly improved upon. Keep an open mind to your problem, and experiment as much as possible. Try not to get bogged down in the minute details of perfecting your dataset. Your time is better spent on implementing solutions and evaluating their performance. This way, we may build intuition and solve problems more effectively.

A decent way to learn is to find a good general resource to learn the fundamentals while actively testing your knowledge via programming solutions. Whether it is a textbook or an online resource like fast.ai, learn the fundamentals. Once you learn the fundamentals, divide and conquer. Go deeper on models, objective functions, optimization algorithms, and architectures. Find the seminal research papers and novel research papers and review their findings. Then, once you have a grasp of DL, work on your own project or contribute to an open-source project, and publish your results.

The last thing, to pick TensorFlow or PyTorch is trivial. Use one, and if another framework is needed. They are typically relatively easy to learn once you learn one well enough.

Understanding the Data and Loading it

Geometric Intuition of Data

You may think of data as collections of uniform or non-uniform data types. A blob of numbers. A series of columns and rows containing records. I want to shift this intuition to a geometrical understanding of data. In DL, comprehending data as geometry is more apt.

Think of a point as zero-dimensional, a line as one-dimensional, a shape as two-dimensional, and a cube as three-dimensional. In two-dimensional (i.e. two-variable) and three-dimensional (i.e. three variable) linear regression, the data is distributed closet to one less than the whole space. In two-dimensional linear regression, a one-dimensional line, or three-dimensional linear regression, a two-dimensional plane is how its represented. As we increase dimensionality, we represent its co-dimensionality, or a model's dimensionality minus 1. For example, a two-dimensional linear regression (i.e. two variables) has a co-dimension of a one-dimensional line, where x and y are represented by a one-dimensional shape. To learn more about this, I highly recommend the blog Shape of Data.

We may transfer some of this intuition to tensors and generalizations of tensors in N-dimensional space. More precisely, the dimensions of a tensor are the number of axes in a given tensor. The dimensionality of an axis in a given tensor will be the length of that axis.

Scalar: zero-dimensional tensors with zero axes.
- This is our point. Analogous to a record or a datum.
Vectors: one-dimensional tensors with one axis.
- This is our line. Analogous to a column or a row.
- First-order tensor.
Matrices: two-dimensional tensors with two axes.
- This is our flat shape or plane. Analogous to a table.
- Second-order tensor.
Tensors: n-dimensional arrays w/ an arbitrary number of axes.

Scalars, Vectors, Matrices, and Tensors

Scalars is a single numerical point with no axis. A vector is an array of scalar values, or vector x consists of n scalars. The length of a vector is called the dimension of a vector. A dimension of a vector is an axis that refers to the vector's length or the number of scalars in a vector. Matrices have m rows and n columns of scalar values.

Image Source: HackerNoon

Shape of Data

In the context of the shape of data, imagine a rectangle (HxL) and associate it with tabular data (NxD). As the dimensionality of our data increases, then envision a higher dimensional shape, (e.g. a cube for time series data in the shape of NxTxD). As you may imagine, it is enormously difficult to imagine higher dimensional space, let alone process it. That is where deep learning comes in handy.

Tabular data is of shape NxD, or the number of samples by the number of features. Time series data are NxTxD, or samples by sequence length by features. Images are typically NxHxWxC or samples by height by width by a stacked color channel (red, green, blue).

Though with some exceptions (e.g. VGG), some neural networks are performant on input data in the shape of NxD. Therefore, we cannot simply load data into a neural network and expect a useable result. Necessary data transformations usually need to occur before loading in your input data.

Data Shape

Tabular: NxD
Images: NxHxWxC
Time series: NxTxD

Image Source: MC.ai

Continuous and Categorical Variables

Continuous variables are numerical data that can be directly fed into a model. They are ready to be added and multiplied directly. Categorical variables contain a number of discrete levels that are not readily apparent in their meaning (e.g. color). We usually need to meaningfully encode some meaning of these variables. There are many ways to transform categorical, the three that may be used generally are:

Integer Encoding: Each unique label is mapped to an integer.
One-Hot Encoding: Each label is mapped to a vector containing 0 or 1.
Learned Embedding: A learned distributed representation of categories.

Data Augmentation

Data augmentation is the process of creating random variations in our input data. This is in order to create more data to train on, though, this process could be expensive and time-consuming. To augment image data, we may rotate, flip, change contrast, change the crop, or color shift, among other things. We typically use data augmentation in image-based tasks.

Structured and Unstructured Data

Structured data is readily available, searchable, and typically formatted in a SQL database with a well-defined schema and data dictionary. Comprising of fields and records, these data tend to be complete, with little to no missing values, and duplicates are generally handled.

Unstructured data is raw data with no associated schema or format. This can be video, image, audio, or text files. These data have not been processed and may be processed further. We commonly see these data in a NoSQL database.

Semi-structured data consists of tagged and labeled data that has some format to it. Typically we see JSON and XML files fall into this category.

Splitting Data

In machine learning tasks, we split our data into at least two different sets, a training set and a test set. It is also good practice to prepare a validation set (or a hold-out set). We do this for the sake of model selection, and after model selection, we simply train and test the model that is most performant. The model selection process can be skipped sometimes, but it's good practice to keep this habit. At times, a model we think will be performant on a set of data doesn't pan out, and we need to test multiple methodologies and choose the best performing model given a set of points.

Furthermore, consistently poorly performing models will necessitate us to challenge our assumptions about our data. Here is a checklist:

Is there enough data?
Am I handling anomalies sufficiently?
Is the set in a proper format?
Are there too many features? Too few?

The training set is what we train our model on, the sample of data used to fit the model. The validation set is used to evaluate how our model is fitted to our training set and fine-tunes our model's hyperparameters. A validation set does not directly impact our model, we use it as a measure to adjust the model accordingly. The test set is the sample of data that we use to see how well the model fits our training set given data it hasn't seen.

Hyperparameters are the properties of the learning algorithm and include architecture-related features, the regularization parameter, number of iterations, step size, etc.
Source: Stanford ML Cheatsheet

There are two methods we can take to split our dataset. First, set aside a percentage of our data and split it into three sets. Alternatively, we can split our dataset into two sets (train and test), then take a percentage of the train set for our validation set. The split percentage will depend on our model and the number of data points we have. Our validation set will depend on how many hyperparameters we need to tune (i.e. more hyperparameters = larger validation set). Training set splits are largely dependent on the type of model we use. We will figure out a good ratio when we begin to talk about cross-validation. If we go with the latter method, we could try an 80/20 split in favor of training, then take a percentage of the validation set. In this cross-validation process, we use our training set to create multiple splits of the train and validation sets, and then train and evaluate these sets iteratively.

Image Source: KDNuggets

Training set: A sample of data used to fit the model.

Used to learn and adjust the weights of a neural network.

Validation set: A sample of data used to evaluate the model during training that does not directly impact the model.

Use results to tune hyperparameters like the number of hidden units, or learning rate.

Test set: A sample of data used to test how well the model fits the data.

Check to see if the model is underfitting or overfitting.

Cross-Validation

Cross-validation is a method that is used to select a model that shifts the onus away from relying solely on the initial training set. Essentially, we take a dataset, split it, and evaluate the model on each training epoch. Two methods we may try are K-Fold Cross-Validation and Leave-P-Out Cross-Validation.

K-Fold Cross Validation: Trains on k-1 folds of data and evaluates the model on the remaining fold.

Leave-P-Out Cross-Validation: Trains on n-p observations and evaluates the model on the p remaining ones.

We will typically use k-fold cross-validation. This way we may split the training data into however many folds we need to train the model on. Then we validate the model on one of the folds, all the while training the model on the other k-1 folds. During this process, we are able to derive the average error over the k folds. This is called cross-validation error.

Image Source: Stanford ML Cheatsheet

Underfitting and Overfitting

This part can and will be in the "Evaluating the Model" section. I believe understanding the relationship between data and fitting models before evaluation is important. Rest assured, I will be bringing this topic back into play in the "Evaluating the Model" section, and hopefully it will make more sense by that point.

Underfitting is when a model misses relevant relations among features and target outputs. In other words, it misses the mark entirely. Typically, we will see the following symptoms:

Training error is close to testing/validation error, generally high training error
High bias

To remedy an underfit model, we may:

Add more features to the model
Train on more epochs

Overfitting is when a model learns the training data too well. The model understands random fluctuations and noise in the training data as valid and does not generalize to new data well. In classification and regression, an overfitted model looks like a Picasso painting, attempting to fit/learn the samples in the training data verbatim. It will perform well on the training set, and poorly on the test set. Not good! Typically, we will see the following symptoms:

Abnormally low training error, and high test/validation error
High variance

To remedy an overfit model, we may:

Perform regularization
Produce more data
Reduce complexity

Image Source: Stanford ML

Bias and Variance

Bias is a type of error that is the difference between a model's predicted values and the correct values. Meaning, it either misses the mark or fits it too closely. In biased models, assumptions about the data are made that are false or simply not useful, therefore, some parts of the model or the model itself are not suitable for the data used. Low bias indicates that the predicted values are close to the actual ones. High bias is when the predicted values are far off from the actual values, therefore, giving a high error in training and testing.

High bias, low variance algorithms are consistent but inaccurate. The algorithm underfits the data because it misses the complexity and breadth of the data, and the algorithm makes assumptions about the data that does not hold well. In other words, the model is too simple.

Variance is the sensitivity to the variations in training data and demonstrates the spread of the data. High variance fits random noise and unimportant relationships in the data, therefore, it cannot generalize and make predictions on new data. Low variance fits important relationships in the data, and generalizes well. Variance in good models should be minimized when possible.

High variance, low bias algorithms are accurate but inconsistent. Overfitting occurs when the algorithm fits the data is complex.

Trade-off between bias and variance: When more variables are added to a complex model, you'll lose bias but gain variance. When simplifying it, you'll lose variance but gain bias. A model can't be both simple and complex at the same time, therefore, there is a tradeoff. We typically cannot minimize both.

Bias-variance decomposition: Decomposes the learning error from any algorithm by adding bias, the variance, and irreducible error due to noise in the dataset.

Handling Missing Data and Imbalanced Data

Missing or imbalanced data will be useful in most implementations using structured tabular data or labeled data. There are a few techniques we may implement to remedy an imbalanced dataset or missing values in your dataset.

Handling Missing Data

When you handle missing data, be conscious of how much data is missing. Generally, if you're missing over 50 percent of your data, then you should get better data or explore more intensive data handling methods (e.g. remove data impoverished columns/rows). In the Pandas coding section of this guide, you'll find how to handle missing values, and if desired, add a placeholder value or filling the missing values via the mean of the value of the column.

Data Imputation: If data is missing, add a value where the data is missing. For categorical data, add a new category (e.g. missing), and for numerical add an indicator value to show it is missing (e.g. 0).

Handling Imbalanced Data

Balanced datasets are pertinent when we split data into classes. Typically a balanced dataset is when there is an about equal distribution of positive examples to negative examples. An imbalanced dataset is when we have either an overrepresented class or an underrepresented class in our data. Generally, imbalanced datasets are not the worst thing. They are problematic when we run into performance issues.

Random Undersampling: Gets rid of the overrepresented class from the training data. Used when there is a lot of data in an underrepresented class.

Random Oversampling: Takes the underrepresented class and samples with replacement data until we get the required ratio. Used when there is not a lot of data in underrepresented/smaller classes.

Synthetic Monetary Oversampling (SMOTE): Synthesizes new data with minor changes in existing samples.

Ensemble modeling tends to work well with imbalanced data since the aggregation used in some of these techniques tends to mitigate the overfitting of a specific class.

Datasets

When testing and experimenting with implementation, you may want to train your model on a particular dataset to see how it performs on a task. I compiled a list of sites that contain useful datasets that require varying degrees of preprocessing and web scrapping. If you know what you want, check out one of the dataset aggregators or search engines. Google's dataset search or Kaggle are great places to start.

You may also access datasets prepackaged within TensorFlow. These sets are preprocessed and ready to be used. Most of the time, you just need to get started and focus on experimenting with model design. Data preparation will eat away at your productive hours without mercy. What's most important is that you get something running and you test and compare a few different versions.

Dataset aggregators and search

Visual data

Text data

Code Snippet: Dataset Splitting in Scikit-Learn

from sklearn.model_selection import train_test_split

# Train/test set split using sklearn
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.70, random_state=40)

print(f"Train labels:\n{y_train}")
print(f"Test labels:\n{y_test}")

Code Snippet: Loading Data TensorFlow Code Example

'''
  This example demonstrates how to load and 
  process the MNIST dataset to be use in an ANN.

  Using the datasets provided by TensorFlow, they typically do the heavy 
  data preprocessing for you.
'''
import tensorflow as tf

# Load MNIST dataset
mnist = tf.keras.datasets.mnist

# Define MNIST train and test sets as tuples
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# For image recognition using an ANN, standardize values by dividing by 255.0
x_train, x_test = x_train / 255.0, x_test / 255.0

# Prints the shape of the x_train set
print("x_train.shape:", x_train.shape)

Note

The first edition of The Deep Learning Cookbook is part one of a multi-part series on deep learning. The next parts will include building and compiling models, training models, evaluating models, and deploying models. Thank you kindly for reading this. If you have any suggestions or corrections, please let me know via email. I originally published The Deep Learning Cookbook on Alex FM.

References

Dive into Deep Learning, Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. 2020.
Fast, https://course.fast.ai/start_colab
Stanford Machine Learning, https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
Shape of Data, https://shapeofdata.wordpress.com/about/
TensorFlow API, https://www.tensorflow.org/api_docs/python/tf/keras/Model
The Lazy Programmer, https://github.com/lazyprogrammer

Analysis of Denver's 2019 DPS Director District 1 Election

Alex — Sun, 05 Sep 2021 21:43:55 GMT

This report highlights Denver’s Coordinated Election on November 5th, 2019, particularly concerning Dr. Radhika Nath’s campaign for DPS Director in District 1. I demonstrate parts of an approach to engaging voters in a targeted manner, using highly curated data and social media. This report is a hybrid. I combine in-depth analysis and personal recommendations to provide strategic insight for political analysts, campaign managers, or to satisfy the curiosity of a Denverite.

Within this report, I examine the voter demographics in DPS District 1, expand on the DPS District 1 School Board Director race in November 2019, review opponent research, and dissect our campaign strategy. I analyze the outcomes of our campaigns tactics and examine their general efficacy. Accompanying these strategies, I review opponent research and list their contributions and major expenditures. Lastly, I analyze the digital efforts made by our campaign on Facebook and Instagram, and examine the results.

Methodology

Data Preparation

VoteBuilder permits basic filtering and data handling. In order to truly target based on the campaigns needs, I distilled the data in VoteBuilder and wrangled subsets of the data. Aggregating the voting population of DPS District 1 in Denver, I created subsets of the VoteBuilder master dataset containing 100,343 registered voters. I stratified these voters into three strati: Highly Likely Voters, Likely Voters, and Inactive Voters. I wrangled each strati by voting habits in local elections and characteristics like age and permanent residency. Targeting our resources wisely, I used the finances at my disposal on engaging Highly Likely Voters via Facebook, Instagram, and traditional canvassing.

The Voting Landscape: Denver Public School District 1

DPS District 1 Voters

DPS District 1 contains 76 precincts. Residing are approximately 100,343 registered voters. District 1 includes all of House District (HD) 6, HD 4, and a small portion of HD 7 and HD 10. At current estimates, unaffiliated voters outnumber all other political affiliations at 40,874 registered voters in DPS District 1. There are 40,262 registered democrats, 17,527 republicans, 339 greens, and 1341 libertarians.

The number of registered voters seem numerous; however, high voter registration does not mean that they are consistent voters. The voter landscape in Colorado shifted with the Voter Access and Modernized Elections Act of 2013[1]. Colorado is one of five states where elections are conducted entirely through mail-in ballots. The caveat of this convenience? Stricter inactive registration standards. A voter is inactive if their mail-in ballot forwards from a given address or if they fail to vote in the last four years.

Given a campaign with limited resources, and most importantly, time, political operations in Colorado that lack resources and outreach should not focus their efforts on inactive voters or selective voters (those who vote in presidential elections only). The voter demographics to pay attention to are 1. Highly Likely Voters, 2. Likely Voters, and 3. Highly Likely Late Voters.

Anticipated Voter Turnout

Anticipated voter turnout for the November 5th, 2019 Coordinated Election in Denver is 35.9 percent. In DPS District 1, total voter turnout in all local elections and all relevant elections in the last four years, the average turnout is 35,923 people. The active voter population in DPS District 1 is an estimated 87,191 people, and the number of voters who voted in five relevant local elections in the last four years accounted for 35,321 people or 40.51 percent of registered voters. In light of this, I created a benchmark that consists of the voter turnout from the following five Denver elections:

June 4th, 2019 Municipal Run-Off
May 7th, 2019 Municipal General
November 7th, 2017 Coordinated Election
November 3rd, 2015 Coordinated Election
May 5th, 2015 Municipal General

Insight #1

Historical Voter Turnout

Given the date of this report, I included the latest data from the November 5th, 2019 Coordinated Election, where approximately 40.72 percent of all registered voters and 46 percent of all active voters participated in the election. The average of all voter turnout in all elections aside from the November 5th Coordinated election is lower than the actual turnout. The data suggest that voter turnout is higher than in previous years and appears to be a trend that may follow in the elections to come.

In comparison to the predicted turnout, the actual turnout for the November 5th Coordinated Election is approximately five points higher than the prediction. In the upcoming Coloradan elections, I anticipate voters will vote in higher numbers than previous local elections. I suspect that this increase is partly due to voters acclimating to Denver's voting system, and voters realizing the importance of participating in local elections.

Historical Voter Turnout in DPS District 1 – Total D1 Population[2]

Historical Voter Turnout in DPS District 1 – Active Registered D1 Population[3]

Historical Voter Turnout in Denver – Registration & Turnout[4]

Denver Coordinated Election Nov 5, 2019 Results

Actual Voter Turnout by Precinct

For the Denver Coordinated Election held on November 5th, 2019, the ballot return rate was notably higher in middle income and affluent areas like Wash Park, Belcaro, Wellshire, Platt Park, and Wash Park West. In Wash Park, the median household value is $846,900[5], while in Belcaro, it is $1,021,000[6]. The lowest rate of return was in the Kennedy neighborhood at 13.99 percent, while the highest was in Hampden South at 64.29 percent. The wealth and voter turnout connection is relatively consistent, but it is not always a reliable indicator of voter turnout. For example, in one part of Virginia Village, there was a 58 percent return rate, while in a different section of the neighborhood, there was a 19.35 percent return rate. There are many factors worth considering when analyzing voter behavior by precincts, such as median age, median household income, number of renters versus homeowners, and educational attainment.

District 1 – Denver Coordinated Election 2019 Results

Dr. Radhika Nath, a Southeast Denver candidate who ran for Denver Public Schools Director in District 1, failed to attain the required number of votes to win; however, given the general lack of funding in her campaign, her performance was impressive. She was outspent 20 to 1 by her opponents. Scott Baldermann self-funded his campaign to the tune of $350,000, and Diana Romero Campbell accepted large donations from pro-charter school millionaires and dark money Political Action Committees. Dr. Nath’s campaign began on June 13th and formally ended on November 5th, 2019. The lens in which this report is written is in hindsight of the campaign. As of November 12th, 2019, Denver released its unofficial election results, which Radhika received 7,468 votes (21.71 percent). Nath’s opponents, Diana Romero Campbell and Scott Baldermann received 10,684 (31.06 percent) and 16,242 (47.22 percent[7]) of the vote, respectively.

DPS District 1 – Demographics and Voting Behaviors

Highly Likely Voter Demographic – DPS District 1

The Highly Likely Voter (HLV) demographic is also known as the “super voter”. These are the voters that participate in every local election, they are politically active and have an intuitive understanding of local politics. HLVs tend to be wealthy, between the ages of 55 to 80, educated, home-owners, and with ample leisure time. To define the HLV demographic and to fine tune it in a voter database system like VoteBuilder, the major factors to consider are:

Voting History: Participation in all relevant local election in the past two years or four years.
Age: Segment by age in nine year increments.
Suppression: Suppress by good mailing and voting address.

Chart 1.1 - There are fewer HLVs than Likely Voters (LV), and the number of HLVs and LVs steadily increases by every age bracket. The highest proportion of HLVs to total population is within the 65-100 age bracket.

Highly Likely Voters – DPS District 1

Image 1.1

Image 1.1 is the VoteBuilder filters used to find Highly Likely Voters. VoteBuilder can only display 5,000 records at a time. To create datasets from VoteBuilder, first segment by age to capture a sample below 5,000, change your settings to display 999 records per page, and begin copying the data from VoteBuilder and pasting it into a spreadsheet application like Excel or Google Sheets. This is time consuming; however, the data gathered are absolutely critical in curating a list of voters to target. Ultimately, resources can be allocated both more effectively and efficiently with well-constructed and highly curated datasets.

Insight #2

Likely Voter Demographic – DPS District 1

Image 2.1

Defining the likely voter demographic is a matter of two factors:

Active voter registration.
Participation in any local election in the past four years.

In Fall 2019, out of the 100,343 registered voters in DPS District 1, 87,196 voters have an active voter registration, while 13,153 registered voters currently have an inactive registration.

Most of the time, a campaign with limited resources should not focus their efforts on inactive voters. Only if a campaign has visited every door in the Highly Likely Voter and Likely Voter demographic at least three times, then inactive voters should be engaged.

Consider the number of inactive voters that voted in the Denver 2019 Coordinated Election. Out of 13,153 inactive voters, only 252 submitted their mail-in ballot, that is about .01 percent of total votes. It is a different dynamic with active voters. Of the 87,196 active voters, over 40,000 individuals voted in the 2019 Denver Coordinated Election.

As seen in image 2.1, in VoteBuilder the filters applied include registered voters and any active voter that voted in any relevant local Denver election in the past four years. All things considered, the likely voter base in DPS District 1 is approximately 45,767 voters.

Insight #3

In-Person Voters – DPS District 1

Chart 2.1

The number of voters that voted in the Denver 2019 Coordinated Election was 3,279 people in total. There are two voting centers located in DPS District 1, Denver Police Department District 3 (University) and Christ Community Church (Hampden South), in those facilities only 439 and 239 people voted in-person, respectively. Low in-person voter turn-out suggests that Denverites are phasing out in-person voting as their preferred method of casting a ballot. Mail-in ballots provide a convenience unmatched by in-person voting.

The reality of Denver voters voting habits is apparent: people are voting by ballot, and campaigns ought to take this into account when planning and operating a campaign. Campaigns ought to perform all their essential outreach tasks well before ballots are mailed to voters.

Insight #4

Ballot Returns by Date – DPS District 1

A trend that is consistent within DPS District 1 and Denver proper is late ballot drop-offs. Denver voters tend to procrastinate. This observation is superficial on its surface, however, there is an correlation between the big five personality traits and political affiliation. People who tend to score low on conscientiousness tend to have a liberal disposition, and therefore, generally less punctual, while individuals who tend to score higher on conscientiousness tend to have a conservative disposition, and are generally more punctual[8]. This may play a small role in why a majority of Denverites drop-off their ballots during the last three days of an election cycle, it is not the full picture.

Chart 3.1 ought to illustrate a clear reality for any campaign manager or field organizer operating in Colorado: all canvassing, mailing, and fundraising efforts need to occur before ballots are delivered to voters. By the time mail-in ballots arrive, a campaign must make at least three attempts to contact all Highly Likely Voters (HLV). This includes mailers, post cards, and a majority of physical outreach. Without a consistent and persistent canvassing and advertising strategy, physical outreach during the last three weeks of an election cycle will likely make little difference in swinging a campaign.

Chart 3.1

Insight #5

Competitive Research – Scott Baldermann

Scott Baldermann, ran as an anti-reform candidate and is multi-millionaire who made his fortune by selling his software company Attolist LLC to Newforma. Scott used his wealth in unprecedented proportions to fund a vast majority of his campaign, funding it to the tune of $334,510[9]. His average contribution is $1,018.85, deriving from 350 individual donors.

His campaign strategy and ground strategy focused heavily on canvassing, mailers, and phone banking. Scott hired a number of individuals to manage his canvassing activities. In addition, he hired MG Community Connections for $19,980, which enabled him to reach 5,000 doors a week, for an estimated $1 to $1.75 a door. Scott gained a particular informational advantage by hiring a polling service for $20,000. His insight into the feelings and attitudes of the residents of DPS District 1 gave him an asymmetric advantage in the realm of actionable data. Overall, Scott spent approximately $229,331 in advertising related activities, making this the bulk of his campaign expenditure.

Scott Baldermann - Campaign Expenditures

Table 5 - Scott Baldermann: Canvassing Expenditures

Table 6 - Scott Baldermann: Advertising Expenditures

Table 7 - Scott Baldermann: Consulting & Polling Expenditures

Competitive Research – Diana Romero Campbell

Diana Romero Campbell is the president of a local non-profit organization and worked in multiple non-profits like Colorado Children’s Campaign, Mile High United Way, and currently Scholars Unlimited. She has spent a majority of her career in non-profits, dedicated to improving the lives of children. She ran as a reform candidate, promoting charter schools and the interests of pro-charter organizations.

Diana’s major source of funding was from big dollar contributors and known charter school investors. She received over $128,739[10]in direct campaign expenditure support from special interests and independent expenditure committees (IECS), and approximately $144,597 in indirect support. Organizations like Students for Education Reform (SFER) and Stand for Children, or Better Schools for a Stronger Colorado (Stand for Children IEC). Furthermore, Diana’s campaign accepted $79,800 in contributions over $1,000. Notable contributors include Emma Bloomberg, Bruce Benson, Phil Anschutz, Jill Anschutz, and Kent Thiry.

Diana’s strategy relied heavily on hired canvassers and the canvassing efforts bought by her IEC donors. Generally, this was a more effective strategy than originally anticipated. Overall, her budget was a reported $105,381.31[11], with the IEC expenditures, she had approximately $234,120.31. By the end of her campaign, she had budget of $19,560.

Diana Romero Campbell – Campaign Expenditures & Contributions

Table 8 - Diana Romero Campbell: Canvassing & Management Expenditures

Table 9 - Diana Romero Campbell: Advertising & Outreach Expenditures

Table 10 - Diana Romero Campbell: Consulting & Professional Services

Table 11 - Diana Romero Campbell: Major Contributions

Radhika Nath – Canvassing Results

During the length of Dr. Nath’s campaign, 221 people volunteered or expressed interest in volunteering in some capacity. The number of individuals that canvassed for Dr. Nath is 42. In total, they knocked on an estimated 21,356 doors[12], with 3,404 successful contacts. Dr. Nath’s campaign had a contact rate of 16 percent. This contact rate could have been moderately improved by increasing canvassing efforts during the weekends and implementing an after work canvassing shift during the weekdays, between 5pm and 6:30pm. Nevertheless, Nath’s campaign performed slightly below average in terms of successful contacts, yet performed remarkably well in terms of polling.

Insight #6

Radhika Nath – Polling Results

The survey conducted during Dr. Nath’s campaign is a measure of relative approval taken by the canvassers, and interpreted from the point of contact. It is a one through five rating system, one being that the contact strongly disapproves of Radhika as a candidate, and a five being that the contact strongly approves of Radhika. The canvassers either asks the contact directly or the canvasser self-reports the contacts impression.

Dr. Nath polled considerably well, with 92 percent of voters having at least a neutral impression of her. Voters with positive impressions of Radhika, either approving or strongly approving of her as a candidate, was considerably high at 44 percent. Only 8 percent of respondents disapproved or strongly disapproved of Radhika.

Chart 4.1

Table 13 - 3585 people surveyed rated their approval of Dr. Nath on a scale from 1 (strongly disapprove) to 5 (strongly approve).

Insight #7

Radhika Nath – Digital Ad Result

Launch Video – Overview

The campaign launch video was the first video of the Nath campaign. It is 2 minutes and 30 seconds in length and was completed at no cost. The launch video was an excellent way to introduce the candidate and show her personality, values, and most importantly, how Dr. Nath intended on fixing Denver Public Schools.

In light of the launch video post’s performance, it reached 19,936 people, with 1,730 post clicks, and 874 reactions, comments, and shares. For any advertising campaign, the most ideal outcome on Facebook is organic reach. The launch video garnered 119 shares and 4.9 thousand minutes watch. What propelled the video was the organic reach, which accounted for 67 percent of the video’s viewership. Paid reach accounted for 33 percent of the video’s viewership. The launch video’s audience was not targeted and its advertising bounds was within Colorado.

The cost of expanding the reach of the video on Facebook cost the campaign $200. It cost $.18 per view, and the average play time was 5 seconds. On average people who viewed the video via paid reach did not watch past the 5 second mark. This is to be expected when launching content on Facebook that is lengthy.

Insight #8

Launch Video – Platform Placement

Chart 5.1 - Platform placement performance of Launch video on Facebook.

The video placement shows which platform a piece of content was hosted on. For the launch video, it was placed predominately on Facebook. Considering that it was a long form video, this medium is ideal, since the video watching feature on Facebook is far more developed than Instagram, and overall, an audience is more likely to interact with this kind of content on Facebook than other platforms.

The graph below illuminates how many impressions (people observing the content on their screen) were made on the Facebook platform from the paid campaign. Overall, 19,430 impressions and 1,134 thruplays (watching a video for 3 seconds or more) occurred.

Launch Video – ThruPlay

Chart 6.1 - ThruPlay performance of Launch video on Facebook. 17,775 total plays, 1,134 thruplays, and an average play time of 5 seconds.

The launch video was watched on average 5 seconds, this is not ideal since the video is 2 minutes and 30 seconds in length; however, 25 percent of viewers watched till about the 38 second mark. This is more promising, and demonstrates something that captivated a percentage of viewers. The graph below follows a typical exponential probability density function, where it spikes at the beginning and exponentially decays over time.

Launch Video – Demographics

Since the launch video was not targeted, it was broadly broadcasted to all audiences from the ages of 18 to 65+ living in Colorado. Considering that Highly Likely Voters in DPS District 1 are 65 and older, this video could have benefited from targeted an older demographic. Though this was a missed opportunity, the campaign received an interesting insight: women from ages 35-44 responded particularly well to Radhika and her message.

Chart 7.1 - Consumer demographics of Launch video on Facebook.

Imagine Video – Overview

Table 15 - Imagine video’s performance on Facebook

The Imagine video was the last paid video campaign, also completed at no charge. The video campaign began on 10/18 and ended on 10/30. Overall, the advertising campaign was a success, though, there was an obstacle: the video was demoted due to the amount of text in the original post. How much the text impacted the video is unknown. Accounting for the demotion, the video performed relatively well. At a cost of $.18 per thruplay, and a total budget of $129.44, the video reach was 4,385, with 7,531 impressions. Overall 1,862 people watched at least 3 seconds of the video, with an average watch time of 4 seconds.

Overall, the post of the Imagine video reach 17,119 people, 5,066 people watched at least 3 seconds of the video, and there was 377 reactions, comments, and shares. People who watched 15 seconds of the video tended to watch the full video. Of the 15-second watch group, 66 percent watched the entirety of the video.

Imagine Video – Average Play Time

Chart 8.1 - Imagine video’s average play time on Facebook.

As evident in chart 8.1, the distribution follows an unusual exponential probability density function, where a majority of viewers lose interest before 25 percent of the video is played. Though, an interesting spike occurs at the 15 second mark, indicating that about 30 percent of the thruplay occurred at that part of the video. The Imagine video attracted and retained more attention than the launch video. This could be due to the quality of the content, its inspirational message and music, its placement, and its length.

Imagine Video – Video Placement

Chart 9.1 - Imagine video’s placement on Facebook and Instagram.

The Imagine video was placed on both Facebook and Instagram, with a higher amount of engagement and thruplay on Instagram, though, on Facebook, there was a higher proportion of reach to thruplay. It demonstrated that for this type of content, at 50 seconds, the video performed significantly better on Instagram than Facebook. With a nearly 1:1 ratio of reach to thruplay, Instagram engaged more people than Facebook. The reason why there was more engagement on Instagram than Facebook was due to the optimization goal, it is easier to get more thruplay on Instagram than it is on Facebook.

Insight #9

Imagine Video – Demographics

Chart 10.1 - Demographics of viewers for the Imagine video.

The Imagine video was targeted toward a defined demographic, and its placement was both on Instagram and Facebook. Apparently, the defined demographic was not followed as it was on Facebook. The highest levels of engagement were from women 25-34. Considering that the most engagement came from Instagram; the platform typically has a younger demographic.

Imagine Video – Defined Audience

A targeted audience is crucial for any ad campaign, and the Imagine video was targeted to all the neighborhoods in South Denver that are within DPS District 1. Additionally, it was targeted toward people with educational interests, and certain behaviors, such as likeliness to engage with liberal content, and educational attainment, like being in college, a college graduate, or in graduate school.

Insight #10

Sources

[1] Data source: https://www.pewtrusts.org/en/research-and-analysis/issue-briefs/2016/03/colorado-voting-reforms-early-results

[2] As of 2019, there are approximately 100,343 voters in DPS District 1 with active and inactive registration. 100,343 is the base number for the averages and do not account for the fluctuations of voter registrations in DPS District 1. The data was gathered from the following sources: https://www.denvergov.org/electionresults#/results and https://www.votebuilder.com/. Note that these are only estimates and may fluctuate.

[3] As of 2019, there are approximately 87,191 voters in DPS District 1 with an active registration.

[4] Data source: Denver - 2019-06-13Report_ElectionTurnoutHistorical.xlsx

[5] Data source: https://www.zillow.com/washington-park-denver-co/home-values/

[6] Data source: https://www.zillow.com/belcaro-denver-co/home-values/

[7] Data source: https://www.denvergov.org/media/denverapps/electionresults/pdfs/20191105/Summary_Report_Denver_FinalUnofficialResults.pdf

[8] Data source: https://www.sciencedirect.com/science/article/abs/pii/S0092656605001017

[9] Data source: http://tracer.sos.colorado.gov/PublicSite/SearchPages/CandidateDetail.aspx?Type=CA&SeqID=44429

[10] Data source: https://cleanslatenowaction.org/dps-district-1-1

[11] Data source: http://tracer.sos.colorado.gov/PublicSite/SearchPages/CandidateDetail.aspx?Type=CA&SeqID=44167

[12] Total was compiled from all available weekly canvassing reports. In addition, volunteers canvassed without the use of VoteBuilder.

Featured image credit: https://cdn.5280.com/2019/11/Denver-Election_November-2019_Bouchard.jpeg