Processing Sequences Using RNNs and 1D CNNs (Ch. 15)

Sequential data — time series, text, audio, video — requires architectures that respect temporal ordering. Chapter 15 introduces recurrent neural networks (SimpleRNN, LSTM, GRU) and shows how 1D convolutional networks can also model sequences, sometimes outperforming RNNs while being faster to train. The running example is the Chicago Transit Authority daily bus and rail ridership dataset, which you’ll forecast using increasingly sophisticated models, from a naive baseline through stacked LSTMs and WaveNet-style dilated convolutions.

What you’ll learn

Forecasting sequences with baseline models and naive approaches
Time series stationarity and differencing to remove trends
Classical ARMA/SARIMA models with statsmodels
SimpleRNN: the basic recurrent cell and its limitations
LSTM (Long Short-Term Memory): cell state, gates, and long-range dependencies
GRU (Gated Recurrent Unit): the lighter alternative to LSTM
Stacking recurrent layers and using return_sequences=True
1D convolutions (Conv1D) for sequence modelling
WaveNet-style dilated causal convolutions
Multivariate time series: multiple input channels
Sequence-to-sequence models

Key concepts

Recurrent neural networks

An RNN processes a sequence step by step, maintaining a hidden state that accumulates information about past inputs. At each time step the cell takes the current input and the previous hidden state, producing a new hidden state (and optionally an output). SimpleRNN uses a single tanh activation, which makes it vulnerable to vanishing gradients over long sequences.

LSTM and GRU

LSTM introduces a cell state (long-term memory) alongside the hidden state, controlled by three gates: the forget gate (what to discard from the cell state), the input gate (what new information to write), and the output gate (what to expose as the hidden state). This architecture can maintain relevant information across hundreds of steps without vanishing gradients. GRU simplifies LSTM to two gates (reset and update) and merges the cell state and hidden state into one. In practice GRU is slightly faster to train and often achieves comparable performance to LSTM.

Conv1D and WaveNet

1D convolutions apply a filter across the time axis, making them translation-invariant in time. Stacking Conv1D layers with increasing dilation rates (1, 2, 4, 8, …) creates a WaveNet-style architecture with an exponentially growing receptive field: a dilated causal conv with dilation 512 can look back 512 time steps while using only a small number of parameters. WaveNet-style networks train faster than LSTMs and can process sequences in parallel during training.

Chicago ridership dataset

The dataset records daily bus and rail boardings for the Chicago Transit Authority from 2001 to 2019. The notebook downloads it automatically, computes rolling statistics, analyzes seasonality, and builds progressively better forecasting models. You’ll see first-hand that a carefully tuned LSTM can beat classical ARIMA models on this real-world dataset.

Code examples

Loading the Chicago ridership data

import pandas as pd
from pathlib import Path
import tensorflow as tf

filepath = tf.keras.utils.get_file(
    "ridership.tgz",
    "https://github.com/ageron/data/raw/main/ridership.tgz",
    cache_dir=".", extract=True)
ridership_path = Path(filepath).with_name("ridership")

path = ridership_path / "CTA_-_Ridership_-_Daily_Boarding_Totals.csv"
df = pd.read_csv(path, parse_dates=["service_date"])
df.columns = ["date", "day_type", "bus", "rail", "total"]
df = df.sort_values("date").set_index("date")
df = df.drop("total", axis=1).drop_duplicates()

Stacked LSTM model for multivariate forecasting

import tensorflow as tf

tf.random.set_seed(42)

model = tf.keras.Sequential([
    tf.keras.layers.LSTM(32, return_sequences=True, input_shape=[None, 2]),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1)
])

model.compile(optimizer="adam", loss="mse", metrics=["mae"])
history = model.fit(train_ds, validation_data=valid_ds, epochs=20)

WaveNet-style dilated 1D convolutions

model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(input_shape=[None, 1]))

# Stack dilated causal conv layers doubling dilation each time
for dilation_rate in (1, 2, 4, 8, 16, 32):
    model.add(tf.keras.layers.Conv1D(
        filters=32, kernel_size=2, padding="causal",
        activation="relu", dilation_rate=dilation_rate))

model.add(tf.keras.layers.Conv1D(filters=1, kernel_size=1))
model.compile(loss="mse", optimizer=tf.keras.optimizers.Adam(learning_rate=3e-4))

Simple GRU baseline

model = tf.keras.Sequential([
    tf.keras.layers.GRU(32, input_shape=[None, 2]),
    tf.keras.layers.Dense(1)
])
model.compile(optimizer="adam", loss="mse")

Running this notebook

Enable a GPU

Recurrent layers can be very slow on CPU. In Colab, select Runtime → Change runtime type → GPU for GPU-accelerated cuDNN implementations of LSTM and GRU.

Open in Colab

Install dependencies

pip install -r requirements.txt

The classical time series section requires statsmodels~=0.14.0.

Dataset download

The Chicago ridership dataset is downloaded automatically when you run the setup cell. It is approximately 108 KB compressed.

Exercises

Exercises include building an encoder-decoder architecture for sequence-to-sequence forecasting and experimenting with different window sizes. Solutions are at the end of the notebook.

The GRU and LSTM layers will only use the fast cuDNN implementation when the default values for activation, recurrent_activation, recurrent_dropout, unroll, use_bias, and reset_after are kept unchanged. Modifying these disables cuDNN and falls back to a slower TensorFlow kernel.

Part I: The Fundamentals

Part II: Neural Networks & Deep Learning

Processing Sequences Using RNNs and 1D CNNs (Ch. 15)

What you’ll learn

Key concepts

Recurrent neural networks

LSTM and GRU

Conv1D and WaveNet

Chicago ridership dataset

Code examples

Loading the Chicago ridership data

Stacked LSTM model for multivariate forecasting

WaveNet-style dilated 1D convolutions

Simple GRU baseline

Running this notebook

Exercises

Build docs developers (and LLMs) love

Part I: The Fundamentals

Part II: Neural Networks & Deep Learning

Documentation Index

​What you’ll learn

​Key concepts

​Recurrent neural networks

​LSTM and GRU

​Conv1D and WaveNet

​Chicago ridership dataset

​Code examples

​Loading the Chicago ridership data

​Stacked LSTM model for multivariate forecasting

​WaveNet-style dilated 1D convolutions

​Simple GRU baseline

​Running this notebook

​Exercises

Build docs developers (and LLMs) love

What you’ll learn

Key concepts

Recurrent neural networks

LSTM and GRU

Conv1D and WaveNet

Chicago ridership dataset

Code examples

Loading the Chicago ridership data

Stacked LSTM model for multivariate forecasting

WaveNet-style dilated 1D convolutions

Simple GRU baseline

Running this notebook

Exercises