pwlreg
Quick Start¶
How to Use pwlreg
¶
pwlreg
performs p iece w ise l inear reg ression. It is general enough to suit any use case, but it was developed specifically to estimate change-point building energy models. ASHRAE calls this family of models “inverse models”; they are really just regression models with two or three piecewise components.
The pwlreg
library was written to be fully compatible with the scikit-learn API. It exposes two estimators: PiecewiseLinearRegression
and AutoPiecewiseRegression
. They differ in how they handle breakpoints (or change points): if you want to specify the breakpoints yourself, use PiecewiseLinearRegression
. If you want to find the optimal breakpoints automatically, use AutoPiecewiseRegression
.
Piecewise regression with fixed breakpoints¶
There are two primary arguments to the PiecewiseLinearRegression
constructor: the breakpoint locations and the degree(s) of the fitted polynomial(s). Using the defaults results in a model that is identical to if you had used ordinary linear regression:
You can view the breakpoints and the model coefficients once the model has been fit.
And you can calculate model metrics however you are accustomed to:
To specify a breakpoint, the breakpoints
argument must include the minimum and maximum values of the input data. Notice that when we passed breakpoints=None
above, the estimator automatically set the breakpoints to 1 and 10. If you pass the breakpoints manually, you will need to explicitly provide these. For example, suppose we wanted to put a breakpoint at x = 4
.
Now there are four model coefficients. In order, these are: the intercept and slope of the first (leftmost) line segment, and the intercept and slope of the last line segment.
The model fits significantly better than the single line did, as we can see both visually and with error metrics:
By default, each segment is a 1-degree polynomial (i.e. a line). In some change-point models, we want to restrict one or more segments to be constant. Do that by passing the degree
argument as a list, with one degree value for each line segment.
Now there are only three model coefficients. Because the first segment is a flat line, it only has one coefficient, the constant term. The second and third coefficients are the intercept and slope of the last segment as before.
This model fits better than the single line did, but not as well as the one with two degree-one segments. Maybe we picked the wrong breakpoint?
Piecewise regression with unknown breakpoints¶
To use AutoPiecewiseRegression
, specify the number of line segments rather than the locations of the breakpoints. You can specify the degrees in the same way as with PiecewiseLinearRegression
.
This looks like a better location for the breakpoint. We can once again inspect the breakpoint, the model coefficients, and the error metrics:
This is our best-fitting model yet. The optimal breakpoint is around 4.8. We are not limited to straight lines if we think a quadratic might give us a better fit:
Advanced Use¶
Because pwlreg
is compatible with scikit-learn, it can be used in the latter’s pipelines, transformations, and cross-validation techniques. Suppose we weren’t sure which kind of change-point model to use for a certain dataset, and we wanted a data-driven approach to selecting the best-performing one.
We’ll first simulate a larger realistic-ish dataset.
rng = np.random.default_rng(1234)
cp1, cp2 = 52, 73
x = np.concatenate(
(
rng.uniform(0, cp1, 300),
rng.uniform(cp1, cp2, 400),
rng.uniform(cp2, 100, 300),
)
)
y = np.piecewise(
x,
[x < cp1, (cp1 <= x) & (x < cp2), x >= cp2],
[
lambda a: 10 + 0.25 * (cp1 - a),
10,
lambda a: 10 + 0.5 * (a - cp2),
],
)
sigma = np.piecewise(
x,
[x < cp1, (cp1 <= x) & (x < cp2), x >= cp2],
[
lambda a: 1 + 0.10 * (cp1 - a),
1,
lambda a: 1 + 0.15 * (a - cp2),
],
)
y += rng.normal(0, sigma, 1000)
y = np.abs(y)
Cross Validation¶
from sklearn.model_selection import GridSearchCV
m = pw.AutoPiecewiseRegression(n_segments=1)
params = [
{
"n_segments": [1],
"degree": [0, 1, 2],
},
{
"n_segments": [2],
"degree": [[0, 0], [0, 1], [1, 0], [1, 1]],
},
{
"n_segments": [3],
"degree": [
[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1],
],
},
]
cv = GridSearchCV(
m,
param_grid=params,
cv=5,
n_jobs=-1,
scoring="neg_root_mean_squared_error",
verbose=2,
)
cv.fit(x, y)
for i in range(4):
for candidate in np.flatnonzero(results["rank_test_score"] == i):
print("Model {0}".format(i))
print(
"Mean RMSE: {0:.3f} (std: {1:.3f})".format(
-1.0 * results["mean_test_score"][candidate],
results["std_test_score"][candidate],
)
)
print("Parameters: {0}".format(results["params"][candidate]))
print("")
Continuity¶
If you don’t want to enforce continuity of the line segments for some reason, just pass continuity=None
to either estimator.
Sample weights¶
Weighted regression is supported by passing weights
to the .fit()
method. This makes little difference in this contrived example, but it could help the fit in areas of the data that are more important than others.
Future enhancements¶
- Standard errors and p-values for coefficients
- C2 continuity (continuity of the line segments and their derivatives)