Skip to content

15. In (the) aggregate

Enough transorming columns! Let's aggregate them instead.

A Polars expression is a function from a Dataframe to a Series. So, how can we possibly write an expression which produces a scalar?

Simple:

  • write an expression which returns a 1-row Series
  • when you register the expression, pass returns_scalar = True

As an example, let's compute the weighted mean of a column, where the weights are given by a second column.

Hello Python my old friend

Nothing fancy here:

def vertical_weighted_mean(values: IntoExprColumn, weights: IntoExprColumn) -> pl.Expr:
    return register_plugin_function(
        args=[values, weights],
        plugin_path=LIB,
        function_name="vertical_weighted_mean",
        is_elementwise=False,
        returns_scalar=True,
    )

Rust

To keep this example's complexity down, let's just limit it to Float64 columns.

#[polars_expr(output_type=Float64)]
fn vertical_weighted_mean(inputs: &[Series]) -> PolarsResult<Series> {
    let values = &inputs[0].f64()?;
    let weights = &inputs[1].f64()?;
    let mut numerator = 0.;
    let mut denominator = 0.;
    values.iter().zip(weights.iter()).for_each(|(v, w)| {
        if let (Some(v), Some(w)) = (v, w) {
            numerator += v * w;
            denominator += w;
        }
    });
    let result = numerator / denominator;
    Ok(Series::new(PlSmallStr::EMPTY, vec![result]))
}

Run it!

Put the following in run.py:

df = pl.DataFrame({
    'values': [1., 3, 2, 5, 7],
    'weights': [.5, .3, .2, .1, .9],
    'group': ['a', 'a', 'a', 'b', 'b'],
})
print(df.group_by('group').agg(weighted_mean = mp.vertical_weighted_mean('values', 'weights')))

If you compile with maturin develop (or maturin develop --release if benchmarking), you'll see:

shape: (2, 2)
┌───────┬───────────────┐
│ group ┆ weighted_mean │
│ ---   ┆ ---           │
│ str   ┆ f64           │
╞═══════╪═══════════════╡
│ b     ┆ 6.166667      │
│ a     ┆ 2.333333      │
└───────┴───────────────┘

Try omitting returns_scalar=True when registering the expression - what changes?