Skip to content

9.1 Lists in, lists out, lists all about

Chapter 9.0 (Weighted-mean watchers) was fun. Let's do it all over again!

Or rather, let's do another list operation. We're going to start with a dataframe such as

shape: (4, 1)
 [0, 9]       
 [8, 6, 0, 9] 
 [3, 3]       
and we're going to try to count the indices which are non-zero. →


You don't really need a plugin to do this, you can just do

df.with_columns(sparse_indices=pl.col('dense').list.eval(pl.arg_where(pl.element() != 0)))

But eval won't cover every need you ever have ever,'s good to learn how to do this as a plugin so you can then customize it according to your needs.

Polars has a helper function built-in for dealing with this: apply_amortized. We can use it to apply a function to each element of a List Series. In this case, we just want to find the indices of non-zero elements, so we'll do:

fn list_idx_dtype(input_fields: &[Field]) -> PolarsResult<Field> {
    let field = Field::new(input_fields[0].name(), DataType::List(Box::new(IDX_DTYPE)));

fn non_zero_indices(inputs: &[Series]) -> PolarsResult<Series> {
    let ca = inputs[0].list()?;

    let out: ListChunked = ca.apply_amortized(|s| {
        let s: &Series = s.as_ref();
        let ca: &Int64Chunked = s.i64().unwrap();
        let out: IdxCa = ca
            .filter(|(_idx, opt_val)| opt_val != &Some(0))
            .map(|(idx, _opt_val)| Some(idx as IdxSize))
apply_amortized is a bit like the apply_to_buffer function we used in How to STRING something together, in that it makes a big allocation upfront to amortize the allocation costs. Think of it as a list version of apply_values, where each element is itself a Series.

Something new in this example is:

  • IdxSize
  • IdxCa

IdxSize is either u32 or u64, depending on your platform, and are what Polars generally uses for counting-related operations. IdxCa is the associated ChunkedArray, and IDX_DTYPE the associated Polars dtype.

To finish this off, the Python side will be a bog-standard:

def non_zero_indices(expr: IntoExpr) -> pl.Expr:
    expr = parse_into_expr(expr)
    return expr.register_plugin(
        lib=lib, symbol="non_zero_indices", is_elementwise=True

If we then make with

import polars as pl
import minimal_plugin as mp


df = pl.DataFrame({'dense': [[0, 9], [8, 6, 0, 9], None, [3, 3]]})
and compile with maturin develop (or maturin develop --release if you're benchmarking!) then we'll see

shape: (4, 2)
│ dense        ┆ indices   │
│ ---          ┆ ---       │
│ list[i64]    ┆ list[u32] │
│ [0, 9]       ┆ [1]       │
│ [8, 6, 0, 9] ┆ [0, 1, 3] │
│ null         ┆ null      │
│ [3, 3]       ┆ [0, 1]    │

Yay, it worked! And not only that, but it's about 1.5x as fast as the list.eval solution noted above!