Local-sensitivity hashing

biogeme.lsh module

Obtain sampling weights using local-sensitivity hashing

author:

Nicola Ortelli

date:

Fri Aug 11 18:25:39 2023

biogeme.lsh.get_lsh_weights(df, w, a, max_weight)[source]

Compute weights using Locality-Sensitive Hashing (LSH) on input data.

This function applies LSH to the input data frame, generating weights based on bucketing of the data. It also provides an option to limit the maximum weight assigned to a group of data points.

Parameters:
  • df (pandas DataFrame) – The input data frame containing the data to compute weights for. The DataFrame should have at least one target column and one weight column.

  • w (float) – The width of the LSH buckets.

  • a (numpy.ndarray) – The LSH hash functions as a 2D array. Each row of this array represents an LSH hash function.

  • max_weight (int, optional) – The maximum weight allowed for a group of data points. If not provided, no maximum weight constraint is applied.

Returns:

An array of weights corresponding to the input data frame.

Return type:

numpy.ndarray