Generating multiple solutions for linear programs | Andrew Wheeler
source link: https://andrewpwheeler.com/2021/10/24/generating-multiple-solutions-for-linear-programs/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Generating multiple solutions for linear programs
I had a question the other day on my TURF analysis about generating not just the single best solution for a linear programming problem, but multiple solutions. This is one arena I like the way people typically do genetic algorithms, in that they have a leaderboard with multiple top solutions. Which is nice for exploratory data analysis.
This example though it is pretty easy to amend the linear program to return exact solutions by generating lazy constraints. (And because the program is fast, you might as well do it this way as well to get an exact answer.) The way it works here, we have decision variables for products selected. You run the linear program, and then collect the k products selected, then add a constraint to the model in the form of:
Sum(Product_k) <= k-1
Then rerun the model with this constraint, get the solution, and then repeat. This constraint makes it so the group of decision variables selected cannot be replicated (and in the forbidden set). Python’s pulp makes this quite easy, and here is a function to estimate the TURF model I showed in that prior blog post, but generate this multiple leaderboard:
import pandas as pd
import pulp
# Function to get solution from turf model
def get_solution(prod_dec,prod_ind):
'''
prod_dec - decision variables for products
prod_ind - indices for products
'''
picked = []
for j in prod_ind:
if prod_dec[j].varValue >= 0.99:
picked.append(j)
return picked
# Larger function to set up turf model
# And generate multiple solutions
def turf(data,k,solutions):
'''
data - dataframe with 0/1 selected items
k - integer for products selected
solutions - integer for number of potential solutions to return
'''
# Indices for Dec variables
Cust = data.index
Prod = list(data)
# Problem and Decision Variables
P = pulp.LpProblem("TURF", pulp.LpMaximize)
Cu = pulp.LpVariable.dicts("Customers Reached", [i for i in Cust], lowBound=0, upBound=1, cat=pulp.LpInteger)
Pr = pulp.LpVariable.dicts("Products Selected", [j for j in Prod], lowBound=0, upBound=1, cat=pulp.LpInteger)
# Objective Function (assumes weight = 1 for all rows)
P += pulp.lpSum(Cu[i] for i in Cust)
# Reached Constraint
for i in Cust:
#Get the products selected
p_sel = data.loc[i] == 1
sel_prod = list(p_sel.index[p_sel])
#Set the constraint
P += Cu[i] <= pulp.lpSum(Pr[j] for j in sel_prod)
# Total number of products selected constraint
P += pulp.lpSum(Pr[j] for j in Prod) == k
# Solve the equation multiple times, add constraints
res = []
km1 = k - 1
for _ in range(solutions):
P.solve()
pi = get_solution(Pr,Prod)
# Lazy constraint
P += pulp.lpSum(Pr[j] for j in pi) <= km1
# Add in objective
pi.append(pulp.value(P.objective))
res.append(pi)
# Turn results into nice dataframe
cols = ['Prod' + str(i+1) for i in range(k)]
res_df = pd.DataFrame(res, columns=cols+['Reach'])
res_df['Reach'] = res_df['Reach'].astype(int)
return res_df
And now we can simply read in our data, turn it into binary 0/1, and then generate the multiple solutions.
#This is simulated data from XLStat
#https://help.xlstat.com/s/article/turf-analysis-in-excel-tutorial?language=en_US
prod_data = pd.read_csv('demoTURF.csv',index_col=0)
#Replacing the likert scale data with 0/1
prod_bin = 1*(prod_data > 3)
# Get Top 10 solutions
top10 = turf(data=prod_bin,k=5,solutions=10)
print(top10)
Which generates the results:
>>> print(top10)
Prod1 Prod2 Prod3 Prod4 Prod5 Reach
0 Product 4 Product 10 Product 15 Product 21 Product 25 129
1 Product 3 Product 15 Product 16 Product 25 Product 26 129
2 Product 4 Product 14 Product 15 Product 16 Product 26 129
3 Product 14 Product 15 Product 16 Product 23 Product 26 129
4 Product 3 Product 15 Product 21 Product 25 Product 26 129
5 Product 10 Product 15 Product 21 Product 23 Product 25 129
6 Product 4 Product 10 Product 15 Product 16 Product 27 129
7 Product 10 Product 15 Product 16 Product 23 Product 27 129
8 Product 3 Product 10 Product 15 Product 21 Product 25 128
9 Product 4 Product 10 Product 15 Product 21 Product 27 128
I haven’t checked too closely, but this looks to me offhand like the same top 8 solutions (with a reach of 129) as the XLStat website. The last two rows are different, likely because there are further ties.
For TURF analysis, you may actually consider a lazy constraint in the form of:
Product_k = 0 for each k
This is more restrictive, but if you have a very large pool of products, will ensure that each set of products selected are entirely different (so a unique set of 5 products for every row). Or you may consider a constraint in terms of customers reached instead of on products, so if solution 1 reaches 129, you filter those rows out and only count newly reached people in subsequent solutions. This ensures each row of the leaderboard above reaches different people than the prior rows.
Recommend
-
4
Co-author networks in Criminology In my bin of things I will never finish at this point, I started a manuscript looking at co-author networks in criminology using web of science data. I recruited several folks over the years (gr...
-
9
Creating high crime sub-tours I was nerdsniped a bit by this paper, Targeting Knife-Enabled Homicides For Preventive Policing: A Stratified Resource Allo...
-
7
Conjoint Analysis of Crime Rankings So part of my recent research mapping crime harm spots uses cost of crime estimates relevant to police departments (Wheeler & Reuter,...
-
6
Plotting Predictive Crime Curves Writing some notes on this has been in the bucket list for a bit, how to evaluate crime prediction models. A recent paper on
-
2
The length it takes from submission to publication The other day I received a positive comment about my housing demolition paper. It made me laugh abit inside — it felt li...
-
5
Some more peer review musings It is academics favorite pastime to complain about peer review. There is no getting around it – peer review is degrading, both in the adjective sense of demoralizing, as well as in the verb ‘wear do...
-
4
Amending the WDD test to incorporate Harm Weights So I received a question the other day about amending my and Jerry Ratcliffe’s
-
6
Plotting interactions and non-linear predictions When interpreting regression model coefficients in which the predictions are non-linear in the original variables, such as when you have polynomial terms or interaction effects, i...
-
3
An intro to linear programming for criminologists Erik Alda made the point the other day on twitter that we are one of the few crim folks that do anything re...
-
3
Pooling multiple outcomes into one regression equation Something that came up for many of my students this last semester in my
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK