Data Envelopment Analysis (DEA) Computational Library
This repository provides a Python library for applying Data Envelopment Analysis (DEA) to real-world data. It includes several classes to compute various tasks related to DEA:
-
Dea()
File:libDEA/dea_instance.py
Solves a standard single DEA instance. -
DeaMultiprocessing()
File:libDEA/dea_multiprocessing.py
Solves multiple DEA instances in parallel using multiprocessing. -
DeaLargeScale()
File:libDEA/dea_largescale.py
Optimizes performance for large-scale cases. -
DeaProfile()
File:libDEA/dea_profile.py
Visualizes the efficiency surface.
The package uses the linear solver from the ortools library. It is free to use, provided you comply with the terms and conditions of ortools.
For more details, see my PhD-related paper: DEA.pdf
git clone --branch main https://github.com/aav-antonov/DEA.git
cd DEA
pip install . e
Below snippets of code how you can use libDEA explaining also input data:
#imports
from libDEA.dea_multiprocessing import DeaMultiprocessing
from libDEA.dea_largescale import DeaLargeScale
from libDEA.dea_profile import DeaProfile
#input X and Y
# generate 100 random DMUs with 3 inputs (what DMU is consuming) and 2 ouputs (what DMU is producing):
X = np.random.uniform(0, 10, size=(3, 100))
Y = np.random.uniform(0, 10, size=(2, 100))
#set up DeaMultiprocessing()
DEAMP = DeaMultiprocessing()
DEAMP.set_DEA(X, Y, q_type="x")
q = DEAMP.run(X, Y, q_type="x")
# q is of size 100 - efficiency (0 <= q <=1) for each DMU (X,Y)
#set up DeaLargeScale() more computationally efficient , can handle cases with > 10 000 DMUs
DEALS = DeaLargeScale()
q = DEALS.run(X, Y, q_type="x")
In the DEA folder, you can benchmark and test the performance and correctness of the two DEA implementations by running:
python test_benchmark.py
This script benchmarks and tests both accuracy and computational efficiency for:
- DeaMultiprocessing: Base method that computes efficiency for each unit directly using multiprocessing.
- DeaLargeScale: Optimized version designed for large-scale data and improved performance.
Random datasets of varying sizes are generated. Both methods are executed, results are compared for accuracy, and computation time is measured.
All tests were run on a machine with 4 CPU cores.
| m | fX_k | fY_k | DeaMultiprocessing (s) | DeaLargeScale (s) |
|---|---|---|---|---|
| 250 | 5 | 3 | 5.6306 | 5.8992 |
| 500 | 5 | 3 | 21.8980 | 14.5271 |
| 1000 | 5 | 3 | 86.2211 | 37.7303 |
| 2000 | 5 | 3 | 351.1777 | 105.9509 |
| 4000 | 5 | 3 | 1500.0* | 240.7928 |
| 8000 | 5 | 3 | 6200.0* | 607.6985 |
- * Extrapolated values for DeaMultiprocessing (based on observed scaling from smaller dataset runs).
Data Envelopment Analysis (DEA)
DEA evaluates the relative efficiency of a set of decision-making units (DMUs) by analyzing their input/output combinations. Each DMU is represented by a vector of inputs
The classical input-oriented DEA efficiency score for a DMU
where:
-
$x_{ij}$ : the$i$ -th input for DMU$j$ , -
$y_{rj}$ : the$r$ -th output for DMU$j$ , -
$m$ : number of input variables, -
$s$ : number of output variables, -
$n$ : number of DMUs, -
$\lambda_j$ : weights for constructing a reference DMU, -
$\theta_o$ : efficiency score for DMU$o$ ($\theta \leq 1$ ;$\theta = 1$ means efficient).
The solution
Complexity of the Problem
Given input and output matrices
As a result, evaluating DEA efficiency for datasets with more than 1,000 DMUs, even with a moderate number of inputs (e.g.,
Improving Computational Performance
A standard way to reduce computational time in DEA is to exploit the fact that the efficiency of each DMU can be determined using only the set of efficient DMUs, rather than the full matrices full_base), which is typically much smaller than the total number of DMUs, and then compute the efficiency of all other DMUs using only this set. The DeaLargeScale class implements this strategy to achieve significant computational improvements.
Steps in DeaLargeScale
Base Candidate Selection via Ratios
Calculate efficiency-related ratios for each column (DMU) and select preliminary candidates.
Let the candidate set be denoted as
Base Extension (Addbase)
For each DMU in
Let this refined set be denoted as
Base Refinement (Rebase)
For each DMU in
Final Compute
For each DMU in the original matrices
Data Envelopment Analysis (DEA) is a powerful methodology for constructing production functions based on empirical observations of Decision Making Units (DMUs) performance. The libDEA library provides specialized tools to visualize the efficient frontier through various projections.
The efficient frontier represents the optimal performance boundary where DMUs operate at maximum efficiency. Since production processes often involve multiple inputs and outputs, libDEA offers two primary visualization approaches:
- Output-Input Visualization (y-x profile): Shows the relationship between a specific output and input.
- Input-Input Visualization (x-x profile): Compares two different inputs while holding outputs constant.
In the DEA folder, you can run test script to see plots produced by DeaProfile() class:
python test_profile.py
from libDEA.dea_profile import DeaProfile
#input X and Y
# generate 100 random DMUs with 3 inputs (what DMU is consuming) and 2 ouputs (what DMU is producing):
X = np.random.uniform(0, 10, size=(3, 100))
Y = np.random.uniform(0, 10, size=(2, 100))
#set up DeaProfile() to visualise a given DMU and Efficient frontier (Production function)
DP = DeaProfile()
DP.get_base(X, Y, q_type="x")
dmu_index = 1
print(f"Selecting DMU with index {dmu_index} for profiling")
x, y = X[:, dmu_index], Y[:, dmu_index]
# Example of plotting y(x) profile
DP.get_yx_profile(x, y, file_output="plots/plot_yx.png")
# Example of plotting x(x) profile for different input pairs
DP.get_xx_profile(x, y, 0, 1, file_output="plot_xx") # see plot_xx_0_1.png
DP.get_xx_profile(x, y, 1, 2, file_output="plot_xx") # see plot_xx_1_2.png
DP.get_xx_profile(x, y, 0, 2, file_output="plot_xx") # see plot_xx_0_2.png
The output plots would lool something like this:



