Skip to content

Maximum LIkelihood Estimation Based MapClassification#290

Open
tensor-calculus wants to merge 11 commits intopysal:mainfrom
tensor-calculus:MLE
Open

Maximum LIkelihood Estimation Based MapClassification#290
tensor-calculus wants to merge 11 commits intopysal:mainfrom
tensor-calculus:MLE

Conversation

@tensor-calculus
Copy link
Copy Markdown

Closes #111 (hopefully)

I wasn't sure how to proceed with the LP solution so I used the Dynamic Programming Approach. Added a new class MaximumLikelihood that accepts an array of standard deviations sigma along with y, the implementation follows the formulation described in:

Mu, L., & Tong, D. (2019). Choropleth Mapping with Uncertainty: A Maximum Likelihood Based Classification Scheme.

Added the above paper to references.bib and added 2 tests to test_mapclassify.py.
There are a lot of comments meant for the sake of understanding and can be removed before merging.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.9%. Comparing base (bc27575) to head (df62f83).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #290     +/-   ##
=======================================
- Coverage   90.2%   88.9%   -1.3%     
=======================================
  Files         11      11             
  Lines       1308    1348     +40     
=======================================
+ Hits        1180    1198     +18     
- Misses       128     150     +22     
Files with missing lines Coverage Δ
mapclassify/__init__.py 100.0% <ø> (ø)
mapclassify/classifiers.py 86.8% <100.0%> (-1.8%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sjsrey sjsrey self-assigned this Mar 7, 2026
@sjsrey sjsrey self-requested a review March 8, 2026 01:18

y : numpy.array
(n, 1), values to classify.
sigma : numpy.array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any guidance on where sigma values are to be obtained?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the widely used American Community Survey (ACS) data include sampling uncertainty through published “margin of error” (MOE) estimates.

The uncertainty values were included in the dataset that they were using. If the data is generated by a statistical model then the standard deviation could be calculated directly from the model's own variance estimates i believe.

Ultimately uncertainty values have to come from either the people collecting the data or the ones modeling it, so it falls outside of our scope i guess.

Comment thread mapclassify/tests/test_mapclassify.py Outdated
class TestMaximumLikelihood:
def setup_method(self):
# A deterministic dataset designed to clearly cluster into 3 groups
self.y = numpy.array([10.0, 20.0, 100.0, 110.0, 200.0, 210.0])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be tested on large n problems for scalability.

Copy link
Copy Markdown
Author

@tensor-calculus tensor-calculus Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, agreed.

The reason I used a small dataset is because I could quickly calculate the correct output by hand and compare it with the classifier's output. If I were to use a large dataset I would never know if the output is correct or not, maybe I can try using the same dataset the authors were using and then compare the result with theirs?

P.S. This particular test example was giving same bins with the regular classifiers and MLE classifier so I changed it in one of the later commits and added a better test example that would give different bins to highlight the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Maximum Likelihood–Based Classification Scheme

2 participants