Maximum LIkelihood Estimation Based MapClassification#290
Maximum LIkelihood Estimation Based MapClassification#290tensor-calculus wants to merge 11 commits intopysal:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #290 +/- ##
=======================================
- Coverage 90.2% 88.9% -1.3%
=======================================
Files 11 11
Lines 1308 1348 +40
=======================================
+ Hits 1180 1198 +18
- Misses 128 150 +22
🚀 New features to boost your workflow:
|
|
|
||
| y : numpy.array | ||
| (n, 1), values to classify. | ||
| sigma : numpy.array |
There was a problem hiding this comment.
Is there any guidance on where sigma values are to be obtained?
There was a problem hiding this comment.
For example, the widely used American Community Survey (ACS) data include sampling uncertainty through published “margin of error” (MOE) estimates.
The uncertainty values were included in the dataset that they were using. If the data is generated by a statistical model then the standard deviation could be calculated directly from the model's own variance estimates i believe.
Ultimately uncertainty values have to come from either the people collecting the data or the ones modeling it, so it falls outside of our scope i guess.
| class TestMaximumLikelihood: | ||
| def setup_method(self): | ||
| # A deterministic dataset designed to clearly cluster into 3 groups | ||
| self.y = numpy.array([10.0, 20.0, 100.0, 110.0, 200.0, 210.0]) |
There was a problem hiding this comment.
This would need to be tested on large n problems for scalability.
There was a problem hiding this comment.
Hmm, agreed.
The reason I used a small dataset is because I could quickly calculate the correct output by hand and compare it with the classifier's output. If I were to use a large dataset I would never know if the output is correct or not, maybe I can try using the same dataset the authors were using and then compare the result with theirs?
P.S. This particular test example was giving same bins with the regular classifiers and MLE classifier so I changed it in one of the later commits and added a better test example that would give different bins to highlight the difference.
Closes #111 (hopefully)
I wasn't sure how to proceed with the LP solution so I used the Dynamic Programming Approach. Added a new class
MaximumLikelihoodthat accepts an array of standard deviationssigmaalong withy, the implementation follows the formulation described in:Added the above paper to
references.biband added 2 tests totest_mapclassify.py.There are a lot of comments meant for the sake of understanding and can be removed before merging.