Hi, developers! Thank you for working on this project. It has helped me tremendously with my work.
I have a quick question.
I used Word Embeddings visualization via PCA for a text classification model and found some outliers that were far from the other examples.
Then, I checked the PCA code at here and found the following lines of code:
self._mean = np.mean(x_train, 0)
x_train = x_train - self._mean
As far as I understood from PR #559, the code above is a reimplementation of Scikit-learn's PCA in NumPy.
But here's the question: Why do you only mean-centering?
Why not add standardization to make the code look like this?
self._mean = np.mean(x_train, 0)
self._std = np.std(x_train, 0)
x_train = (x_train - self._mean) / self._std
After I changed to that, my visualizations started to look more 'ordered'
Thank you in advance!
Hi, developers! Thank you for working on this project. It has helped me tremendously with my work.
I have a quick question.
I used Word Embeddings visualization via PCA for a text classification model and found some outliers that were far from the other examples.
Then, I checked the PCA code at here and found the following lines of code:
As far as I understood from PR #559, the code above is a reimplementation of Scikit-learn's PCA in NumPy.
But here's the question: Why do you only mean-centering?
Why not add standardization to make the code look like this?
After I changed to that, my visualizations started to look more 'ordered'
Thank you in advance!