Question about PCA

Hi, developers! Thank you for working on this project. It has helped me tremendously with my work.

I have a quick question.

I used Word Embeddings visualization via PCA for a text classification model and found some outliers that were far from the other examples.

Then, I checked the PCA code at [here](https://github.com/PAIR-code/lit/blob/main/lit_nlp/components/pca.py) and found the following lines of code:

```
self._mean = np.mean(x_train, 0)
x_train = x_train - self._mean
```

As far as I understood from PR #559, the code above is a reimplementation of Scikit-learn's PCA in NumPy.

But here's the question: Why do you only mean-centering?

Why not add standardization to make the code look like this?

```
self._mean = np.mean(x_train, 0)
self._std = np.std(x_train, 0)
x_train = (x_train - self._mean) / self._std
```

After I changed to that, my visualizations started to look more 'ordered'

Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about PCA #1623

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about PCA #1623

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions