Add Pathway Commons v14 data analysis report and plots#5
Add Pathway Commons v14 data analysis report and plots#5chronicgiardia wants to merge 2 commits intoPathwayCommons:masterfrom
Conversation
- REPORT.md: full EDA report covering interaction types, data sources, network connectivity, and data quality notes - interaction_types.png: top 10 interaction types bar chart - data_sources.png: top 10 data sources bar chart - degree_distribution.png: degree distribution histogram and log-log plot Co-Authored-By: Oz <oz-agent@warp.dev>
There was a problem hiding this comment.
Pull request overview
Adds an exploratory analysis report for the Pathway Commons v14 Extended SIF dataset and includes the generated plots referenced by the report. This complements the repo’s purpose (working with PC v14 SIF data) by documenting dataset composition, source contributions, and graph connectivity characteristics.
Changes:
- Add
REPORT.mddescribing interaction type distribution, data sources, degree distribution, and data quality notes for PC v14. - Add plots for interaction types, data sources, and degree distribution.
- Reference these plots from the report for a self-contained writeup.
Reviewed changes
Copilot reviewed 1 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| REPORT.md | New analysis report with dataset summary, connectivity stats, and embedded plot references. |
| interaction_types.png | Plot used by the report to visualize top interaction types. |
| data_sources.png | Plot used by the report to visualize top data sources. |
| degree_distribution.png | Plot used by the report to visualize the degree distribution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Pathway Commons v14 — Data Analysis Report | ||
|
|
||
| ## Dataset Overview | ||
| - **Source:** Pathway Commons v14 (`pc-hgnc.txt.gz` from `download.baderlab.org`) |
There was a problem hiding this comment.
The dataset reference here (pc-hgnc.txt.gz from download.baderlab.org) doesn’t match the repo’s documented download + naming convention (README uses data.gz downloaded from pathwaycommons.org/archives/.../PC14.All.hgnc.txt.gz). To avoid confusion for users trying to reproduce the analysis, please align this source line with README (same URL/filename) or explicitly state that this file is equivalent and how it was obtained/renamed.
| - **Source:** Pathway Commons v14 (`pc-hgnc.txt.gz` from `download.baderlab.org`) | |
| - **Source:** Pathway Commons v14 (`PC14.All.hgnc.txt.gz` from `pathwaycommons.org/archives/...`, saved locally as `data.gz` per README) |
| ## Generated Artifacts | ||
| - `data.gz` — raw Pathway Commons v14 data file |
There was a problem hiding this comment.
data.gz is listed under “Generated Artifacts”, but it’s an input dataset (downloaded) rather than something produced by the analysis, and it isn’t included in this PR. Consider renaming this section to distinguish inputs vs. outputs, or move data.gz to “Dataset Overview” / “Inputs” and keep this section limited to committed/generated outputs (the report + plots).
| ## Generated Artifacts | |
| - `data.gz` — raw Pathway Commons v14 data file | |
| ## Generated Outputs |
| | 17 | **NOG** | **6,087** | **Gene** | | ||
| | 18 | chebi:78510 | 6,004 | Small molecule | | ||
| | 19 | chebi:23414 | 5,986 | Small molecule | | ||
| | 20 | CHEBI:45713 | 5,808 | Small molecule | |
There was a problem hiding this comment.
The ChEBI identifier casing is inconsistent (CHEBI:45713 vs chebi:... above). If these are meant to be the same identifier namespace, consider normalizing the casing in this table (or note why this one differs) so readers don’t interpret it as a different ID format.
| | 20 | CHEBI:45713 | 5,808 | Small molecule | | |
| | 20 | chebi:45713 | 5,808 | Small molecule | |
This workflow builds a package using Gradle and publishes it to GitHub Packages upon release creation.
Summary
Exploratory data analysis of the Pathway Commons v14 dataset (
pc-hgnc.txt.gz), including:Key Findings
Co-Authored-By: Oz oz-agent@warp.dev