No description

Find a file

Alexis c249d619fa Add Zenodo badge to Readme Repo is now archived on Zenodo: https://zenodo.org/records/14591715		2025-01-03 10:56:56 +00:00
analysis	OSSF Scorecard validation	2025-01-02 10:44:04 +00:00
api_clients	analysis replication scripts	2025-01-02 10:44:03 +00:00
dataset	Hydrate graph dataset	2025-01-02 10:44:02 +00:00
disc_validation	DISC validation	2025-01-02 10:44:04 +00:00
relationship_analysis	OSSF Scorecard validation	2025-01-02 10:44:04 +00:00
scorecard_validation	OSSF Scorecard validation	2025-01-02 10:44:04 +00:00
shared_models	analysis replication scripts	2025-01-02 10:44:03 +00:00
storage_interface	analysis replication scripts	2025-01-02 10:44:03 +00:00
.env	analysis replication scripts	2025-01-02 10:44:03 +00:00
.gitignore	OSSF Scorecard validation	2025-01-02 10:44:04 +00:00
.gitmodules	analysis replication scripts	2025-01-02 10:44:03 +00:00
README.md	Add Zenodo badge to Readme	2025-01-03 10:56:56 +00:00
requirements.txt	analysis replication scripts	2025-01-02 10:44:03 +00:00

Replication Package - Links Between Package Popularity, Criticality, and Security in Software Ecosystems

./dataset → A compressed snapshot of the graph dataset, and tooling to load it into Neo4J.
./analysis → Scripts used for all parts of the analysis of packages in the graph dataset.
./relationship_analysis → raw data and spreadsheet used to find correlations between popularity, criticality, and security
./scorecard_validation → Code used to validate the use of OSSF Scorecard as a proxy for security. Validation makes use of static analysis vuln density as a more direct security measure.
./dsic_validation → Code used to validate the use of DISC as a node criticality in Directed scale-free graphs.
./storage_interface → (Internal) Src files for interfacing with the GraphDB
./shared_models → (Internal) Src files defining various datamodels
./api_clients → (Internal) Src files supporting API interactions

Requirements

The Setup instructions for each of the parts of this repo

Download the zipped dataset from Zenodo: https://zenodo.org/records/14577850
Move the zipped dataset into the dataset directory
cd into the dataset directory
Run sudo make load data - this unpacks the dataset snapshot and loads it into a Neo4J instance running on Docker
Run make launch - brings the Neo4J instance up and makes it accessible on port 7687

Follow all setup steps for graph database
Run git submodule init followed by git submodule update to initialise the git-submodule used for topology analysis
Generate a GitHub API Auth Token
Paste GitHub API Auth Token into .env file at root of this repo
Create a Python3.8 virtual environment
Install dependencies from requirements.txt

Analysis scripts are inter-dependant:
- degree_distrib.py -(enables)-> tail-estimation
- disc_sampling.py -(enables)-> disc_ossf_scoring.py
- popularity_sampling.py -(enables)-> popularity_ossf_scoring.py
*_ossf_scoring.py scripts have run times in the multiple hours due to rate limits
for tail estimation (topology analysis) Run python3 tail-estimation/Python3/tail-estimation.py --verbose 1 --delimiter comma --diagplots 1 --savedata 1 <ABSOLUTE PATH>/output/.../deg_distrib.csv <ABSOLUTE PATH>/output/.../tail_estim

Please raise any issues or questions using the built-in GitHub Issue system, Alexis will address them in due course.

Raw Bibtex cite to paper - Pending Camera Ready Approval