diff options
| author | Jules Laplace <julescarbon@gmail.com> | 2019-02-10 21:07:57 +0100 |
|---|---|---|
| committer | Jules Laplace <julescarbon@gmail.com> | 2019-02-10 21:07:57 +0100 |
| commit | 5b71f57cc419c140a12bbc8daebb0795cf0e7c68 (patch) | |
| tree | 5bfe9164beb3e3a8d71243ff037413980a772e45 /scraper/README.md | |
| parent | d213702d4baf7a8c776ef71383346c0d6402106a (diff) | |
s2 scrape script that runs the pertinent scripts
Diffstat (limited to 'scraper/README.md')
| -rw-r--r-- | scraper/README.md | 6 |
1 files changed, 5 insertions, 1 deletions
diff --git a/scraper/README.md b/scraper/README.md index 33b2d975..ac50b761 100644 --- a/scraper/README.md +++ b/scraper/README.md @@ -13,6 +13,10 @@ pip install csvtool npm install ``` +## simplified workflow + +If you are just updating the scrape, run `s2-scrape.sh` to run just the scripts you need. + ## workflow ``` @@ -40,7 +44,7 @@ We do a two-stage fetch process as only about 66% of their papers are in this da ### s2-search.py -Loads titles from citations file and queries the S2 search API to get paper IDs, then uses the paper IDs from the search entries to query the S2 papers API to get first-degree citations, authors, etc. +Loads titles from citations file and queries the S2 search API to get paper IDs, then uses the paper IDs from the search entries to query the S2 papers API to get first-degree citations, authors, etc. This will overwrite the `citations_lookup.csv` so maybe don't run this again. ### s2-papers.py |
