s2 scrape script that runs the pertinent scripts

author: Jules Laplace <julescarbon@gmail.com> 2019-02-10 21:07:57 +0100
committer: Jules Laplace <julescarbon@gmail.com> 2019-02-10 21:07:57 +0100
commit: 5b71f57cc419c140a12bbc8daebb0795cf0e7c68 (patch)
tree: 5bfe9164beb3e3a8d71243ff037413980a772e45 /scraper/README.md
parent: d213702d4baf7a8c776ef71383346c0d6402106a (diff)
1 files changed, 5 insertions, 1 deletions
diff --git a/scraper/README.md b/scraper/README.md
index 33b2d975..ac50b761 100644
--- a/scraper/README.md
+++ b/scraper/README.md
@@ -13,6 +13,10 @@ pip install csvtool
 npm install
 ```
 
+## simplified workflow
+
+If you are just updating the scrape, run `s2-scrape.sh` to run just the scripts you need.
+
 ## workflow
 
 ```
@@ -40,7 +44,7 @@ We do a two-stage fetch process as only about 66% of their papers are in this da
 
 ### s2-search.py
 
-Loads titles from citations file and queries the S2 search API to get paper IDs, then uses the paper IDs from the search entries to query the S2 papers API to get first-degree citations, authors, etc.
+Loads titles from citations file and queries the S2 search API to get paper IDs, then uses the paper IDs from the search entries to query the S2 papers API to get first-degree citations, authors, etc.  This will overwrite the `citations_lookup.csv` so maybe don't run this again.
 
 ### s2-papers.py
author	Jules Laplace <julescarbon@gmail.com>	2019-02-10 21:07:57 +0100
committer	Jules Laplace <julescarbon@gmail.com>	2019-02-10 21:07:57 +0100
commit	5b71f57cc419c140a12bbc8daebb0795cf0e7c68 (patch)
tree	5bfe9164beb3e3a8d71243ff037413980a772e45 /scraper/README.md
parent	d213702d4baf7a8c776ef71383346c0d6402106a (diff)