summaryrefslogtreecommitdiff
path: root/scraper/README.md
diff options
context:
space:
mode:
authorJules Laplace <julescarbon@gmail.com>2019-02-10 21:07:57 +0100
committerJules Laplace <julescarbon@gmail.com>2019-02-10 21:07:57 +0100
commit5b71f57cc419c140a12bbc8daebb0795cf0e7c68 (patch)
tree5bfe9164beb3e3a8d71243ff037413980a772e45 /scraper/README.md
parentd213702d4baf7a8c776ef71383346c0d6402106a (diff)
s2 scrape script that runs the pertinent scripts
Diffstat (limited to 'scraper/README.md')
-rw-r--r--scraper/README.md6
1 files changed, 5 insertions, 1 deletions
diff --git a/scraper/README.md b/scraper/README.md
index 33b2d975..ac50b761 100644
--- a/scraper/README.md
+++ b/scraper/README.md
@@ -13,6 +13,10 @@ pip install csvtool
npm install
```
+## simplified workflow
+
+If you are just updating the scrape, run `s2-scrape.sh` to run just the scripts you need.
+
## workflow
```
@@ -40,7 +44,7 @@ We do a two-stage fetch process as only about 66% of their papers are in this da
### s2-search.py
-Loads titles from citations file and queries the S2 search API to get paper IDs, then uses the paper IDs from the search entries to query the S2 papers API to get first-degree citations, authors, etc.
+Loads titles from citations file and queries the S2 search API to get paper IDs, then uses the paper IDs from the search entries to query the S2 papers API to get first-degree citations, authors, etc. This will overwrite the `citations_lookup.csv` so maybe don't run this again.
### s2-papers.py