diff options
| author | Jules Laplace <julescarbon@gmail.com> | 2018-11-09 19:48:43 +0100 |
|---|---|---|
| committer | Jules Laplace <julescarbon@gmail.com> | 2018-11-09 19:48:43 +0100 |
| commit | 34b1972124da38f6cd28c1991f190bd4bdf1fe9e (patch) | |
| tree | 3b0e65b1f59c995029479c0e04f64a2240bfc701 /README.md | |
| parent | ca626447b49c55f40ef58d97ee7ff1784f3481b0 (diff) | |
interface with the google sheet
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 18 |
1 files changed, 14 insertions, 4 deletions
@@ -84,7 +84,7 @@ Fetch the files listed in ieee.json and process them. Use pdfminer.six to extract the first page from the PDFs. -### s2-pdf-report.py report_first_pages +### s2-pdf-first-pages.py Perform initial extraction of university-like terms, to be geocoded. @@ -115,11 +115,21 @@ After scraping these universities, we got up to 47% match rate on papers from th ### expand-uni-lookup.py -At this point in the process, I had divided the task of scraping and geocoding between 4 different machines, so I reduced down the output of these scripts into the file `reports/all_institutions.csv`. I got increased accuracy from my paper classifier using just university names, so I wrote this script to group the rows using the extracted university names, and show me which address they geocode to. This file must be gone through manually. This technique geocoded around 47% of papers. +By now I had a list of institutions in `reports/all_institutions.csv` (done by merging the results of the geocoding, as I had done this on 4 computers and thus had 4 files of institutions). This file must be gone through manually. This technique geocoded around 47% of papers. -### s2-pdf-report.py report_geocoded_papers +At this point I moved `reports/all_institutions.csv` into the Google Sheets. All further results use the CSV on Google Sheets. -Perform initial extraction of university-like terms, to be geocoded. +### s2-pdf-report.py + +Generates reports of things from the PDFs that were not found. + +### s2-geocode-spreadsheet.py + +To add new institutions, simply list them in the spreadsheet with the lat/lng fields empty. Then run this script and anything missing a lat/lng will get one. + +### s2-citation-report.py + +Generate the main report with maps and citation lists. --- |
