blob: 8810654ec0fc25481d58c02c778f6e9dc577145a (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
|

# Overview
The VFRAME/Check image deduplication API will provide capabilities to determine if a query image matches any prior submitted query images. The service is designed to integrate with the Check workflow described below.
## Requirements
- provide matching results for at least
- Second rate: peak 1 image every 10 seconds
- Hourly rate: ≈3.6K images per hour
- Daily rate: ≈87K image requests submitted per day
- Weekly rate: ≈610K images per week
- provide an authenticated API service to match a query image to all previously submitted query images and receive a match result
- authenticated requests only to protect against misuse
- authenticated services for Check will be handled manually requesting/exchanging credentials
- provide an interactive demo page to help Check users understand threshold settings
- provide adjustable threshold settings in URI parameter, and/or provide list of similar matches with threshold
- scale to accommodate up to 1 million unique image records to compare against
- after 1M records, we will need to rescale/rebuild the architecture to accommodate
## User story
- Audience member sends image to a number on WhatsApp (or generically, user adds an image to Check). - Handled by Smooch.
- Image is ingested into Check.
- Handled by Smooch & Check.
- Image is matched against existing images in Check.
- MVP:
- detect near-identical matches that are different sizes, resolutions.
- Assess for feasibility:
- find same meme images used for different claims
- find same claims using different meme images
- find same images (not memes) with different text
- find same images + text in different physical files
- Image is automatically related to any matching images in Check.
- Analyst can confirm matches and dissociate any false matches. - Handled in Check
- Audience member receives the verification result for any matching images with existing final-status.
- Handled in Check, Smooch, and WA Business API
## Example Images
Matches that can be detected
|Query|Known Image|Match|Notes|
|---|---|---|---|
|||True|Different size|
Matches that can be detected
|Query|Known Image|Match|Notes|
|---|---|---|---|
|||True|Small grey border|
|||True|Small black
border|
|||True|Slightly desaturated|
<div class="pagebreak"></div>
Matches that can be detected
|Query|Known Image|Match|Notes|
|---|---|---|---|
|||True|Different JPG export quality|
|||True|Image slightly stretched horizontally|
|||True|Image slightly stretched vertically|
<div class="pagebreak"></div>
Similar images that can not be detected
|Query|Known Image|Match|Notes|
|---|---|---|---|
|||False|collage|
|||False|collage, and text section|
<div class="pagebreak"></div>
## Data Retention
- we will retain the posted images and store:
- the computed hash features
- timestamp
- sha256 of the file
- mysql data will be stored on a server in Frankfurt
- image data on S3 storage will be stored on a server in Amsterdam
## Out of Scope
- Interactive matching
- Video matching
- Content analysis
- Text detection, text recognition (OCR)
- User-in-the-loop machine learning for improvement of matching algorithms
## Assets Required
- we will need a sample set of images to test and calibrate with here
- and timely feedback to make any needed revisions
|