The schema of the 2020 Blob Dataset presents AnonFunctionInvocationId and AnonAppName as unique IDs.
However, there are sometimes discrepancies where the invocationId spans multiple application names. For example,
full_df[full_df['AnonFunctionInvocationId'] == 1967128581]
| Timestamp |
AnonRegion |
AnonUserId |
AnonAppName |
AnonFunctionInvocationId |
AnonBlobName |
BlobType |
AnonBlobETag |
BlobBytes |
Read |
Write |
Datetime |
| 1606814873193 |
q2d |
1209884869 |
01qqaww4 |
1967128581 |
1wx5dgohq1kiwjum |
BlockBlob/text/plain; charset=utf-8 |
f1x5p2nqh6 |
28.0 |
True |
False |
2020-12-01 09:27:53.193 |
| 1607004493391 |
q2d |
1209884869 |
j2alqt8s |
1967128581 |
1wx5dgohq1kiwjum |
BlockBlob/text/plain; charset=utf-8 |
w5mohi6523 |
28.0 |
True |
False |
2020-12-03 14:08:13.391 |
This seems to be a recurrent pattern with this user, for example consider other functionInvocationIds 830734703, 440926898, or 900464655.
This leads me to believe that the cause is unlikely to be unfortunate prefixes of hashed IDs. Is there any way to explain this discrepancy, apart from the data being potentially unclean?
The schema of the 2020 Blob Dataset presents
AnonFunctionInvocationIdandAnonAppNameas unique IDs.However, there are sometimes discrepancies where the invocationId spans multiple application names. For example,
This seems to be a recurrent pattern with this user, for example consider other
functionInvocationIds830734703, 440926898, or 900464655.This leads me to believe that the cause is unlikely to be unfortunate prefixes of hashed IDs. Is there any way to explain this discrepancy, apart from the data being potentially unclean?