Unexpected data in Blob Dataset 2020

The schema of the 2020 Blob Dataset presents `AnonFunctionInvocationId` and `AnonAppName` as unique IDs. 

However, there are sometimes discrepancies where the invocationId spans multiple application names. For example, 

```
full_df[full_df['AnonFunctionInvocationId'] == 1967128581]
```

Timestamp | AnonRegion | AnonUserId | AnonAppName | AnonFunctionInvocationId | AnonBlobName | BlobType | AnonBlobETag | BlobBytes | Read | Write | Datetime
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
1606814873193 | q2d | 1209884869 | 01qqaww4 | 1967128581 | 1wx5dgohq1kiwjum | BlockBlob/text/plain; charset=utf-8 | f1x5p2nqh6 | 28.0 | True | False | 2020-12-01 09:27:53.193
1607004493391 | q2d | 1209884869 | j2alqt8s | 1967128581 | 1wx5dgohq1kiwjum | BlockBlob/text/plain; charset=utf-8 | w5mohi6523 | 28.0 | True | False | 2020-12-03 14:08:13.391


This seems to be a recurrent pattern with this user, for example consider other `functionInvocationIds` 830734703, 440926898, or 900464655.

This leads me to believe that the cause is unlikely to be unfortunate prefixes of hashed IDs. Is there any way to explain this discrepancy, apart from the data being potentially unclean?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected data in Blob Dataset 2020 #50

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timestamp	AnonRegion	AnonUserId	AnonAppName	AnonFunctionInvocationId	AnonBlobName	BlobType	AnonBlobETag	BlobBytes	Read	Write	Datetime
1606814873193	q2d	1209884869	01qqaww4	1967128581	1wx5dgohq1kiwjum	BlockBlob/text/plain; charset=utf-8	f1x5p2nqh6	28.0	True	False	2020-12-01 09:27:53.193
1607004493391	q2d	1209884869	j2alqt8s	1967128581	1wx5dgohq1kiwjum	BlockBlob/text/plain; charset=utf-8	w5mohi6523	28.0	True	False	2020-12-03 14:08:13.391

Unexpected data in Blob Dataset 2020 #50

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions