Monkey by SimonBerend · Pull Request #31 · CorrelAidxNL/forward43

SimonBerend · 2021-10-26T13:22:56Z

Some issues I didnt know how to fix:

there are some errors in the get_survey_responses function, related to 'responses_dict' (lines 39, 48, 50)
All the constituent parts work but I havent ran the script as a whole
Im not sure whether I use the scrape function at the bottom correctly (line 195), it differs a bit from scrape functions used in others scripts
Im not sure whether I have placed the variables in MasterpeaceScraper(search_string = "MEAL", survey_id = '297005313') (line 214) correctly

Updating scrapers before introducing keywords funcionality

andrewsutjahjo · 2021-11-03T15:38:15Z

+        surveys = self.client.get_survey_lists()
+        if 'data' not in surveys:
+            raise ValueError(
+                f"Surveys not found for search string {search_string}, data returned: {surveys}"


I'd change this to f"Surveys not found from client. data returned: {surveys}"
as this part of the code doesn't do anything with the search_string yet

andrewsutjahjo · 2021-11-03T16:08:41Z

+
+    def get_survey_questions(self, survey_id: str) -> Dict[str, str]:
+        """Get the questions associated with the row number in a survey."""
+        pass


Why is this passing, do you have the questions somewhere?

andrewsutjahjo · 2021-11-03T16:13:43Z

+        """Get id of respondent """
+        """Note: multiple projects per respondent"""


Could you make this into one note?

eg:

""" Get id of respondent. Note: multiple projects per respondent """

This'll make the Note a part of the docstring and visible for everyone who's mousing over the function

andrewsutjahjo · 2021-11-03T16:20:14Z

+    def get_questions_dict(self, survey_details):
+        questions = {}


This could use a docstring:
"""Get questions and their answers.
Returns a Dict: {question_id : answer}
"""

andrewsutjahjo · 2021-11-03T16:31:14Z

+    def get_respondent_data(self, responses, survey_id, respondent_id, club_data_dict, entity) -> str:
+        """get data entities per respondent"""
+        for page in responses[survey_id][respondent_id]['pages']:
+            if page['id'] == '146876559':


What's this specific number? could use a documentation at least, or turn it into a variable if it could change later down the line

andrewsutjahjo · 2021-11-03T16:46:27Z

+                    if len(question["headings"]) > 0:
+                        project_data_ids[question['id']] = question['headings'][0].get('heading', "")


Is it always the first heading of question["headings"] that contains important data, and never the second?

andrewsutjahjo · 2021-11-03T16:49:41Z

+        return club_data
+
+    def get_project_data_ids(self, survey_details: dict):
+        """Get ids of the answers to questions on projects."""


I'm not too sure what this returns - The dict has the format

{ "question_id" : "information_from_heading" }

the docstring says Get ids of the answers to questions on projects.
Is the question_id then also the id of the answer? What's the heading?

andrewsutjahjo · 2021-11-03T16:56:06Z

+        for x in unique_values:
+            ids_per_value = [key for key, value in project_data_ids.items() if value == x]
+            ids_per_value.sort()
+            if len(ids_per_value) > 1:
+                split_project_dict[x] = ids_per_value
+        return split_project_dict


What this does is it flips the keys and values in project_data_ids and if there are duplicate values, it makes a new dict where the initial value is the new key, and the new value is a list of the keys that belong to that value.
However, if there's 1 or less values, this just returns an empty dict

andrewsutjahjo · 2021-11-03T17:02:14Z

+            project_list.append({
+                'id'              : 'respondent_id'+ str(i),
+                'title'           : self.get_project_data(responses, survey_id, respondent_id, split_project_data_dict, project_number = i, entity = 'Project Title'),
+                'description'     : self.get_project_data(responses, survey_id, respondent_id, split_project_data_dict, project_number = i, entity = 'Describe your project (at least 300 words)<br><br><em>- Context (What is the dilemma that the project is trying to tackle? Why is it important for this neighbourhood/group of people/the country?</em><br><em>- Activities (What did you do?)</em><br><em>- Results (What did you achieve? What did you create, produce, accomplish? Try to include numbers, if possible).</em><br><em>- Impact (What changed in the community? What did you learn yourself or as a team? Did you meet your own expectations)?</em>'),


I'm worried that this is a very long string for an entity

andrewsutjahjo · 2021-11-03T17:05:53Z

+            except Exception as e:
+                self.logger.exception('Failed to get projects from current page')
+
+            self.write_to_file(projects, str(search_string + respondent_id))


This'll write a separate file per respondent ; so sometimes 1 and sometimes 4 projects. I think we were using a one file per scrape methodology, so I'd say do something like create a projects list at the start of the scrape() method: projects = [] and then in the try: change it to

resp_projects = self.process_response(responses, survey_details, survey_id, respondent_id) projects.extend(resp_projects)

akashrajkn · 2021-11-05T05:08:57Z

+            responses = self.client.get_all_pages_response(survey_id)
+            for response in responses:
+                if not response.get("data", []):
+                    raise ValueError(


This will throw an error for the first item in the list that does not have data param. Is this the intended behaviour?

If not, you could do something like this:

for response in responses: data = response.get('data', []) for response_data in data: respondent_id = response_data.get("id", "") responses_dict[survey_id][respondent_id] = response_data

SimonBerend and others added 16 commits May 6, 2021 12:33

Merge pull request #1 from CorrelAidxNL/master

f128329

Updating scrapers before introducing keywords funcionality

keywords in kickstartscraper v1 + list in hparams

ada2c47

first version monkey API/scraper

82da657

Delete scraper_masterpeace.py~Stashed changes

7cee2de

pickup on MEAL work

338262f

working on process_response

c445694

first version monkey API/scraper

826c0f1

Delete scraper_masterpeace.py~Stashed changes

2994422

pickup on MEAL work

c301c13

working on process_response

442d642

merging with master

941972a

remove .idea/ from repo

aa4dd68

Merge branch 'CorrelAidxNL:master' into monkey

1efb602

first version survey monkey scraper

708154e

Delete forward43/notebooks directory

8a8466e

deleted dated survey monkey setup

a5d4d85

SimonBerend requested a review from andrewsutjahjo October 26, 2021 13:22

error fix in monkey scraper

04e5714

SimonBerend requested a review from akashrajkn October 27, 2021 16:29

andrewsutjahjo requested changes Nov 3, 2021

View reviewed changes

akashrajkn reviewed Nov 5, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monkey#31

Monkey#31
SimonBerend wants to merge 17 commits intoCorrelAidxNL:masterfrom
SimonBerend:monkey

SimonBerend commented Oct 26, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

andrewsutjahjo Nov 3, 2021

Uh oh!

akashrajkn Nov 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"""Get id of respondent """
		"""Note: multiple projects per respondent"""

		if len(question["headings"]) > 0:
		project_data_ids[question['id']] = question['headings'][0].get('heading', "")

Conversation

SimonBerend commented Oct 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants