Conversation
- Fix 'bistreams' -> 'bitstreams' in docstring - Fix 'sucessfully' -> 'successfully' in log messages (2 instances) These spelling errors were found in comments and log messages and do not affect code behavior.
add group info to resourcePolicy
Backports critical bugfixes and proxy support from upstream the-library-code/dspace-rest-python.
Bugfixes:
- Fix User model trailing commas that turned fields into tuples
- Fix get_items() using wrong embedded key ('collections' -> 'items')
- Fix InProgressSubmission step assigned from lastModified instead of step
- Fix InProgressSubmission type assigned from lastModified instead of type
- Fix EntityType type field overwriting label
- Fix parse_json to handle None response safely
Improvements:
- Add proxy support via PROXY_URL env var and proxies constructor param
- Add proxies to all HTTP methods (GET, POST, PUT, DELETE, PATCH, send)
- Add proxies to authenticate status check GET
- Add params parameter to api_patch method
- Add embedded attribute to HALResource base class
- Add ITER_PAGE_SIZE class variable (preparation for pagination)
- Add upstream_ref/ to .gitignore
…roxy-support fix: upstream bugfixes and proxy support backport
|
invalid PR, sorry |
|
I was looking at that while you closed it. I noticed you try to implement automatic re-login on timeouts. See #59 for the approach we are currently looking into. It basically misses a test run and then should be read for merging. Compared to your approach it does deal with all kind of http requests, not only post. |
There was a problem hiding this comment.
Pull request overview
This pull request makes several changes beyond the timeout addition mentioned in the title. The main changes include adding timeout support to POST requests, introducing proxy configuration, implementing resource policy management, fixing multiple bugs in assignment and error handling, improving logging consistency, and adding new utility methods for retrieving collections and managing metadata.
Changes:
- Added timeout parameter to
api_postmethod for better request timeout control - Introduced proxy support via
PROXY_DICTandproxiesparameter throughout HTTP methods - Added
ResourcePolicyclass and related methods (get_resourcepolicy,create_resourcepolicy) for managing resource policies - Fixed several bugs including incorrect field assignments in
InProgressSubmissionandSubmissionFormclasses, and trailing commas inUserclass attributes - Enhanced error handling and authentication retry logic for 401/403 responses
- Improved logging consistency by introducing
_loggerinstance (though some areas still use globallogging) - Added new methods:
get_owningCollection,remove_metadata,verify_response, andlast_errproperty - Made pysolr optional by moving import into try-except block
- Updated
get_itemto return Item object instead of raw response - Fixed spelling errors (bistreams → bitstreams, sucessfully → successfully)
- Removed unnecessary f-string prefixes from static strings in example files and removed unused imports
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
| dspace_rest_client/client.py | Core client changes including timeout support, proxy configuration, new resource policy methods, improved error handling, logging improvements, bug fixes, and new utility methods |
| dspace_rest_client/models.py | Added ResourcePolicy class, fixed User class syntax bugs, corrected field assignments in InProgressSubmission and SubmissionForm, added embedded attribute to HALResource, added resourcePolicies property to DSpaceObject, fixed spelling errors, removed unused imports |
| example.py | Removed unnecessary f-string prefixes from static error messages |
| example_gets.py | Removed unused imports and unnecessary f-string prefix |
| console.py | Removed unused imports and unnecessary f-string prefix |
Comments suppressed due to low confidence (1)
dspace_rest_client/client.py:431
- Missing parameter documentation: The
detailsparameter is not documented in the docstring. Consider adding documentation explaining that this is an optional dictionary that will be populated with pagination information from the search results.
def search_objects(self, query=None, scope=None, filters=None, page=0, size=20, sort=None, dso_type=None, details=None):
"""
Do a basic search with optional query, filters and dsoType params.
@param query: query string
@param scope: uuid to limit search scope, eg. owning collection, parent community, etc.
@param filters: discovery filters as dict eg. {'f.entityType': 'Publication,equals', ... }
@param page: page number (not like 'start' as this is not row number, but page number of size {size})
@param size: size of page (aka. 'rows'), affects the page parameter above
@param sort: sort eg. 'title,asc'
@param dso_type: DSO type to further filter results
@return: list of DspaceObject objects constructed from API resources
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if r.status_code == 403: | ||
| if not retry and r.status_code in (401, 403): | ||
| r_json = parse_json(r) | ||
| if 'message' in r_json and 'CSRF token' in r_json['message']: |
There was a problem hiding this comment.
Potential error handling issue: If parse_json(r) returns None (e.g., due to a malformed response), the check 'message' in r_json on line 802 will raise a TypeError. Consider adding a null check: if r_json and 'message' in r_json and 'CSRF token' in r_json['message']:.
| if 'message' in r_json and 'CSRF token' in r_json['message']: | |
| if r_json and 'message' in r_json and 'CSRF token' in r_json['message']: |
| @@ -918,6 +1070,24 @@ def add_metadata(self, dso, field, value, language=None, authority=None, confide | |||
|
|
|||
| return dso_type(api_resource=parse_json(r)) | |||
There was a problem hiding this comment.
Missing error handling: If parse_json(r) returns None (e.g., due to an API error), then dso_type(api_resource=parse_json(r)) would be called with None, which could cause an error. Consider checking if the response is valid before constructing the object.
| logging.error('Need a valid item') | ||
| _logger.error('Need a valid item') | ||
| return None | ||
| return Item(api_resource=parse_json(self.create_dso(url, params=params, data=item.as_dict()))) |
There was a problem hiding this comment.
Missing error handling: If parse_json(self.create_dso(...)) returns None (e.g., due to an API error), then Item(api_resource=None) would be called, which could cause an error in the Item constructor. Consider checking if the parsed JSON is valid before constructing the Item object.
| return Item(api_resource=parse_json(self.create_dso(url, params=params, data=item.as_dict()))) | |
| api_resource = parse_json(self.create_dso(url, params=params, data=item.as_dict())) | |
| if not api_resource: | |
| _logger.error('Failed to create item: invalid or empty API response') | |
| return None | |
| return Item(api_resource=api_resource) |
| def delete_user(self, user): | ||
| if not isinstance(user, User): | ||
| logging.error(f'Must be a valid user') | ||
| logging.error('Must be a valid user') |
There was a problem hiding this comment.
Inconsistent logging: This line uses the global logging module instead of the _logger instance that's used throughout the rest of this file. This should be changed to _logger.error for consistency.
| logging.error('Must be a valid user') | |
| _logger.error('Must be a valid user') |
| logging.error('Need a DSO or a URL to delete') | ||
| return None | ||
| else: | ||
| if not isinstance(dso, SimpleDSpaceObject): | ||
| logging.error(f'Only SimpleDSpaceObject types (eg Item, Collection, Community, EPerson) ' | ||
| f'are supported by generic update_dso PUT.') | ||
| logging.error('Only SimpleDSpaceObject types (eg Item, Collection, Community, EPerson) ' | ||
| 'are supported by generic update_dso PUT.') |
There was a problem hiding this comment.
Inconsistent logging: These lines use the global logging module instead of the _logger instance that's used throughout the rest of this file. These should be changed to _logger.error for consistency.
| @@ -187,7 +228,9 @@ def api_post(self, url, params, json, retry=False): | |||
| @param retry: Has this method already been retried? Used if we need to refresh XSRF. | |||
| @return: Response from API | |||
| """ | |||
| r = self.session.post(url, json=json, params=params, headers=self.request_headers) | |||
| self._last_err = None | |||
| r = self.session.post(url, json=json, params=params, headers=self.request_headers, | |||
| proxies=self.proxies, timeout=timeout) | |||
There was a problem hiding this comment.
Inconsistent timeout support: The PR title mentions "Add timeout", but the timeout parameter is only added to api_post method and not to other HTTP methods like api_get, api_put, api_delete, or api_patch. For consistency and completeness, consider adding timeout support to all API methods, or clarify why only POST requests need timeout control.
| if self.groupName is None and '_embedded' in api_resource: | ||
| if 'group' in api_resource['_embedded']: | ||
| self.groupName = api_resource['_embedded']['group'].get('name') | ||
| self.groupUUID = api_resource['_embedded']['group'].get('uuid') |
There was a problem hiding this comment.
Missing blank line: There should be a blank line before the method definition to separate it from the constructor, following PEP 8 style guidelines.
| self.groupUUID = api_resource['_embedded']['group'].get('uuid') | |
| self.groupUUID = api_resource['_embedded']['group'].get('uuid') |
| Remove metadata | ||
| """ | ||
| if dso is None or field is None or not isinstance(dso, DSpaceObject): | ||
| _logger.error('Invalid or missing DSpace object, field or value string') |
There was a problem hiding this comment.
Incorrect error message: The error message mentions "value string" but the remove_metadata function doesn't have a value parameter (unlike add_metadata). The message should be corrected to "Invalid or missing DSpace object or field string".
| _logger.error('Invalid or missing DSpace object, field or value string') | |
| _logger.error('Invalid or missing DSpace object or field string') |
| url = f'{self.API_ENDPOINT}/core/items/{item_uuid}/owningCollection' | ||
| try: | ||
| r = self.api_get(url, None, None) | ||
| self.verify_response(r, f"item:{item_uuid}", True) |
There was a problem hiding this comment.
The return value of verify_response is not being checked. If the response verification fails (non-200 status or invalid JSON), the method will still proceed to parse and return data, which could lead to errors. Consider checking the return value and returning early if verification fails, e.g., if not self.verify_response(r, f"item:{item_uuid}", True): return None.
| self.verify_response(r, f"item:{item_uuid}", True) | |
| if not self.verify_response(r, f"item:{item_uuid}", True): | |
| return None |
| r_json = parse_json(response=r) | ||
| return Item(r_json) |
There was a problem hiding this comment.
Missing error handling: If the API request fails or returns invalid data, parse_json(response=r) could return None, and then Item(r_json) would be called with None, which could cause an error in the Item constructor. Consider checking if r_json is None before constructing the Item object.
|
@vidiecan it would be great, if you could create a new PR with the resource policies support you added and with the deletion of a metadata value. |
No description provided.