feat: modernize langchain integration core tools#28
Conversation
…mline client handling and error management
…media tools for apify integration
…ms and message for empty dataset
…y tools to enforce safety constraints
…and maintability; update test cases for better formatting and error handling
…to apify_api_token
|
I added @MQ37 as a reviewer as he implemented the |
MQ37
left a comment
There was a problem hiding this comment.
Thank you for the PR - in general LGTM 👍 Please address my comments and suggestions regarding the Apify token handling, Actor run memory values and the raw request that I would change to use the Apify client method.
| url = _APIFY_API_ENDPOINT_GET_DEFAULT_BUILD.format(actor_id=actor_obj_id) | ||
| response = requests.request('GET', url, timeout=_REQUESTS_TIMEOUT_SECS) |
There was a problem hiding this comment.
nit: the Apify client nowadays allows getting the default build, we can use that instead of the raw request https://docs.apify.com/api/client/python/reference/class/ActorClient#default_build
There was a problem hiding this comment.
switching to ActorClient.default_build()
- we're pinned to apify-client = "^2.3.0", and in 2.3.0 the sync ActorClient.default_build is incorrectly declared async def, so calling it on the sync client returns a coroutine. It's fixed in 2.5.0.
Adopting the suggestion requires bumping the dep to apify-client = "^2.5.0". Let me know whether to include the bump in this PR or leave the raw request and revisit separately.
|
|
||
| apify_client: ApifyClient | ||
| """An instance of the ApifyClient class from the apify-client Python package.""" | ||
| apify_api_token: SecretStr | None = Field( |
There was a problem hiding this comment.
nit: I would rename the var to apify_token
There was a problem hiding this comment.
I'd lean toward not renaming it.. as it is a public API breaking change, which we're trying to avoid here. Existing user code like ApifyDatasetLoader(..., apify_api_token=...) would stop working.
A few options if we want to push on this:
1) Keep apify_api_token (my preference)
- no breakage, and the field still resolves both APIFY_TOKEN and APIFY_API_TOKEN env vars after the helper change, so users get the standard naming where it matters.
2) Rename to apify_token + keep apify_api_token as a deprecated alias
- cleaner public surface long-term, no immediate breakage, but adds a small maintenance cost.
3) Hard rename
- cleanest but breaks existing users.
Which would you prefer?
Summary
First PR into
feat/modernize-langchain-integration; adds the foundational tools layer and modernizes auth conventions across the package. Upcoming PRs will add search & crawling tools, social media tools, LangChain-native components, and docs to feat/modernize-langchain-integration before merging it all tomain.New code: ~880 lines - Tests: ~1000 lines
ApifyToolsClient(_client.py)ApifyClient, one method per tool operation. Accepts bothSecretStrand rawstrtokens and falls back to theAPIFY_API_TOKENenv var. Shared_list_items_or_raisehelper wraps dataset-fetch errors intoRuntimeError.BaseToolsubclassesApifyRunActorTool,ApifyGetDatasetItemsTool,ApifyRunActorAndGetItemsTool,ApifyScrapeUrlTool,ApifyRunTaskTool,ApifyRunTaskAndGetItemsTool. Exported via theAPIFY_CORE_TOOLS: list[type[BaseTool]]convenience list for selective agent binding._ApifyGenericToolbase classhandle_tool_error=True, developer-controlled safety clamping (_clamp_timeout,_clamp_memory,_clamp_items) with configurable ceilings (max_timeout_secs,max_memory_mbytes,max_items) and hardcoded floor of1to enforce API protocol minimums.document_loaders.py,wrappers.py,tools.py)get_from_dict_or_env+@model_validator(mode='before')withSecretStrfield type andsecret_from_env('APIFY_API_TOKEN', default=None)default factory, matchinglangchain-openai/langchain-anthropicconventions. Tokens are automatically redacted in logs/traces and additionally excluded frommodel_dump()/repr()viaexclude=True, repr=False. Client construction moved to@model_validator(mode='after')/model_post_init. Addedpopulate_by_name=TruetoConfigDicton loader and wrapper. The new tools reuse this same auth pattern; fixing it here avoids shipping two parallel conventions across the package.ApifyActorsTool,ApifyDatasetLoader,ApifyWrapperretain their public API; auth changes are internal.test_tools.py,test_client.py,test_document_loaders.py), integration smoke tests undertests/integration_tests/, and error-scenario coverage (missing token, run failure, network error, clamp floor/ceiling, token excluded frommodel_dump,APIFY_TOKENenv-var fallback on the loader).Review strategy
The diff is larger than a typical PR (~1.9k lines, half of which are tests). Suggested reading order to make it tractable:
_client.py: the newApifyToolsClientabstraction_ApifyGenericToolbase class intools.py, then the 6 tool classes (homogeneous, once one clicks, the rest read fast)document_loaders.py,wrappers.py, and theApifyActorsTool.__init__change intools.pyMerge strategy
This PR targets
feat/modernize-langchain-integration, notmain. The plan is to accumulate all reviewed modernization work (core tools, native tools, social tools, scraping tools, docs) on that branch, and then open a single PR fromfeat/modernize-langchain-integration->mainonce everything is complete and reviewed.