Skip to content

feat: modernize langchain integration core tools#28

Open
daveomri wants to merge 42 commits into
feat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-core-tools
Open

feat: modernize langchain integration core tools#28
daveomri wants to merge 42 commits into
feat/modernize-langchain-integrationfrom
feat/modernize-langchain-integration-core-tools

Conversation

@daveomri
Copy link
Copy Markdown
Collaborator

@daveomri daveomri commented Apr 21, 2026

Summary

First PR into feat/modernize-langchain-integration; adds the foundational tools layer and modernizes auth conventions across the package. Upcoming PRs will add search & crawling tools, social media tools, LangChain-native components, and docs to feat/modernize-langchain-integration before merging it all to main.

New code: ~880 lines - Tests: ~1000 lines

Note on scope: While building the new tools layer, I noticed some pre-existing issues in the legacy code (plain-string token handling, outdated get_from_dict_or_env + mode='before' validator pattern, tokens leaking into model_dump() / repr()). Bcs the new tools reuse the same SecretStr-based auth, keeping two parallel conventions in the package seems confusing and short-lived, so I made the fixes in this PR. And bcs this branch will be used as a starting point for other tools implementation, I incorporated suggested fixes in this PR (but can remove them if they are not desired)


  • ApifyToolsClient (_client.py)
    • Internal helper wrapping ApifyClient, one method per tool operation. Accepts both SecretStr and raw str tokens and falls back to the APIFY_API_TOKEN env var. Shared _list_items_or_raise helper wraps dataset-fetch errors into RuntimeError.
  • 6 new BaseTool subclasses
    • ApifyRunActorTool, ApifyGetDatasetItemsTool, ApifyRunActorAndGetItemsTool, ApifyScrapeUrlTool, ApifyRunTaskTool, ApifyRunTaskAndGetItemsTool. Exported via the APIFY_CORE_TOOLS: list[type[BaseTool]] convenience list for selective agent binding.
  • _ApifyGenericTool base class
    • Common client handling, handle_tool_error=True, developer-controlled safety clamping (_clamp_timeout, _clamp_memory, _clamp_items) with configurable ceilings (max_timeout_secs, max_memory_mbytes, max_items) and hardcoded floor of 1 to enforce API protocol minimums.
  • Auth pattern modernized (document_loaders.py, wrappers.py, tools.py)
    • Replaced legacy get_from_dict_or_env + @model_validator(mode='before') with SecretStr field type and secret_from_env('APIFY_API_TOKEN', default=None) default factory, matching langchain-openai / langchain-anthropic conventions. Tokens are automatically redacted in logs/traces and additionally excluded from model_dump() / repr() via exclude=True, repr=False. Client construction moved to @model_validator(mode='after') / model_post_init. Added populate_by_name=True to ConfigDict on loader and wrapper. The new tools reuse this same auth pattern; fixing it here avoids shipping two parallel conventions across the package.
  • Backward compatible
    • ApifyActorsTool, ApifyDatasetLoader, ApifyWrapper retain their public API; auth changes are internal.
  • Tests
    • Unit tests for all tools & client (~1000 lines across test_tools.py, test_client.py, test_document_loaders.py), integration smoke tests under tests/integration_tests/, and error-scenario coverage (missing token, run failure, network error, clamp floor/ceiling, token excluded from model_dump, APIFY_TOKEN env-var fallback on the loader).

Review strategy

The diff is larger than a typical PR (~1.9k lines, half of which are tests). Suggested reading order to make it tractable:

  1. _client.py: the new ApifyToolsClient abstraction
  2. _ApifyGenericTool base class in tools.py, then the 6 tool classes (homogeneous, once one clicks, the rest read fast)
  3. Auth diff in document_loaders.py, wrappers.py, and the ApifyActorsTool.__init__ change in tools.py
  4. Tests last: mostly linear, grouped by the module they cover

Merge strategy

This PR targets feat/modernize-langchain-integration, not main. The plan is to accumulate all reviewed modernization work (core tools, native tools, social tools, scraping tools, docs) on that branch, and then open a single PR from feat/modernize-langchain-integration -> main once everything is complete and reviewed.

@daveomri daveomri self-assigned this Apr 21, 2026
@daveomri daveomri changed the title Feat: modernize langchain integration core tools feat: modernize langchain integration core tools Apr 23, 2026
@drobnikj drobnikj requested a review from jirispilka April 29, 2026 13:31
@jirispilka jirispilka requested a review from MQ37 May 4, 2026 08:08
@jirispilka
Copy link
Copy Markdown
Contributor

I added @MQ37 as a reviewer as he implemented the langchain-apify package

Copy link
Copy Markdown
Collaborator

@MQ37 MQ37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR - in general LGTM 👍 Please address my comments and suggestions regarding the Apify token handling, Actor run memory values and the raw request that I would change to use the Apify client method.

Comment thread langchain_apify/tools.py Outdated
Comment thread langchain_apify/tools.py Outdated
Comment thread langchain_apify/_utils.py
Comment on lines +120 to +121
url = _APIFY_API_ENDPOINT_GET_DEFAULT_BUILD.format(actor_id=actor_obj_id)
response = requests.request('GET', url, timeout=_REQUESTS_TIMEOUT_SECS)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the Apify client nowadays allows getting the default build, we can use that instead of the raw request https://docs.apify.com/api/client/python/reference/class/ActorClient#default_build

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switching to ActorClient.default_build()

  • we're pinned to apify-client = "^2.3.0", and in 2.3.0 the sync ActorClient.default_build is incorrectly declared async def, so calling it on the sync client returns a coroutine. It's fixed in 2.5.0.

Adopting the suggestion requires bumping the dep to apify-client = "^2.5.0". Let me know whether to include the bump in this PR or leave the raw request and revisit separately.

Comment thread langchain_apify/tools.py Outdated

apify_client: ApifyClient
"""An instance of the ApifyClient class from the apify-client Python package."""
apify_api_token: SecretStr | None = Field(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would rename the var to apify_token

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean toward not renaming it.. as it is a public API breaking change, which we're trying to avoid here. Existing user code like ApifyDatasetLoader(..., apify_api_token=...) would stop working.

A few options if we want to push on this:

1) Keep apify_api_token (my preference)

  • no breakage, and the field still resolves both APIFY_TOKEN and APIFY_API_TOKEN env vars after the helper change, so users get the standard naming where it matters.

2) Rename to apify_token + keep apify_api_token as a deprecated alias

  • cleaner public surface long-term, no immediate breakage, but adds a small maintenance cost.

3) Hard rename

  • cleanest but breaks existing users.

Which would you prefer?

Comment thread langchain_apify/document_loaders.py Outdated
Copy link
Copy Markdown
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants