Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
8cad430
feat: implement apifyclient wrapper
daveomri Apr 20, 2026
2404b9c
feat: removed redundant const file
daveomri Apr 20, 2026
b1a89a4
feat: add few more input schemas, helpers and tool classes
daveomri Apr 20, 2026
0aa9175
feat: export new tools from __init__
daveomri Apr 20, 2026
4e46d36
feat: add unit tests
daveomri Apr 20, 2026
fc6ef12
feat: implement tests and introduce tools list
daveomri Apr 21, 2026
cc5be9e
fix: lint fix
daveomri Apr 21, 2026
c2b9cb6
feat: enhance error handling and documentation for apify tools
daveomri Apr 21, 2026
3edf126
fix: iso format fix
daveomri Apr 21, 2026
8c36edc
feat: add apify run task and apify run task and get items tools with …
daveomri Apr 21, 2026
026175a
feat: introduce _ApifyGenericTool base class for Apify tools to strea…
daveomri Apr 21, 2026
110c971
feat: add _actor_tools.py file to define upcomming search and social …
daveomri Apr 21, 2026
a08f63e
fix: add try/except to match others
daveomri Apr 21, 2026
d028531
fix: update timeout constants and improve input schema descripiton in…
daveomri Apr 21, 2026
429a3ed
fix: enhance error handling for missing dataset id in run_actor and r…
daveomri Apr 21, 2026
b914e47
fix: update apifygetdatasetitemstool to return a json object with ite…
daveomri Apr 21, 2026
0f71181
feat: add integration smoke tests for generic Apify tools to validate…
daveomri Apr 21, 2026
50c52f2
feat: implement clamping for timeout, memory, and item limits in apif…
daveomri Apr 21, 2026
ba179a6
feat: clean up _actor_tools.py and tools.py for improved readibility …
daveomri Apr 22, 2026
da900ce
feat: add three new tools to _client.py
daveomri Apr 22, 2026
ff6ffeb
feat: implement apifygooglesearchtool and apifywebcrawlertool
daveomri Apr 22, 2026
6e8888c
feat: implement a apify search retrievel
daveomri Apr 22, 2026
b124ce1
feat: add apify crawl loader to document_loaders.py
daveomri Apr 22, 2026
029b9e1
feat: update __init__
daveomri Apr 22, 2026
c7ee287
feat: add unit tests
daveomri Apr 22, 2026
ec60765
feat: add actor tools unit tests
daveomri Apr 22, 2026
c077186
feat: add retrievers unit tests
daveomri Apr 22, 2026
0b4ecbb
feat: simplify apify crawl loader init and enhance unit tests
daveomri Apr 22, 2026
005294b
ref: align private scope conventions with langchain partner package s…
daveomri Apr 22, 2026
2f74c29
ref: migrate auth to SecretStr + secret_from_env pattern
daveomri Apr 23, 2026
6258b2b
fix: backward-compat fix
daveomri Apr 23, 2026
2905b67
fix: update stale doc string
daveomri Apr 23, 2026
3238c02
chore: removed redundant file
daveomri Apr 23, 2026
92df406
fix: extracted repeated code, fixed secretstr compatibility to apifyt…
daveomri Apr 23, 2026
3a0f666
fix: set min value to timeout, memory and items, add exlude and repr …
daveomri Apr 23, 2026
8614cfd
feat: added repr and exclude to apify api token
daveomri Apr 23, 2026
2bf130a
feat: add type checking to apify core tools list
daveomri Apr 23, 2026
98293d4
feat: add tests for clamped values and apify api token
daveomri Apr 23, 2026
863ed8d
fix: lint fix
daveomri Apr 23, 2026
70527e0
ref: update apify_api_token type to support SecretStr in document loa…
daveomri Apr 24, 2026
797b7f9
Merge branch 'feat/modernize-langchain-integration-core-tools' into f…
daveomri Apr 24, 2026
f005bc5
fix: turn off logger for ApifySearchRetrieval
daveomri Apr 24, 2026
dd08098
fix: fix lint errors
daveomri Apr 24, 2026
2804a5c
fix: tests fix
daveomri Apr 24, 2026
ea8b16e
chore: rename tools to match the task description
daveomri Apr 28, 2026
cd1eea1
fix: narrow except blocks in _client.py to SDK/transport errors
daveomri Apr 28, 2026
50c3583
fix: clamp memory_mbytes to Apify platform minimum (128 MB)
daveomri Apr 28, 2026
450728c
fix: narrow empty-dataset message in ApifyGetDatasetItemsTool
daveomri Apr 28, 2026
1360e92
ref: simplify ApifyToolsClient.__init__ to require explicit token
daveomri Apr 28, 2026
09b6c6e
docs: add module-level docstring to tools.py
daveomri Apr 28, 2026
a5bd7cc
ref: rename model_post_init parameter to
daveomri Apr 28, 2026
e0f15e8
Merge branch 'feat/modernize-langchain-integration-core-tools' into f…
daveomri Apr 28, 2026
23242c1
revert: restore env-fallback
daveomri Apr 28, 2026
8f9afe6
Merge branch 'feat/modernize-langchain-integration-core-tools' into f…
daveomri Apr 28, 2026
7ea3e8c
chore: drop placeholder section in _actor_tools.py
daveomri Apr 28, 2026
700e5ab
chore: align APIFY_ACTOR_TOOLS type hint with APIFY_CORE_TOOLS
daveomri Apr 28, 2026
c0dd11e
feat: constrain crawler_type to a Literal of valid Apify values
daveomri Apr 28, 2026
0189943
feat: clamp max_crawl_depth in ApifyWebCrawlerTool
daveomri Apr 28, 2026
6d2422d
feat: expose timeout_secs in ApifyGoogleSearchInput
daveomri Apr 28, 2026
2dfecd7
ref: accept SecretStr token in ApifyCrawlLoader
daveomri Apr 28, 2026
9c81785
docs: clarify ApifyCrawlLoader.lazy_load is not truly lazy
daveomri Apr 28, 2026
49dd4f0
ref: rewrite ApifySearchRetriever to use ApifyToolsClient
daveomri Apr 28, 2026
a060c14
fix: normalise locale codes to lowercase to match Apify Actor schema
daveomri Apr 28, 2026
a908467
fix: extract source URL from metadata.url for apify/rag-web-browser
daveomri Apr 28, 2026
250e1ac
fix: rename actor search group
daveomri May 5, 2026
f4cf20e
fix: test fix
daveomri May 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 55 additions & 3 deletions langchain_apify/__init__.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,71 @@
from __future__ import annotations

from importlib import metadata
from typing import TYPE_CHECKING

from langchain_apify.document_loaders import ApifyDatasetLoader
from langchain_apify.tools import ApifyActorsTool
from langchain_apify._actor_tools import ApifyGoogleSearchTool, ApifyWebCrawlerTool
from langchain_apify.document_loaders import ApifyCrawlLoader, ApifyDatasetLoader
from langchain_apify.retrievers import ApifySearchRetriever
from langchain_apify.tools import (
ApifyActorsTool,
ApifyGetDatasetItemsTool,
ApifyRunActorAndGetDatasetTool,
ApifyRunActorTool,
ApifyRunTaskAndGetDatasetTool,
ApifyRunTaskTool,
ApifyScrapeUrlTool,
)
from langchain_apify.wrappers import ApifyWrapper

if TYPE_CHECKING:
from langchain_core.tools import BaseTool

try:
__version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
# Case where package metadata is not available.
__version__ = ''
del metadata # optional, avoids polluting the results of dir(__package__)

# Convenience tool-class lists for selective agent binding.
# Binding all tools at once overwhelms the LLM context window;
# pick the group(s) relevant to your use case.

APIFY_CORE_TOOLS: list[type[BaseTool]] = [
ApifyRunActorTool,
ApifyGetDatasetItemsTool,
ApifyRunActorAndGetDatasetTool,
ApifyScrapeUrlTool,
ApifyRunTaskTool,
ApifyRunTaskAndGetDatasetTool,
]

APIFY_SEARCH_TOOLS: list[type[BaseTool]] = [
ApifyGoogleSearchTool,
ApifyWebCrawlerTool,
]

__all__ = [
# Existing components (backward-compatible)
'ApifyActorsTool',
'ApifyDatasetLoader',
'ApifyWrapper',
# Core generic tools
'ApifyGetDatasetItemsTool',
'ApifyRunActorAndGetDatasetTool',
'ApifyRunActorTool',
'ApifyRunTaskAndGetDatasetTool',
'ApifyRunTaskTool',
'ApifyScrapeUrlTool',
# Actor-specific tools
'ApifyGoogleSearchTool',
'ApifyWebCrawlerTool',
# Retriever
'ApifySearchRetriever',
# Loaders
'ApifyCrawlLoader',
# Tool group lists
'APIFY_SEARCH_TOOLS',
'APIFY_CORE_TOOLS',
# Meta
'__version__',
]
158 changes: 158 additions & 0 deletions langchain_apify/_actor_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
"""Actor-specific tool subclasses.

Tools in this module wrap a single Apify Actor behind a simplified,
LLM-friendly interface. They inherit from
:class:`~langchain_apify.tools._ApifyGenericTool`.
"""

from __future__ import annotations

import json
from typing import TYPE_CHECKING

from langchain_core.tools import ToolException
from pydantic import BaseModel # noqa: TCH002

from langchain_apify.tools import (
ApifyGoogleSearchInput,
ApifyWebCrawlerInput,
CrawlerType,
_ApifyGenericTool,
)

if TYPE_CHECKING:
from langchain_core.callbacks import CallbackManagerForToolRun

# ---------------------------------------------------------------------------
# Search & Crawling tools
# ---------------------------------------------------------------------------


class ApifyGoogleSearchTool(_ApifyGenericTool): # type: ignore[override]
"""Search Google and return structured results via Apify.

Wraps the ``apify/google-search-scraper`` Actor behind a simplified,
LLM-friendly interface. Returns a JSON string containing an array of
result objects, each with ``title``, ``url``, and ``description`` keys.

Args:
apify_api_token: Apify API token. Falls back to the ``APIFY_API_TOKEN``
environment variable when *None*.

Returns:
JSON string — an array of ``{"title", "url", "description"}`` objects.

Example:
.. code-block:: python

import os
os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"

from langchain_apify import ApifyGoogleSearchTool

tool = ApifyGoogleSearchTool()
results = tool.invoke({"query": "LangChain framework"})
"""

name: str = 'apify_google_search'
description: str = (
'Search Google using Apify and return structured results as a JSON array.'
' Each result has keys: title, url, description.'
' Required: query (str) — the search query.'
' Optional: max_results (int, default 10),'
' country_code (str|null), language_code (str|null),'
' timeout_secs (int, default 300).'
)
args_schema: type[BaseModel] = ApifyGoogleSearchInput

def _run(
self,
query: str,
max_results: int = 10,
country_code: str | None = None,
language_code: str | None = None,
timeout_secs: int = 300,
_run_manager: CallbackManagerForToolRun | None = None,
) -> str:
try:
results = self._client.google_search(
query,
max_results=self._clamp_items(max_results),
country_code=country_code,
language_code=language_code,
timeout_secs=self._clamp_timeout(timeout_secs),
)
except RuntimeError as exc:
raise ToolException(str(exc)) from exc
return json.dumps(results)


class ApifyWebCrawlerTool(_ApifyGenericTool): # type: ignore[override]
"""Crawl a website and return page content as JSON via Apify.

Wraps the ``apify/website-content-crawler`` Actor. Returns a JSON string
containing an array of page objects, each with ``url``, ``title``, and
``content`` (markdown) keys.

Args:
apify_api_token: Apify API token. Falls back to the ``APIFY_API_TOKEN``
environment variable when *None*.

Returns:
JSON string — an array of ``{"url", "title", "content"}`` objects.

Example:
.. code-block:: python

import os
os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"

from langchain_apify import ApifyWebCrawlerTool

tool = ApifyWebCrawlerTool()
pages = tool.invoke({
"url": "https://docs.apify.com",
"max_crawl_pages": 5,
})
"""

name: str = 'apify_web_crawler'
description: str = (
'Crawl a website using Apify and return page content as a JSON array.'
' Each page object has keys: url, title, content (markdown).'
' Required: url (str) — seed URL to crawl.'
' Optional: max_crawl_pages (int, default 10),'
' max_crawl_depth (int, default 1),'
' crawler_type (str, default "cheerio"),'
' timeout_secs (int, default 300).'
)
args_schema: type[BaseModel] = ApifyWebCrawlerInput

def _run(
self,
url: str,
max_crawl_pages: int = 10,
max_crawl_depth: int = 1,
crawler_type: CrawlerType = 'cheerio',
timeout_secs: int = 300,
_run_manager: CallbackManagerForToolRun | None = None,
) -> str:
try:
items = self._client.crawl_website(
url,
max_crawl_pages=self._clamp_items(max_crawl_pages),
max_crawl_depth=self._clamp_depth(max_crawl_depth),
crawler_type=crawler_type,
timeout_secs=self._clamp_timeout(timeout_secs),
)
except RuntimeError as exc:
raise ToolException(str(exc)) from exc
pages = [
{
'url': item.get('url', ''),
'title': item.get('metadata', {}).get('title', ''),
'content': item.get('markdown') or item.get('text', ''),
}
for item in items
]
return json.dumps(pages)
Loading
Loading