-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
[BUG] 小红书抓取失败 #855
Description
🔍 问题检查清单
🐛 问题描述
尝试了扫码和cookie进行抓取都是弹出浏览器,扫码登录后有登录态,但是没有下一步的操作
我尝试了不同账号、不同IP局无法抓取
📝 复现步骤
- 扫码或cookie登录
- 弹出浏览器页面
- 自动关闭浏览器页面后报错
💻 运行环境
- 操作系统: mac
- Python版本: Python 3.9.6
- 是否使用IP代理: 代理和直连都尝试了
- 是否使用VPN翻墙软件:代理和直连都尝试了
- 目标平台(抖音/小红书/微博等): 小红书
📋 错误日志
[11:35:24]
[ERR]
raise DataFetchError(err_msg)
[11:35:24]
[ERR]
media_platform.xhs.exception.DataFetchError: {"code":-1,"success":false}
[11:35:24]
[DATA]
The above exception was the direct cause of the following exception:
[11:35:24]
[DATA]
Traceback (most recent call last):
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/main.py", line 157, in
[11:35:24]
[DATA]
run(main, async_cleanup, cleanup_timeout_seconds=15.0, on_first_interrupt=_force_stop)
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/tools/app_runner.py", line 109, in run
[11:35:24]
[DATA]
asyncio.run(_runner())
[11:35:24]
[DATA]
File "/Users/gm/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/asyncio/runners.py", line 190, in run
[11:35:24]
[DATA]
return runner.run(main)
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/asyncio/runners.py", line 118, in run
[11:35:24]
[DATA]
return self._loop.run_until_complete(task)
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/.local/share/uv/python/cpython-3.11.14-macos-aarch64-none/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
[11:35:24]
[DATA]
return future.result()
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/tools/app_runner.py", line 96, in _runner
[11:35:24]
[DATA]
await app_main()
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/main.py", line 110, in main
[11:35:24]
[DATA]
await crawler.start()
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/media_platform/xhs/core.py", line 119, in start
[11:35:24]
[DATA]
await self.get_creators_and_notes()
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/media_platform/xhs/core.py", line 209, in get_creators_and_notes
[11:35:24]
[DATA]
all_notes_list = await self.xhs_client.get_all_notes_by_creator(
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/media_platform/xhs/client.py", line 600, in get_all_notes_by_creator
[11:35:24]
[DATA]
notes_res = await self.get_notes_by_creator(
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/media_platform/xhs/client.py", line 574, in get_notes_by_creator
[11:35:24]
[DATA]
return await self.get(uri, params)
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/media_platform/xhs/client.py", line 168, in get
[11:35:24]
[DATA]
return await self.request(
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/.venv/lib/python3.11/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
[11:35:24]
[DATA]
return await fn(*args, **kwargs)
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/.venv/lib/python3.11/site-packages/tenacity/_asyncio.py", line 47, in call
[11:35:24]
[DATA]
do = self.iter(retry_state=retry_state)
[11:35:24]
[DATA]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[11:35:24]
[DATA]
File "/Users/gm/Documents/Program/MediaCrawler/.venv/lib/python3.11/site-packages/tenacity/init.py", line 326, in iter
[11:35:24]
[DATA]
raise retry_exc from fut.exception()
[11:35:24]
[ERR]
tenacity.RetryError: RetryError[<Future at 0x117d23d10 state=finished raised DataFetchError>]
[11:35:24]
[WARN]
Crawler exited with code: 1

