用n8n做AI工作流驱动网站出海赚美金2：网站截图与写 SEO 友好的介绍

内容隐藏

1 摘要

摘要

本文详解如何利用n8n自动化工作流，结合Playwright截图与AI生成SEO友好内容，解决出海网站内容自动化处理难题，提升谷歌收录与用户体验。

书接上回：用 n8n 做 AI 工作流驱动网站出海赚美金 1：连接 Supabase 数据库

年初的时候我做了一个 AI 编程工具的导航网站，打算流量上来后就可以接谷歌广告赚美金：

https://www.aicoding.help/cn

设想是根据 AI 编程开发的流程来推荐每个阶段的效率工具：灵感与想法、原型与设计、编程开发、数据库与存储、部署上线、扩展能力、数据分析、内容管理、协作与运营

这是整个网站的后端处理逻辑：

需求描述

今天来解决第二个阶段：

把 submit 表中待处理的网站，逐个处理
打开网站截图
AI 生成 seo偏好的详情信息 存放数据库

效果是这样：

每个网站的截图作为卡片封面，网站自带的标题作为卡片标题，同时生成短描述

每个卡片点进去后就是一个详情描述，这里也是希望谷歌收录的信息，所以需要 SEO友好

网站截图

这个需求的核心就是网站截图，有 3 种方式：

第一种是本地部署 python 跑

例如 tap4ai 自带的后端就是用的 pyppeteer，相当于调用 Chrome 去访问网址后，截图，核心代码：

if self.browser is None:
    # 获取自定义的Chrome可执行文件路径
    chrome_executable_path = os.getenv('CHROME_EXECUTABLE_PATH')
    launch_options = {
        'headless': True,
        'ignoreDefaultArgs': ["--enable-automation"],
        'ignoreHTTPSErrors': True,
        'args': ['--no-sandbox', '--disable-dev-shm-usage', '--disable-gpu',
                 '--disable-software-rasterizer', '--disable-setuid-sandbox'],
        'handleSIGINT': False, 
        'handleSIGTERM': False, 
        'handleSIGHUP': False
    }
    
    # 如果设置了自定义Chrome路径，则使用它
    if chrome_executable_path:
        logger.info(f"使用自定义Chrome路径: {chrome_executable_path}")
        launch_options['executablePath'] = chrome_executable_path
    
    self.browser = await launch(**launch_options)

page = await self.browser.newPage()
# 设置用户代理
await page.setUserAgent(random.choice(global_agent_headers))

# 设置页面视口大小并访问具体URL
width = 1920  # 默认宽度为 1920
height = 1080  # 默认高度为 1080
await page.setViewport({'width': width, 'height': height})
try:
    await page.goto(url, {'timeout': 60000, 'waitUntil': ['load', 'networkidle2']})
except Exception as e:
    logger.info(f'页面加载超时,不影响继续执行后续流程:{e}')

第二种是调用别人现成的 API

目前测下来效果比较好、且免费的是 urlscan

这是它的文档地址 https://urlscan.io/docs/api/

能顺利截 reddit 的图，示例：https://urlscan.io/liveshot/?width=600&height=400&url=https://www.reddit.com/

结构就是 https://urlscan.io/liveshot/?width=【宽度】&height=【高度】&url=【网址】

但测了几个网站，语言都是德语。如上图还会有一些遮挡，应该是设备设置的问题。

其他的，要么是被禁了，要么要付费。

例如用 wordpress 的预览功能，会限制被禁止，应该是因为服务器在国内的原因

https://s0.wp.com/mshots/v1/https://www.reddit.com/?w=600&h=400

其他的，例如 https://gugudata.com/，就要付费，效果未知，pass

第 3 种，就是 docker 部署一个无头浏览器，通过端口的形式供 n8n 调用

具体参考：https://community.n8n.io/t/automate-screenshots-pdfs-and-more-integrating-n8n-with-self-hosted-browserless-playwright-changedetection-io/53351

这种，说实话还是很麻烦，尝试了一下没成功，就先放弃了。

看下来效果最好的还是第一种方法，但工作流平台都有一个通病：无法运行复杂的 python 脚本。

所以解决方案是把第一种，也就是 tap4ai 原本就有的 python 调用 playwright 截图功能封装成 api，供 n8n 调用。

这个也是我解决很多复杂功能的方案：先在 Cursor 完成单独模块的开发，再打包成 fastapi，通过宝塔面板部署到服务器上，再打开端口权限给 n8n 调用。

参考 cursor 的提示词：

请新建一个脚本，然后帮我把@website_crawler.py中的网页截图的相关能力，单独写到一个脚本里，如果涉及到调用其他的脚本也把代码一起放进去

这是网页截图的核心部分，你需要去分析还需要哪些：
...

调用`sequentialthinking`MCP工具去一步一步思考处理，确保不要影响到其他的功能代码

继续把代码封装成 fastapi 接口：

继续优化把@website_screenshot.py 封装成fastapi接口，用户传入一个网址后，返回截图的url，包括云存储url、缩略图url

同时，我还让 cursor 写了一个 api 测试脚本，测完没问题，我才部署：

上传到宝塔面板，部署成 API

插句题外话，部署服务器很推荐用宝塔面板，通过面板来操作，能省不少事。包括 n8n、dify 等的部署基本上都很丝滑。

在宝塔面板新建文件夹

把关键文件，包括代码脚本、.env 环境配置、requirements.txt 等一起上传进去

新建 Python 项目

在下图位置，填好信息即可。

查看日志确保服务启动

在线调试

fastapi 有自带一个接口文档同时还能在上面调试很方便：

例如我部署在 3333 端口，文档就在 http://ip:3333/docs

找到接口函数，点 Try it out，并修改下面的请求参数

点 Execute，下面 Curl 就是自动生成好的请求

下面 Response boy 我们看到返回 200，数据正常

调通了，最后才是到 n8n 新建 HTTP Request 节点，确保能正常返回网站的截图信息：

至此，我们最麻烦的一步就解决了。

n8n 工作流

接下来，就可以着手去新建工作流。

方便起见，我们继续沿用上次的工作流，在提交网站后，直接就把提交的信息拿来抓网站截图和写介绍。

如图，上面的就是上次的工作流，解决的是从用户的输入中解析出网址，并提交到数据库，等待爬取。

不同的是，我补充了一个提取多个网站信息的节点，方便用户同时提交多个网站，甚至是把一个多网站介绍的文章放进去，也能直接全部录入了。

下面的是这次新增的工作流，解决的是把用户提交的网站进行截图、生成 SEO 友好的详情。

对于网站截图，上一小节已经说了通过把 python 脚本部署成 API 后，新建 http 请求。

接下来看写网页详情的这个节点：

其中，一个 SEO 友好的网站介绍信息怎么写，参考提示词：

You are the good SEO Editor. I will give you a website description. Now you should write a new_content referring the template_content, the new_content should output with markdown format. Additionally, your content should also explain how this website helps users tackle challenges in AI programming and product development. The first level of markdown should be h3. When outputting, do not start with the sentence "Here is the content" . The content of the new_content  have modules including what, feature, how, price, helpful tips, Frequently Asked Questions. And you should get the keyword of the content, and generate the content about the keyword as more as you can.  The markdown title level of these modules is h3. Direct output

The base content_template is\n: What is tap4.ai?

tap4.ai is an AI-driven platform that provides access to a vast array of AI technologies for various needs, including ChatGPT, GPT-4o for text generation and image understanding, Dalle3 for image creation for document analysis

What is the main feature of tap4.ai? 
1.Collect more than 1000 AIs and 200+ categories;
2. Discover the AI tools easily; 
3. Free ai tools submission;
How to use tap4.ai?
Every user can utilize GPT-4o for free up to 20 times a day on tap4.ai. Subscribing to the platform grants additional benefits and extended access beyond the free usage limits.
Can I generate images using tap4.ai?
Yes, with Dalle3's text-to-image generation capability, users can create images, sharing credits with GPT-4o for a seamless creative experience.
How many GPTs are available on tap4.ai?\n tap4.ai offers nearly 200,000 GPT models for a wide variety of applications in work, study, and everyday life. You can freely use these GPTs without the need for a ChatGPT Plus subscription.

How can I maximize my use of tap4.ai's AI services?
By leveraging the daily free uses of GPT-4o document reading, and Dalle's image generation, users can explore a vast range of AI-powered tools to support various tasks.

Will my information be used for your training data?
We highly value user privacy, and your data will not be used for any training purposes. If needed, you can delete your account at any time, and all your data will be removed as well.

When would I need a tap4.ai subscription?
If the 20 free GPT-4o conversations per day do not meet your needs and you heavily rely on GPT-4o, we invite you to subscribe to our affordable products. Just output the markdown content!

NOTICE I am not asking you write new content about tap4.ai. This is only template for your reference. You should wait for my request.

测试一下录入网站：https://www.reddit.com

数据库有了：

前端也有了：

详情页也写好了：

感兴趣可以进去看看：

https://www.aicoding.help/ai/Reddit

因为我服务器在海外，通过这样的形式，对于风控较高的 Reddit 也能正常截图，并不会出现拦截之类的。这也是为什么我要「多此一举」部署到线上的其中一个原因。

👤 关于作者：饼干哥哥 & NGS

我是饼干哥哥，数据分析师、AI 博主，和出海业务专家朋友创立了公司 NGS NextGrowthSail，专注 AI 在出海营销场景下的落地。上周我们内部复盘自动化内容营销工作流数据时，发现如果用上本文的截图与SEO优化技术，内容生产效率能提升30%以上。

🔗 访问 NextGrowthSail 官网 →

摘要

需求描述

网站截图

上传到宝塔面板，部署成 API

n8n 工作流

发表评论 取消回复

发表评论取消回复