记一次开源项目贡献

mcpo项目是open-webui开源团队为标准化接入MCP工具而创建的mcp-server封装服务。记录一下给这个项目提交pr完成开源贡献的过程，也详细讲解这个pr中我的解bug思路。

前言

给open-webui配各种mcp工具时，用了mcpo来对各种mcp-server做openapi规范的封装（实际上我只是希望用它的鉴权封装，虽然只是个简单的静态Bearer Token）。添加图表绘制工具mcp-server-chart时，发现有报错，mcpo服务起不来。翻issue无果后，手痒了，提个pr吧。

科普之协作开发

现代协作软件开发场景中，会有多个开发人员在同一个项目中分工迭代软件物料。虽然通常会按模块（横向分工隔离）或版本（纵向分工隔离）的维度做分工，尽量保证协作开发时大家更新的物料不发生重叠，但实际上很难避免冲突。

协同场景物料版本管理在我可见范围内的主流方案是svn和git。值得注意的是，一些纯客户端项目，和大部分游戏项目，都倾向使用svn做开发版本管理，这凸显了2个版本管理工具的主要区别：

从场景角度看：
- git擅长针对代码或文本做充分的diff追踪，命令多操作细，主要基于分支做权限管理，开源社区生态好；
- svn则可以对任意资源集做更精简的版本管理，还能做目录级别权限控制。
从实现角度看：
- git的版本存储是分布式的，本地也会维护一份增量记录的完整版本历史，对代码、配置和少量文本资源来说，这足够了，但当项目中的多媒体甚至二进制资源文件多起来，磁盘占用会很可怕；
- svn的版本存储是集中式的，本地不保留完整版本历史，每个分支做完整深拷贝冗余存储，所以通常不搞feature分支。

总的来说，git是个代码版本管理工具，附带项目资源顺便管管还行，多了很伤；svn是个软件资源版本管理工具，diff追踪不会那么精细，但客户端负担小、资源权限管理细。

github开源贡献流程

确认issue
fork
开发
提交pr
review

确认issue

开源社区很大，如果一个开源项目足够活跃，那么很有可能项目issue里有人讨论过相同的问题或是相关的问题，开源社区就是容易给项目带来这种类似被动众测的buff。

翻issue的主要目的有2个：

找分析：用于快速定位问题场景和相关症结，定位坑后能绕就绕，绕不开也方便修。
找解法：如果急着用项目，issue里可能会发现一些没有审完的pr或者野生的release版本或者一些fix代码。

确认项目issue里没有合用的问题解法，而自己有解法且愿意做开源贡献时，就可以做下一步了。

fork

github的常规开源贡献方式是fork项目到个人空间方便获取完全权限，然后进行分支操作和开发。

如果你是项目的主创团队成员，也许你有权限直接在当前项目上创建自己的feature分支进行开发。

例如，要fork mcpo项目可以点击项目github首页的fork按钮，进入fork发起页面。

选择fork项目到组织或个人空间，可以修改项目名和项目描述，可以选择是否只fork项目的main分支（github的默认项目主分支）。

选分支时需要确认当前项目的分支逻辑，确保fork时包含希望合入代码的分支。例如，mcpo项目的主分支是main，开发分支是dev，所有feature分支需要在经过评审后合入dev分支，对dev分支做版本测试，dev分支在合适时机合入main分支并打上版本tag，打包发布对应release版本。

开发

还是以这次我给mcpo项目提cr为例。问题背景我在pr的review issue里有描述，大致上是这样：

mcpo在给各种mcp-server做openapi规范封装时会自动生成接口文档，而接口文档的入参出参说明的生成需要走以下流程来完成：

按照MCP（Model Context Protocol）这个协议，调用mcp server接口（目前有Stdio、SSE、StreamableHTTP三种通信模式），拉取输入、输出schema
解析输入、输出schema，完成结构化
根据结构化数据生成接口入参出参说明文档，用于进行接口文档渲染

我在mcpo的配置文件里包含了mcp-server-chart，试图正常启动mcpo服务：

{
  "mcpServers": {
    "mcp-server-chart": {
      "command": "npx",
      "args": [
        "-y",
        "@antv/mcp-server-chart"
      ]
    }
  }
}

发现报错了（捞个issue里的相同报错日志贴上来，懒得复现问题捞日志）：

ERROR:      + Exception Group Traceback (most recent call last):
  |   File "/Users/ddrag/IdeaProjects/mcpo/.venv/lib/python3.11/site-packages/starlette/routing.py", line 692, in lifespan
  |     async with self.lifespan_context(app) as maybe_state:
  |   File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 210, in __aenter__
  |     return await anext(self.gen)
  |            ^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/main.py", line 104, in lifespan
  |     async with stdio_client(server_params) as (reader, writer):
  |   File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 231, in __aexit__
  |     await self.gen.athrow(typ, value, traceback)
  |   File "/Users/ddrag/IdeaProjects/mcpo/.venv/lib/python3.11/site-packages/mcp/client/stdio/__init__.py", line 166, in stdio_client
  |     async with (
  |   File "/Users/ddrag/IdeaProjects/mcpo/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 772, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    |   File "/Users/ddrag/IdeaProjects/mcpo/.venv/lib/python3.11/site-packages/mcp/client/stdio/__init__.py", line 173, in stdio_client
    |     yield read_stream, write_stream
    |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/main.py", line 105, in lifespan
    |     async with ClientSession(reader, writer) as session:
    |   File "/Users/ddrag/IdeaProjects/mcpo/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 772, in __aexit__
    |     raise BaseExceptionGroup(
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/main.py", line 107, in lifespan
      |     await create_dynamic_endpoints(app, api_dependency=api_dependency)
      |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/main.py", line 43, in create_dynamic_endpoints
      |     form_model_fields = get_model_fields(
      |                         ^^^^^^^^^^^^^^^^^
      |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/utils/main.py", line 182, in get_model_fields
      |     python_type_hint, pydantic_field_info = _process_schema_property(
      |                                             ^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/utils/main.py", line 84, in _process_schema_property
      |     type_hint, _ = _process_schema_property(
      |                    ^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/Users/ddrag/IdeaProjects/mcpo/src/mcpo/utils/main.py", line 70, in _process_schema_property
      |     assert ref in schema_defs, "Custom field not found"
      |            ^^^^^^^^^^^^^^^^^^
      | TypeError: argument of type 'NoneType' is not iterable
      +------------------------------------

我们可以看看这个mcpo/src/mcpo/utils/main.py里的_process_schema_property()函数在做什么：

def _process_schema_property
(
    _model_cache: Dict[str, Type],
    prop_schema: Dict[str, Any],
    model_name_prefix: str,
    prop_name: str,
    is_required: bool,
    schema_defs: Optional[Dict] = None,
) -> tuple[Union[Type, List, ForwardRef, Any], FieldInfo]:
    """
    Recursively processes a schema property to determine its Python type hint
    and Pydantic Field definition.

    Returns:
        A tuple containing (python_type_hint, pydantic_field).
        The pydantic_field contains default value and description.
    """
    if "$ref" in prop_schema:
        ref = prop_schema["$ref"]
        ref = ref.split("/")[-1]
        assert ref in schema_defs, "Custom field not found"
        prop_schema = schema_defs[ref]

    prop_type = prop_schema.get("type")
    prop_desc = prop_schema.get("description", "")

    default_value = ... if is_required else prop_schema.get("default", None)
    pydantic_field = Field(default=default_value, description=prop_desc)

    # Handle the case where prop_type is missing but 'anyOf' key exists
    # In this case, use data type from 'anyOf' to determine the type hint
    if "anyOf" in prop_schema:
        type_hints = []
        for i, schema_option in enumerate(prop_schema["anyOf"]):
            type_hint, _ = _process_schema_property(
                _model_cache,
                schema_option,
                f"{model_name_prefix}_{prop_name}",
                f"choice_{i}",
                False,
            )
            type_hints.append(type_hint)
        return Union[tuple(type_hints)], pydantic_field

    # Handle the case where prop_type is a list of types, e.g. ['string', 'number']
    if isinstance(prop_type, list):
        # Create a Union of all the types
        type_hints = []
        for type_option in prop_type:
            # Create a temporary schema with the single type and process it
            temp_schema = dict(prop_schema)
            temp_schema["type"] = type_option
            type_hint, _ = _process_schema_property(
                _model_cache, temp_schema, model_name_prefix, prop_name, False
            )
            type_hints.append(type_hint)

        # Return a Union of all possible types
        return Union[tuple(type_hints)], pydantic_field

    if prop_type == "object":
        nested_properties = prop_schema.get("properties", {})
        nested_required = prop_schema.get("required", [])
        nested_fields = {}

        nested_model_name = f"{model_name_prefix}_{prop_name}_model".replace(
            "__", "_"
        ).rstrip("_")

        if nested_model_name in _model_cache:
            return _model_cache[nested_model_name], pydantic_field

        for name, schema in nested_properties.items():
            is_nested_required = name in nested_required
            nested_type_hint, nested_pydantic_field = _process_schema_property(
                _model_cache,
                schema,
                nested_model_name,
                name,
                is_nested_required,
                schema_defs,
            )

            if name_needs_alias(name):
                other_names = set().union(nested_properties, nested_fields, _model_cache)
                alias_name = generate_alias_name(name, other_names)
                aliased_field = Field(
                    default=nested_pydantic_field.default,
                    description=nested_pydantic_field.description,
                    alias=name
                )
                nested_fields[alias_name] = (nested_type_hint, aliased_field)
            else:
                nested_fields[name] = (nested_type_hint, nested_pydantic_field)

        if not nested_fields:
            return Dict[str, Any], pydantic_field

        NestedModel = create_model(nested_model_name, **nested_fields)
        _model_cache[nested_model_name] = NestedModel

        return NestedModel, pydantic_field

    elif prop_type == "array":
        items_schema = prop_schema.get("items")
        if not items_schema:
            # Default to list of anything if items schema is missing
            return List[Any], pydantic_field

        # Recursively determine the type of items in the array
        item_type_hint, _ = _process_schema_property(
            _model_cache,
            items_schema,
            f"{model_name_prefix}_{prop_name}",
            "item",
            False,  # Items aren't required at this level,
            schema_defs,
        )
        list_type_hint = List[item_type_hint]
        return list_type_hint, pydantic_field

    elif prop_type == "string":
        return str, pydantic_field
    elif prop_type == "integer":
        return int, pydantic_field
    elif prop_type == "boolean":
        return bool, pydantic_field
    elif prop_type == "number":
        return float, pydantic_field
    elif prop_type == "null":
        return None, pydantic_field
    else:
        return Any, pydantic_field

这是一个可以被递归调用的函数，用来处理schema结构里的递归结构，我们可以对照一个schema的例子来理解，这是我从mcp-server-chart项目里捞的一个工具的input schema：

{
  "name": "generate_fishbone_diagram",
  "description": "Generate a fishbone diagram chart to uses a fish skeleton, like structure to display the causes or effects of a core problem, with the problem as the fish head and the causes/effects as the fish bones. It suits problems that can be split into multiple related factors.",
  "inputSchema": {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
      "data": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "children": {
            "type": "array",
            "items": {
              "properties": {
                "name": { "type": "string" },
                "children": {
                  "type": "array",
                  "items": {
                    "$ref": "#/properties/data/properties/children/items"
                  }
                }
              },
              "required": ["name"],
              "type": "object"
            }
          }
        },
        "required": ["name"],
        "description": "Data for fishbone diagram chart, such as, { name: 'main topic', children: [{ name: 'topic 1', children: [{ name: 'subtopic 1-1' }] }."
      },
      "theme": {
        "default": "default",
        "description": "Set the theme for the chart, optional, default is 'default'.",
        "enum": ["default", "academy"],
        "type": "string"
      },
      "width": {
        "type": "number",
        "description": "Set the width of chart, default is 600.",
        "default": 600
      },
      "height": {
        "type": "number",
        "description": "Set the height of chart, default is 400.",
        "default": 400
      }
    },
    "required": ["data"]
  }
}

我们高效点，别浪费时间去理解问题全貌的每个细节，围绕引发报错的地方分析：

assert ref in schema_defs, "Custom field not found"这个断言是针对”$ref”字段的，而这个字段的值长这样：”#/properties/data/properties/children/items”。
properties -> data -> properties -> children -> items，这看起来是一条路径，指向这个input schema中的一个节点，而”$ref”就在这个节点内部。
函数中，断言前先取了”$ref”字段的最后一节也就是”items”，然后似乎打算在schema根节点上找到这个”items”，如果找不到就无法通过assert，没错我跟你们一样看不懂这操作，但我可以猜测。看起来这个if代码逻辑是假设了一个很窄的场景，很鸵鸟地打算出了问题再说，然而实际上真的遇到了假设之外的情况：

if "$ref" in prop_schema:
    ref = prop_schema["$ref"]
    ref = ref.split("/")[-1]
    assert ref in schema_defs, "Custom field not found"
    prop_schema = schema_defs[ref]

OK，大致搞清了状况，我们看看需要怎么让这个函数在这个场景能正常工作。函数主要包含了一堆ifelse逻辑分支，大致上是利用一个深度优先搜索来遍历schema节点树，完成这棵树的结构化的同时给每个节点分配两个东西：
- python_type_hint：大概是接口文档需要展示的入参出参值类型。
- pydantic_field：是一个pydantic包的”字段”对象，看起来能定义一个字段的默认值、描述、别名。
我们来看看type_hint和pydantic_field都是怎么来的吧，给大家节省时间快速翻译一下：看起来原作者基本上是把整个schema当做一个有向无环图，有”$ref”时只象征性检查引用schema根节点一级子节点的情况，处理很有限的一种环的场景。深度优先搜索到底，获得每个叶子节点的type，然后往上回溯组合出一些复合类型，比如Dict[str, Dict[str, str]]。至于schema_defs，默认值是Field(default=None, description="")。原来有兜底返回值的啊，那有救了。
我们挣扎一下看能不能给”$ref”一个很好看的真实字段信息，嗯需要给properties.data.properties.children.items造出一个复合类型，可是”$ref”造成循环引用了，递归个没完，总会有一个”$ref”拿不到类型。
既然这样，就放弃挣扎吧，我们缩小影响面，解决case先。具体case是一个造成循环引用的”$ref”字段”#/properties/data/properties/children/items”，让我们看看手头还有什么，啊有个model_name_prefix一眼就是用来记深度优先节点路径的，用来当_model_cache的key，这个cache里面就放那些字段信息。好说，model_name_prefix跟”$ref”字段的含义基本等价，可以互相翻译。
debug一下，把断言失败时的”$ref”和model_name_prefix打印出来看看：
- “$ref”: “#/properties/data/properties/children/items”
- model_name_prefix: “generate_fishbone_diagram_form_model_data_model_children_item_model_children”
追了下调用链，”generate_fishbone_diagram_form_model”这个前缀是工具名称决定的，跟schema无关，剩下的对应关系其实很明显：
- _model_对应/properties/
- _item对应/items
这些是函数中的固有字段映射，分别用于处理object类节点和array成员节点。
我们把”$ref”翻译成prefix风格看看，”data_model_children_item”，model_name_prefix去掉固定前缀是”data_model_children_item_model_children”，足够了，前缀关系。
解法出来了：

if ref.startswith("#/properties/"):
    # Remove common prefix in pathes.
    prefix_path = model_name_prefix.split("_form_model_")[-1]
    ref_path = ref.split("#/properties/")[-1]
    # Translate $ref path to model_name_prefix style.
    ref_path = ref_path.replace("/properties/", "_model_")
    ref_path = ref_path.replace("/items", "_item")
    # If $ref path is a prefix substring of model_name_prefix path,
    # there exists a circular reference.
    # The loop should be broke with a return to avoid exception.
    if prefix_path.startswith(ref_path):
        # TODO: Find the exact type hint for the $ref.
        return Any, Field(default=None, description="")

我们给人把注释写清楚，有遗憾的地方也把TODO写好，相信后来人的智慧。
Be nice，补个单测，跟历史单测风格保持一致（乐，这项目存量单测全是给_process_schema_property函数写的）：

def test_ref_to_parent_node():
    schema = {'$ref': '#/properties/data/properties/children/items'}
    result_type, result_field = _process_schema_property(
        _model_cache,
        schema,
        "generate_fishbone_diagram_form_model_data_model_children_item_model_children",
        "item",
        False,
        {}
    )

    assert result_type == Any
    assert result_field.description == ""

pytest自测通过，交代码。

提交pr

如果你fork了当前项目，github项目主页能看到提pr（Pull Request）的提示，万一没看见，也可以点进”Pull requests”标签页，再点击”New pull request”按钮。

选择来源仓库、分支和目标仓库、分支后，就可以点击”Create pull request”按钮，自动给你创建一个issue，用来自述你的pr，具体怎么写可以参考仓库里的历史pr issue，我的是这个：#174。

pr issue里需要说清你解决的问题场景和你的代码对当前项目的影响面，带上自测结果会更让人放心。

能看到我在自己的fork项目里新建了一个feature分支：fix-circular-schema-ref-exception

review

等待你的目标分支的代码评审人评审你的代码，在你的pr review中给出评审意见。

我的pr评审人是仓库主Tim Jaeryang Baek，很客气，没challenge我，一个thanks后就给我一口气合了dev、main分支，甚至单开了一个release tag装我的pr。

于是我没有被challenge和返工的流程示例可以展示了┑(￣Д ￣)┍

close掉的pr issue里会提示代码已合，你可以删掉feature分支和fork项目。

github

#github #mcpo #svn

记一次开源项目贡献

https://bipedalbit.net/2025/06/10/记一次开源项目贡献/

作者

Bipedal Bit

发布于

2025年6月10日

许可协议

记一次DNS污染应对上一篇

再谈hexo静态站搭建下一篇