开发日志 187 - 性能与优化

0 点赞
十字军之王3
转载

我叫乔尔,是《十字军之王3》团队的技术负责人。我在十字军之王系列中担任各种角色已进入第二个十年,最初是《十字军之王2》的一名设计师。 在我们深入探讨如何提升模拟速度的细节之前,我先请达安对我们的成果做一个简要总结。 直截了当的总结 你想先了解一些快速的信息和结论吗?让我来为你总结一下! 随着【天命】扩展包的加入,游戏在可玩土地面积和可玩在世角色数量方面扩大了约30%至40%。我们专注于缩短模拟 tick 时长,以保持与当前《十字军之王3》相近的游戏体验。我们在低配和高配设备上的测试数据显示,目前的游戏tick速度已接近当前正式版本。具体表现为:游戏初期略慢,但在1066年开局的150至250年间,当游戏速度设为5档时,中后期的速度达到或略快于正式版本。 以下图表展示了150年间的tick持续时间与当前正式版本的对比(数值越高表示tick持续时间越长,即速度越慢)。

[Rough Tick Duration Graph - Red: All Under Heaven, Yellow: Current Version] We also implemented GUI, 3D graphics, and memory-usage improvements, though they were a lower priority than simulation speed. Caveat: Results will vary by the world you create and the world the simulation creates around you! There is no single number or graph that covers all games, hardware, and play styles, but we have aimed to deliver a playable, stable, fast experience with All Under Heaven. That’s the short version. If you want more details, more graphs, and more insights, then please read on! Definitions Now let’s get back to the regular schedule and start digging into the details First, we need to define what we mean when we say speed or performance. Normally, when we talk about performance, we refer to two categories: rendering and simulation. When developing DLCs for Grand Strategy Games, we create more systems and objects to simulate over time. Graphical complexity will also increase with more development, but during DLC development it rarely causes new bottlenecks for us. However, increased simulation demands put more load on the CPU for calculations and data transformations due to new features and systems to keep track of. This makes the CPU our most common bottleneck when it comes to optimizing for simulation tick speed. So our most common task when it comes to optimizing the code for our games is to look at where our CPU-cycles are spent and where we can be more efficient, in order to keep down the average time-per-tick for a playthrough. This makes the time-per-tick measurement the most important metric for us to track throughout development, and the one we will be talking about the most in this Dev Diary. Measuring Performance So how do you get accurate measurements for tick speed when working with a complex simulation? There are many variables in play that will affect the final outcome. Examples of variables that affect results include: graphical settings, hardware, test length, early- or late-game state, background tasks, and random variation. All of these variables matter in analyzing how the game performs, but we also want to stress that most of our performance improvements are going to have a similar effect across the board of different hardware. For that reason, the most valuable test is early-game tick performance on fixed hardware with fixed settings and a fixed random seed. This gives us a test that we can quickly repeat to allow us to track the tick speed trends over time for different versions of the code during development and allows us to quickly spot degradations. With that said, there are some optimizations which don't fully benefit lower-core hardware as much: while the early-game simulation is generally just a throughput issue for the code, the end-game optimization will instead focus on trying to contain the growth in complexity. We do so by ensuring calculations scale well with a larger simulation in later stages of the game, for example with more characters or larger realms. For that reason it’s still important for us to do spot checks on low-spec machines and profiling of endgame saves to avoid focusing only on solutions that require more CPU cores to parallelize computations. Adaptation One saying in software engineering (and many other professions) is that “perfect is the enemy of good.” It’s a sure way to spend ages on a feature if we’re determined from the outset that it has to use the most perfect and streamlined code. This is especially true in game development, where we often start out with one design of a feature to later tweak and modify the feature to better fit with the rest of the game and gameplay feedback for what is a fun player experience. At times you can know from experience where we will have future performance problems and plan for a more thought-through data architecture to mitigate that, but most of the time the most important thing is to get features up and running to verify that they are fun. After that, we can identify which systems are not performing according to our requirements and start improving them from both a coherence and efficiency perspective. The Journey of a Thousand Miles Begins with One Step What I’ve so far described is our usual approaches to performance work for DLC projects. However with All Under Heaven this all gets upended by the scale. Beyond having more individual features than any prior DLC, the biggest challenge for simulation tick speed is the addition of two subcontinents. Both East Asia and Southeast Asia are massive additions to the world that we need to simulate for the game. This was going to be a larger challenge than any previous expansion we had done for CK3. In addition to the typical ~20% slowdowns we would see from unoptimized feature additions, we would now also need to deal with a 32% increase in baronies (and rulers) to the game. In our simulation, rulers are the smallest building block for moving the world forward: this is also linearly correlated with the amount of work the CPU must perform. This means that just by putting the rest of Asia on the map, we have immediately made the game slower by an equal amount of the size increase. In the planning stages for AUH we set aside additional time and resources compared to a normal expansion specifically for looking into how to make the game faster and offset the increased simulation scope. We also knew that we couldn’t just rely on easy wins by keeping new features in line with good practices. This time we had to look even deeper into what old systems we had that were holding us back. With that said, I’ll hand it over to two of our very talented Engineers, Anton and Carl-Henrik, to explain in more depth how we find underperforming components in the codebase and the methods we use to make the game perform according to the principle “gotta go fast.” Hello, I’m Anton, one of the Senior Programmers who has been on the Crusader Kings III team for many, many years. I’m going to talk a bit about how the code is structured and how we measure and think about performance improvements. Focusing on the Most Expensive Systems One approach to performance work is to look at various game systems and what it takes to simulate each system every daily tick. We start with the most expensive system, because optimizing it yields the largest impact compared to how much time we spend on it. Individual systems are also more self-contained and have only a limited number of connections with the rest of the game. It becomes easier to reason about it, and easier to focus on what matters right this moment. Our internal tools help us visualize various game features and compare the performance impact of each one.

【系列更新性能图表示例】 以下是一个示例:汇总了25年的游戏时间,显示了游戏中最消耗资源部分的平均时间。每天有63个不同的系统进行更新;其中44个系统每个耗时不到0.3毫秒,可以忽略不计。对于剩下的19个系统,我们会显示它们的名称以及每天的平均耗时和所占百分比。有些游戏系统的资源消耗总是较高。在开发过程中,图表中不同系统的耗时占比可能会出现增减。我们会评估某个系统每天消耗如此多时间是否合理。曾经有段时间,继承系统或情境系统在图表中位居榜首,这清楚地表明存在过度消耗的情况,需要我们进行调查。过去,在实际没有任何变化的情况下,相关状况更新过于频繁,而且继承系统在处理大量角色时运行着一个成本极高的脚本。现在如你所见,这两个系统都被归入了【其他】类别,总体运行良好。角色和修正值通常位于优先处理队列的顶端;它们是游戏的核心,因此总是会比其他内容占用更多时间。这一直是个难题——对于本身就耗费资源的部分,我还能让它提速多少?是值得再花几天时间优化,还是说目前已经足够好,我应该去寻找其他更容易实现的性能提升点? 平均值并非唯一重要的数据。这张特定的图表揭示了另一个问题。你会注意到浅橙色的柱状图(活动相关)有时会出现不成比例的峰值,这导致某些月份的运行速度远低于平均水平。我们既希望平均时间较低,也不希望在发生重要事件时出现大幅峰值。与仅查看汇总数据相比,这种可视化技术的一大优势就在于此。活动将是下一轮优化的主要目标。 并行更新与串行更新 前一张图表展示了每日周期的串行部分。接下来的两张图表将展示并行更新。

[平行AI更新]

[Parallel pre-update of game systems] It’s very easy to reason about and develop a system with a single-threaded approach. You know things happen in order, things are predictable and repeatable. It makes development faster, and features can be tested and balanced sooner. Usually only after the feature's skeleton is ready and it starts working in the game, it gets connected with the rest of the systems, and after that, we may take another pass and make some of the work go in parallel. These two graphs show parallel work during a daily tick. You can still see total wall-clock time spent on the update every tick; it’s below 20ms for each graph. It’s drawn as a thick red line at the bottom of each graph; notice how much more work gets done in total in parallel. This example is on a PC with about 20 logical cores. Why not do everything in parallel? We need a deterministic order of observable changes in the game for multiplayer to work, so all clients have the same game state as everyone else. Another important part is again the ability to reason about the correctness of the game. Most changes in the game have propagating side effects. A Ruler conquering a new title means that another Ruler is going to lose a title. Loss of a title means changes in income and military power. A Ruler who gets weaker is more endangered by factions and other enemies. During any of those steps additional events may get triggered. A strictly determined order of actions and cascading side effects is necessary to understand and predict game outcomes. Keeping that in mind, we’d like to do as much work in parallel as possible. We have an internal framework that allows us to split parts of the feature into sequential and parallel steps. It was covered a long time ago in pre-release Dev Diary 36. The parallel part goes first and is called “pre-update”. During the pre-update, nothing observable changes in the game. Every parallel thread sees the same visible game state, like it’s frozen. In parallel various systems can do heavy lifting to independently calculate what should be changed during the next sequential step. Everyone calculates new income independently, every AI actor makes decisions independently, heavy logic, triggers, and math that can be done in advance are pre-calculated. All this is done to minimize the final amount of sequential changes - apply already known values instead of doing math, execute the final decision instead of making a whole decision-making process etc. Even during sequential updates, we still want to utilize more than one thread. And this part is extra risky and complex. If I can prove that certain modifications can be done irrespective of the order of operations, or if some actions guarantee to have only limited side effects, then it’s safe to do parallel work there. One more important note here: only a small fraction of the game is updated each day. Our updates are separated into daily, monthly, and yearly updates. Only very few crucial systems need to be updated every day. AI movement of units is one such case. Most systems are updated only once a month. It’s a compromise between frequency and efficiency. Every day only 1/30th of all people, all construction, all epidemics etc are updated. This spreads expensive updates equally over the entire duration of the game. Obvious drawback compared to older games - you never know for sure when it happens to you, the player. You don’t know the date when you get your personal monthly income, it just happens every 30 days. Applying Optimization to Specific Systems Let’s talk about more specific changes to the game - how it all applies to individual game systems. One example mentioned earlier was an expensive succession update. All Under Heaven introduces China, an enormous realm with unique succession mechanics; lots of people compete for counties, duchies, and kingdoms, and candidates are appointed according to their merit rank and score based on various properties of a person, their family, and events in their life. At some point it was the most expensive system to update daily, even when you take into account that we update succession very rarely, compared to other systems. That’s how expensive it was. Many individually reasonable decisions led to this outcome: China has lots of titles to appoint, and the design was to allow almost everyone to be a candidate for any county in the entirety of China. This is a quadratic problem that quickly grows out of control. What worked somewhat okay for big Byzantium, although already too slow, was no longer suitable for China. The first attempt was to change the order of operations: can we eliminate as many unsuitable candidates as quickly as possible, so we don’t have to run expensive logic and math on scoring? It’s much cheaper to find the best out of 100 than out of 4,000 people. Another obvious omission was a sequential scoring; doing appointment score math is an immutable operation, and can be safely done in parallel on all people at once rather than one at a time. This alone made it three times faster, but it was still not good enough. It remained the third-most-expensive system. The third step was to go and talk with our designers about their intent. Do we really care if every courtier in China can be a valid candidate for any county? Surely we can find some compromise. It’s important that any landless ruler has this chance. Every member of a proper noble family should be given a chance to serve the realm. But what about the lowborn? Can we somehow limit their participation to only titles where they personally live? This design step halves the number of candidates, yielding large performance gains while maintaining the same player experience. Players can still see important people all around the realm taking jobs and the system still feels competitive and alive, while taking even less time. It’s always important to keep in mind what we want to achieve with any game system: what matters to the player and what goals designers have. The fourth step was less impactful, but still valuable. After the previous 3 steps most of the succession update was spent doing scripted scoring math. I was lucky to get suspicious there and look deeper. It was just a single line of script that calculated holding income. One very old trigger was always fast enough, but with the introduction of governor efficiency for administrative realms, holding income is now affected by governor efficiency, which made it slower due to incorrect caching. What was supposed to be just a simple return of a precalculated value turned into a whole sequence of very expensive math on demand. With the change that made cached income always valid and available, succession became so fast that it disappeared from the graph and was included in the “other”. Also, any other script that was using holding income became faster overall. Lessons from System-Level Optimizations One more way to make the game faster is to do things less frequently or not do it at all. Turns out, AI for barons was trying to do lots of things that made little sense. Barons don’t have councils or court positions, yet they were still evaluating all of them; they no longer do that now. if it takes about three years to build a building when you have only a single holding, then they should only attempt new construction every three years; Barons now only attempt to start a construction every three years. Lots of game decisions and interactions can never be attempted by barons as they will fail various triggers, but what if we don’t have to run those triggers in the first place? A sweeping review of availability by ruler tier for those mechanics freed even more time, aaaand in the end the result was not that impressive. Overall 0.5-1ms of total daily tick time, which can be overshadowed by random fluctuations in hardware or the current game state. The main cause for it - proper rulers spend so much time doing heavy tasks, that optimizing barons was barely worth it. All AI decision-making already runs in parallel as well, so any performance gains end up being spread over multiple cores and are less noticeable as a result. That’s it from me. Thanks for letting me share my thoughts on performance. I’ll now hand over to Carl-Henrik who has also done a lot of performance work during the development of AUH. The Full Picture Dear Daily Tick Enthusiasts: I am Carl-Henrik, Principal Programmer on the team, and I have mostly been working on improving the performance and memory usage of Crusader Kings III! My personal hobby involves sizecoding on 8-bit and 16-bit CPUs in assembler, so working with performance is close to my heart. I even won a couple of size/performance coding competitions! I am also fairly new here, so I rely a lot on people around me including Daan, Joel, Jimmy, and Anton, as well as the design team and team partners! (I do cast a long shadow, however, as the mentor of Johan Andersson prior to his time at Paradox) Generally I have found that the code that I optimize works well, but we have since it was written introduced a large number of titles and characters. This means that the assumptions made at the time don’t hold up any longer and this is the opportunity to improve performance. No code has changed because it was in any way inferior. Loading My initial focus was on the game's load screen. As my personal computer at home falls below the minimum specifications for CK3, addressing this would significantly improve my ability to play the game. I was unprepared for the sheer volume of activity, and the only available performance measurement was debug logging, which was intertwined with all other processes occurring during this game phase. To better analyze load time, I made a new performance-tracking tool that can keep track of the whole sequence, shown in the image below. The graphs show what each CPU is busy loading or setting up and the black portions don’t necessarily indicate that a CPU was idle, they may have been too quick to show or not included in the loading functions.

【流媒体分析工具,我当前的个人开发时间项目】 虽然我们发现了许多需要改进的地方,但当时实施这些更改的工作量太大。我计划很快开始增强这些系统。 内存 在《草原可汗》发布前后,我们需要节省一些内存。最低配置的电脑运行游戏时遇到了困难,而我们的主机移植团队之前已经对此进行过研究。将《十字军之王3》完整移植到现代平台是一项了不起的成就,借助他们的研究成果,我们能够快速实现这些内存改进。 其中一项特别的改进是游戏中工具提示的内存使用,效果令人惊讶。仅将这一更改应用到PC版本就节省了数GB的内存!我们也研究了克劳塞维茨引擎中更新的图形用户界面代码所带来的内存节省,但结果表明,这些节省对于《天下共主》来说差异过大,无法采用。 性能方面 在进行内存优化后,我们还需要关注代码性能。早期测试显示,性能下降幅度高达我们之前的DLC《草原可汗》的1.5倍。 我的工作重点是代码改进,而团队中其他专攻设计的成员则专注于游戏内的脚本和角色行动频率等元素。

【工作电脑(32核)100年性能样本】 为了监控优化进度并识别开发相关问题,我每天都会运行性能分析器达100年。这一操作在我的高配电脑(32处理器、64GB内存、Windows 11)和一台低配机器(8核、16GB内存)上均会进行。

【低配置(8核)测试电脑超过100年的性能样本】 从图表中,我们可以调查众多系统的性能问题。 一旦确定了某个系统并追踪到相关的(缓慢)代码,我们就可以开始进行改进。大多数时候,我发现有些内容在更新其他并不都需要更新的内容。 例如,在更新前的性能图表中,脚本变量是资源占用最高的项目之一。在已保存变量/脚本变量中,我们缩小了需要更新的变量的检查范围,因为这些变量可能会超时,所以不再检查大约500,000个变量,每个周期只需测试几百个变量。这一更改使得该类别从图表中完全消失了!渲染 图形通常是游戏中对性能要求最高的部分之一,但即便有最新的地图更新,《十字军之王3》在这方面受到的影响也并不显著。为提升图形性能,我们引入了自适应帧率功能。这一新设置对处理器数量少于10个的电脑有益,当屏幕活动较少时,它会略微降低渲染帧率。低画质图形预设下默认启用该功能。 自适应帧率与最大FPS设置协同工作,后者新增了一个名为【显示器刷新率】的默认选项。此前,禁用垂直同步且关闭最大FPS限制会导致性能显著下降,因为默认情况下帧率是不受限制的。为获得最佳性能并将渲染影响降至最低,请启用自适应帧率并将最大帧率设置为30。 更新后的地图和其他改进增加了GPU需要处理的细节量。这可能会导致帧处理时间延长和屏幕更新率降低,进而可能使界面响应感变差。不过这不会影响日常的游戏节拍率,因为CPU和GPU是并行工作的。我们计划研究如何更好地平衡这一工作负载,但目前此优化尚未与其他计划中的改进项目一起被列为优先事项。 复杂性 许多优化措施简单直接且易于理解,但有时你会发现一些看似微小且无害的问题,却可能让你陷入复杂的困境。有一个函数会执行一些检查并运行脚本(如果检查通过)。该函数约有30行代码,但存在的性能问题较为严重,修复这些问题花费了超过两周时间。 最慢的代码行是对IsScopeOK的调用,该函数主要用于检查脚本作用域(参数)是否与触发器或效果的预期相符。进行一些简单的改进后并未产生效果,这表明编译器已自动完成了相关优化工作。 尽管该函数本身并非天生缓慢,但每帧执行的脚本数量庞大,这使得它成为整个游戏中资源消耗最大的部分。这条语句的实际问题在于,为了检查作用域,每个触发器都会生成一个128位的标志字段,并将其与根据作用域生成的128位标志字段进行比较。将此替换为直接比较作用域编号后,性能得到了显著提升。 遗憾的是,触发器和效果可以接受多种类型的作用域,因此如果初始检查失败,作为后备方案,位标志测试仍是必要的,但这已经是一个很大的改进。 在大多数情况下,触发器和效果会返回作用域编号,但也存在仅返回作用域位标志的链接概念。我最终花了两天时间检查所有这些类,并添加了缺失的函数。

【IsScopeOk代码块重写,将成本较低的检查放在顶部】 触发/效果似乎也受到缓存未命中的影响,添加缓存预取能进一步改善情况。 但这只是该函数中的一行代码。我们有一个游戏内脚本分析器,会广泛用于提升脚本性能。该分析器仅在使用时才会产生性能消耗,因此需要进一步调查。 我们需要记录脚本的文件名和行号,而这种特定数据类型在存储该信息时的实现效率较低,改进这一点带来了显著帮助。不过,在未使用时,这种效率提升仍然是多余的。因此,使用就地new而非默认构造函数,让这行代码的性能变得可以接受。尽管耗时,但实验、测试和性能分析的每一步都被证明是极具价值的,因为最终实现了显著的性能提升。 简洁性 有时让事情变得更简单也能让它们运行得更快。每个角色对其他角色都有很多看法,而且每种看法的强度会随着时间变化。跟踪看法的系统巧妙地缓存了当前的看法值,并计算出该看法变化1点的未来日期。 现在,要获取角色之间当前的看法值,只需在缓存中查找,并且每天检查所有看法是否到了变化的时间。这个检查是在一个排序列表中进行的,因此只需要考虑即将发生变化的看法。遗憾的是,每完成一个步骤后,都需要将新日期重新插入列表,因此随着我们添加更多角色,这项改进的成本越来越高。 事实证明,该计算只是一个包含变化率常数的乘法和加法运算,因此取消缓存并实时计算好感度,比维护一个排序数组要快得多。 失败 有时我尝试优化某个函数的性能,却发现总体而言这种改变并不值得。 在每日更新期间,查找修正值是一项频繁操作,因此加快这段代码的速度应该能带来显著收益。我尝试了两种方法,它们都大幅缩短了函数的执行时间。然而,这些改进被其他系统中同等程度的性能下降所抵消,最终没有带来整体提升。尽管在这些代码修改上花费了大量时间,但它们最终还是被舍弃了。 从脚本与设计角度看性能 大家好,我是CK3的高级程序员达安,同时也是“天命”项目代码部门的协调员。此外,我还是中国地区的设计“特色负责人”。 因此,我很适合从设计和“脚本”的角度来谈谈我们是如何处理性能问题的!需要说明的是,大部分脚本调整工作并非由我亲自完成,但我提供了相关工具和建议。优化我们的脚本 游戏中的脚本文件以文本形式存在,承担了大部分游戏机制的实现工作,为设计师和模组制作者提供了强大的功能。 然而,这种强大功能也带来了一些责任,同时存在一个明显的缺点——我们的Paradox Script脚本运行速度不如等效的C++代码快!而且很容易在无意中让某些功能的运行速度低于所需水平。 为了应对这一问题,我们在最近的版本中添加了一些工具,用于查找运行缓慢的脚本。例如,脚本分析器已在开发模式下可在游戏内使用。 我们使用这些工具检查了游戏中几乎所有的脚本文件,以寻求性能提升。我们重新调整了触发顺序,简化了结构,采用了“效果相当但速度略快”的微优化,以及其他多项改进。对于那些希望我们尽量保持脚本不变的模组制作者,我们深表歉意,因为在优化过程中我们对脚本进行了大量调整! 我们还针对最常用的触发器和效果,对其C++层面进行了优化。有时我们甚至会创建新的触发器——一个很好的例子就是那个命名贴切但略显拗口的【is_available_quick】触发器。

【生成的脚本文档显示“is_available_quick”触发器】 这个新触发器允许我们一次性检查多个已有的可用性“状态”检查,因为我们发现经常需要同时检查它们。其功能逻辑等同于分别调用这些检查,但在底层是一个单一的“更智能”且更快的C++代码块。 同样,我们为多个脚本列表生成器添加了“快捷方式”选项,用于常用筛选器,仅通过文本调整就能加快脚本速度。 脚本顺序优化器 我们还研究并实现了一个自动脚本顺序优化器,它根据权重调整脚本执行顺序,通过提前停止执行来提升性能。但我们决定暂时禁用这个脚本优化器,因为我们发现它带来的性能提升并不总是足够稳定。这是在进行性能优化时可能遇到的典型情况:一项性能改进可能会抵消另一项的收益!在这种情况下,我们其他的脚本优化措施已经抢占了这个工具能带来的最大提升空间。 如果我们从数据上判断它确实有用,我们仍可能启用它! 无论如何,我们会将其用于未来脚本语言的元编程改进。 出于好奇,你可以通过在运行游戏时添加命令行选项‘-script_optimizer’来启用其基础版本。我本可以专门写一篇开发日志来介绍这个工具及其选项,但还是留到以后吧。灵活频率 有时我们会模拟一些不该模拟的内容——那些无关紧要的事情,或者对于某些角色类型而言不会频繁变化的事情。我们的许多脚本逻辑缺乏便捷的工具来智能调整AI的行为频率。例如,冰岛的男爵和中国的皇帝,在考虑大多数角色互动时的频率是一样的。 因此,我们对多个系统(互动、决策等)进行了调整,使其能够为AI的考量配置基于层级的频率。这不仅为我们提供了一种静态定义的方式来确定特定层级的统治者应多久考虑一次某事,更重要的是,还能明确他们何时绝对不应考虑(即频率为零)。例如,我们为游牧、曼陀罗和瓦努阿统治者设置了“release_as_tributary_interaction”(以附庸身份释放互动)。男爵或伯爵应该考虑这个吗?不,绝不,他们在机制上无法做到这一点。公爵呢?也许……但对他们来说不是很重要。然而,在我们旧的互动定义逻辑中,我们必须进行检查,而且每年每个统治者都会进行检查。这是一个相对“昂贵”的脚本检查。 通过基于等级的AI频率定义,我们可以消除这种检查,并将此互动放入伯爵或男爵永远不会检查的类别中。(“从不”是我们可以轻松优化的地方!)

【“释放为附庸国互动”的AI频率配置】 作为一个重要的优化点,这还能让高阶层AI统治者更频繁地考虑某些事务,同时不会因低阶层角色而影响性能,从而让游戏AI在关键时候反应更迅速。 我们已将这种基于阶层的频率配置方式应用到几乎所有角色互动、决策、活动和大型项目中——总计约550种统治者AI会考虑执行的“事务”。 消除无关小人物 目前有一个常用的提升游戏速度的模组叫“人口控制”,它会在世界人口达到特定数量时开始清除人物。虽然我们没有完全按照那种方式去做,但我们一直在努力解决一个根本问题:游戏中会积累大量没什么作用的角色——经过几百年后,这类角色的数量可能会达到数万个。 因此,我们着手减少或消除了许多产生无聊、无用、随机或不可见角色的“源头”。这些角色会滞留在宾客池、男爵和伯爵宫廷的边缘,占用空间并影响性能。 同时,我们让那些更有趣、有一定背景故事的角色留存更久,并更频繁地让他们参与事件,通常会用他们来替代随机生成的无名角色。这使我们能够去除长期运行的游戏中积累的无用冗余内容,减轻了即使是相对次要的非统治者角色对模拟系统造成的负担。 结论 在撰写本文时距离发布还有几周时间,让我们看看目前的进展:我们最初测量到每日游戏节拍速率下降了近1.5倍,因此设定了一个目标,即不要比上一版本慢太多。 我们大幅增加了地图面积,随之而来的是角色数量的显著增加。 距离发布仅剩几周时间,我们正处于开发的最后阶段,并在进行日常的调整。我们已进行了显著的性能优化,现在游戏运行速度相比之前设定的帧率目标,更接近当前正式版的速度。这一点已在用于测试的高端和低配电脑上均得到证实。 我们希望在最终阶段的工作尘埃落定后,能让游戏运行得更快一些。 畅快游玩吧! 更新后的最低硬件配置 接下来由我,黑色工作室的技术总监吉米,来谈谈最低硬件要求的一些结束语。 首先,我对团队在《天下一家》这一技术要求极高的扩展内容上所做的性能优化工作感到非常惊讶。作为《天下》性能评估的一部分,我们重新审视了最低硬件要求,特别是CPU基准线。正如这篇开发者日志中所提到的,经过数月的优化工作,我们几乎解决了早期开发阶段出现的所有性能下降问题。从开发初期每日游戏循环时长大幅增加的情况,到现在,即便在地图扩大和模拟内容更复杂的情况下,游戏运行表现已与当前正式版本相当(游戏早期和晚期阶段会有一些波动)。这些测试在高端和低端硬件上均已完成。换句话说,此次硬件规格的更新并非由于代码未优化,而是为了确保所有玩家都能稳定运行游戏。在我们硬件实验室进行的大量测试中发现,像2011年发布的英特尔i3-2120这类较旧的CPU,在高负载情况下表现不稳定,尤其是搭配仅6GB内存的系统时。这是我们2020年最初的最低配置要求,对于将近15年的硬件来说,其使用寿命已经相当可观。但为了确保一致的游戏体验,我们现在将最低CPU推荐配置更新为英特尔i5-750或AMD FX-4300,并搭配8GB内存。以如今的标准来看,这些配置仍然适中,但它们能为《十字军之王》在发行五年后的今天提供所需的稳定且可预测的性能。