Restarting a process is about bringing it back to a stable, known state. From there, things can be retried. When the initialization isn’t stable, supervision is worth very little.
An initialized process should be stable no matter what happens. That way, when its siblings and cousins get started later on, they can be booted fully knowing that the rest of the system that came up before them is healthy.
If you don’t provide that stable state, or if you were to start the entire system asynchronously, you would get very little benefit from this structure that a try ... catch in a loop wouldn’t provide.
Supervised processes provide guarantees in their initialization phase, not a best effort.
If, on the other hand, your database is on a remote host, you should expect the connection to fail. It’s just a reality of distributed systems that things go down 8.
In this case, the only guarantee you can make in the client process is that your client will be able to handle requests, but not that it will communicate to the database. It could return {error, not_connected} on all
calls during a net split, for example.
The reconnection to the database can then be done using whatever cooldown or backoff strategy you believe is optimal, without impacting the stability of the system. It can be attempted in the initialization phase as an optimization, but the process should be able to reconnect later on if anything ever disconnects.
If you expect failure to happen on an external service, do not make its presence a guarantee of your system. We’re dealing with the real world here, and failure of external dependencies is always an option.
不管你用什么手段(冷却时间或你认为最佳的补偿策略),都必须确保数据库的重连,以便不影响整个系统的稳定性。可以尝试在初始化阶段作尽可能的优化,但必须保证进程能在断开连接的情况下能重连回来。
如果你预料到一个外部服务有可能出错,那么必须保证这些问题不会出现在进程初始化阶段(系统绝不能出错的那些阶段)。我们面对的是什么都可能发生的现实世界,外部依赖接口不靠谱的情况屡见不鲜。