Thinking about the file format of game is usually attacked after a game exists for a while. It is an after the fact activity that occurs piecemeal, some what unconsciously, as game elements are abstracted out to allow the game to be more data driven.
The abstraction usually hits a road block on behaviors. The problem is that most games are compiled, but complex behaviors are best solved with a programming language expression.
Enter scripting languages. This is the right idea, but it introduces a new set of issues. Now there are two languages, a fast compiled (and usually complex) language, and a simple interpreted language.
We still don’t have a file format that our game pipeline can load. We can’t interpret most of the code. The code is not written for the pipeline to understand.
This is a problem, and prevents powerful forms of automation. Between partial application, profiled guided compiliation/memory allocation, and lazy incremental computing I think all of the optimization that programmers apply by hand, can be automated. To do these types of optimization requires an understandable and predictable language to reason about. Object C is too big. Its to confusing. There are too many primitives.
Haskell is also too complex, but it is closer. One lesson learned from Haskell is that all the built-in data types should be used through an interface. Haskell learned this the hard way with the whole String vs Text/Bytestring saga. The specific implementations would be choosen by the compiler from profile guided optimizations. Whether to use a list for the list interface or a vector or fixed array is determined by how the object is used in the code. This is something that can at least semiautomated.
I don’t know the right language to write games in. It is whatever language is the most productive all the way through optimization. This includes the community.
The question of what language to pick is best posed like this: “If we were hiring as fast possible what rate could we aquire employees from the community, and how many games would be completed from that the community and language combo.”
So there are two reasons to have a simple language. One is for optimizations, the other is for hiring employees.
I don’t know what the answer is though.
Back to the format of a game. Its simple.
data Game = Game Code Data
I have been talking about the Code part. Its some language we can perform the optimization I talked about. Its this unicorn of a language. I am assuming that it exists for the rest of the post and it requires the automatic optimization I talk about earlier. I’ll go into more details.
So the data breaks into two things. First there is the data that is never mutated. This where the art assets go for the most part. This our readonly data with the ability to add local contexts. It is the Reader Monad. The other parts of the data are the mutable data. This is all the tweaks that producers can do. Its our state monad. So the data creates an environment that game executes in. The Writer is where all computationally irrelevant output goes like metrics. I will leave it out now.
type ReflectableEnvironment = Dictionary String ReflectableType
type ImmutableEnvironment = ReflectableEnvironment
type MutableEnvironment = ReflectableEnvironment
newtype Environment = RWST ImmutableEnvironment MutableEnvironment ()
This split makes it easy to partially apply at any point in execution by using whatever data is in the reader monad to make a new function. This would be equivalent to replacing the update loop with a new one specialized to all the available readonly data. Think you fixed sized preallocated arrays based off the data.
To make this concept efficient would require that not only the data storing the readonly be amenable to incremental updates, but that the structure for storing the Code be incrementally updateable also. Think edit and continue. Specializing the program must be able to be done incrementally.
I have no doubt that the algorithm can be done in polynomial time, I think depending how the specializer works it might be sublinear if whole program optimizations can be made incremental.
Lets assume that incremental specialization is cheap. Then you could have the client perform its own specializations. So if a users mutable state hasn’t changed in x years, or whatever threshold was optimal, we could specialize using the all of the users slow changing data. So each user would have a game that was optimal for their data based on the probability of the data mutating.
If some of the mutable data was specialized away, and was mutated after specialization, that would require a latency hit as the game would need to be recreated. It might be impossible to ever make this fast enough, to happen regularly, but if the likelihood was once a year, or whatever we set, we could just perform the operation in the background and let the user know that it is updating. The optimal threshold would need to be tweaked between the speed of the optimizations and the latency of implementing them.
There is no way you do this stuff by hand. I think partial evaluation is the future. The trick to make it efficient is to rethink all file formats that the game uses, or more the simply, the one file format of the whole game, as a list edits. The list of edits are passed to a specializer along with an evaluator. A JIT. A tracing JIT that logs long runs of user behavior and can process it in the background.
So the trip to incremental algorithms is just pass in the changes. There is another way to about it that is less concrete but makes it clearer what the process to optimize is.
There is someway to fit the data to a function. The edits are the changes of the parameters of the function. This is not what I think should be done. It is too hard (MAXEnt seems equivalent, although it goes the other direction. Creates a functional that optimizes entropy given constraints. The nonlinear solutions from MAXEnt might not be suitable for games, but maybe could work in the pipeline). Instead as a stop gap that is still pretty efficient, is to use the data type definition to generate a edit list that is optimized for general plane trees. This is the gdiff library on Hackage.
So for any data type we can generate a reasonably efficient edit list for our entire game. In this way we can stream the game. Enough for now.