## Research Problems Related to Theorema

### RISC-Linz, Austria

The Theorema system is developed at RISC-Linz by the Theorema group, under the leadership of Bruno Buchberger. This document is companion to the abstract of the tutorial Exploring Computer Mathematics with Theorema.

• #### Strategies for proving in natural style:

After some training and experimentation (as we did during the development of the Theorema system), it becomes relatively easy to detect and implement inference rules in natural style. However, combining these rules in a straightforward manner leads to inefficient provers (due to combinatorial explosion) and sometimes to unnatural proofs. The research problem is to find appropriate strategies for combining these inference rules, such that the provers are efficient and the proofs are natural and compact. The problem applies to provers in general logic (propositional, predicates of first order and higher order) as well as to specific domain provers (set theory, tuples, natural numbers, integers, limit domains). For instance, it is interesting to investigate whether results obtained by an efficient prover (e.g. resolution-based) can be used to guide the natural-style prover.

• #### Using meta-variables in natural proving

One may identify to styles of proofs: the presentation style, and the discovering style. Let us illustrate this by an example: We want to prove: "The sum of two convergent sequences f and g is also convergent." The proof will start by: "Let a, b be the limits of f, g, respectively."

A proof in presentation style will continue with: "We prove now that the limit of f+g is a+b." The prover guesses (like by magic) which is the appropriate limit, and later in this proof it will also guess good terms like e/2, max(M,N), in the same magic way. In fact, these terms can be discovered only by performing the proof by some other method. However the proof is enough to convince somebody about the truth of the sentence.

A proof in discovering style will (in contrast) continue with: "We must find a value c which is the limit of f+g." The second prover will later introduce more such "must find" variables, which are names for terms which we do not know yet, and at certain moments in the proof it will figure out which are the appropriate values of these unknowns. This proof is also convincing, although a bit more difficult to follow, however it also shows the reader how to find the appropriate terms. (The reader may later use similar techniques in other proofs.)

These temporary unknowns are usually called "meta-variables", and they occur in natural style proofs and also in tableaux methods. The research problem is to find efficient methods/strategies for combining inference rules which introduce/solve meta-variables with other natural style inference rules, such that the proofs obtained are natural and compact. The problem is relevant both for provers in general logic as well as for provers in specific domains - which suggests that it may be possible to find strategies which are valid for any set of inference rules. The experiments we did in the frame of the Theorema project show that a straightforward adaptation of the techniques used in tableaux proving leads to inefficient methods (large search space) and unnatural proofs (wrong order of inference steps). On the other hand, certain specific strategies can be both efficient and natural for certain types of problems (e.g. S-decomposition for limit theorems).

• #### Proving in inductive domains and generating natural conjectures

The induction principle of any elementary inductive domain (e.g. naturals, tuples) is simple to implement and also quite natural. However, for particular proofs specific variants of the basic induction principle may be necessary, or specific strategies to reduce the proof to this basic principle. Moreover, during more complex proofs in natural style, one encounters subgoals which contain only ground terms, but it is possible to prove them only by generalization (replace some constants by universal variables) and induction (possibly on part of the subgoal). This last operation is usually seen as conjecture generation and proving. The research problem consists in finding efficient and natural methods for finding the appropriate induction principle depending on the problem at hand, and for generating useful and compact conjectures. A significant amount of research has been done on induction in general, however there is still need to adapt and experiment with these techniques in the context of natural style proving.

• #### Equality reasoning in natural style, use of sequence variables

While very powerful tools - notably [hyper]paramodulation and Knuth-Bendix completion have been developed for this problem, they are not yet fully applied in the context of natural style proving. For instance, natural style proving is goal oriented, while critical pair completion just infers new equalities. Moreover, in many natural style proofs one needs to alternate equality rewriting steps with logical steps: thus not all the equalities are present at a certain stage in the proof. On the other hand, paramodulation produces proofs which are very difficult to understand. For instance, a proof in which many assumptions are negated equalities is neither very natural nor very readable.

Moreover, in the Theorema project we noticed that sequence variables are very useful for natural expression of many mathematical statements, especially in the domain of tuples. A sequence variable is a symbol which stands for the occurrence of an arbitrary number of terms (in contrast to a normal variable which stands for the occurrence of a single term). For instance, if x is a usual variable and X is a sequence variable, then the term [x, X] stands for [x1], for [x1, x2], for [x1, x2, x3], etc., thus for any tuple with at least one element. One may then write definitions like: "For any x, y and any sequence Z, the list [x, y, Z] is sorted if and only if x is greater than y and [y, Z] is sorted."

The research problem consists in finding efficient strategies for equality reasoning in the context of natural style proving (possibly also by adapting Knuth-Bendix completion and paramodulation), and to adapt these techniques, and/or invent others, for the formulae which contain sequence variables.

• #### Proof presentation in user-oriented style

In the Theorema system we want to present the proofs to the user such that he can understand them. However, different types of users in different situations may need different types of presentations. Sometimes the proofs have to be published on a more (e.g. printed) of less (e.g. internet) fixed media. This raises problems like: presenting the tree of the proof in a more or less linearized form, references through labels or links, hiding or showing certain details.

Even when the media is very interactive (ideally the Theorema system), sometimes the user only needs to follow the finished proof, sometimes he also needs to interact with the proof in order to develop it. For the former situation we experimented with the technique of focus windows, which shows only the relevant formulae at each step, for the later we are developing an interactive frame, which allows navigation and controlled expansion of the proof tree.

The research problem consists in finding and implementing the most appropriate presentation techniques for different concrete situations. This also includes experimentation with [numerous] concrete users, and has a certain impact on the overall design of the system: for instance, the information provided by various provers has to be standardized such that it can be used by any presentation engine.

• #### Other research problems

For the following problems I give here only the headlines, mostly because we just recently began to experiment in the Theorema system with these matters. However I plan to substantiate them during the tutorial.
• Retrieval of information in Mathematical Knowledge Bases.
• Practical program verification: functional vs. imperative.
• Program synthesis.
• Representation of proving, computing, and solving steps in the "proof" tree.
• User language for specification of inference steps and provers.

Throughout the tutorial, we will give particular emphasis to refining and explaining the research problems described above.