The Basis

Introduction


The present programming technology shows a number of problems which could be eliminated by means of natural-language programming in general and Pegasus in particular:


Major Problems of present program technologies and computer languages

1. The mental problem A developer has to restructure her/his thoughts so that they correspond to the world model of a specific programming language instead of being able to express them directly to the computer. This results in losses, for example, when writing a C#-program we tend to think, right from the beginning in classes, attributes and methods – all entities which do not at all occur in the real world – and thereby strongly restrict our creativity.


2. The programming language problem The same algorithms must be implemented over and over in new programming languages, although the actual idea behind these algorithms does not change over time: for example, "bubble sort" formerly in assembler language, now in Java and soon in another new language will stay the same algorithm.


3. The documentation problem Developers must read and write comments and documentations in order to explain the code they create. The code itself is not expressive enough to be self-explicable to other programmers, possibly not even to those involved in the same project. In addition to that, often documentations are written in a language which is not the native language of the author. This is a source of misunderstandings, mistakes and inefficiencies, possibly due to omitting the documentation was completely, or using wrong expressions.

4. The technical problem Instead of dealing with the actually creative problems of a program, developers lose a lot of time with technical inconsistencies, for example, with the choice of the correct character set, the network protocol or the database format.


Solutions and hypotheses

Natural-language programming with Pegasus could solve the above mentioned problems: 1. The program ideas of the developers could be expressed directly in a natural language without costly restructuring. 2. So far as the natural language Usually remains unchanged for long time periods, programs written with it remain also equivalent: not the programs should be written anew in order to adapt them to the current state of technology, but the compilers for the natural-language programming would develop without having the developers to notice it. 3. Developers could write comments and documentation in their native language and other developers could go on working with it, because it is automatically translated in their native language. The need for comments itself would be reduced, because the programs would be already written in natural language. 4. Pegasus admits only standard formats as for example Unicode.


Taking into account the above considerations we base our work on the following  three hypotheses, which we hope to be corroborated by the realization of the program system "Pegasus" and the implementing of numerous application examples.


1. With natural-language program languages applications can be generally developed more effective, in particular by using context dependency, referencing mechanisms and compression mechanisms inherent to natural language.

2. The programs written in natural-language computer languages are easier and faster to understand than programs written in usual programming languages.

3. Natural-language programming languages are easier to learn than non natural-language computer languages.



The idea language

The natural language, more specifically the human thinking must be modeled in some way in order to be usable in Pegasus. For this purpose the idea notation was designed. With its help atomic ideas, ideas which arise directly from our senses as for example "red" or "loud“, can be combined to complex ideas, as for example "wood" or "table", these complex ideas in turn to composite ideas, which illustrate thoughts like „the table is brown.“ (Table, brown). The ideas themselves stand above the specific natural language, because they can be expressed in different natural languages. Although all these ideas are individual for every single person, nevertheless, they are quite similar among the people. Communication is actually only possible because of this, and it is also the base for a computer system with general knowledge.


Characteristics of the natural language

Among other things natural language differs from present computer languages in the following three points which are all aimed to form information exchange as efficient as possible: 1. Frequent application of implicit references, for example, by pronouns like "he", "she" or adjectives like "the last", by direct references like „the string“ or by partial sentence references like „Yes, this is correct.“. 2. Application of compression, by syntactic compression like „Print the matrix‘ lines and columns.“, as well as by semantic compression like „Go through the set from left to right and vice versa.“. 3. Context dependency like "Take a list. Add the number 3 to the end. Take the name of the list. Add the number 3 to the end. “, where the sentence „Add the number 3 to the end.“ has a different meaning in each case.

Understanding

Short-term memory

Generate

Verbalize

Natural language
- German
- English

Programming language
- Java

Read

Long-term memory

Brain

Further research in the area of the before mentioned Story-Telling systems [039] is preferable. If there exist related works or such would appear later, which are currently under research, I will stimulate an exchange, if this could be profitable.


Altogether, there are only a few attempts in the area of pure natural-language programming what is in our opinion to be led back mainly on the fact that at the time of the development of such systems technical developments, which are inalienable for a natural-language programming system, for example efficient parsing of natural language, semantic data formats as well as sufficient computing and storage capacities were not yet available.


Discussion


Problems with natural-language programming could emerge from the fact that in the natural language there are often different ways to express the same, for example „Write a string.“ or „Display a string.“. However, the number of these semantically equivalent expressions is strongly limited and can be grasped. Besides, natural-language expressions are often equivocal, however, this can be solved by further inquiries of the system like „With the word „it“, do you mean the set or the string?“.


Formal notation is sometimes preferred to linguistic description, for example in mathematics. Hence, a natural-language computer language should additionally support formal notation, so that the optimal program technology is always available to the user, so that he himself can decide, when he would like to use formal notation and when not.


Conclusion


Finally the following can be said in the opinion of the author: Object-Oriented Languages have already reached the peak of what is possible with this program paradigm. They are the result of a long way which has begun with assembler languages and has led to programming languages, which more and more got closer to human thinking. One should suppose that this development continues in the future. Natural-language programming could play a crucial role here. However, it will be at least of interest for further research.


Pegasus will be the first theoretically founded and practically applicable system of this kind, therefore we hope to perceptibly enlarge the scientific knowledge in the area of software technology and computer languages, but also in affected areas, for example computer linguistics.

The technology


Pegasus reads a natural-language text and generates a working program out of it, which implements the ideas contained in the text.


Architecture

Resolving of idea language expressions

Pegasus reads an input sentence and generates an expression in the idea language that is equivalent to the input sentence. The example „If the first element of the second row of the matrix is smaller than 3 then write "I can understand you!" would be equivalent, for example, to the following expression in the idea language:

(
   (
      ((be, action), normal mode, present tense, predicate),
      ((element, entity), (first, positive, property),
         ((row, entity), (second, positive, property),
            ((matrix, entity), single, entity reference,
               reference, normal phrase, phrase),
            single, entity reference, reference, normal phrase, phrase),
         single, entity reference, reference, normal phrase, phrase, subject),
      (smallness, comparative, property),
      ((three, entity), single, entity reference, reference,
         comparation phrase, phrase, object),
   statement clause, clause, first),
   (
      ((writing, action), normal mode, present tense, predicate),
      (direct personal reference, reference entity, subject),
      (("I can understand you!", character string symbol),
         symbol phrase, phrase, object),
   command clause, clause, second),
condition sentence, sentence)


The necessary lexical knowledge would be stored in a dictionary that contains all grammatical forms of words, as well as the allocation of the idea for the respective words of the supported languages. An entry could look like this:


After an input sentence is expressed in the idea language, Pegasus tries to determine the meaning of the sentence. An expression of the idea language would be resolved gradually from the outside inwards and compared to the meanings stored in the internal library. The example sentence would match, for example, the following entry in the library:

Here the instructions are stored, which cause exactly the intended behavior of a causal connection in a corresponding Java program, namely the if-then instruction.


With the help of the short-term memory, a queue which internally stores the last seven used entities, references like „the number” or "he" would be resolved.


Generation of the output program

Afterwards, the output program would be generated out of the resolved meanings in the library. For the short example program:


Take the matrix ([2, 2, 1], [1, 4, 3]).
Print it.
New line.
If the first element of the second row of the matrix is smaller than 3 then write "I can understand you!".


this will look as follows:


// Take the matrix ([2, 2, 1], [1, 4, 3]).
long[][] matrix74 = new long[2][3];
matrix74[0] = new long[]{2, 2, 1};
matrix74[1] = new long[]{1, 4, 3};

// Print it.
System.out.print("(");
for (int i7 = 0; i7 <= 1; i7++)
{
   System.out.print("[");
   for (int i8 = 0; i8 <= 2; i8++)
   {
      System.out.print(matrix74[i7][i8]);
      if (i8 < 2) System.out.print(", ");
   }  
   System.out.print("]");
   if (i7 < 1) System.out.print(", ");
}
System.out.print(")");

// New line.
System.out.println("");

// If the first element of the second row of the matrix is
// smaller than 3 then write "I can understand you!".
if (matrix74[1][0] < 3)
   System.out.println("I can understand you!");


Pegasus could now express the example also in German, by mapping the corresponding expression in the idea language by means of the German grammar to the syntactically correct sentence:

Wenn das erste Element der zweiten Zeile der Matrix kleiner als drei ist, dann schreibe „I can understand you!“.



Related works


Since the beginnings of computer science, there were computer languages which made use of natural language to a very large extent. However, these are no natural-language computer languages in the real sense because they are not based on a subset of the natural language, i.e. on a limited, but correct syntax and on a limited, but correct lexicon, but represent merely a natural-language supplement to formal computer languages. COBOL, FORTRAN and BASIC belong among others to these natural-language supplemented computer languages, of newer date are KlarDeutsch and AppleScript [010]. KlarDeutsch is a commercial program language supplemented with natural-language, which is used in machine control. AppleScript is a scripting language for writing simple programs for the Mac OS. In former versions AppleScript was available even in several languages. Natural Java [045] is a language, which can generate Java code with the help of natural language. However, the use of the natural language is limited to a pure mapping of the Java structure, for example with orders like "I would like to create a public method that is named XYZ ...".

Since the eighties there were also attempts of purely natural-language programming languages: The language NLC [054, 006], which allows programming in English natural-language in the restricted domain of matrix operations, is the first of these attempts. A newer interesting attempt is the Metafor-Prototype [039], that considers a program as a story to be told in style of Story-Telling systems [034] with sentences like „Pacman is a character who loves to run through a maze and eat dots.“. Beside these scientific projects there is also an initiative of private individuals: „Osmosian“ [049].

Pegasus transcends this by integrating all features of the mentioned systems: