Wednesday, September 8, 2010

Genifer: Facts and Rules Visualization

depth=3, 2D

depth=4, 3D


This interactive visualization is created by a GeniferGraph adapter class which exposes GeniferLisp's internal data to dANN.  dANN then uses the graph representation to dimensionally embed it in N-dimensional space.

Here is the sourcecode for the GeniferGraph class.  Notice that it uses recursive descent to add all LISP symbols to a finite depth.


Wednesday, September 1, 2010

Genifer Logic Graphs Visualization

Visualized by creating a dANN directed graph from the ABCL (Armed Bear Common Lisp) data structures, and visualizing it with SpaceGraph-J2 + dANN Hyperassociative Map.




Tuesday, August 31, 2010

Self-Reflective Interactive Realities


A.I. SpaceGraph

another reason for going directly to JOGL is that i see an opportunity to make an AI scenegraph
a scenegraph is just a framework for managing what drawing commands are performed
so what i'm saying is that certain parts of dANN could form that framework
which is why i wondered if dANN had anything like a quadtree or octree
it seemed similar to that VectorSet or VectorMap we were discussing
for using GraphDrawing to find contents in a hypersphere
a scenegraph is usually a DAG
i think a more appropraite term is 'SpaceGraph'
not scene
described here http://automenta.com/spacegraph

How does space graph differ from scene graph?

scenegraph is just a term used in computer graphics
i'm making a connection between AI and 2d/3d computer rendering
the other side of it, is input sensing
the mouse cursor for example (and multitouch too) generate a sequence of events
and changes in state
the input logic too could benefit from AI

2D Prototype

just working on a 2D model right now
which would be suitable for graph drawing (and interaction)
it does zooming and panning and now i can do intersection with rectangular areas

what i'm saying is that we can do even better than ordinary ‘scenegraphs’ potentially
whether it optimizes it or replaces it entirely


Why focus on user-interfaces rather than AI “work”?

because the graph drawing stuff could form the basis for a new desktop UI
and as a result, apps like netbeans would be obsolete
because we could in essence program directly into it
i'm thinking deeply about it because it may hold a chance of accelerating development, in the long long term
if the system can reason about its drawing commands, its input sense, and what it is working with, it will be completely reflectively AI
then the entire system state can be persisted in graphs and shared over the network
no need for the file system, trac, etc anymore
Git might be a good versioning system to use for serialized graphs
(there is JGit pure java impl of Git which can be used to make a virtual filesystem interface)
for example, code and issues would be linked explicitly
so realizing all this, i'm trying to make a simple 2D interactive framework that dANN can hook into this way
for at minimum, graph layout, text input, and drawing connections between things
i mean a user-inputted drawn connection
like sketches or wires
i think i've found a minimum list of UI functionality necessary to achieve this
full graph reflexivity
even typing the letter 'a' would start to bring up associations
'...is the first letter of English alphabet'


Dont you think for that to happen the AI needs to reach a more mature level?

everything necessary exists right now. its not really AI, just designing it in a way that it can be utilized by AI components like graphs, neural networks, genetic algorithms, etc
then as soon as we can construct those AI components from within the system, we can accelerate experimentation
because all of the results will be right there


Why move to JOGL instead of Java3D?

for interactive graph panning, zooming, and clicking buttons
JOGL is more direct
Java3D's scenegraph is nearly made obsolete when compared with Ardor3D or JME3
some of its terminology are really unweildly, like 'viewplatform' etc
anyway Ardor3D and JME use JOGL as their underlying OpenGL interface
https://jogl.dev.java.net/

Saturday, August 28, 2010

Integrating Logic and Machine Learning


"We may be utilizing machine learning 'too little, too late'.  (IE, we're using too little ML and using ML not early enough).  Thus we may be already going down a slightly wrong path...  we should correct this by taking ML seriously and try to find the best way to incorporate ML in our architecture..." --YKY

interfacing with dANN has been my goal while designing Genifer's Java layer.  when it's finished, it will allow us to combine dANN's components with Genifer concepts.

dANN provides several "machine learning" (and related) components:
  • dynamic bayesian networks
  • many types of neural networks
  • dimensional embedding aka graph drawing
  • graph metrics
  • etc...
Genifer's logic can remain in LISP -- Java may not be allowed to interfere - but the concepts will be reference-able

LISP will be able to bind formulas to Java method calls.  for example, using graph drawing:

List<Atom> getNeighbors(similarityGraph, center, radius)

so now that we know this, the question becomes: how do we create integrative designs that utilize Genifer and dANN as (internal) components?

well we could create many types of integrative designs.  though we may have a chance to evolve them: we can declaratively instantiate new instances of the system inside itself.

http://picocontainer.org/introduction.html


"Constructor Injection is a Dependency Injection variant where a component gets all its dependencies via its constructor."


"The most important benefits of Constructor Injection are:

  • It makes a strong dependency contract
  • It makes testing easy, since dependencies can be passed in as Mock Objects
  • It is very succinct in terms of lines of code
  • Classes that rely on Constructor Injection are generally Good Citizens
  • A dependency may be made immutable by making the dependency reference final"


Systems like Picocontainer, and Spring Framework, are used to declaratively specify "wirings" of component dependencies.  This amounts to a hierarchical property tree listing the interfaces, implementation classes, and constant parameters in order to fill a "container" aka "context".

more later on this...

Friday, August 27, 2010

Tuesday, August 24, 2010

Why GUIs are Important in AI Design

what IDE do you use to develop software?  i hope it's not exclusively command-line tools.  modern IDE's provide a lot of convenience which improves the quality and speed of development: syntax highlighting, code completion, visual debugging, etc

so if you're using an IDE like NetBeans or Eclipse, then your interface to Genifer, for example, is ALREADY through a GUI (though indirectly).

if you understand this, you'll see why i think it's important to consider the GUI's we use to develop AI. 

btw some people need some visualization to think.  i consider your comment that GUIs are only important after the 'back end' is finished to be "logic-based thinking chauvinistic"

On 08/24/2010 04:52 PM, Abram Demski wrote:
I am currently using text editor command-line, because Lispers typically fall back on Emacs and I am not very interested in that... :P
check this out:  http://www.pawfal.org/fluxus/


I think it's sad that there are not better Lisp environments out there... I used to use Powerlisp, which was nice in many ways but too old to be stable (and no longer in development).
for the most modern LISP-like environment that i'm aware of, i suggest to look at NetBeans + Clojure

http://www.enclojure.org/screenshots

I dislike project-oriented environments, which means almost all IDEs... :P
(shocked!)

For Lisp, I'd love an environment which was closer to an "edit and save the memory" model rather than "edit and save and execute text". ("Save the memory" meaning "save a text file containing defuns and setqs to restore the current memory state.) Obviously for large software projects you;d need more than this in terms of project management, but for rapid prototyping, that'd be nice. Imho.
well, this is our opportunity to build that!

On 08/24/2010 05:30 PM, Abram Demski wrote:
The SOAR group recently did some interesting IDE work, too... basically, rather than really editing text, they now edit database entries corresponding to SOAR rules (which looks like text while you're editing it). I don't know a whole lot about it, though.
thanks for the pointer, seems interesting. here's what i could find on SOAR's IDE:
if you can send any more information about it, i can take a closer look

also check out jACT-R, a java imlementation of the ACT-R cognitive models:

    http://anthonymharrison.com/wp/wp-content/uploads/2008/04/tooltip.jpg
    http://jactr.org
Fluxus looks interesting...

If we can built something like that, that would be great.
Fluxus (written in C/C++) embeds its own LISP engine which is tightly integrated with an OpenGL draw loop.

JOGL provides a native OpenGL interface for Java.


in my latest commit i added a simple browser panel:

http://code.google.com/p/genifer/source/detail?r=418f1766575b7ebda60b1929163210904325e4fc

which shows facts and rules.  it uses different colors and font sizes to show the relative probability of each fact.

here is a screenshot running from YKY's latest test group:


Wednesday, August 18, 2010

MicroPSI and Genifer and dANN


I've been investigating MicroPSI.  Here are some introductions:
Psi-Theory is about human action regulation, intention and behaviour. The theory describes a comprehensive model of the human brain, its cognitive processes, emotion and motivation. It is about the informational structure of an intelligent, motivated, emotional agent (called Psi) which is able to survive in arbitrary domains of reality. This agent is driven by several satisfiable needs: need for food, water, the avoidance of pain, certainty, competence and affiliation). The cognitive processes are modulated by emergent emotional states, memory, and sensory perception.

Source Code

I collected a package of some old Java MicroPSI source code from 2005, including the above mentioned documentation that I found helpful.


Motivation
  • Five (5) basic needs that drive behaviour
  • Generate intentions to satisfy needs


Action Regulation
  • Intention is selected and executed
  • Action regulation is modulated by Emotions


Emotions
  • Emotions emerge from the system through modulation of the cognitive processes
  • Agents do not “have” an emotion, but it thinks and acts emotionally




5 Basic Needs that Trigger Behavior
  • Preserve Existence
    • food, water, avoidance of pain
  • Preserve Species
    • sexuality and reproduction
  • Affiliate
    • need to belong to a group and engage in social interactions (“signals of legitimacy”)
  • Be Certain
    • predictability of events and consequences
  • Be Competent
    • capability of mastering problems and tasks, including satisfying one’s needs

PSI Quad Networks
 
Psi Neural Networks and Activation

PSI as part of an Open-Source AGI System

I was wondering how these systems might fit together in a modern implementation of the PSI Cognitive model:
  • PSI - action, motivation, intention, emotion
  • Genifer - logic nodes and logic processing, providing memory, descriptions of sensations, and invokeable actions, geniform natural language
  • dANN - graph framework, bayesian, markov, signal processing, genetic wavelets, procedural learning
  • EnCog - neural networks and learning
  • OpenRDF - read and write semantic web data, utilize publicly available OWL ontologies to structure knowledgebases
  • MOSES - procedure learning
  • DeSTIN - machine learning, pattern recognition, concept formation
  • WordNet, FrameNet, ConceptNet, ...
  • Stanford Parser | RelEx - natural language input
  • NLGen - natural language output
  • anything else?

Thursday, August 5, 2010

My Current HOL Algorithm

The following is equivalent in strength to HOL Light. Continues this post.

  • As an extension of the notion of term (defined here), add the notation of polymorphically typed lambda calculus, an equality predicate, and the choice operator (I'll call it epsilon-- it's a function that takes predicates and forms a term that represents an arbitrary satisfier of the predicate, if one exists; otherwise, it just represents an arbitrary term).
  • Take terms with boolean type as a drop-in replacement for atomic statements in that design. (It may seem strange to use them to replace atomic statements, since they are compound statements, but bear with me. Rules will correspond to valid sequents, rather than to compound statements in the logic.)
  • Use lambda-calculus unification as a drop-in replacement for first-order unification. This includes normalization; we could either store the normal forms of statements in the KB or let the unification algorithm normalize on the spot, depending on whether space efficiency of the KB or time efficiency of the unifier is a bigger concern.
  • Implement equational reasoning. Fundamentally, we treat an equality statement which has a given truth value, T, as if it represented two rules with that truth value: one justifying inferences in each direction, substituting one term for another. These rules cannot be stated as FOL rules, however, since they represent substitution of terms within larger statements. Nonetheless, the same basic propagation scheme for the truth values will work. In addition to the 2 pseudo-rules for each equality statement, we treat the system as knowing A=A for all terms A. Knuth-Bendix completion should be implemented eventually, which allows us to ignore one of the two inference directions for each equality statement without losing anything. Ultimately, I believe this means equational reasoning can be implemented as an extension to the normalization step of the unification algorithm (but I could be wrong here).
  • Finally, we also need one hypothetical reasoning rule: if assuming A implies B with truth calue TV1, and assuming B implies A with truth value TV2, then A=B with truth value (& TV1 TV2) for an appropriate &-formula. Rather than implementing this as one complicated hypothetical reasoning step, it seems to me that the most convenient way would be to implement rule-level reasoning in general (ie, a query can be a rule, rules can be combined with other rules to produce new rules, etc) and then add this as a way of recording high-probability mutual implication.
That's all, folks!

PS:

In addition to the above reasoning system, for HOL Light, we also need the axiom of choice and the axiom of infinity. (The axiom of extensionality is covered by normalization.) Both of these axioms are very much "mathematical" in nature, and it isn't clear whether they are needed for more common-sense reasoning. (My suspicion is that they would emerge in other forms, ie, in coding the KB we'd encode something like these plus much more.)

Tuesday, August 3, 2010

NARS and Genifer

Per SeH's request, here is a comparison of entities in NARS and their possible counterparts in Genifer:

BudgetValue
no counterpart yet, maybe related to utility values
Concept
Term.Predicate
Goal
same as Term
Item
no counterpart yet
Judgement
same as Term
Question
no counterpart yet, or same as Term
Sentence
same as Term
ShortFloat
Term.Constant.Number
Stamp
no counterpart, we don't use time stamps
Task
no counterpart yet, maybe same as Term
TaskLink
no counterpart yet
TruthValue
TruthValue

Sunday, August 1, 2010

AGI use cases

(Some entries are just general description rather than concrete example;  It may take some additional time to think of concrete examples for them)


1. Pattern recognition

Programming related:
  • recognize a program / a program missing a curly brace
  • recognize a function declaration [maybe missing something]
  • recognize a mathematical / logical expression
  • recognize the definition of an abstract data type
  • recognize that the execution of a program is slow (this is fuzzy)
  • recognize that some code / data structure is complex (also fuzzy)
  • recognize that a text / image file is big (also fuzzy)
English related:
  • recognize some nouns, verbs, prepositions, etc
  • recognize NPs, VPs, etc
  • recognize sentences (or almost-sentences missing something)
  • recognize misspelled words
2. Natural language understanding / interpretation

Syntactic parsing into logical form:
  • translate nouns, verbs, prepositions, etc, into logical form
  • translate NPs, VPs, into logical form
  • translate sentences into logical form
Understanding semantics:
  • understanding nouns:
        eg:  chair is an object, dog is an animal, love is an abstract thing
  • understanding verbs:
        eg:  to kick is to hit with a leg;  to kiss is to make contact with lips;
              to love is to kiss, have sex, hold hands, talk, etc.
  • understanding prepositions:
        eg:  John eats spaghetti with Peter -> Peter eats spaghetti
              John eats spaghetti with meatballs -> the spaghetti has meatballs &
                                                                          meatballs don't eat spaghetti
  • understanding sentences
3. Learning from examples

Once we have basic English capability, we can use English to give examples:
  • Roman numeral for '1' consists of one I
    Roman numeral for '2' consists of two I's
    Roman numeral for '3' consists of three I's
    -->  Roman numeral for n consists of n I's

4. Query answering
  • Is 13 an integer?
  • Is 13 an odd number?
  • Is 3.14 a float?
  • All variables must be given a type when they are declared?
  • Every C# program must contain one Main method?
  • Is ++(x + 1) a valid C# statement?
5. Finding explanations
  • Why does some code result in error messages?
        eg:  why is ++(x + 1) invalid?
              what's wrong with  printf("This is a test);  ?
  • Why doesn't a program give the desired result?
        eg:  why doesn't  printf("Hello world/n");  produce a new line?
  • Why is one program slower than another similar program?
6. Performing simple tasks
  • Print all odd numbers from 1 to 100.
  • Replace "John" with "Mary" in this text file.
  • Write a function that reverses an English sentence's word order.

Saturday, July 31, 2010

Current Design

We've reached a point where we're worrying about implementation more than design for a bit, so I thought it might be good to record my understanding of the design being implemented.

Language:

statement := head <- body truthvalue
head := atomicstatement
body := [atomicstatement { & [~] atomicstatement }*]
atomicstatement := predicate(term{,term}*)
term := constant | variable | function(term{,term}*)
truthvalue := (probability, confidence)
probability is in range [0,1]
confidence is in range [0,1]

IE, a statement has a conclusion (which is an atomic statement) in the head, and a possibly empty list of possibly negated atomic statements which are conditions, in its body. It records a probability for the conclusion given the conditions, including a (NARS-style) confidence for that conclusion. (If the list is empty, then we are just asserting something about the head's probability; if it's nonempty, we're asserting a conditional probability.)

Functions may be Skolem functions, which is how the system expresses existential statements. Universal statements are expressed by leaving variables unbound; in this case, the probability associated with the statement is intended to be the probability that a randomly chosen instantiation is true.

A simple query for the truth value of an atomic statement proceeds in roughly the following way. We are trying to construct a bayesian network to answer the query. The query statement is the first node, and each new node is another atomic statement which is deemed relevant; the links in the network come from the conditional rules. At each step, we look for new statements which "match," meaning that they have an atomic formula whose free variables can be instantiated to match the appearance of one of the current nodes in the bayesian network.

When a new matching statement is found, it is instantiated and added to the growing network. In the case of statements with empty condition-lists, matching provides a probability estimate for one of the nodes, which allows belief propagation to occur. When two different rules match at their head, however, (conditions or no conditions) we cannot simply add them to the network; we have a rule combination problem (my prev. post).

The procedure for dealing with rule combination problems that we've decided on at present is as follows. (Rules given for probability only; deciding how to propagate confidence is not hard, but we haven't done it yet.)

  1. If the two statements have conditions which are mutually exclusive, we can just add the two resulting probabilities together. With the simple statement format given above, this is easy to test for: when one rule has A in its condition list and the other has ~A, the two are mutually exclusive.
  2. Failing this, we check for primacy of one rule over another. This is equivalent to my notion of "derived rule" in my post on the combination problem, but we'll be storing it as causal information (YKY has convinced me that causal information really does encompass derivative-ness information). The general idea is to take the more causally primal rule whenever we know that one is more primal.
  3. Finally, if neither of the first two conditions hold, we do rule averaging. My reason for advocating noisy-or was that it allowed logical reasoning, but the present formalism allows logical reasoning with simple averaging, so long as we take confidence into account: a rule conveys 0 confidence when its conditions don't match, so it does not weaken other rules when it averages with it.
Additional random topics.

The weights for rules can be adjusted on-line by counting instances where the conditions are met, and incrementing the confidence based on the probability that the conditions are indeed satisfied in the instance. This rule could (should) be added to the above query-answering algorithm, but it means we are no longer simply building a bayesian network.

The design right now makes totally ubiquitous independence assumptions; it would be interesting to find a way to dynamically relax these assumptions when we have more processing power. It's very expensive to relax too many, but it might be worthwhile to relax a few (and maybe save the results in a way that avoids re-doing the expensive computations).

Tuesday, July 27, 2010

Rule Combination

The rule combination problem is: if we have X->A and Y->A, and we get probabilistic information about both X and Y, how do we integrate the two resulting estimates of A?

The normal bayes net semantics would give us an interval of possible probabilities, based on the probability of X&Y (which we may not know, but perhaps we could use some naive estimate). We could then take the midpoint of this or some such. Call this combination approach "averaging".

Another approach is to use the noisy-or combination rule, which can be thought of as adding the two probabilities together (though we also subtract out the product of the two, which makes sure we don't add up to greater than 1). This is great for making highly mathematical systems (such as higher-order logic) probabilistic, because it captures the monotonic nature of logical reasoning: finding additional rules which apply can only bolster the probability of A, never decrease it. A system based on averaging will not be very good for doing this, because if several arguments don't work very well but one does, then an averaging approach will come out with a not-very-large probability; an additive approach, however, will come up with a large probability (as seems right).

A purely additive approach is limited, however: sometimes we want additional information to detract probability! The noisy-and rule does this. Noisy-and adds together the probability that an event won't happen that are contributed by the individual rules, in the exact way that noisy-or adds the probabilities that it *will* happen. This makes sense when we want new rules that we find to represent additional reasons that an event might not happen. This can be complement noisy-or if desired, in a way that preserves the additive "mathematical" style of rule combination where it's desired, but allows subtractive combination where it's more appropriate. ("Bayesian logic programming" uses the approach of allowing specific predicates to declare themselves noisy-or predicates or noisy-and predicates; rules whose consequence is an or-predicate will get treated additively, while rules whose consequence is an and-predicate will get treated as subtractive. YKY suggests an alternate approach, in which we declare individual rules to be either additive or subtractive; this seems a bit more powerful, and loses nothing.)

One problem is that the probabilities given by additive and subtractive rules no longer can be learned simply by counting conditional frequencies. Learning does not become impossible (and perhaps not even very difficult), but the meaning of the numbers is no longer very local, since a probability must be calculated as a sum of many rules. (Think about it: if we learned X->A and Y->A by counting the conditional frequencies, our estimate for A in the presence of both X and Y would be way too high.) The averaging approach may allow simpler learning algorithms. Yet, it seems less powerful as a representational choice.

A problem related to rule combination is how we should reason using lifted estimates. If we have a probability for C, a formula which has some free variables, then the Genifer approach is to allow this to be instantiated and consider it a probability estimate for the instantiated formula as well. Supposing it instantiates to B, and we've also got a rule B->E, which connects it to a larger Bayes Net we're making in memory, the way to handle this is clear: it's fine, even necessary, for initial items in Bayes Nets to have probabilities like this. Suppose, though, that it instead instantiates to a formula that's in the middle of a network, Q->B->E. How do we use the info now? Again, we have an information combination problem. Similar thoughts about averaging, additive, and subtractive approaches apply.

Fortunately, these two problems are just examples of a single more general problem. Assume we have rules X->A and Y->B, where A and B unify to C (that is, instantiating either A or B can result in C). Further, treat unconditional probabilities (like A) as just conditional probabilities where the list of conditions is empty-- that is, X , Y. or both may be tautologous. This is the general form of the combination problem.

If this was all there was to it, it would boil down to that choice: averaging vs additive/subtractive combination, or perhaps some mix of the two. (We could declare rules to be either a frequency, a reason-for, or a reason-against; but we'd then need a special combination rule to tell us what to do when we had a mix of frequencies and additive/subtractive rules, and it's not clear at all what combination would be appropriate... the result would seem rather hackish.) However, there are further complications.

It seems appropriate to sometimes derive rules from one another. A->B and B->C can combine to give an abbreviated rule A->C, summarizing what knowing A does to C's probability. However, all of the combination rules so far will mess up if we allow such derivations to take place. An additive (or subtractive) rule will now take the additional rule into account as well, resulting in twice as much modification from A as is merited. An averaging rule will not do that, but if we have other knowledge about B, then an averaging rule will fallaciously discount its importance. The effect will multiply as more derivations are allowed. It seems that, essentially, we are taking the same information into account more than once. This suggests that we should avoid applying two different rules if we know that one is derived from the other. Ideally, it seems we should prefer the original rules to the derived rules; however, that's only the case when we have enough time to reconstruct all the steps which are accounted for in the derivation (to check them for the particular situation); before that point, it's less clear when we should trust which rule (a partial derivation may already take into account critical situation-specific info that would invalidate the derived rule, but on the other hand, a derived rule may summarize a long and valuable chain of reasoning which is essentially sound but which would take too long to replicate). Some heuristic for choosing between the two should be employed.

In other words: the rule combination problem is solved by first asking if one rule is derived from the other. If not, either averaging or additive/subtractive combination can be done (a choice which has to be made for the system); if so, a heuristic is employed to choose between the two rules.

This has a strong flavor of NARS revision, for those who know NARS.

YKY has pointed out an interesting example problem to use for thinking about rule combination. I will give two solutions. The problem is as follows: Sue is punctual. Punctual people arrive on time. However, Sue dies in a car crash on the way to a meeting. Dead people do not arrive on time. This is represented with "P" ("Sue is punctual"), "P->A" ("If someone is punctual she'll probably arrive on time"), "D" ("Sue is dead"), "D->~A" ("Dead people probably do not arrive on time"). Assume all the rules get properly instantiated to talk about Sue, and of course to talk about some other specific variables we're hiding (what time it is, arrive where, etc).

We get a very bad result if we just apply noisy-or: noticing that Sue is dead actually will increase her chances of arrival slightly! We get just a little better by averaging, but still not satisfactory. One good solution here is to model the rule for death as subtractive: being dead subtracts from your chances no matter what they were. This gets a good result; that is, it'll be very near zero.

Another solution, though, can be hit upon: perhaps death does not override punctuality because it is a negative rule, but rather, because the punctuality rule must ultimately be seen as derived from a chain of reasoning which includes the assumption that the subject is alive! This seems more fundamentally right to me. It also brings up an idea: it might be the case that rules entered in the starting knowledge base may not all be "basic" in the sense of non-derived: rather, it may be that some rules should already be noted as derived from one another, even if we don't actually have all the rules necessary to replicate the derivation. Consider a system like Cyc. Cyc has different "layers" of concepts, ranging from physical concepts (hard, soft, long, short...) to social concepts (like, dislike, ...). The idea I'm putting forward is that in a probabilistic version of this sort of KB, it might be useful to tell the system that social phenomena ultimately depend on physical phenomena, even if we can't trace out the lines of dependence in a precise way. What this does is lets the system know that physical facts will override social facts when the two suggest conflicting outcomes; this will usually be the correct inference. (There are exceptions, however; biological organisms tend to systematically violate rules of thumb which work for simpler physical systems, such as dissipation of heat (through homeostasis), Newton-style physics (through self-guided motion), et cetera.)

In fact, it might be useful to be able to learn the dependence relationships of rules without actually having derived one from another... This would reflect the way that we assume, based on large amounts of evidence, that biology is just a very complicated dance of chemistry (ie, biological rules follow from combinations of chemical rules) without actually knowing how everything works. That seems like dangerous territory, though, so it'll have to be a thought for later...

Monday, July 26, 2010

Causality

Why causality is not completely captured by Bayesian networks

As [Koller & Friedman 2009], Probabilistic Graphical Models, Ch 21, explains:  We know that a Bayesian network is directed, but the direction of the arrows do not have to be meaningful.  They can even be anti-temporal.  On the other hand, it is common wisdom that a "good" BN structure should correspond to causality, in that an edge X -> Y often suggests that X "causes" Y, either directly or indirectly.  Bayesian networks with a causal structure tends to be sparser and more natural.  However, as long as the network structure is capable of representing the underlying joint distribution correctly, the answers that we obtain to probabilistic queries are the same, regardless of whether the network structure is causal or not.

It seems that causal relations cannot be captured simply by probabilistic relations but require some form of inductive algorithm to obtain, such as the IC (for "inductive causality") algorithm proposed by [Pearl 2000].

Aug 2010 Update:  a new set of slides on causality.

Tuesday, July 20, 2010

GOLOG

Abram,

GOLOG sounds similar to your idea of encoding actions in the logic.

There are 2 types of terms:  fluents and actions.

Examples of fluents are:

Fluents evaluates to true or false.  For example At(room1) = true.

Examples of actions are:
eg, Go(up),  Pick(package1, bag1).

And a GOLOG program can consist of these statements:

These statements can be combined with regular Prolog statements to create a new type of programming, but I'm not sure if it includes automated planning. (Will update later when I finish the lecture notes, Action Programming Languages by Thielscher, 2008).

Friday, July 16, 2010

planning paradigms

I plan to unify the following planning paradigms (this is just a draft). Let me outline the scope and try to ask / answer some basic questions:

The paradigms are:
  1. DP (deductive planning) -- in which planning is reduced to inference
  2. RL (reinforcement learning)
  3. classical, state-space-search planning (eg STRIPS)
  4. partial order planning, GRAPHPLAN, HTN (hierarchical task networks)
  5. utility theory, decision theory
  6. decision networks / influence graphs
  7. MDP (Markov decision processes), POMDP (partially observable MDPs)
  8. decision-theoretic agents
  9. "action theories" -- how to specify actions in logic
Some basic questions:

DP

Should planning subsume inference, or inference subsume planning, or shall we have both?  Are there advantages (or disadvantages) if we have both mechanisms interleaved?  Or maybe it'll just create a mess -- "techniques must not be multiplied beyond necessity?"

A basic question is whether DP can mimic the mechanisms of:
  1. decomposition
  2. partial order planning
  3. hierarchical task planning
[ TO-DO:  insert architecture diagram ]

RL

My latest conclusion is:
      DP + inductive learning   subsumes   RL

RL is the learning of an optimal policy.  A policy is a set of mappings:
        states -> actions

In deductive planning, we can have backward-chaining from a goal, but also forward-chaining from the current state.  The latter approach seems equivalent to RL, especially when augmented with inductive learning, where the learner can learn the rules that functions like the mappings above.

However, I should add that DP + IL (inductive learning) only mimics RL functionally, but is not equivalent to RL algorithmically.  Perhaps there are some merits in the RL algorithm?

One thing can be said for sure:  RL cannot co-exist simply with another planning method without creating conflicts.  In my earlier analysis I concluded that the planner should override RL whenever it has solutions, since it is knowledge-based and more sophisticated.  RL is useful when the planner is clueless, particularly in areas where knowledge is scant.

MDPs


What is their relation to classical planning?

To be continued...

Thursday, July 15, 2010

Thoughts on Goals

These are some thoughts on what might constitute an effective goal system.

One use-case I like to keep in mind is inference guidance. The goal system would not necessarily really be called upon for guidance at every inference step (this might be too expensive), but imagining that it was, we'd like to have a system that could perform fairly well.

The basic case, then, would be to think of a query as a goal and create a system which would respond to it in a manner similar to our existing inference algorithm (utilizing some background knowledge about how to achieve this goal).

Basically, we'd want to tell the system that if we have a goal of estimating variable A, and we have a rule involving a variable that unifies with A, the other variables in that rule become subgoals and we can get a better estimate for A by improving our estimates for them and applying belief prop to calculate A. This is a fairly messy statement, because it implicitly relies on an argument of convergence. Basically, the idea is that taking more rules into account tends to produce a better estimate... but this is not always the case, of course. The idea we're employing is something like "taking all the rules into account would result in the best estimate we could come up with, and chains of reasoning with more elements will tend to get closer to this..."

The line of thought from the basic facts to the appropriate action would be highly nontrivial. As a general principle, I think it's usually infeasible to plan an action from first principles when we need to decide what to do. This means (roughly) that the framework of utility theory, which tells us that we should take the action which maximizes the expected utility, is insufficient. In principle we can do with just declarative knowledge, but in practice, we need procedural knowledge.

Genifer currently uses declarative rules, which tell her how facts follow from other facts. What I'm suggesting is that we also use procedural rules, which would tell her what to do when. Procedural rules would, put together, resemble programs.

The idea here is that rather than deciding what to do for each new problem based on a planning algorithm, the system should always be in the process of constructing an overall policy, a program which it uses to make decisions. This reinforces the theme of automatic programming in Genifer.

The basic goal is to construct a set of rules which imply the highest success rate (the highest expected utility). Updateless decision theory tells us that we should calculate the expected success rate in our earliest prior distribution, that is, refusing to update on anything we learn. This technically loses nothing, because we can create rules "If you have learned such-and-such then ...", and the method solves a few problems with conventional decision theory. Notably, though, conditional decision theory (in which we look for the policy with the highest expected value given all that we know) works just fine in most cases, and would probably be faster to reason about.

Either way, the basic paradigm for converting from declarative knowledge to procedural knowledge is to search for rules with high expected utility. Unfortunately, the utility of rules are interrelated, because we've got to know how we will behave in the future in order to know which action is best to take now. This problem has standard treatments in the reinforcement learning literature, however.

The basic idea, then: whereas a declarative rule is in the form statements -> statements, and the conclusion carries probabilities, and imperative rule is in the form statements -> actions, and the actions carry (estimated) expected utilities.

What about goals and subgoals? It makes sense to think of a goal as a utility-bearing statement which is not under our direct control. We want to achieve this goal by our actions; an action is something which is under direct control. So if we generalize the above a bit, allowing imperative rules to be arbitrary "statements -> statements" rules, we get goals and subgoals automatically via the formula "subgoal -> goal." A top-level goal is a statement given unconditional utility (that is to say, utility not derived from any rule).

[Edit: this is a declarative rule subgoal -> goal,  which should be accompanied by an imperative rule trying(goal) -> trying(subgoal), or something to that effect.]

Top-level goals should have basic utility, which cannot be changed except by us. They may also have expected value, which is the sum of the basic utility and the additional utility caused by the future consequences of achieving that goal. Subgoals and actions will generally just have expected value. Basic utility can be thought of as the declarative, decision-theoretic measure; expected value, on the other hand, serves the purpose of our imperative measure. This is because expected value is a local measure which can be used immediately to make decisions. Essentially, the imperative program we follow emerges from our imperative rules via the rule of always taking that action which possesses the highest imperative value.

Fleshing out this solution will draw on ideas from reinforcement learning, and also may draw on ideas from influence diagrams (the Bayes Net answer to utility reasoning), causal bayesian reasoning (though causal decision theory solves less problems than updateless decision theory!), and other areas.

[Edit: the declarative/imperative distinction I am making is somewhat messy. I should instead distinguish 3 classes of information: information about probability, information about utility, and information about expected value. "Information" takes the form of both rules ("B->A") and naked statements ("A").]

Monday, July 12, 2010

Feature Matrix

YKY has highlighted the modules essential to the Genifer prototype.
SeH has included many more features.  Feel free to suggest more!

Sunday, July 11, 2010

Eliminating my HOL "Meta-Rules"

HOL systems, as I've mentioned before, are not usually specified just as inference rules on statements. Instead, they are specified indirectly, as inference rules on sequents. A sequent is a set of premises and a conclusion, representing a valid argument in the logic. (This is represented symbolically in the form A, B, C,... |- D where A, B, C,... is the set of premises and D is the conclusion. The |- symbol is called the turnstyle.) So, specifying a system via a sequent calculus is a slightly curious practice: rather than just specifying inference rules directly, we specify a system of reasoning which comes up with valid inference rules. Historically, this move to the meta-level is what led to the field we call "proof theory".

My previous approach was to preserve this multi-level structure, calling the sequents "rules" and the sequent inference rules "meta-rules" (since they are rules for creating rules). However, it would be more efficient to eliminate the extra level if possible.

In general, this may be a complicated process. Some sequent derivation rules have special syntactical restrictions, such as "... X and Y share no free variables ...". Also consider sequent rules like "If A,B |- C, then A |- (B->C)". It isn't obvious that there is a simple way to turn a specification of a system in terms of sequent derivation rules into a direct specification consisting of inference rules on statements.

Typical systems, though, will obey a few nice properties. First, adding more premises to a valid sequent will always result in a valid sequent. Furthermore, many sequent derivation rules just propagate along the premises from the sequents they start with to the sequents they produce. For sequents which do that, the job is easy. Consider an example:

"From A, B, C... |- P and D, E, F... |- Q conclude A, B, C... D, E, F... |- (P&Q)"


In this case, we'd just translate the whole thing to the rule "From P and Q conclude (P&Q)". These sorts can be eliminated from the meta-level immediately, which is good (every little bit helps).

Unfortunately, not all cases are so easy. Many inference rules will almost pass along the premises untouched, but make some little modifications to the list. These modifications will usually be to remove specific premises, occasionally to add premises, and quite rarely to modify the internals of premises. The example above, "If A,B |- C, then A |- (B->C)", is one example of this: one premise is eliminated in the process of drawing the conclusion. Getting rid of meta-rules of this sort requires cleverness.

First-order natural deduction systems handle that particular example in a nice way: at any time during reasoning, we may make a new assumption. This can be anything. We keep the list of assumptions that have been made. Now, say we just concluded something, S. One deduction move we can now make is to remove an assumption from the list (call it A), and stick it on the beginning of S with an arrow: A->S is our new conclusion, which is true in the new context (ie, the old context minus one assumption).

This general idea could be extended to any rules which add or eliminate premises: they are seen as just modifying the list of assumptions.

A practical implementation of this sort of idea for Genifer might be to engage in hypothetical reasoning via conditional queries: we make a query with a list of assumptions attached, to be used in addition to the actual facts. However, some issues may come up with respect to the difference between the classical conditional and the probabilistic conditional. I need to consider this more. (The approach is similar to my "rule-level reasoning", but rather than reasoning about rules using conditional queries, we reason about statements...)

This leaves rules which modify the sequents internally. The primary example of this is variable instantiation, but that's handled by unification! So, in summary, it appears to me that my "meta-level" is not needed in Genifer and can be eliminated with a little patience...

All of the above discussion was done with a translation of HOL Light in mind, but could apply to IG or an IG variant equally well.

Saturday, July 10, 2010

Prolog trick again

Abram, this is your example:


This is why it becomes a problem when you look at it from the proof-tree's perspective:

First problem:  failure to explain away

When you get from the top Q to 1, you already get an observed value for 1, this is already sufficient to produce a TV for Q.

When you get down to 2, you also get a baseline P value for "heavy rain", and supposedly it also produces a TV that can be propagated (via Prolog trick, not standard belief propagation) up to Q.

When you get down to 3, it has an observed value, then you try to propagate that upwards, but it encounters 1, which itself has an observed value (= true).  It seems that it cannot override 1's value.  So the effect of 3 seems to get drowned by 1.

All in all, this failure is due to my treating the Bayes links as isolated links.

Second problem:  cyclic proof tree

In principle we should solve this by searching the tree above the node-to-be-added for duplicated nodes, but such a search may be costly, depending on the branching factor of the tree.

That's why, in the past I simply allow the tree to become cyclic.

One cheap solution is to search only for the parent's parent for duplication. That will prevent "1st-degree" duplication, and may be efficient enough.