Getting started with ResearchCyc

My main hobby currently is looking for an open-source community focused on commonsense reasoning, specifically, the encoding of commonsense used in social interactions in a form that allows for inferring explanations and predictions. If no such community exists, I’ll create one. So far, I’ve found biannual symposia and mailing lists (see CommonsenseReasoning.org) but no open-source community.

Predicate calculus is the only notation that will be expressive enough to capture the richness of the domain, I think. I’m aware of graph-based efforts like OpenMind, but they either don’t allow constraints across variables (e.g., the agent of the action must also have known he would be its primary beneficiary in order to infer that the action was self-serving) or their method of matching to support inferencing (which is likely to be some flavor of “maximal bipartite-graph matching“) will end up being functionally identical to unification for predicate calculus. So I’m placing my bet on predicate calculus-based approaches. The Stanford Encyclopedia of Philosophy (SEP) has an interesting overview.

One of the most promising of these efforts is by Andrew S. Gordon and Jerry Hobbs, which they are developing toward a book. Let’s see if we can make that part of the community once the book is published. Andrew has a video lecture about it and Jerry offers an extensive peek at the formulations.

The Cyc Project has a freely-available ontology (i.e., a high-level taxonomy plus frame-like schemas or predicate definitions), OpenCyc. It’s unclear whether they encourage or permit contributions, and anyway I’m looking for general rules and facts rather than just an ontology. That kind of knowledge base is what their ResearchCyc project is said to be. Although ResearchCyc is not open, maybe they are open to contributions. And it was conceived and is led by one of my AI heroes, Doug Lenat, and I’d like to help it if I can. Michael Witbrock has some interesting video lectures about Cyc. Here’s what I’ve done to have a look inside ResearchCyc…

Getting a copy of RCyc

  1. Write to rcyc@cyc.com with a few sentences describing your non-commercial research interest in the system, and asking for a (free) license. The response might take a few days.
  2. If your license request is approved, you’ll get an email with a download site url, a userid, and a password. Licenses seem to expire after a year. If I click the download link and enter the password, I get a page where I can click an “rcyc” folder link and then get a list of downloads; however, revisiting the same url and entering the userid as well as the password always fails for me.
  3. Download both the “o” tgz and sha1 files (the most up-to-date in early 2014). Getting the sha1 file is recommended because I encountered a corrupted download a few times, and you need a way to check for that. There is a free “MD5 & SHA Checksum” tool, and you can verify the download this way:
    1. Click ‘Browse’ and select researchcyc-4.0o.tgz
    2. Open the sha1 file using a text editor, then select and copy the SHA1 sum value (i.e., the part preceding “researchcyc-4.0o.tgz”, not including whitespace)
    3. Click ‘Paste’ in the application and then ‘Verify’. If verification fails, try downloading the large tgz file again. (I’ve had 5 downloads in a row fail.)
  4. Unpack the tgz.
  5. Avoid editing any files if you’re using Windows; otherwise, you might introduce Windows-specific line ending characters that will prevent the server from starting in Ubuntu.

Installation

I recommend installing on a publicly-accessible server, say, on an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance. Doing so will allow you access from any browser if your local machine doesn’t accept requests on port 80. For example, this is what it looks like in Chrome on a Samsung Note 2… rcyc viewed on samsung note 2

  1. Configuring the host machine…
    • If installing on EC2…
      1. Go to https://console.aws.amazon.com/ec2/ and create an account if you don’t have one
      2. Click Launch Instance
      3. Select Ubuntu Server 13.10 64-bit
      4. In the left pane, select General Purpose, then click any row with close to 4 GiB RAM (under the “Memory” column)
      5. On the “Add Storage” page, set the persistent storage capacity (the “Size (GiB)” column) to about 25.
      6. When you get a chance to edit the Security Group, permit access from your clients…
        1. Note the security group of the instance
        2. In the left pane, click Security Groups
        3. In the right pane, select the row of the instance’s security group
        4. In the bottom pane, select the Inbound tab. To allow requests from all IP addresses, there should be a row with Port=22 and Source=0.0.0.0/0. Be sure to add Port=3602 and Source=0.0.0.0/0 because rcyc’s webserver uses this port. For more guidance, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html
      7. If prompted, click “Yes, I want to continue with this instance type” (It won’t be free)
      8. Click Launch
      9. Choose an existing key pair, or create a new one (and be sure to download the key pair file). Then click “Launch Instances”.
      10. Make sure you’ve installed Putty
      11. To convert the pem file to Putty’s ppk format, follow http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html#putty-private-key
      12. To upload rcyc, do something like this:

        C:\Windows\system32>C:\”Program Files (x86)”\PuTTY\pscp.exe -r -i C:\[your path]\[your private key file].ppk “C:\[another local path]\researchcyc-4.0o\*.*” ubuntu@[your ec2 public dns]:

        I would have preferred WinSCP’s graphical UI, but it kept showing connection errors while transferring large files. If pscp is interrupted, you can re-run the command above. If you decided to upload the gzip instead of unpacking first, or you didn’t set the persistent storage high enough, and then find that you don’t have enough free disk space to unpack (type df -h to check), there is a way to increase disk space without having to lose your disk content

      13. Configure Putty to make it easier to login to the server for future maintenance…
        1. In Windows, go to Start | Putty
        2. Session | HostName = ubuntu@[Public DNS shown in AWS EC2 Instances page for this instance]
        3. Session | ConnectionType = SSH
        4. Connection | SSH | Auth | PrivateKeyFile = [browse to ppk file you created from the pem file]
        5. Window | Columns = 120
        6. Window | LinesOfScrollback = 2000
        7. Session | SavedSessions = “rcyc”, then click Save. When you attempt an ssh connection with this EC2 instance in the future, you can select “rcyc” in the sessions list, click Load, and then edit the host address to match what the ec2 console shows.
        8. Before clicking Open to start the ssh session, you might need to change to a network that doesn’t block port 22.
        9. When prompted “The server’s host key is not cached in the registry…”, click Yes.
        10. If all goes well, you should be prompted with
          Using username "ubuntu".
          Authenticating with public key "imported-openssh-key"
          
          The programs included with the Ubuntu system are free software;
          the exact distribution terms for each program are described in the
          individual files in /usr/share/doc/*/copyright.
          
          Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
          applicable law.
          
          ubuntu@ip-NN-NN-NN-NN:~$

    • Otherwise, if you prefer installing locally…
      1. Let’s take the hard case of using Windows. The rcyc setup instructions indicate some features aren’t supported on Windows, so I setup Ubuntu 13 64-bit in VMWare Player 6 on Windows 7 Pro 64-bit. But before creating that virtual machine, go into your BIOS and ensure that “Intel Virtualization Technology” (i.e., “Intel VT-x”) is enabled; if there is another setting limiting use of VT-x to “trusted” applications, turn that off.
      2. Before powering-on the Ubuntu VM, make sure its Memory setting is at least 7680MB.
      3. To allow fast transfer of files with the Windows host, I created a “forUbuntu” folder on my Windows desktop, moved the unpacked download contents there, installed VMWare Tools in the vm, and set the vm option for Shared Folders to the new one on my host desktop. Now access the shared folder in the vm and copy the files to a local folder in the guest Ubuntu filesystem.
        1. sudo apt-get install build-essential
        2. sudo apt-get install linux-headers-`uname -r`
        3. Select Player | Manage | Install VMWare Tools…, which will mount a folder as though it were the cdrom drive
        4. cp /media/[your userid]/*.gz /tmp/
        5. cd /tmp
        6. tar xvzf VM*.gz
        7. cd vmware-tools-distrib/
        8. sudo ./vmware-install.pl and accept all defaults
        9. Verify that the mounting tool is working: lsmod | grep vmhgfs
        10. The shared folder should be listed after doing this: ls /mnt/hgfs. If it is, do this: cd /mnt/hgfs/yourSharedFolder
        11. Whether you’ve already unpacked the rcyc gzip or not, let’s move it to a non-shared folder to avoid any accidental edits (and attendant changes to line endinges) such as your home: mv *.tgz ~ and then let’s go there: cd ~/researchcyc-*.
        12. If you haven’t already unpacked the RCyc gzipped tar file, do so now, like this: tar -xvzf *.tgz

        The remainder of the instructions will assume your terminal’s working directory is this one.

      4. ResearchCyc’s scripting language is a Lisp variant called SubL that runs on top of Java. So, to install Java in your vm, open a terminal there and sudo apt-get install openjdk-7-jre-headless
  2. Make sure you have a terminal open to the Ubuntu instance (either a Putty connection to EC2 or a terminal running in your local vm instance)
  3. Edit researchcyc-4.0o/server/cyc/run/init/jrtl-release-init.lisp so that the license value you received in email is pasted in place of the XXXX in (csetq *master-license-key* "XXXX")
  4. In the same directory, I edited parameters.lisp by adding (csetq *cb-show-cure-link* t) just before (check-system-parameters) at the end. This is supposed to make a purple “CURE” button appear in the web interface, which I’m told is a knowledge entry tool that provides some guidance.
  5. Navigate to the main scripting directory cd researchcyc-4.0o/server/cyc/run/
  6. Do a test launch to verify everything was configured correctly; try ./bin/run-cyc.sh
    • If you get error bash no such file or directory executable then one possibility is that Windows line-endings made it into some of the text files; try:
      sudo apt-get install dos2unix
      researchcyc-4.0o$ find . *.* |xargs dos2unix
      chmod a+x ./bin/*.sh
      sudo apt-get install openjdk-7-jre-headless

      (My initial research into this problem suggested that there might be 32-bit components in rcyc that wouldn’t run in Ubuntu 13.10 without ia32-libs but that turned out to be a red herring.)

    • After about five minutes of startup, you should see something like:
      Start time: Thu Nov 21 17:08:08 SGT 2013
      Lisp implementation: Cycorp Java SubL Runtime Environment
      JVM: Oracle Corporation OpenJDK 64-Bit Server VM 1.7.0_25 (23.7-b01)
      Current KB: 7163
      Patch Level: 10.145914
      Working Directory: /home/david/Desktop/researchcyc-4.0o/server/cyc/run/.
      Running on: ubuntu
      OS: Linux 3.11.0-13-generic (amd64)

      and after another five minutes, you should see:

      HTTP server listening on port 3602.  Connect via URL http://ubuntu:3602/cgi-bin/cg?cb-start
      
      SPARQL server started on port 3615.
      Jetty server started on port 3603
      Ready for services.
      Total memory allocated to VM: 5791MB.
      Memory currently used: 1651MB.
      Memory currently available: 4140MB.
      CYC(1):
    • Once a Cyc: prompt appears, the webserver is ready. (But its public DNS might not be distributed yet, so if you’re impatient you might want to use the public IP number shown in the EC2 console.)
    • Verify that it’s accessible by using a local browser to visit http://[your ec2 instance’s public dns, or localhost if using a browser in the vm]:3602/cgi-bin/cg?cb-start
  7. If you’re going to edit any rcyc content, and you probably will eventually want to, you’ll need to create an account other than the default Guest account to do so.
    1. Check researchcyc-4.0o/server/cyc/run/init/release-specific-init.lisp to make sure the following is NOT present there.
      (noting-progress "Enabling password authentication"
         (csetq *image-requires-authentication?* T))

      If it’s present, delete it and restart the RCyc server. One way to stop the server is to enter (exit) (a SubL command) at the CYC: prompt; another way is to reboot the Ubuntu server.

    2. On the start page, there’s a textbox for entering a userid to change which user is logged in. Enter “CycAdministrator” and Submit.
    3. You should be on a new page (actually just a new frame in the lower pane) that offers a button to return to the (“now stale”) login page. Click it, enter the new userid you want, and Submit.
    4. You should be in a new frame that says “Unknown Cyclist…Do you want to create a new Cyc constant with this name?”. Click “Yes, Create Cyclist”.
          Note: On at least one occasion, the RCyc server quit after this command, returning no content to my browser and showing this in the terminal:
      CYC(1): ./bin/cyc-runner.sh: line 298:  2555 Killed                  java ${BIT_FLAG} ${SERVER_FLAG} -Xms${MIN_HEAP} -Xmx${MAX_HEAP} ${OLD_SORT_FLAG} ${CODE_CACHE_FLAG} ${PERM_SIZE_FLAG} ${EA_FLAG} ${CM_FLAG} ${PGC_FLAG} ${FAST_OPTS_FLAG} ${AGENT_LIB_FLAG} ${EXTRA_OPTIONS} ${CYC_JAVA_OPTIONS} ${LOG_FLAG} ${ASSERTS_FLAG} -cp "${CLASSPATH}" ${MAIN_CLASS} -f "${INIT_FORM}" "$@"
      Shutting down Derby which provides the SCG repository ....
      ./bin/cyc-runner.sh: line 304: /db/bin/stopNetworkServer: No such file or directory
      ... see  for log output.
  8. If the test allowed you to reach RCyc’s main webpage, then the easier way to launch the rcyc server in the future is:
    • If using EC2, the following will auto-launch rcyc whenever you Start the instance after having Stopped it (because stopping an instance when you don’t need it saves on AWS usage fees). Create /etc/init/run-cyc.conf with this content,
      start on runlevel [2345]
      stop on runlevel [!2345]
      respawn
      script
           cd /home/ubuntu/researchcyc-4.0o/server/cyc/run/
           exec ./bin/run-cyc.sh -b
      end script

      and then do initctl start run-cyc

    • Otherwise, when running in the local vm… Instead of, ./bin/run-cyc.sh do this: setsid ./bin/run-cyc.sh -b “setsid” runs rcyc in a different Linux session than your terminal, so you can exit your terminal and the rcyc server will still run. The “-b” flag is Cyc’s own flag for telling it to run in the background.
  9. You might now want to configure the rcyc server to require a password.
  10. Start exploring in the web UI. For example,
    1. Type “hear” or “perceive” into the search box.
    2. Select #$hearsThat or #$perceivesThat from the autocomplete dropdown
    3. Scroll down the left pane to select Consequent or Antecedent. The right pane will show rules in which the predicate you selected is used in either an antecedent or consequent.

Many thanks to the ResearchCyc team in helping me get this far!

Cycorp offers guidance about how to explore cyc through this web interface.

Hopefully, someday they’ll offer rcyc as an Amazon Machine Instance (AMI), which would make these instructions much shorter!

Print Friendly, PDF & Email