Opensource

Paglo Crawler

Paglo is now available for everyone! And you can set up an account for FREE. Download the open source Paglo Crawler, activate your account, and our systems will instantly set you up (with no human intervention).

Once you complete the download, you can add or remove plugins that extend your Crawler, or utilize our code to build your own Crawler agents. The Paglo Crawler runs on Windows XP, Vista and 2003 Server. See below for Linux support.

A Linux Crawler is also available

Overview

The Paglo Crawler is an open source supersearcher — an agent that probes your network for devices and other IT assets, and discovers everything about them. The Crawler is part of Paglo, the first search engine for IT, a tool that specializes in searching the complex and varied data of IT networks, and in returning intelligent data in both simple text and rich quantitative form. The data that the Crawler finds is visible through a secure Paglo Web account. A single Paglo Crawler can be installed to probe an entire enterprise network.

New_crawler_screenshot Paglo finds what no other search engine can find: IT assets. That includes the characteristics of devices on the network, such as type, configuration, and what other devices are connected. That kind of information about a network isn't collected neatly into documents that you can hunt for. If it was, Google could find it. But you know how it is — as soon as you document something in IT, it changes!

Paglo finds this kind of information by sending the Crawler through the network and communicating with each device. No need for hours of manual data entry to document. And Paglo works continuously. The Paglo Crawler does its crawling over and over again, which keeps the data it collects fresh. It automatically self-updates.

The Paglo Crawler stores the information that it gathers throughout your network in your private Paglo Search Index, in a separate and unassailable location that only you have access to. When you log in to your Paglo account, you get all the data that's yours, and only the data that's yours. And you're the only one who has access to it. Your login is the key — no one else can see your data unless you give them your login password.

We have also made the Crawler extendable so you can develop your own plugins to collect additional IT data or use existing add-ons that are available to the community.

Features of the Paglo Crawler:

  • Open Source — Anyone can download the source code from our Subversion repository
  • Extendable — We encourage you to develop plugins to extend the type of IT data that the Crawler can capture
  • Powerful — Employs unique scanning and probing techniques for discovering devices and other IT assets, and identifying their unique characteristics
  • Easy — Installs in minutes via free download, and provides rich and sophisticated data visible in a browser through a secure Web account

Linux Instructions

You can download a .deb package here for i386/AMD based systems or you can build the crawler from the public subversion repository at: http://svn.paglo.com/paglo_open_source/crawler/trunk/.

Configuring

The Paglo Crawler for Linux uses the same configuration file as the Crawler for Windows does. However, it is not set up as automatically as it is with the Window's version's UI.

The installer for the crawler installs in to:

/opt/paglo/paglo_crawler/
All of its support files, log files, configuration files, and state files are installed there.

It also installs a startup script and logrotate script in:

/etc/init.d/paglo_crawler
and
/etc/logrotate.d/paglo_crawler

A sample configuration file is provided at:

/opt/paglo/paglo_crawler/etc/crawler.conf-example
Please consult this example file for instructions on how to configure the crawler. You will need to edit the configuration as the one provided out of the box will not work by default. At a minimum you will need to set the company data key for the crawler to use.

If the CLI is not enabled in the configuration file then the Paglo Crawler will run as a daemon.

Building

The instructions provided below refer to Linux systems.

Prerequisites

The following libraries and their header files must be installed to successfully build the Paglo Crawler:

The three packages listed above should be available with most common Linux distributions. Installing these packages is sufficient. However, they can be installed from source obtained from the listed website if necessary.

Additionally, the Paglo Crawler requires the two following packages:

Notes on building these to work with the Paglo Crawler follows.

Building SNMP++ 3.x

In general, building SNMP++ isn't very difficult.

  1. Download the tarball from the website listed above and unpack it.
  2. Change to the "snmp++" directory that is extracted.
  3. Change to the "src" directory and locate the Makefile most appropriate for your platform (e.g., Makefile.linux for Linux).
  4. Open the Makefile and add the following defines to the "COPTIONS" line:
    -DSNMP_PP_NAMESPACE -D_USE_OPENSSL
  5. Compile by running make -f <Makefile name> (e.g., make -f Makefile.linux)

Once compiliation has completed you can run "make install" to install the libraries and header files, so that they are accessible system-wide. However, it's not necessary to do so. If you opt not to install SNMP++, you will need to specify the locations of the libraries and header files when running the paglo crawler's configure script. Use the --with-snmp-libs and --with-snmp-headers options to do this.

Building gSOAP

Building gSOAP is much simpler than building SNMP++. Just download the tarball from the gSOAP website, change to the directory that is extracted and run ./configure && make. After compilation has completed you can then run make install if you'd like the libraries, headers, and binaries to be available system-wide.

Otherwise you can use the --with-gsoap-imports, --with-gsoap-libs, --with-gsoap-headers, --with-wsdl2h-path and --with-soapcpp2-path options to specify their locations when running paglo crawler's configure script.

Building the Paglo Crawler

Once the pre-requisites are out of the way, building the paglo crawler is as easy as running configure and telling it where all the files it needs are located.

This can be done by using the following options:

--with-pcap-libs=DIRPath to libpcap
--with-pcap-headers=DIR Path to pcap.h
--with-openssl-libs=DIRPath to OpenSSL libraries
--with-snmp-libs=DIRPath to SNMP++ library
--with-snmp-headers=DIRPath to SNMP++ headers
--with-gsoap-imports=DIRDirectory containing gSOAP import headers
--with-gsoap-libs=DIRPath to gSOAP libraries
--with-gsoap-headers=DIRPath to gSOAP headers
--with-wsdl2h-path=DIRPath to gSOAP's wsdl2h program
--with-soapcpp2-path=DIRPath to gSOAP's soapcpp2 program
--with-ruby-libs=DIRPath to libruby
--with-ruby-headers=DIRPath to ruby.h

For example

./configure --with-gsoap-imports=/gsoap-2.7/soapcpp2/import \
--with-ruby-headers=/usr/lib/ruby/1.8/i386-linux-gnu

After the configure script has completed, compile the crawler by running 'make'. Once compilation has finished, you can install it by becoming root and running 'make install'. The Paglo Crawler will be installed in $PREFIX/sbin (e.g., /usr/local/sbin if --prefix wasn't used).

Running

To run the crawler you need to execute it as root, and tell it where its configuration file is. For examle:

/opt/paglo/paglo_crawler/sbin/paglo_crawler -c /opt/paglo/paglo_crawler/etc/crawler.conf
The crawler must be run as root so that it can open a network interface in promiscuous mode.

By default the crawler will run as a unix daemon. You can run it in interactive mode by specififying

enable_cli=true
in the crawler's configuration file. Please refer to the README file that is part of the paglo crawler's source code for what you can do via the crawler's CLI.