Introduction

hwloc provides command line tools and a C API to obtain the
hierarchical map of key computing elements, such as: NUMA memory nodes,
shared caches, processor sockets, processor cores, processing units
(logical processors or "threads") and even I/O devices. hwloc also
gathers various attributes such as cache and memory information, and is
portable across a variety of different operating systems and platforms.
Additionally it may assemble the topologies of multiple machines into a
single one so as to let applications consult the topology of an entire
fabric or cluster at once.

hwloc primarily aims at helping high-performance computing (HPC)
applications, but is also applicable to any project seeking to exploit
code and/or data locality on modern computing platforms.

Note that the hwloc project represents the merger of the libtopology
project from inria and the Portable Linux Processor Affinity (PLPA)
sub-project from Open MPI. Both of these prior projects are now
deprecated. The first hwloc release was essentially a "re-branding" of
the libtopology code base, but with both a few genuinely new features
and a few PLPA-like features added in. Prior releases of hwloc included
documentation about switching from PLPA to hwloc; this documentation
has been dropped on the assumption that everyone who was using PLPA has
already switched to hwloc.

hwloc supports the following operating systems:
  * Linux (including old kernels not having sysfs topology information,
    with knowledge of cpusets, offline CPUs, ScaleMP vSMP, NumaScale
    NumaConnect, and Kerrighed support) on all supported hardware,
    including Intel Xeon Phi.
  * Solaris
  * AIX
  * Darwin / OS X
  * FreeBSD and its variants (such as kFreeBSD/GNU)
  * NetBSD
  * OSF/1 (a.k.a., Tru64)
  * HP-UX
  * Microsoft Windows
  * IBM BlueGene/Q Compute Node Kernel (CNK)

Since it uses standard Operating System information, hwloc's support is
mostly independant from the processor type (x86, powerpc, ...) and just
relies on the Operating System support. The only exception to this is
kFreeBSD, which does not support topology information, and hwloc thus
uses an x86-only CPUID-based backend (which can be used for other OSes
too, see the Components and plugins section).

To check whether hwloc works on a particular machine, just try to build
it and run lstopo or lstopo-no-graphics. If some things do not look
right (e.g. bogus or missing cache information), see Questions and Bugs
below.

hwloc only reports the number of processors on unsupported operating
systems; no topology information is available.

For development and debugging purposes, hwloc also offers the ability
to work on "fake" topologies:
  * Symmetrical tree of resources generated from a list of level
    arities
  * Remote machine simulation through the gathering of Linux sysfs
    topology files

hwloc can display the topology in a human-readable format, either in
graphical mode (X11), or by exporting in one of several different
formats, including: plain text, PDF, PNG, and FIG (see CLI Examples
below). Note that some of the export formats require additional support
libraries.

hwloc offers a programming interface for manipulating topologies and
objects. It also brings a powerful CPU bitmap API that is used to
describe topology objects location on physical/logical processors. See
the Programming Interface below. It may also be used to binding
applications onto certain cores or memory nodes. Several utility
programs are also provided to ease command-line manipulation of
topology objects, binding of processes, and so on.

Perl bindings are available from Bernd Kallies on CPAN.

Python bindings are available from Guy Streeter:
  * Fedora RPM and tarball.
  * git tree (html).

Installation

hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the
BSD license. It is hosted as a sub-project of the overall Open MPI
project (http://www.open-mpi.org/). Note that hwloc does not require
any functionality from Open MPI -- it is a wholly separate (and much
smaller!) project and code base. It just happens to be hosted as part
of the overall Open MPI project.

Nightly development snapshots are available on the web site.
Additionally, the code can be directly cloned from Git:
shell$ git clone https://github.com/open-mpi/hwloc.git
shell$ cd hwloc
shell$ ./autogen.sh

Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are
required when building from a Git clone.

Installation by itself is the fairly common GNU-based process:
shell$ ./configure --prefix=...
shell$ make
shell$ make install

The hwloc command-line tool "lstopo" produces human-readable topology
maps, as mentioned above. It can also export maps to the "fig" file
format. Support for PDF, Postscript, and PNG exporting is provided if
the "Cairo" development package (usually cairo-devel or libcairo2-dev)
can be found in "lstopo" when hwloc is configured and build.

The hwloc core may also benefit from the following development
packages:
  * libnuma for memory binding and migration support on Linux
    (numactl-devel or libnuma-dev package).
  * hwloc can use one of two different libraries for full I/O device
    discovery:
      1. libpciaccess (BSD). The relevant development package is
         usually libpciaccess-devel or libpciaccess-dev.
      2. libpci, from the pciutils package (GPL). The relevant
         development package is usually pciutils-devel or libpci-dev.
    On Linux, PCI discovery may still be performed even if none of the
    above libraries can be used.
  * the AMD OpenCL implementation for OpenCL device discovery.
  * the NVIDIA CUDA Toolkit for CUDA device discovery.
  * the NVIDIA Tesla Development Kit for NVML device discovery.
  * the NV-CONTROL X extension library (NVCtrl) for NVIDIA display
    discovery.
  * libxml2 for full XML import/export support (otherwise, the internal
    minimalistic parser will only be able to import XML files that were
    exported by the same hwloc release). See Importing and exporting
    topologies from/to XML files for details. The relevant development
    package is usually libxml2-devel or libxml2-dev.
  * libtool's ltdl library for dynamic plugin loading. The relevant
    development package is usually libtool-ltdl-devel or libltdl-dev.

PCI and XML support may be statically built inside the main hwloc
library, or as separate dynamically-loaded plugins (see the Components
and plugins section).

Note that because of the possibility of GPL taint (remember that hwloc
is BSD-licensed), hwloc's configure script will prefer libpciaccess to
the pciutils package. Indeed, if libpciaccess is not found, hwloc will
not use pciutils unless it is specifically requested via the
--enable-libpci flag is provided.

Also note that if you install supplemental libraries in non-standard
locations, hwloc's configure script may not be able to find them
without some help. You may need to specify additional CPPFLAGS,
LDFLAGS, or PKG_CONFIG_PATH values on the configure command line.

For example, if libpciaccess was installed into /opt/pciaccess, hwloc's
configure script may not find it be default. Try adding PKG_CONFIG_PATH
to the ./configure command line, like this:
./configure PKG_CONFIG_PATH=/opt/pciaccess/lib/pkgconfig ...

CLI Examples

On a 4-socket 2-core machine with hyperthreading, the lstopo tool may
show the following graphical output:

                              dudley.png

Here's the equivalent output in textual form:
Machine (16GB)
  Socket L#0 + L3 L#0 (4096KB)
 L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
   PU L#0 (P#0)
   PU L#1 (P#8)
 L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
   PU L#2 (P#4)
   PU L#3 (P#12)
  Socket L#1 + L3 L#1 (4096KB)
 L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
   PU L#4 (P#1)
   PU L#5 (P#9)
 L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
   PU L#6 (P#5)
   PU L#7 (P#13)
  Socket L#2 + L3 L#2 (4096KB)
 L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
   PU L#8 (P#2)
   PU L#9 (P#10)
 L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
   PU L#10 (P#6)
   PU L#11 (P#14)
  Socket L#3 + L3 L#3 (4096KB)
 L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
   PU L#12 (P#3)
   PU L#13 (P#11)
 L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
   PU L#14 (P#7)
   PU L#15 (P#15)

Finally, here's the equivalent output in XML. Long lines were
artificially broken for document clarity (in the real output, each XML
tag is on a single line), and only socket #0 is shown for brevity:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_index="0" cpuset="0x0000ffff"
   complete_cpuset="0x0000ffff" online_cpuset="0x0000ffff"
   allowed_cpuset="0x0000ffff"
   dmi_board_vendor="Dell Computer Corporation" dmi_board_name="0RD318"
   local_memory="16648183808">
 <page_type size="4096" count="4064498"/>
 <page_type size="2097152" count="0"/>
 <object type="Socket" os_index="0" cpuset="0x00001111" ... >
   <object type="Cache" cpuset="0x00001111" ...
       cache_size="4194304" depth="3" cache_linesize="64">
     <object type="Cache" cpuset="0x00000101" ...
         cache_size="1048576" depth="2" cache_linesize="64">
       <object type="Cache" cpuset="0x00000101" ...
           cache_size="16384" depth="1" cache_linesize="64">
         <object type="Core" os_index="0" ... >
           <object type="PU" os_index="0" cpuset="0x00000001"
               complete_cpuset="0x00000001" online_cpuset="0x00000001"
               allowed_cpuset="0x00000001"/>
           <object type="PU" os_index="8" cpuset="0x00000100"
               complete_cpuset="0x00000100" online_cpuset="0x00000100"
               allowed_cpuset="0x00000100"/>
         </object>
       </object>
     </object>
     <object type="Cache" cpuset="0x00001010" ...
         cache_size="1048576" depth="2" cache_linesize="64">
       <object type="Cache" cpuset="0x00001010"
           cache_size="16384" depth="1" cache_linesize="64">
         <object type="Core" os_index="1" cpuset="0x00001010" ... >
           <object type="PU" os_index="4" cpuset="0x00000010"
               complete_cpuset="0x00000010" online_cpuset="0x00000010"
               allowed_cpuset="0x00000010"/>
           <object type="PU" os_index="12" cpuset="0x00001000"
               complete_cpuset="0x00001000" online_cpuset="0x00001000"
               allowed_cpuset="0x00001000"/>
         </object>
       </object>
     </object>
   </object>
 </object>
 <!-- ...other sockets listed here ... -->
  </object>
</topology>

On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the
following graphical output:

                              hagrid.png

Here's the equivalent output in textual form:
Machine (32GB)
  NUMANode L#0 (P#0 8190MB) + Socket L#0
 L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
 L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
  NUMANode L#1 (P#1 8192MB) + Socket L#1
 L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
 L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
  NUMANode L#2 (P#2 8192MB) + Socket L#2
 L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
 L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
  NUMANode L#3 (P#3 8192MB) + Socket L#3
 L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
 L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)

And here's the equivalent output in XML. Similar to above, line breaks
were added and only PU #0 is shown for brevity:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_index="0" cpuset="0x000000ff"
   complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
   allowed_cpuset="0x000000ff" nodeset="0x000000ff"
   complete_nodeset="0x000000ff" allowed_nodeset="0x000000ff"
   dmi_board_vendor="TYAN Computer Corp" dmi_board_name="S4881 ">
 <page_type size="4096" count="0"/>
 <page_type size="2097152" count="0"/>
 <object type="NUMANode" os_index="0" cpuset="0x00000003" ...
     nodeset="0x00000001" ... local_memory="7514177536">
   <page_type size="4096" count="1834516"/>
   <page_type size="2097152" count="0"/>
   <object type="Socket" os_index="0" cpuset="0x00000003" ... >
     <object type="Cache" cpuset="0x00000001" ...
         cache_size="1048576" depth="2" cache_linesize="64">
       <object type="Cache" cpuset="0x00000001" ...
           cache_size="65536" depth="1" cache_linesize="64">
         <object type="Core" os_index="0" ... >
           <object type="PU" os_index="0" cpuset="0x00000001"
               complete_cpuset="0x00000001" online_cpuset="0x00000001"
               allowed_cpuset="0x00000001" nodeset="0x00000001"
               complete_nodeset="0x00000001" allowed_nodeset="0x00000001"/>
         </object>
       </object>
     </object>
  <!-- ...more objects listed here ... -->
</topology>

On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into
each socket):

                              emmett.png

Here's the same output in textual form:
Machine (16GB)
  Socket L#0
 L2 L#0 (4096KB)
   L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
   L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
 L2 L#1 (4096KB)
   L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
   L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
  Socket L#1
 L2 L#2 (4096KB)
   L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
   L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
 L2 L#3 (4096KB)
   L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
   L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)

And the same output in XML (line breaks added, only PU #0 shown):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_index="0" cpuset="0x000000ff"
   complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
   allowed_cpuset="0x000000ff" dmi_board_vendor="Dell Inc."
   dmi_board_name="0NR282" local_memory="16865292288">
 <page_type size="4096" count="4117503"/>
 <page_type size="2097152" count="0"/>
 <object type="Socket" os_index="0" cpuset="0x00000055" ... >
   <object type="Cache" cpuset="0x00000011" ...
       cache_size="4194304" depth="2" cache_linesize="64">
     <object type="Cache" cpuset="0x00000001" ...
         cache_size="32768" depth="1" cache_linesize="64">
       <object type="Core" os_index="0" ... >
         <object type="PU" os_index="0" cpuset="0x00000001"
             complete_cpuset="0x00000001" online_cpuset="0x00000001"
             allowed_cpuset="0x00000001"/>
       </object>
     </object>
     <object type="Cache" cpuset="0x00000010" ...
         cache_size="32768" depth="1" cache_linesize="64">
       <object type="Core" os_index="1" ... >
         <object type="PU" os_index="4" cpuset="0x00000010" ...
             complete_cpuset="0x00000010" online_cpuset="0x00000010"
             allowed_cpuset="0x00000010"/>
       </object>
     </object>
   </object>
  <!-- ...more objects listed here ... -->
</topology>

Programming Interface

The basic interface is available in hwloc.h. Some higher-level
functions are available in hwloc/helper.h to reduce the need to
manually manipulate objects and follow links between them.
Documentation for all these is provided later in this document.
Developers may also want to look at hwloc/inlines.h which contains the
actual inline code of some hwloc.h routines, and at this document,
which provides good higher-level topology traversal examples.

To precisely define the vocabulary used by hwloc, a Terms and
Definitions section is available and should probably be read first.

Each hwloc object contains a cpuset describing the list of processing
units that it contains. These bitmaps may be used for CPU binding and
Memory binding. hwloc offers an extensive bitmap manipulation interface
in hwloc/bitmap.h.

Moreover, hwloc also comes with additional helpers for interoperability
with several commonly used environments. See the Interoperability With
Other Software section for details.

The complete API documentation is available in a full set of HTML
pages, man pages, and self-contained PDF files (formatted for both both
US letter and A4 formats) in the source tarball in doc/doxygen-doc/.

NOTE: If you are building the documentation from a Git clone, you will
need to have Doxygen and pdflatex installed -- the documentation will
be built during the normal "make" process. The documentation is
installed during "make install" to $prefix/share/doc/hwloc/ and your
systems default man page tree (under $prefix, of course).

Portability

As shown in CLI Examples, hwloc can obtain information on a wide
variety of hardware topologies. However, some platforms and/or
operating system versions will only report a subset of this
information. For example, on an PPC64-based system with 32 cores (each
with 2 hardware threads) running a default 2.6.18-based kernel from
RHEL 5.4, hwloc is only able to glean information about NUMA nodes and
processor units (PUs). No information about caches, sockets, or cores
is available.

Similarly, Operating System have varying support for CPU and memory
binding, e.g. while some Operating Systems provide interfaces for all
kinds of CPU and memory bindings, some others provide only interfaces
for a limited number of kinds of CPU and memory binding, and some do
not provide any binding interface at all. Hwloc's binding functions
would then simply return the ENOSYS error (Function not implemented),
meaning that the underlying Operating System does not provide any
interface for them. CPU binding and Memory binding provide more
information on which hwloc binding functions should be preferred
because interfaces for them are usually available on the supported
Operating Systems.

Here's the graphical output from lstopo on this platform when
Simultaneous Multi-Threading (SMT) is enabled:

                          ppc64-with-smt.png

And here's the graphical output from lstopo on this platform when SMT
is disabled:

                         ppc64-without-smt.png

Notice that hwloc only sees half the PUs when SMT is disabled. PU #15,
for example, seems to change location from NUMA node #0 to #1. In
reality, no PUs "moved" -- they were simply re-numbered when hwloc only
saw half as many. Hence, PU #15 in the SMT-disabled picture probably
corresponds to PU #30 in the SMT-enabled picture.

This same "PUs have disappeared" effect can be seen on other platforms
-- even platforms / OSs that provide much more information than the
above PPC64 system. This is an unfortunate side-effect of how operating
systems report information to hwloc.

Note that upgrading the Linux kernel on the same PPC64 system mentioned
above to 2.6.34, hwloc is able to discover all the topology
information. The following picture shows the entire topology layout
when SMT is enabled:

                        ppc64-full-with-smt.png

Developers using the hwloc API or XML output for portable applications
should therefore be extremely careful to not make any assumptions about
the structure of data that is returned. For example, per the above
reported PPC topology, it is not safe to assume that PUs will always be
descendants of cores.

Additionally, future hardware may insert new topology elements that are
not available in this version of hwloc. Long-lived applications that
are meant to span multiple different hardware platforms should also be
careful about making structure assumptions. For example, there may
someday be an element "lower" than a PU, or perhaps a new element may
exist between a core and a PU.

API Example

The following small C example (named ``hwloc-hello.c'') prints the
topology of the machine and bring the process to the first logical
processor of the second core of the machine.
/* Example hwloc API program.
 *
 * Copyright (c) 2009-2010 inria.  All rights reserved.
 * Copyright (c) 2009-2011 Universiteeacute; Bordeaux 1
 * Copyright (c) 2009-2010 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
 *
 * hwloc-hello.c
 */

#include <hwloc.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>

static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
                        int depth)
{
 char string[128];
 unsigned i;

 hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0);
 printf("%*s%s\n", 2*depth, "", string);
 for (i = 0; i < obj->arity; i++) {
     print_children(topology, obj->children[i], depth + 1);
 }
}

int main(void)
{
 int depth;
 unsigned i, n;
 unsigned long size;
 int levels;
 char string[128];
 int topodepth;
 hwloc_topology_t topology;
 hwloc_cpuset_t cpuset;
 hwloc_obj_t obj;

 /* Allocate and initialize topology object. */
 hwloc_topology_init(&topology);

 /* ... Optionally, put detection configuration here to ignore
    some objects types, define a synthetic topology, etc....

    The default is to detect all the objects of the machine that
    the caller is allowed to access.  See Configure Topology
    Detection. */

 /* Perform the topology detection. */
 hwloc_topology_load(topology);

 /* Optionally, get some additional topology information
    in case we need the topology depth later. */
 topodepth = hwloc_topology_get_depth(topology);

 /*****************************************************************
  * First example:
  * Walk the topology with an array style, from level 0 (always
  * the system level) to the lowest level (always the proc level).
  *****************************************************************/
 for (depth = 0; depth < topodepth; depth++) {
     printf("*** Objects at level %d\n", depth);
     for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
          i++) {
         hwloc_obj_snprintf(string, sizeof(string), topology,
                    hwloc_get_obj_by_depth(topology, depth, i),
                    "#", 0);
         printf("Index %u: %s\n", i, string);
     }
 }

 /*****************************************************************
  * Second example:
  * Walk the topology with a tree style.
  *****************************************************************/
 printf("*** Printing overall tree\n");
 print_children(topology, hwloc_get_root_obj(topology), 0);

 /*****************************************************************
  * Third example:
  * Print the number of sockets.
  *****************************************************************/
 depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET);
 if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
     printf("*** The number of sockets is unknown\n");
 } else {
     printf("*** %u socket(s)\n",
            hwloc_get_nbobjs_by_depth(topology, depth));
 }

 /*****************************************************************
  * Fourth example:
  * Compute the amount of cache that the first logical processor
  * has above it.
  *****************************************************************/
 levels = 0;
 size = 0;
 for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
      obj;
      obj = obj->parent)
   if (obj->type == HWLOC_OBJ_CACHE) {
     levels++;
     size += obj->attr->cache.size;
   }
 printf("*** Logical processor 0 has %d caches totaling %luKB\n",
        levels, size / 1024);

 /*****************************************************************
  * Fifth example:
  * Bind to only one thread of the last core of the machine.
  *
  * First find out where cores are, or else smaller sets of CPUs if
  * the OS doesn`t have the notion of a "core".
  *****************************************************************/
 depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);

 /* Get last core. */
 obj = hwloc_get_obj_by_depth(topology, depth,
                hwloc_get_nbobjs_by_depth(topology, depth) - 1);
 if (obj) {
     /* Get a copy of its cpuset that we may modify. */
     cpuset = hwloc_bitmap_dup(obj->cpuset);

     /* Get only one logical processor (in case the core is
        SMT/hyperthreaded). */
     hwloc_bitmap_singlify(cpuset);

     /* And try to bind ourself there. */
     if (hwloc_set_cpubind(topology, cpuset, 0)) {
         char *str;
         int error = errno;
         hwloc_bitmap_asprintf(&str, obj->cpuset);
         printf("Couldn`t bind to cpuset %s: %s\n", str, strerror(error));
         free(str);
     }

     /* Free our cpuset copy */
     hwloc_bitmap_free(cpuset);
 }

 /*****************************************************************
  * Sixth example:
  * Allocate some memory on the last NUMA node, bind some existing
  * memory to the last NUMA node.
  *****************************************************************/
 /* Get last node. */
 n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NODE);
 if (n) {
     void *m;
     size = 1024*1024;

     obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, n - 1);
     m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset,
             HWLOC_MEMBIND_DEFAULT, 0);
     hwloc_free(topology, m, size);

     m = malloc(size);
     hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset,
             HWLOC_MEMBIND_DEFAULT, 0);
     free(m);
 }

 /* Destroy topology object. */
 hwloc_topology_destroy(topology);

 return 0;
}

hwloc provides a pkg-config executable to obtain relevant compiler and
linker flags. For example, it can be used thusly to compile
applications that utilize the hwloc library (assuming GNU Make):
CFLAGS += $(pkg-config --cflags hwloc)
LDLIBS += $(pkg-config --libs hwloc)
cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)

On a machine with 4GB of RAM and 2 processor sockets -- each socket of
which has two processing cores -- the output from running hwloc-hello
could be something like the following:
shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine(3938MB)
*** Objects at level 1
Index 0: Socket#0
Index 1: Socket#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine(3938MB)
  Socket#0
 Core#0
   PU#0
 Core#1
   PU#1
  Socket#1
 Core#3
   PU#2
 Core#2
   PU#3
*** 2 socket(s)
shell$

Questions and Bugs

Questions should be sent to the devel mailing list
(http://www.open-mpi.org/community/lists/hwloc.php). Bug reports should
be reported in the tracker (https://github.com/open-mpi/hwloc/issues).

If hwloc discovers an incorrect topology for your machine, the very
first thing you should check is to ensure that you have the most recent
updates installed for your operating system. Indeed, most of hwloc
topology discovery relies on hardware information retrieved through the
operation system (e.g., via the /sys virtual filesystem of the Linux
kernel). If upgrading your OS or Linux kernel does not solve your
problem, you may also want to ensure that you are running the most
recent version of the BIOS for your machine.

If those things fail, contact us on the mailing list for additional
help. Please attach the output of lstopo after having given the
--enable-debug option to ./configure and rebuilt completely, to get
debugging output. Also attach the /proc + /sys tarball generated by the
installed script hwloc-gather-topology when submitting problems about
Linux, or send the output of kstat cpu_info in the Solaris case, or the
output of sysctl hw in the Darwin or BSD cases.

History / Credits

hwloc is the evolution and merger of the libtopology
(http://runtime.bordeaux.inria.fr/libtopology/) project and the
Portable Linux Processor Affinity (PLPA)
(http://www.open-mpi.org/projects/plpa/) project. Because of functional
and ideological overlap, these two code bases and ideas were merged and
released under the name "hwloc" as an Open MPI sub-project.

libtopology was initially developed by the inria Runtime Team-Project
(http://runtime.bordeaux.inria.fr/) (headed by Raymond Namyst
(http://dept-info.labri.fr/~namyst/). PLPA was initially developed by
the Open MPI development team as a sub-project. Both are now deprecated
in favor of hwloc, which is distributed as an Open MPI sub-project.

Further Reading

The documentation chapters include
  * Terms and Definitions
  * Command-Line Tools
  * Environment Variables
  * CPU and Memory Binding Overview
  * I/O Devices
  * Multi-node Topologies
  * Object attributes
  * Importing and exporting topologies from/to XML files
  * Synthetic topologies
  * Interoperability With Other Software
  * Thread Safety
  * Components and plugins
  * Embedding hwloc in Other Software
  * Frequently Asked Questions

Make sure to have had a look at those too!
  __________________________________________________________________


 Generated on 2 Mar 2016 for Hardware Locality (hwloc) by  doxygen
 1.6.1
