Python interface for the LIKWID C API (https://github.com/RRZE-HPC/likwid)
Python interface for the C API of LIKWID (https://github.com/RRZE-HPC/likwid)
.. image:: https://travis-ci.com/RRZE-HPC/pylikwid.svg?branch=master :target: https://travis-ci.com/RRZE-HPC/pylikwid?branch=master
I added a setup.py script for the installation. It builds the C module and copies it to the proper destination.
::
$ git clone https://github.com/RRZE-HPC/pylikwid.git
$ cd pylikwid
# Build C interface
$ python setup.py build_ext -I <include path for likwid> -L <library path for likwid> -R <library path for likwid>
# Install module to the proper location
$ python setup.py install (--prefix=<where to install>)
# Testing
$ python -c "import pylikwid"
$ ./testlib.py
After import pylikwid
you can call the following functions:
pylikwid.markerinit()
: Initialize the Marker API of the LIKWID library.
Must be called previous to all other functions.pylikwid.markerthreadinit()
: Add the current thread to the Marker API.
Since Python is commonly single-threaded simply call it directly
after pylikwid.markerinit()
rr = pylikwid.registerregion(regiontag)
: Register a region to the
Marker API. This is an optional function to reduce the overhead of
region registration at pylikwid.markerstartregion
. If you don't call
pylikwid.registerregion(regiontag)
, the registration is done at
pylikwid.markerstartregion(regiontag)
. On success, 0 is return. If you
havn't called pylikwid.markerinit()
, a negative number is returned.err = pylikwid.markerstartregion(regiontag)
: Start measurements under
the name regiontag
. On success, 0 is return. If you havn't called
pylikwid.markerinit()
, a negative number is returned.err = pylikwid.markerstopregion(regiontag)
: Stop measurements under the
name regiontag
again. On success, 0 is return. If you havn't
called pylikwid.markerinit()
, a negative number is returned.num_events, events[], time, count = pylikwid.markergetregion(regiontag)
:
Get the intermediate results of the region identified by
regiontag
. On success, it returns the number of events in the
current group, a list with all the aggregated event results, the
measurement time for the region and the number of calls.pylikwid.nextgroup()
: Switch to the next event set in a
round-robin fashion. If you have set only one event set on the
command line, this function performs no operation.pylikwid.markerreset(regiontag)
: Reset the values stored using the region
name regiontag
. On success, 0 is returned.pylikwid.markerclose()
: Close the connection to the LIKWID Marker API
and write out measurement data to file. This file will be evaluated
by likwid-perfctr
.pylikwid.getprocessorid()
: Returns the ID of the currently
executing CPUpylikwid.pinprocess(cpuid)
: Pins the current process to the CPU
given as cpuid
.pylikwid.pinthread(cpuid)
: Pins the current thread to the CPU
given as cpuid
.pylikwid.inittopology()
: Initialize the topology module (reads in
system topology)
infodict = pylikwid.getcpuinfo()
: Return a dict with general
information about the system (CPU model, CPU family, ...)
osname
: Name of the CPU retrieved from the CPUID leafsname
: Name of the micro architectureshort_name
: Short name of the micro architecturefamily
: ID of the CPU familymodel
: Vendor-specific model number of the CPUstepping
: Stepping (Revision) of the CPUperf_version
: Version number of the hardware performance
monitoring capabilitiesperf_num_ctr
: Amount of general-purpose counter registers per
hardware threadperf_num_fixed_ctr
: Amount of fixed-purpose counter registers
per hardware threadperf_width_ctr
: Bit length of the counter registersclock
: CPU clock (only unequal to 0 if timer module is
initialized)turbo
: Is turbo mode supported?isIntel
: Is it an Intel CPU?supportUncore
: Does the system have performance monitoring
counters in the Uncore?features
: String with performance relevant CPU features (AVX,
SSE, ...)featureFlags
: Bitmask for all available CPU featurestopodict = pylikwid.getcputopology()
: Return a dict with the
topology of the system. Here is a list of fields in the dict:
numSockets
: Number of CPU sockets
numHWThreads
: Number of hardware threads (physical +
hyperthreading cores)
activeHWThreads
: Number of active/usable hardware threads
numCoresPerSocket
: Amount of hardware threads per CPU socket
numThreadsPerCore
: Amount of hardware threads assembled in
every physical CPU core
numCacheLevels
: Amount of levels in cacheing hierarchy
cacheLevels
: Dict with information about the cache levels,
keys are the levels (1, 2, 3,...)
level
: Level of the cache in the hierarchylineSize
: Size of a cache linesets
: Amount of setsinclusive
: Is the cache inclusive or exclusive?`threads
: Amount of threads attached to the cacheassociativity
: Associativity of the cachetype
: data (= data cache), unified = (data + instruction
cache)size
: Size of the cache in bytesthreadPool
: Dict with information about the hardware threads.
Keys are the os-generated ID of the hardware thread
coreId
: ID of the corresponding physical coreapicId
: ID set by the operating systemthreadId
: ID of the hardware thread in the physical corepackageId
: ID of the CPU socket hosting the hardware threadpylikwid.printsupportedcpus()
: Prints all supported micro
architecture names to stdout
pylikwid.finalizetopology()
: Delete all information in the
topology module
numadict = pylikwid.initnuma()
: Initialize the NUMA module and
return the gathered values
numberOfNodes
: Amount of NUMA nodes in the system
nodes
: Dict holding the information about the NUMA domains.
Keys are the NUMA domain IDs
id
: ID of the NUMA domain (should be equal to dict key)numberOfProcessors
: Number of hardware threads attached to
the NUMA domainprocessors
: List of all CPU IDs attached to the NUMA domainfreeMemory
: Amount of free memory in the NUMA domain (in
Kbytes)totalMemory
: Amount of total memory in the NUMA domain (in
Kbytes)numberOfDistances
: How many distances to self/other NUMA
domainsdistances
: List with distances, NUMA domain IDs are the
destination indexes in the listpylikwid.finalizenuma()
: Delete all information in the NUMA
module
affdict = pylikwid.initaffinity()
: Initialize the affinity domain
module and return the gathered values
numberOfAffinityDomains
: Amount of affinity domains
numberOfSocketDomains
: Amount of CPU socket related affinity
domains
numberOfNumaDomains
: Amount of NUMA related affinity domains
numberOfCacheDomains
: Amount of last level cache related
affinity domains
numberOfProcessorsPerSocket
: Amount of hardware threads per
CPU socket
numberOfCoresPerCache
: Amount of physical CPU cores per last
level cache
numberOfProcessorsPerCache
: Amount of hardware threads per
last level cache
domains
: Dict holding the information about the affinity
domains
tag
: Name of the affinity domain (N = node, SX = socket X,
CY = cache Y, MZ = memory domain Z)numberOfProcessors
: Amount of hardware threads in the
domainnumberOfCores
: Amount of physical CPU cores in the domainprocessorList
: List holding the CPU IDs in the domainpylikwid.finalizeaffinity()
: Delete all information in the
affinity domain module
pylikwid.cpustr_to_cpulist()
: Transform a valid cpu string in
LIKWID syntax into a list of CPU IDs
pylikwid.getcpuclock()
: Return the CPU clockt_start = pylikwid.startclock()
: Start the clock and return the
current timestampt_end = pylikwid.stopclock()
: Stop the clock and return the
current timestampt = pylikwid.getclock(t_start, t_end)
: Return the time in seconds
between t_start
and t_end
c = pylikwid.getclockcycles(t_start, t_end)
: Return the amount of
CPU cycles between t_start
and t_end
pylikwid.inittemp(cpu)
: Initialize the temperature module for CPU
cpu
pylikwid.readtemp(cpu)
: Read the current temperature of CPU
cpu
pinfo = pylikwid.getpowerinfo()
: Initializes the energy module
and returns gathered information. If it returns None
, there is no
energy support
minFrequency
: Minimal possible frequency of a CPU core
baseFrequency
: Base frequency of a CPU core
hasRAPL
: Are energy reading supported?
timeUnit
: Time unit
powerUnit
: Power unit
domains
: Dict holding the information about the energy
domains. Keys are PKG, PP0, PP1, DRAM
ID
: ID of the energy domainenergyUnit
: Unit to derive raw register counts to uJsupportInfo
: Is the information register available?tdp
: TDP of the domain (only if supportInfo == True)minPower
: Minimal power consumption by the domain (only if
supportInfo == True)maxPower
: Maximal power consumption by the domain (only if
supportInfo == True)maxTimeWindow
: Maximal time window between updates of the
energy registerssupportStatus
: Are energy readings from the domain are
possible?supportPerf
: Is power capping etc. available?supportPolicy
: Can we set a power policy for the domain?e_start = pylikwid.startpower(cpu, domainid)
: Return the start
value for a cpu for the domain with domainid
. The domainid
can be found in pinfo["domains"][domainname]["ID"]
e_stop = pylikwid.stoppower(cpu, domainid)
: Return the stop value
for a cpu for the domain with domainid
. The domainid
can be
found in pinfo["domains"][domainname]["ID"]
e = pylikwid.getpower(e_start, e_stop, domainid)
: Calculate the
uJ from the values retrieved by startpower
and stoppower
.
pylikwid.initconfiguration()
: Read in config file from different
places. Default is /etc/likwid.cfg
config = pylikwid.getconfiguration()
: Get the dict with the
configuration options
configFileName
: Path to the config filetopologyCfgFileName
: If a topology file was created with
likwid-genTopoCfg
and found by initconfiguration()
daemonPath
: Path to the access daemon executablegroupPath
: Path to the base directory with the performance
group filesdaemonMode
: Configured access mode (0=direct, 1=accessDaemon)maxNumThreads
: Maximal amount of hardware threads that can be
handled by LIKWIDmaxNumNodes
: Maximal amount of CPU sockets that can be handled
by LIKWIDpylikwid.destroyconfiguration()
: Destroy all information about
the configuration
pylikwid.hpmmode(mode)
: Set access mode. For x86 there are two
modes:
mode = 0
: Access the MSR and PCI devices directly. May require
root accessmode = 1
: Access the MSR and PCI devices through access daemon
instancespylikwid.hpminit()
: Initialize the access functions according to
the access mode
pylikwid.hpmaddthread(cpu)
: Add CPU cpu
to the access layer
(opens devices files or connection to an access daemon)
pylikwid.hpmfinalize()
: Unregister all CPUs from the access layer
and close files/connections
pylikwid.init(cpus)
: Initialize the perfmon module for the CPUs
given in list cpus
pylikwid.getnumberofthreads()
: Return the number of threads
initialized in the perfmon module
pylikwid.getnumberofgroups()
: Return the number of groups
currently registered in the perfmon module
pylikwid.getgroups()
: Return a list of all available groups. Each
list entry is a dict:
Name
: Name of the performance groupShort
: Short information about the performance groupLong
: Long description of the performance groupgid = pylikwid.addeventset(estr)
: Add a performance group or a
custom event set to the perfmon module. The gid
is required to
specify the event set later
pylikwid.getnameofgroup(gid)
: Return the name of the group
identified by gid
. If it is a custom event set, the name is set
to Custom
pylikwid.getshortinfoofgroup(gid)
: Return the short information
about a performance group
pylikwid.getlonginfoofgroup(gid)
: Return the description of a
performance group
pylikwid.getnumberofevents(gid)
: Return the amount of events in
the group
pylikwid.getnumberofmetrics(gid)
: Return the amount of derived
metrics in the group. Always 0 for custom event sets.
pylikwid.getnameofevent(gid, eidx)
: Return the name of the event
identified by gid
and the index in the list of events
pylikwid.getnameofcounter(gid, eidx)
: Return the name of the
counter register identified by gid
and the index in the list of
events
pylikwid.getnameofmetric(gid, midx)
: Return the name of a derived
metric identified by gid
and the index in the list of metrics
pylikwid.setup(gid)
: Program the counter registers to measure all
events in group gid
pylikwid.start()
: Start the counter registers
pylikwid.stop()
: Stop the counter registers
pylikwid.read()
: Read the counter registers (stop->read->start)
pylikwid.switch(gid)
: Switch to group gid
(stop->setup(gid)->start)
pylikwid.getidofactivegroup()
Return the gid
of the currently
configured group
pylikwid.getresult(gid, eidx, tidx)
: Return the raw counter
register result of all measurements identified by group gid
and
the indices for event eidx
and thread tidx
pylikwid.getlastresult(gid, eidx, tidx)
: Return the raw counter
register result of the last measurement cycle identified by group
gid
and the indices for event eidx
and thread tidx
pylikwid.getmetric(gid, midx, tidx)
: Return the derived metric
result of all measurements identified by group gid
and the
indices for metric midx
and thread tidx
pylikwid.getlastmetric(gid, midx, tidx)
: Return the derived
metric result of the last measurement cycle identified by group
gid
and the indices for metric midx
and thread tidx
pylikwid.gettimeofgroup(gid)
: Return the measurement time for
group identified by gid
pylikwid.finalize()
: Reset all used registers and delete internal
measurement results
pylikwid.markerreadfile(filename)
: Reads in the result file of an
application run instrumented by the LIKWID Marker APIpylikwid.markernumregions()
: Return the number of regions in an
application runpylikwid.markerregiontag(rid)
: Return the region tag for the
region identified by rid
pylikwid.markerregiongroup(rid)
: Return the group name for the
region identified by rid
pylikwid.markerregionevents(rid)
: Return the amount of events for
the region identified by rid
pylikwid.markerregionthreads(rid)
: Return the amount of threads
that executed the region identified by rid
pylikwid.markerregiontime(rid, tidx)
: Return the accumulated
measurement time for the region identified by rid
and the thread
index tidx
pylikwid.markerregioncount(rid, tidx)
: Return the call count for
the region identified by rid
and the thread index tidx
pylikwid.markerregionresult(rid, eidx, tidx)
: Return the call
count for the region identified by rid
, the event index eidx
and the thread index tidx
pylikwid.markerregionmetric(rid, midx, tidx)
: Return the call
count for the region identified by rid
, the metric index midx
and the thread index tidx
pylikwid.initgputopology()
: Initialize the topology module (reads in
system topology)
topolist = pylikwid.getgputopology()
: Return a list with the
GPU topology of the system. Each GPU is represented by a dict. The entries in
the dicts are:
devid
: Device identifier for the GPUnumaNode
: The NUMA node identifier the GPU is attached atname
: Name of the devicemem
: Memory capacity of the deviceccapMajor
: Major number of the compute capabilityccapMinor
: Minor number of the compute capabilitymaxThreadsDim[3]
: Maximum sizes of each dimension of a blockmaxGridSize[3]
: Maximum sizes of each dimension of a gridmaxThreadsPerBlock
: Maximam number of thread per blocksharedMemPerBlock
: Total amount of shared memory available per blocktotalConstantMemory
: Total amount of constant memory available on the devicesimdWidth
: SIMD width of arithmetic units = warp sizememPitch
: Maximum pitch allowed by the memory copy functions that involve memory regions allocated through cuMemAllocPitch()regsPerBlock
: Total number of registers available per blockclockRatekHz
: Clock frequency in kilohertztextureAlign
: Alignment requirementsurfaceAlign
: Alignment requirement for surfacesl2Size
: L2 cache in bytes. 0 if the device doesn't have L2 cachememClockRatekHz
: Peak memory clock frequency in kilohertzpciBus
: PCI bus identifier of the devicepciDev
: PCI device (also known as slot) identifier of the devicepciDom
: PCI domain identifier of the devicemaxBlockRegs
: Maximum number of 32-bit registers available to a thread blocknumMultiProcs
: Number of multiprocessors on the devicemaxThreadPerMultiProc
: Maximum resident threads per multiprocessormemBusWidth
: Global memory bus width in bitsunifiedAddrSpace
: 1 if the device shares a unified address space with the host, or 0 if notecc
: 1 if error correction is enabled on the device, 0 if error correction is disabled or not supported by the deviceasyncEngines
: Number of asynchronous enginesmapHostMem
: 1 if the device can map host memory into the CUDA address spaceintegrated
: 1 if the device is an integrated (motherboard) GPU and 0 if it is a discrete (card) componentpylikwid.finalizegputopology()
: Delete all information in the
topology module
pylikwid.nvinit(gpus)
: Initialize the nvmon module for the GPUs
given in list gpus
pylikwid.nvgetnumberofgpus()
: Return the number of GPUs
initialized in the nvmon module
pylikwid.nvgetnumberofgroups()
: Return the number of groups
currently registered in the nvmon module
pylikwid.nvgetgroups()
: Return a list of all available groups. Each
list entry is a dict:
Name
: Name of the performance groupShort
: Short information about the performance groupLong
: Long description of the performance groupgid = pylikwid.nvaddeventset(estr)
: Add a performance group or a
custom event set to the perfmon module. The gid
is required to
specify the event set later
pylikwid.nvgetnameofgroup(gid)
: Return the name of the group
identified by gid
. If it is a custom event set, the name is set
to Custom
pylikwid.nvgetshortinfoofgroup(gid)
: Return the short information
about a performance group
pylikwid.nvgetlonginfoofgroup(gid)
: Return the description of a
performance group
pylikwid.nvgetnumberofevents(gid)
: Return the amount of events in
the group
pylikwid.nvgetnumberofmetrics(gid)
: Return the amount of derived
metrics in the group. Always 0 for custom event sets.
pylikwid.nvgetnameofevent(gid, eidx)
: Return the name of the event
identified by gid
and the index in the list of events
pylikwid.nvgetnameofcounter(gid, eidx)
: Return the name of the
counter register identified by gid
and the index in the list of
events
pylikwid.nvgetnameofmetric(gid, midx)
: Return the name of a derived
metric identified by gid
and the index in the list of metrics
pylikwid.nvsetup(gid)
: Program the counter registers to measure all
events in group gid
pylikwid.nvstart()
: Start the counter registers
pylikwid.nvstop()
: Stop the counter registers
pylikwid.nvread()
: Read the counter registers (stop->read->start)
pylikwid.nvswitch(gid)
: Switch to group gid
(stop->setup(gid)->start)
pylikwid.nvgetidofactivegroup()
Return the gid
of the currently
configured group
pylikwid.nvgetresult(gid, eidx, tidx)
: Return the raw counter
register result of all measurements identified by group gid
and
the indices for event eidx
and thread tidx
pylikwid.nvgetlastresult(gid, eidx, tidx)
: Return the raw counter
register result of the last measurement cycle identified by group
gid
and the indices for event eidx
and thread tidx
pylikwid.nvgetmetric(gid, midx, tidx)
: Return the derived metric
result of all measurements identified by group gid
and the
indices for metric midx
and thread tidx
pylikwid.nvgetlastmetric(gid, midx, tidx)
: Return the derived
metric result of the last measurement cycle identified by group
gid
and the indices for metric midx
and thread tidx
pylikwid.nvgettimeofgroup(gid)
: Return the measurement time for
group identified by gid
pylikwid.nvfinalize()
: Reset all used registers and delete internal
measurement results
pylikwid.gpumarkerinit()
: Initialize the Nvmon Marker API of the LIKWID library.
Must be called previous to all other functions.rr = pylikwid.gpuregisterregion(regiontag)
: Register a region to the
Nvmon Marker API. This is an optional function to reduce the overhead of
region registration at pylikwid.markerstartregion
. If you don't call
pylikwid.gpumarkerregisterregion(regiontag)
, the registration is done at
pylikwid.gpumarkerstartregion(regiontag)
. On success, 0 is return. If you
havn't called pylikwid.gpumarkerinit()
, a negative number is returned.err = pylikwid.gpumarkerstartregion(regiontag)
: Start measurements under
the name regiontag
. On success, 0 is return. If you havn't called
pylikwid.gpumarkerinit()
, a negative number is returned.err = pylikwid.gpumarkerstopregion(regiontag)
: Stop measurements under the
name regiontag
again. On success, 0 is return. If you havn't
called pylikwid.gpumarkerinit()
, a negative number is returned.num_gpus, num_events, events[][], time[], count[] = pylikwid.gpumarkergetregion(regiontag)
:
Get the intermediate results of the region identified by
regiontag
. On success, it returns the number of events in the
current group, a list with all the aggregated event results per GPU, the
measurement time for the region and the number of calls.pylikwid.gpunextgroup()
: Switch to the next event set in a
round-robin fashion. If you have set only one event set on the
command line, this function performs no operation.pylikwid.gpumarkerreset(regiontag)
: Reset the values stored using the region
name regiontag
. On success, 0 is returned.pylikwid.gpumarkerclose()
: Close the connection to the LIKWID Nvmon Marker API
and write out measurement data to file. This file will be evaluated
by likwid-perfctr
.Code
Here is a small example Python script how to use the LIKWID Marker API
in Python:
::
#!/usr/bin/env python
import pylikwid
pylikwid.markerinit()
pylikwid.markerthreadinit()
liste = []
pylikwid.markerstartregion("listappend")
for i in range(0,1000000):
liste.append(i)
pylikwid.markerstopregion("listappend")
nr_events, eventlist, time, count = pylikwid.markergetregion("listappend")
for i, e in enumerate(eventlist):
print(i, e)
pylikwid.markerclose()
This code simply measures the hardware performance counters for
appending 1000000 elements to a list. First the API is initialized with
``likwid.init()`` and ``likwid.threadinit()``. Afterwards it creates an
empty list, starts the measurements with
``likwid.startregion("listappend")`` and executes the appending loop.
When the loop has finished, we stop the measurements again using
``likwid.stopregion("listappend")``. Just for the example, we get the
values inside our script using ``likwid.getregion("listappend")`` and
print out the results. Finally, we close the connection to the LIKWID
Marker API.
You always have to use ``likwid-perfctr`` to program the hardware
performance counters and specify the CPUs that should be measured. Since
Python is commonly single-threaded, the cpu set only contains one entry:
``likwid-perfctr -C 0 -g <EVENTSET> -m <PYTHONSCRIPT>`` This pins the
Python interpreter to CPU 0 and measures ``<EVENTSET>`` for all regions
in the Python script. You can set multiple event sets by adding multiple
``-g <EVENTSET>`` to the command line. Please see the LIKWID page for
further information how to use ``likwid-perfctr``. Link:
https://github.com/rrze-likwid/likwid
Example
Using the above Python script we can measure the L2 to L3 cache data volume:
::
$ likwid-perfctr -C 0 -g L3 -m ./test.py
--------------------------------------------------------------------------------
CPU name: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type: Intel Core Haswell processor
CPU clock: 3.39 GHz
--------------------------------------------------------------------------------
(0, 926208305.0)
(1, 325539316.0)
(2, 284626172.0)
(3, 1219118.0)
(4, 918368.0)
Wrote LIKWID Marker API output to file /tmp/likwid_17275.txt
--------------------------------------------------------------------------------
================================================================================
Group 1 L3: Region listappend
================================================================================
+-------------------+----------+
| Region Info | Core 0 |
+-------------------+----------+
| RDTSC Runtime [s] | 0.091028 |
| call count | 1 |
+-------------------+----------+
+-----------------------+---------+--------------+
| Event | Counter | Core 0 |
+-----------------------+---------+--------------+
| INSTR_RETIRED_ANY | FIXC0 | 9.262083e+08 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 3.255393e+08 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 2.846262e+08 |
| L2_LINES_IN_ALL | PMC0 | 1.219118e+06 |
| L2_TRANS_L2_WB | PMC1 | 9.183680e+05 |
+-----------------------+---------+--------------+
+-------------------------------+--------------+
| Metric | Core 0 |
+-------------------------------+--------------+
| Runtime (RDTSC) [s] | 0.09102752 |
| Runtime unhalted [s] | 9.596737e-02 |
| Clock [MHz] | 3.879792e+03 |
| CPI | 3.514753e-01 |
| L3 load bandwidth [MBytes/s] | 8.571425e+02 |
| L3 load data volume [GBytes] | 0.078023552 |
| L3 evict bandwidth [MBytes/s] | 6.456899e+02 |
| L3 evict data volume [GBytes] | 0.058775552 |
| L3 bandwidth [MBytes/s] | 1.502832e+03 |
| L3 data volume [GBytes] | 0.136799104 |
+-------------------------------+--------------+
At first a header with the current system type and clock is printed.
Afterwards the output of the Python script lists the results of the
measurements we got internally with likwid.getregion
. The next
output is the region results evaluated by likwid-perfctr
and prints
at first a headline stating the measured eventset, here L3
and the
region name listappend
. Afterwards 2 or 3 tables are printed. At
first some basic information about the region like run time (or better
measurement time) and the number of calls of the region. The next table
contains the raw values for each event in the eventset. These numbers
are similar to the ones we got internally with likwid.getregion
. If
you have set an performance group (here L3
) instead of a custom
event set, the raw results are derived to commonly used metrics, here
the CPI
(Cycles per instruction, lower is better) and different
bandwidths and data volumes. You can see, that the load bandwidth for
the small loop is 857 MByte/s and the evict (write) bandwidth is 645
MByte/s. In total we have a bandwidth of 1502 MByte/s.
Code
::
#!/usr/bin/env python
import pylikwid
liste = []
cpus = [0,1]
pylikwid.init(cpus)
group = pylikwid.addeventset("INSTR_RETIRED_ANY:FIXC0")
pylikwid.setup(group)
pylikwid.start()
for i in range(0,1000000):
liste.append(i)
pylikwid.stop()
for thread in range(0,len(cpus)):
print("Result CPU %d : %f" % (cpus[thread], pylikwid.getresult(group,0,thread)))
pylikwid.finalize()
Example
::
$ ./test.py
Result CPU 0 : 87335.000000
Result CPU 1 : 5222188.000000
Please be aware that Python is a high-level language and your simple
code is translated to a lot of Assembly instructions. The CPI
value
is commonly low (=> good) for high-level languages because they have to
perform type-checking and similar stuff that can be executed fast in
comparison to the CPU clock. If you would compare the results to a lower
level language like C or Fortran, the CPI
will be worse for them but
the performance will be higher as no type-checking and transformations
need to be done.