In PySLM, the slicing and hatching problem is inherently parallelisable simply because we can discretise areas the geometry into disrcete layers that for most situations behave independent. However, the actual underlying algorithms for slicing, offsetting the boundaries, clipping the hatch vectors is serial (single threaded). In order to significantly reduce the process time, multi-threaded processing is very desirable
Multi-threading in Python
Unfortunately, Python like most scripting or interpreter languages of the past are not inherently designed or destined to be multi-threaded. Perhaps, this may change in the future, but other scripting languages may fill this computational void (Rust, Julia, Go). Python, by intentions limits any multi-threaded use in scripts by using a construct known as the GIL – Global Interpreter Lock. This is a shared situation in other common scripting languages Matlab (ParPool), Javascript (Worker) where the parallel computing capability of multi-core CPUs cannot be exploited in a straightforward manner.
To some extent special distributions such as via the Anaconda distribution, core processing libraries such as numpy, scipy, scikit libraries internally are generally multi-threaded and support vectorisation via native CPU extensions. More computational mathematical operations and algorithms can to some extent be optimised to run in parallel automatically using numba, numexpr, and however, this cannot cover more broad multi-functional algorithms, such as those used in PySLM.
Python has the thread module for multi-threaded processing, however, for CPU bound processing it has very limited use This is because Python uses the global interpreter lock – GIL and this only allows one programming thread (i.e. one line) to be executed at any instance. It is mainly used for asynchronous IO (network or filesystem) operations which can be processed in the background.
Use of Multiprocessing Library in PySLM
The other option is to use the multiprocessing library built into the core Python distribution. Without going into too much of the formalities, multi-processing spawns multiple python processes and assign batches of work. The following programming article I found as a useful reference to the pitfalls of using the library.
In this implementation, the Pool
and Manager
modules are used to more optimally process the geometry. The most important section is to initialise the multiprocessing
library with the ‘spawn
‘ method, which stops random interruptions during the operation as discussed in the previous article.
from multiprocessing import Manager
from multiprocessing.pool import Pool
from multiprocessing import set_start_method
set_start_method("spawn")
The Manager.dict acts as a ‘proxy‘ object used to more efficiently store data which is shared between each process that is launched. Without using manager, for each process launch, a copy of the objects passed are made. This is important for the geometry or Part
object, which if it were to contain a lattice of a set ofs complex surface would become expensive to copy.
d = Manager().dict()
d['part'] = solidPart
d['layerThickness'] = layerThickness # [mm]
A Pool
object is used to create a set number of processes via setting the parameter processes=8
(typically one per CPU core). This is a fixed number re-used across a batch through the entire computation which removes the cost of additional copying and initialising many process instances. A series of z
slice levels are created representing the layer z-id. These are then merged into a list of tuple pairs with the Manager dictionary and is stored in processList
.
Pool.map
is used to perform the slice function (calculateLayer
) and collect all computed layers following the computation.
p = Pool(processes=8)
numLayers = solidPart.boundingBox[5] / layerThickness
z = np.arange(0, numLayers).tolist()
processList = list(zip([d] * len(z), z))
# Run the pro
layers = p.map(calculateLayer, processList)
The slicing function is fairly straightforward and just unpacks the arguments and performs the slicing and hatching operation. Note: each layer needs to currently initialise its own instance of a Hatcher
class because this is not shared across all the processes. This carries a small cost, but means each layer can process entirely independently; in this example the change is the hatchAngle
across layers. The layer position is calculated using the layer position (zid
) and layerThickness
.
def calculateLayer(input):
# Typically the hatch angle is globally rotated per layer by usually 66.7 degrees per layer
d = input[0]
zid= input[1]
layerThickness = d['layerThickness']
solidPart = d['part']
# Create a StripeHatcher object for performing any hatching operations
myHatcher = hatching.Hatcher()
# Set the base hatching parameters which are generated within Hatcher
layerAngleOffset = 66.7
myHatcher.hatchAngle = 10 + zid * 66.7
myHatcher.volumeOffsetHatch = 0.08
myHatcher.spotCompensation = 0.06
myHatcher.numInnerContours = 2
myHatcher.numOuterContours = 1
myHatcher.hatchSortMethod = hatching.AlternateSort()
#myHatcher.hatchAngle += 10
# Slice the boundary
geomSlice = solidPart.getVectorSlice(zid*layerThickness)
# Hatch the boundary using myHatcher
layer = myHatcher.hatch(geomSlice)
# The layer height is set in integer increment of microns to ensure no rounding error during manufacturing
layer.z = int(zid*layerThickness * 1000)
layer.layerId = int(zid)
return zid
The final step to use multiprocessing
in Python is the inclusion of the python __name__
guard i.e:
if __name__ == '__main__':
main()
The above is unfortunate because it makes debugging slightly more tedious in editors, but is the price for extra performance.
Performance Improvement
The performance improvement using the multiprocesssing
library is shown in the table below for a modest 4 core laptop (my budget doesn’t stretch that far).
This was performed on the examples/inversePyramid.stl geometry with an overall bounding box size [90 \times 90 \times 60]
mm, hatch distance h_d=0.08
mm and the layer thickness set at 40 μm.
Number of Processes | Run Time [s] |
---|---|
Base-line (Simple For loop) | 121 |
1 | 108 |
2 | 65.4 |
4 | 42 |
6 | 37.1 |
8 | 31.8 |
Given these are approximate timings, it is nearly linear performance improvement for the simple example. However, it can be seen choosing more processes beyond cores, does squeeze some extra performance out – perhaps due to Intel’s hyperthreading. Below shows that the CPU is fully utilised.
Conclusions:
This post shows how one can adapt existing routines to generate multi-processing slicing and hatching with PySLM. In the future, it is desirable to explore a more integrated class structure for hooking functions onto. Other areas that are of interest to explore are potentially the use of GPU computing to parallelise some of the fundamental algorithms.
Example
The example showing this that can be run is example_3d_multithread.py
in the github repo.