Metadata-Version: 2.1
Name: pdal
Version: 3.2.2
Summary: Point cloud data processing
Home-page: https://pdal.io
Author: Howard Butler
Author-email: howard@hobu.co
Maintainer: Howard Butler
Maintainer-email: howard@hobu.co
License: BSD
Keywords: point cloud spatial
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: GIS
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: numpy

================================================================================
PDAL
================================================================================

PDAL Python support allows you to process data with PDAL into `Numpy`_ arrays.
It provides a PDAL extension module to control Python interaction with PDAL.
Additionally, you can use it to fetch `schema`_ and `metadata`_ from PDAL operations.

Installation
--------------------------------------------------------------------------------

PyPI
................................................................................

PDAL Python support is installable via PyPI:

.. code-block::

    pip install PDAL

GitHub
................................................................................

The repository for PDAL's Python extension is available at https://github.com/PDAL/python

Python support released independently from PDAL itself as of PDAL 1.7.

Usage
--------------------------------------------------------------------------------

Simple
................................................................................

Given the following pipeline, which simply reads an `ASPRS LAS`_ file and
sorts it by the ``X`` dimension:

.. _`ASPRS LAS`: https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html

.. code-block:: python


    json = """
    {
      "pipeline": [
        "1.2-with-color.las",
        {
            "type": "filters.sort",
            "dimension": "X"
        }
      ]
    }"""

    import pdal
    pipeline = pdal.Pipeline(json)
    count = pipeline.execute()
    arrays = pipeline.arrays
    metadata = pipeline.metadata
    log = pipeline.log

Programmatic Pipeline Construction
................................................................................

The previous example specified the pipeline as a JSON string. Alternatively, a
pipeline can be constructed by creating ``Stage`` instances and piping them
together. For example, the previous pipeline can be specified as:

.. code-block:: python

    pipeline = pdal.Reader("1.2-with-color.las") | pdal.Filter.sort(dimension="X")

Stage Objects
=============

- A stage is an instance of ``pdal.Reader``, ``pdal.Filter`` or ``pdal.Writer``.
- A stage can be instantiated by passing as keyword arguments the options
  applicable to the respective PDAL stage. For more on PDAL stages and their
  options, check the PDAL documentation on `Stage Objects <https://pdal.io/pipeline.html#stage-objects>`__.

  - The ``filename`` option of ``Readers`` and ``Writers`` as well as the ``type``
    option of ``Filters`` can be passed positionally as the first argument.
  - The ``inputs`` option specifies a sequence of stages to be set as input to the
    current stage. Each input can be either the string tag of another stage, or
    the ``Stage`` instance itself.
- The ``Reader``, ``Filter`` and ``Writer`` classes come with static methods for
  all the respective PDAL drivers. For example, ``pdal.Filter.head()`` is a
  shortcut for ``pdal.Filter(type="filters.head")``. These methods are
  auto-generated by introspecting ``pdal`` and the available options are
  included in each method's docstring:

.. code-block::

    >>> help(pdal.Filter.head)
    Help on function head in module pdal.pipeline:

    head(**kwargs)
        Return N points from beginning of the point cloud.

        user_data: User JSON
        log: Debug output filename
        option_file: File from which to read additional options
        where: Expression describing points to be passed to this filter
        where_merge='auto': If 'where' option is set, describes how skipped points should be merged with kept points in standard mode.
        count='10': Number of points to return from beginning.  If 'invert' is true, number of points to drop from the beginning.
        invert='false': If true, 'count' specifies the number of points to skip from the beginning.

Pipeline Objects
================

A ``pdal.Pipeline`` instance can be created from:

- a JSON string: ``Pipeline(json_string)``
- a sequence of ``Stage`` instances: ``Pipeline([stage1, stage2])``
- a single ``Stage`` with the ``Stage.pipeline`` method: ``stage.pipeline()``
- nothing: ``Pipeline()`` creates a pipeline with no stages.
- joining ``Stage`` and/or other ``Pipeline`` instances together with the pipe
  operator (``|``):

  - ``stage1 | stage2``
  - ``stage1 | pipeline1``
  - ``pipeline1 | stage1``
  - ``pipeline1 | pipeline2``

Every application of the pipe operator creates a new ``Pipeline`` instance. To
update an existing ``Pipeline`` use the respective in-place pipe operator (``|=``):

.. code-block:: python

    # update pipeline in-place
    pipeline = pdal.Pipeline()
    pipeline |= stage
    pipeline |= pipeline2

Reading using Numpy Arrays
................................................................................

The following more complex scenario demonstrates the full cycling between
PDAL and Python:

* Read a small testfile from GitHub into a Numpy array
* Filters the array with Numpy for Intensity
* Pass the filtered array to PDAL to be filtered again
* Write the final filtered array to a LAS file and a TileDB_ array
  via the `TileDB-PDAL integration`_ using the `TileDB writer plugin`_

.. code-block:: python

    import pdal

    data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true"

    pipeline = pdal.Reader.las(filename=data).pipeline()
    print(pipeline.execute())  # 1065 points

    # Get the data from the first array
    # [array([(637012.24, 849028.31, 431.66, 143, 1,
    # 1, 1, 0, 1,  -9., 132, 7326, 245380.78254963,  68,  77,  88),
    # dtype=[('X', '<f8'), ('Y', '<f8'), ('Z', '<f8'), ('Intensity', '<u2'),
    # ('ReturnNumber', 'u1'), ('NumberOfReturns', 'u1'), ('ScanDirectionFlag', 'u1'),
    # ('EdgeOfFlightLine', 'u1'), ('Classification', 'u1'), ('ScanAngleRank', '<f4'),
    # ('UserData', 'u1'), ('PointSourceId', '<u2'),
    # ('GpsTime', '<f8'), ('Red', '<u2'), ('Green', '<u2'), ('Blue', '<u2')])
    arr = pipeline.arrays[0]

    # Filter out entries that have intensity < 50
    intensity = arr[arr["Intensity"] > 30]
    print(len(intensity))  # 704 points

    # Now use pdal to clamp points that have intensity 100 <= v < 300
    pipeline = pdal.Filter.range(limits="Intensity[100:300)").pipeline(intensity)
    print(pipeline.execute())  # 387 points
    clamped = pipeline.arrays[0]

    # Write our intensity data to a LAS file and a TileDB array. For TileDB it is
    # recommended to use Hilbert ordering by default with geospatial point cloud data,
    # which requires specifying a domain extent. This can be determined automatically
    # from a stats filter that computes statistics about each dimension (min, max, etc.).
    pipeline = pdal.Writer.las(
        filename="clamped.las",
        offset_x="auto",
        offset_y="auto",
        offset_z="auto",
        scale_x=0.01,
        scale_y=0.01,
        scale_z=0.01,
    ).pipeline(clamped)
    pipeline |= pdal.Filter.stats() | pdal.Writer.tiledb(array_name="clamped")
    print(pipeline.execute())  # 387 points

    # Dump the TileDB array schema
    import tiledb
    with tiledb.open("clamped") as a:
        print(a.schema)

Executing Streamable Pipelines
................................................................................
Streamable pipelines (pipelines that consist exclusively of streamable PDAL
stages) can be executed in streaming mode via ``Pipeline.iterator()``. This
returns an iterator object that yields Numpy arrays of up to ``chunk_size`` size
(default=10000) at a time.

.. code-block:: python

    import pdal
    pipeline = pdal.Reader("test/data/autzen-utm.las") | pdal.Filter.range(limits="Intensity[80:120)")
    for array in pipeline.iterator(chunk_size=500):
        print(len(array))
    # or to concatenate all arrays into one
    # full_array = np.concatenate(list(pipeline))

``Pipeline.iterator()`` also takes an optional ``prefetch`` parameter (default=0)
to allow prefetching up to to this number of arrays in parallel and buffering
them until they are yielded to the caller.

If you just want to execute a streamable pipeline in streaming mode and don't
need to access the data points (typically when the pipeline has Writer stage(s)),
you can use the ``Pipeline.execute_streaming(chunk_size)`` method instead. This
is functionally equivalent to ``sum(map(len, pipeline.iterator(chunk_size)))``
but more efficient as it avoids allocating and filling any arrays in memory.

Accessing Mesh Data
................................................................................

Some PDAL stages (for instance ``filters.delaunay``) create TIN type mesh data.

This data can be accessed in Python using the ``Pipeline.meshes`` property, which returns a ``numpy.ndarray``
of shape (1,n) where n is the number of Triangles in the mesh.

If the PointView contains no mesh data, then n = 0.

Each Triangle is a tuple ``(A,B,C)`` where A, B and C are indices into the PointView identifying the point that is the vertex for the Triangle.

Meshio Integration
................................................................................

The meshes property provides the face data but is not easy to use as a mesh. Therefore, we have provided optional Integration
into the `Meshio <https://github.com/nschloe/meshio>`__ library.

The ``pdal.Pipeline`` class provides the ``get_meshio(idx: int) -> meshio.Mesh`` method. This
method creates a `Mesh` object from the `PointView` array and mesh properties.

.. note:: The meshio integration requires that meshio is installed (e.g. ``pip install meshio``). If it is not, then the method fails with an informative RuntimeError.

Simple use of the functionality could be as follows:

.. code-block:: python

    import pdal

    ...
    pl = pdal.Pipeline(pipeline)
    pl.execute()

    mesh = pl.get_meshio(0)
    mesh.write('test.obj')

Advanced Mesh Use Case
................................................................................

USE-CASE : Take a LiDAR map, create a mesh from the ground points, split into tiles and store the tiles in PostGIS.

.. note:: Like ``Pipeline.arrays``, ``Pipeline.meshes`` returns a list of ``numpy.ndarray`` to provide for the case where the output from a Pipeline is multiple PointViews

(example using 1.2-with-color.las and not doing the ground classification for clarity)

.. code-block:: python

    import pdal
    import psycopg2
    import io

    pl = (
        pdal.Reader(".../python/test/data/1.2-with-color.las")
        | pdal.Filter.splitter(length=1000)
        | pdal.Filter.delaunay()
    )
    pl.execute()

    conn = psycopg(%CONNNECTION_STRING%)
    buffer = io.StringIO

    for idx in range(len(pl.meshes)):
        m =  pl.get_meshio(idx)
        if m:
            m.write(buffer,  file_format = "wkt")
            with conn.cursor() as curr:
              curr.execute(
                  "INSERT INTO %table-name% (mesh) VALUES (ST_GeomFromEWKT(%(ewkt)s)",
                  { "ewkt": buffer.getvalue()}
              )

    conn.commit()
    conn.close()
    buffer.close()


.. _`Numpy`: http://www.numpy.org/
.. _`schema`: http://www.pdal.io/dimensions.html
.. _`metadata`: http://www.pdal.io/development/metadata.html
.. _`TileDB`: https://tiledb.com/
.. _`TileDB-PDAL integration`: https://docs.tiledb.com/geospatial/pdal
.. _`TileDB writer plugin`: https://pdal.io/stages/writers.tiledb.html

.. image:: https://github.com/PDAL/python/workflows/Build/badge.svg
   :target: https://github.com/PDAL/python/actions?query=workflow%3ABuild

Requirements
================================================================================

* PDAL 2.4+
* Python >=3.7
* Pybind11 (eg :code:`pip install pybind11[global]`)
* Numpy (eg :code:`pip install numpy`)
* scikit-build (eg :code:`pip install scikit-build`)


Changes
--------------------------------------------------------------------------------


3.2.2
................................................................................

* Implement move ctor to satisfy MSVC 2019 https://github.com/PDAL/python/commit/667f56bd0ee465f55a14636986e80b0a9cefcf14


3.2.1
................................................................................

* implement #129, add pandas DataFrame i/o for convenience by @hobu in
  https://github.com/PDAL/python/pull/130
* harden getMetadata and related calls from getting non-utf-8 'json'  by @hobu
  in https://github.com/PDAL/python/pull/140
* ignore DataFrame test if not GeoPandas, give up on Python 3.7 builds by @hobu
  in https://github.com/PDAL/python/pull/137

3.2.0
................................................................................

* PDAL base library 2.4.0+ is required

* CMake project name updated to pdal-python

* `srswkt2` property added to allow fetching of SRS info

* pip builds require cmake >= 3.11

* CMAKE_CXX_STANDARD set to c++17 to match PDAL 2.4.x

* Driver and options *actually* uses the library instead of
  shelling out to `pdal` application :)

* _get_json renamed to toJSON and made public

* Fix #119, 'json' optional kwarg put back for now

* DEVELOPMENT_COMPONENT in CMake FindPython skipped on OSX

* Make sure 'type' gets set when serializing to JSON

3.1.0
................................................................................

* **Breaking change** – pipeline.metadata now returns a dictionary from
  json.loads instead of a string.

* pipeline.quickinfo will fetch the PDAL preview() information for a data source.
  You can use this to fetch header or other information without reading data.
  https://github.com/PDAL/python/pull/109

* PDAL driver and option collection now uses the PDAL library directly rather
  than shelling out to the pdal command https://github.com/PDAL/python/pull/107

* Pipelines now support pickling for use with things like Dask
  https://github.com/PDAL/python/pull/110


3.0.0
................................................................................

* Pythonic pipeline creation https://github.com/PDAL/python/pull/91

* Support streaming pipeline execution https://github.com/PDAL/python/pull/94

* Replace Cython with PyBind11 https://github.com/PDAL/python/pull/102

* Remove pdal.pio module https://github.com/PDAL/python/pull/101

* Move readers.numpy and filters.python to separate repository https://github.com/PDAL/python/pull/104

* Miscellaneous refactorings and cleanups

2.3.5
................................................................................

* Fix memory leak https://github.com/PDAL/python/pull/74

* Handle metadata with invalid unicode by erroring https://github.com/PDAL/python/pull/74

2.3.0
................................................................................

* PDAL Python support 2.3.0 requires PDAL 2.1+. Older PDAL base libraries
  likely will not work.

* Python support built using scikit-build

* readers.numpy and filters.python are installed along with the extension.

* Pipeline can take in a list of arrays that are passed to readers.numpy

* readers.numpy now supports functions that return arrays. See
  https://pdal.io/stages/readers.numpy.html for more detail.

2.0.0
................................................................................

* PDAL Python extension is now in its own repository on its own release
  schedule at https://github.com/PDAL/python

* Extension now builds and works under PDAL OSGeo4W64 on Windows.