To install the latest version of PyArrow from conda-forge using conda:
conda install -c conda-forge pyarrow
Install the latest version from PyPI:
pip install pyarrow
Currently there are only binary artifcats available for Linux and MacOS.
Otherwise this will only pull the python sources and assumes an existing
installation of the C++ part of Arrow.
To retrieve the binary artifacts, you’ll need a recent
pip version that
supports features like the
Building from source¶
First, clone the master git repository:
git clone https://github.com/apache/arrow.git arrow
Building pyarrow requires:
- A C++11 compiler
- Linux: gcc >= 4.8 or clang >= 3.5
- OS X: XCode 6.4 or higher preferred
You will need Python (CPython) 2.7, 3.4, or 3.5 installed. Earlier releases and are not being targeted.
This library targets CPython only due to an emphasis on interoperability with pandas and NumPy, which are only available for CPython.
The build requires NumPy, Cython, and a few other Python dependencies:
pip install cython cd arrow/python pip install -r requirements.txt
Installing Arrow C++ library¶
First, you should choose an installation location for Arrow C++. In the future using the default system install location will work, but for now we are being explicit:
Now, we build Arrow:
cd arrow/cpp mkdir dev-build cd dev-build cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. make # Use sudo here if $ARROW_HOME requires it make install
To get the optional Parquet support, you should also build and install parquet-cpp.
cd arrow/python # --with-parquet enables the Apache Parquet support in PyArrow # --with-jemalloc enables the jemalloc allocator support in PyArrow # --build-type=release disables debugging information and turns on # compiler optimizations for native code python setup.py build_ext --with-parquet --with-jemalloc --build-type=release install python setup.py install
On XCode 6 and prior there are some known OS X @rpath issues. If you are unable to import pyarrow, upgrading XCode may be the solution.
In development installations, you will also need to set a correct
LD_LIBRARY_PATH. This is most probably done with
In : import pyarrow In : pyarrow.from_pylist([1,2,3]) Out: <pyarrow.array.Int64Array object at 0x7f899f3e60e8> [ 1, 2, 3 ]