如何实现与np.asarray()一致的数据类型强制转换,为不同大小的ndarray列表确定合适dtype
asarray() Type Promotion Logic for Pre-Allocated Buffers Great question! You're right that np.asarray() relies on the C API function PyArray_DiscoverDTypeAndShape() under the hood, but NumPy doesn't expose this exact function directly in its Python API. However, you can replicate its type promotion behavior perfectly using a built-in NumPy function, and avoid the extra data copy you're worried about. Here's how:
Step 1: Calculate the Target Dtype with np.result_type()
The key function you need is np.result_type() — this is exactly what NumPy uses internally to determine the "safe, expanded" dtype when combining arrays of different types. It follows the same type promotion rules as np.asarray() when given a list of arrays.
For your example:
import numpy as np # Sample dtype list matching your example dtypes = [np.uint16, np.uint8, np.uint16, np.uint8, np.uint16, np.uint16, np.uint16, np.uint16, np.int16, np.uint16] target_dtype = np.result_type(*dtypes) print(target_dtype) # Output: int32
Even better, you can pass the arrays directly to np.result_type() instead of extracting their dtypes first:
list_of_ndarrays = [np.array([1], dtype=np.uint16), np.array([2], dtype=np.uint8), ...] target_dtype = np.result_type(*list_of_ndarrays)
This will give you exactly the same dtype that np.asarray() would choose for the stacked array.
Step 2: Pre-Allocate Your Buffer with the Target Dtype
Now, in your unify_sizes() method, you can create the result buffer directly using target_dtype instead of guessing or using a generic dtype. For example, if you're unifying to the maximum dimensions across all arrays:
def unify_sizes(list_of_ndarrays, mode="max"): # Calculate unified shape (example for max mode) all_shapes = [arr.shape for arr in list_of_ndarrays] unified_shape = tuple(max(s) for s in zip(*all_shapes)) num_arrays = len(list_of_ndarrays) # Get the target dtype using NumPy's promotion logic target_dtype = np.result_type(*list_of_ndarrays) # Pre-allocate the result buffer result = np.empty((num_arrays,) + unified_shape, dtype=target_dtype) # Fill the buffer with resized/padded arrays (adjust padding logic as needed) for i, arr in enumerate(list_of_ndarrays): # Resize/pad arr to unified_shape, cast to target_dtype in-place if possible pad_width = [(0, us - s) for s, us in zip(arr.shape, unified_shape)] resized_arr = np.pad(arr, pad_width, mode="constant").astype(target_dtype, copy=False) result[i] = resized_arr return result
The copy=False in astype() ensures that if the array's dtype already matches target_dtype, no copy is made — which saves memory and time for large arrays.
Why This Works
np.result_type() implements the same type promotion hierarchy as PyArray_DiscoverDTypeAndShape(). It selects the smallest dtype that can safely represent all values from the input arrays, handling unsigned/signed mismatches, bit width differences, and floating-point vs integer conversions correctly. This is exactly what np.asarray() uses when stacking arrays of compatible shapes.
Avoiding Extra Copies
By pre-allocating the buffer with the correct dtype upfront, you eliminate the need to first resize arrays to a unified shape, then call np.asarray() (which would create another copy of the data). All operations happen directly in the target buffer, minimizing memory overhead — critical for large arrays.
内容的提问来源于stack exchange,提问作者hans_meine




