Data corruption when inserting OracleDataFrames that are views

Hi,

1. What versions are you using?
```py
con.version: 23.26.1.1.0
platform.platform: macOS-26.2-arm64-arm-64bit
sys.maxsize > 2**32: True
platform.python_version: 3.12.12
oracledb.__version__: 3.4.2
```

2. Is it an error or a hang or a crash?

Error

3. What error(s) or behavior you are seeing?

```py
import pandas as pd
import oracledb

with oracledb.connect(...) as con:
    cur = con.cursor()
    cur.execute('create private temporary table ora$ptt_mytest (a int) on commit drop definition')
    df = pd.DataFrame({'A': [0, 1]}).convert_dtypes(dtype_backend='pyarrow')
    df = df.iloc[1:2]
    print('Inserted:\n', df)
    cur.executemany('insert into ora$ptt_mytest (a) values (:a)', df)
    print('Selected:\n', pd.DataFrame.from_arrow(con.fetch_df_all('select * from ora$ptt_mytest')))
```

Both "inserted" and "selected" dataframes should be the same, but they are different
```
Inserted:
    A
1  1
Selected:
      A
0  0.0
```

We have isolated the problem to be only when the dataframe is some kind of view of another one.
Here we are creating a dataframe with 2 rows, and then calling `df.iloc[1:2]`, which selects only the second row.
But instead, oracledb inserted the first row.

This does not happen if we (i believe)
- don't call `.convert_dtypes()`, because then the Arrow PyCapsule interface creates new buffers
- round trip the dataframe e.g. through a file, so that we are sure it owns its data

This leads me to believe that when I call `df.iloc[1:2]`, the result will point to the same Arrow buffer with an offset, and this offset is ignored by oracledb (but length is not ignored, because only 1 row from the buffer is inserted).

We ran into this problem with manual batching, i.e.
```py
big_df: pd.DataFrame
batches = [big_df.iloc[i:i+1000] for i in range(0, len(big_df), 1000)]
for b in batches:
    cur.executemany('INSERT ...', b)
```
which kept inserting the first batch again and again.

This may be a problem with Pandas not correctly exporting Arrow PyCapsule, but probably no, because `polars.from_arrow(df)` gives the correct result.

4. Does your application call init_oracle_client()?

No, using thin mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data corruption when inserting OracleDataFrames that are views #574

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data corruption when inserting OracleDataFrames that are views #574

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions