Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAV File Size Increases Significantly After Metadata Update and adding Column #278

Open
Sushants2789 opened this issue Nov 29, 2024 · 0 comments

Comments

@Sushants2789
Copy link

Description

When using pyreadstat to read a SAV file, add columns, and save back, the output file size increases substantially without a proportional increase in actual data content.

Detailed Reproduction Steps

Read SAV file with encoding detection
`import pyreadstat
import chardet

Encoding detection

with open(file_path, 'rb') as file:
raw_data = file.read(10000)
result = chardet.detect(raw_data)
detected_encoding = result['encoding'] or 'utf-8'

Read file

sav_df, meta = pyreadstat.read_sav(file_path, encoding=detected_encoding)

Add new columns

new_column_labels = meta.column_labels.copy()
new_value_labels = meta.variable_value_labels.copy()

Write file back

temp_sav_path = sav_file_path.replace('.sav', '_updated.sav')
pyreadstat.write_sav(
sav_df,
temp_sav_path,
column_labels=new_column_labels,
variable_value_labels=new_value_labels
)

Environment Details

pyreadstat version: 1.2.8
Platform: macOS
Python version: 3.9.18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant