Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Data Truncation in CTE with Type Casting During Bulk Update #1093

Open
takaaa220 opened this issue Jan 1, 2025 · 7 comments
Open
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@takaaa220
Copy link
Contributor

takaaa220 commented Jan 1, 2025

Hello,

I have encountered an issue with data truncation caused by type casting for a bulk update with CTEs. Below is a minimal reproducible example and the SQL query generated by the ORM (this code is in internal/dbtest/pg_test.go ):

func TestBulkUpdate(t *testing.T) {
    type Model struct {
        ID   int64
        Name string `bun:",type:varchar(3)"`
    }

    db := pg(t)
    t.Cleanup(func() { db.Close() })

    mustResetModel(t, ctx, db, (*Model)(nil))

    models := []*Model{
        {ID: 1, Name: "foo"},
        {ID: 2, Name: "bar"},
    }

    _, err := db.NewInsert().Model(&models).Exec(ctx)
    require.NoError(t, err)

    models[0].Name = "abcd"

    res, err := db.NewUpdate().
        With("_data", db.NewValues(&models)).
        Model((*Model)(nil)).
        TableExpr("_data").
        Set("name = _data.name").
        Where("model.id = _data.id").
        Exec(ctx)
    require.NoError(t, err) // pass

    require.Equal(t, models[0].Name, updatedModels[0].Name) // failed (expected: abcd, actual: abc)
}

generated sql:

WITH "_data" ("id", "name") AS (
    VALUES (1::BIGINT, 'abcd'::varchar(3)), (2::BIGINT, 'bar'::varchar(3))
) 
UPDATE "models" AS "model" 
SET name = _data.name 
FROM _data 
WHERE (model.id = _data.id)

The issue occurs because of the type casting in the VALUES clause ('abcd'::varchar(3)), which results in data truncation ('abcd' becomes 'abc'). This behavior seems problematic for the following reasons:

  • The ORM silently truncates data without throwing any error or warning.
  • This could lead to unexpected data loss, especially if the developer is unaware of the database type constraints.
  • It does not align with the general expectation of ORMs preventing silent data corruption.

Could the ORM be enhanced to either:

  • Avoid type casting in CTEs when it's not strictly necessary?
  • Add a configuration option to handle this behavior explicitly?

Thank you for your time and consideration. Let me know if you need any further details or clarifications.

@takaaa220 takaaa220 changed the title [Question] Unexpected Data Truncation in CTE with Type Casting During Bulk Update Jan 1, 2025
@j2gg0s j2gg0s self-assigned this Jan 1, 2025
@j2gg0s
Copy link
Collaborator

j2gg0s commented Jan 1, 2025

If you want, you can disable explict casting by turn off feature.DoubleColonCast.
Link: https://github.com/uptrace/bun/blob/master/query_values.go#L221.

With version >= v1.2.7, we add new function WithoutFeature

@takaaa220
Copy link
Contributor Author

@j2gg0s
Thanks for your reply. (And, happy new year! )

Is there a solution for handling such cases where only specific columns should be cast while others should not?

For example, VARCHAR(N) columns would encounter the truncation issue if cast, so I want to avoid casting them. However, UUID columns must be cast; otherwise, they result in an error.

@j2gg0s
Copy link
Collaborator

j2gg0s commented Jan 2, 2025

If we only check a subset of type conversions, what criteria should we use to decide which subset to select?

If we check all possible type conversions, is it necessary? The logic may become overly complex and repetitive, as the database already returns errors for most invalid cases.

The two points above are not meant as challenges but rather as considerations that need to be weighed.
I don’t have a definitive answer to this and welcome any input or suggestions.

@takaaa220
Copy link
Contributor Author

Hmm, let me think about it for a bit.

@j2gg0s j2gg0s added the enhancement New feature or request label Jan 3, 2025
@j2gg0s
Copy link
Collaborator

j2gg0s commented Jan 3, 2025

Relate #503

@j2gg0s
Copy link
Collaborator

j2gg0s commented Jan 3, 2025

@j2gg0s Thanks for your reply. (And, happy new year! )

Is there a solution for handling such cases where only specific columns should be cast while others should not?

For example, VARCHAR(N) columns would encounter the truncation issue if cast, so I want to avoid casting them. However, UUID columns must be cast; otherwise, they result in an error.

Would constructing an Append function for UUIDs be a better option?

Are there any other similar data types that must also be converted?

@takaaa220
Copy link
Contributor Author

takaaa220 commented Jan 4, 2025

Would constructing an Append function for UUIDs be a better option?

I am not sure a good decision because I’m not yet fully familiar with the design of this project...

I think the goal here is to determine whether type casting is necessary. It depends on the column type and the database.
So, how about the structure so that the dialect is responsible for providing a function to handle type casting, and the dialect performs the casting only when necessary?

Are there any other similar data types that must also be converted?

I’m sorry, I’m not very knowledgeable about databases… (The only example I could think of was UUID.)

@j2gg0s j2gg0s added the help wanted Extra attention is needed label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants