Skip to content

Add format and compression to COPY command#21

Merged
graingert merged 1 commit into
sqlalchemy-redshift:masterfrom
cpcloud:format-compression
Aug 27, 2015
Merged

Add format and compression to COPY command#21
graingert merged 1 commit into
sqlalchemy-redshift:masterfrom
cpcloud:format-compression

Conversation

@cpcloud

@cpcloud cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor

replaces #1

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

@graingert when you get a chance can you close #1? thanks!

what's needed to merge this PR?

  1. use sa.text.
  2. add a test

anything else?

Comment thread redshift_sqlalchemy/dialect.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this without string formatting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quoted values should definitely be implemented as SQL parameters

@graingert

Copy link
Copy Markdown
Member

needs a test, and needs some review for SQL injection possibilities

@graingert

Copy link
Copy Markdown
Member

I know it's not your code, but I think this PR is a good excuse to clean this function up a bit.

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

@graingert how do you feel about adding enum34 as a dependency so that we can have something like this:

import enum


class Format(enum.Enum):
    CSV = 0
    JSON = 1


class Compression(enum.Enum):
    GZIP = 0
    LZOP = 1
    # more as needed

@graingert

Copy link
Copy Markdown
Member

Could do something like assert format in ['JSON', 'CSV'] rather than an enum

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

It doesn't appear that there's a way to use bindparams with the session_token parameter:

In [1]: sa.text(':secret_key:session_token').bindparams(secret_key='a', session_token='b')
ArgumentError: This text() construct doesn't define a bound parameter named 'secret_key'

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

The format for credentials is:

aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>[;token=<temporary-session-token>]

@graingert

Copy link
Copy Markdown
Member

I think the credentials string will need to be formatted in python and bound as one single SQL parameter

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

fine by me

@graingert

Copy link
Copy Markdown
Member

looks nice, could do with some tests though. Don't worry about uploading files to S3 to the real redshift instance for now. Just some simple compiles with verification of the text produced:

You can use http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html#how-do-i-render-sql-expressions-as-strings-possibly-with-bound-parameters-inlined to do the compile including bound params.

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

i added test_format, test_compression, test_invalid_format and test_invalid_compression. What else do you want me to add?

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

Or rather, what else do you want me to test?

@graingert

Copy link
Copy Markdown
Member

Ah sorry I didn't spot the tests.

Comment thread redshift_sqlalchemy/dialect.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer pep257 dosctrings:

"""
Prepares a Redshift COPY statement.

...
"""

@graingert

Copy link
Copy Markdown
Member

This changes the interfaces (to something I'd prefer), so we'll have to get some consensus with other users of this code.

Another possibility is to rename this implementation as NewCopyCommand (name TBD) and adapt the old CopyCommand interface to call this NewCopyCommand

Comment thread redshift_sqlalchemy/dialect.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do a ctrl+F "self.null" on this page I don't find anything

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's because it's used in the visit_copy_command as element.null

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳 ah of course.

@graingert

Copy link
Copy Markdown
Member

@jklukas @thisfred @bouk: Thoughts?

@cpcloud

cpcloud commented Aug 24, 2015

Copy link
Copy Markdown
Contributor Author

I had no idea so many people were developing this library! Awesome.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change the way the credentials are passed in? This seems way more brittle

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't brittle, this simply fails fast with invalid data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably prefer taking the params as arguments, doing the format then doing the regex validation.

Eg: (Don't quote me on this)

def __init__(self, ..., access_key_id, secret_access_key, token=None):
    credentials = 'aws_access_key_id={access_key};aws_secret_access_key={secret_access_key}'.format(
        access_key=access_key_id,
        secret_access_key=secret_access_key,
    )
    if token is not None:
        credentials += ';token=' + token
    if not creds_rx.match(credentials):
        ...fail...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not just have them as seperate arguments like before? That's definitely more robust instead of this regex stuff

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me to have this as separate arguments

@cpcloud

cpcloud commented Aug 26, 2015

Copy link
Copy Markdown
Contributor Author

no.

you can't pass kwargs like that to sa.text

@cpcloud

cpcloud commented Aug 26, 2015

Copy link
Copy Markdown
Contributor Author

it doesn't work for the same reasons as stated above about escaping

@graingert

Copy link
Copy Markdown
Member

It shouldn't be escaped at all. It should come through as a literal null byte

@cpcloud

cpcloud commented Aug 26, 2015

Copy link
Copy Markdown
Contributor Author

Is there a place that we can chat? I feel like this is moving slower than it could.

@cpcloud

cpcloud commented Aug 26, 2015

Copy link
Copy Markdown
Contributor Author

sqla irc?

@graingert

Copy link
Copy Markdown
Member

Yup my contact details are here https://graingert.co.uk/foaf.rdf
On 26 Aug 2015 21:19, "Phillip Cloud" notifications@github.com wrote:

sqla irc?


Reply to this email directly or view it on GitHub
#21 (comment)
.

@cpcloud

cpcloud commented Aug 26, 2015

Copy link
Copy Markdown
Contributor Author

i'm in the sqla irc

@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

I still don't see why setting compiler.dialect._backslash_escapes to False and then resetting it back to the default for the dialect isn't the right solution here. No matter what the approach is (overriding compiler.render_literal_value, or a custom type), it will involve turning off _backslash_escapes and then turning it back to the default.

Can we please get this merged today. This is dragging on far too long.

@graingert

Copy link
Copy Markdown
Member

it will involve turning off _backslash_escapes and then turning it back to the default.

I will not accept anything that breaks the thread-safety of the query compiler. Let's just use .format() here

@bouk

bouk commented Aug 27, 2015

Copy link
Copy Markdown
Contributor

Also, setting private variables is a no-no

@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

Let's just use .format() here.

For everything, or just for null?

Also, setting private variables is a no-no

I'm aware of that.

I will not accept anything that breaks the thread-safety of the compiler.

Can you give an example of how this breaks thread safety? Is there no public state on the compiler that can be set by a user during compilation?

@graingert

Copy link
Copy Markdown
Member

Just for null, call the parameter "dangerous_null_delimiter=None" and only add the "NULL AS " if it's not None.

Thread 1 Thread 2
disable slashes
text_query = compile(obj)
text_query = compile(obj)
enable slashes
text_query is not escaped!

@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

Nice example.

call the parameter "dangerous_null_delimiter=None"

is that a joke?

@graingert

Copy link
Copy Markdown
Member

nope

@graingert graingert added this to the 1.0.0 milestone Aug 27, 2015
@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

Thanks, but that's unnecessary.

@graingert

Copy link
Copy Markdown
Member

Cool looks like this is ready to go, can you clean up your commits (squash and rebase as appropriate)?

@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

Sure

@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

@graingert Would you like me to squash further?

@graingert

Copy link
Copy Markdown
Member

probably best for one commit here I think?

@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

done

@graingert

Copy link
Copy Markdown
Member

@jklukas @thisfred @bouk: I'm going to merge this. If you've got any objections we can change them in another PR.

graingert added a commit that referenced this pull request Aug 27, 2015
Add format and compression to COPY command
@graingert graingert merged commit c302895 into sqlalchemy-redshift:master Aug 27, 2015
@cpcloud cpcloud deleted the format-compression branch August 27, 2015 15:24
@cpcloud

cpcloud commented Aug 27, 2015

Copy link
Copy Markdown
Contributor Author

@graingert thanks for your patience.

@graingert

Copy link
Copy Markdown
Member

@graingert no, thank you for your patience, feature and improvements to the code!

haleemur pushed a commit to haleemur/redshift_sqlalchemy that referenced this pull request Sep 2, 2015
…sion

Add format and compression to COPY command
@jklukas jklukas mentioned this pull request Mar 18, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants