포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

programing

포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

closeapi 2023. 5. 16. 22:40

포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

DataFrame.to _sql 메서드가 있지만 mysql, sqlite 및 oracle 데이터베이스에서만 작동합니다.이 메서드는 postgres 연결 또는 sqlalchemy 엔진으로 전달할 수 없습니다.

판다 0.14(2014년 5월 말 출시)부터 postgresql이 지원됩니다.그sql모듈이 현재 사용 중입니다.sqlalchemy다양한 데이터베이스 맛을 지원합니다.postgresql 데이터베이스에 대한 sqlalchemy 엔진을 전달할 수 있습니다(문서 참조).예:

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df.to_sql('table_name', engine)

판다에서 0.13.1 버전까지의 postgresql이 지원되지 않는 것이 맞습니다.만약 당신이 이전 버전의 판다를 사용해야 한다면, 여기 패치된 버전이 있습니다.pandas.io.sqlhttps://gist.github.com/jorisvandenbossche/10841234 .
제가 이것을 예전에 썼기 때문에 항상 작동한다고 완전히 보장할 수는 없지만, 그 근거는 거기에 있어야 합니다.)작업 디렉터리에 해당 파일을 넣고 가져올 경우 다음 작업을 수행할 수 있습니다(위치:conpostgresql 연결):

import sql  # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')

더 빠른 옵션:

다음 코드는 Pandas DF를 postgres DB에 df.to _sql 메서드보다 훨씬 빠르게 복사할 것이며 DF를 저장하는 데 중간 csv 파일이 필요하지 않습니다.

DB 사양에 따라 엔진을 만듭니다.

데이터 프레임(df)과 동일한 수의 열을 가진 테이블을 postgres DB에 만듭니다.

DF의 데이터는 당신의 사후 표에 삽입될 것입니다.

from sqlalchemy import create_engine
import psycopg2 
import io

만약 당신이 테이블을 교체하고 싶다면, 우리는 우리의 df의 헤더를 사용하여 normal to_sql 메소드로 교체한 다음 전체 빅타임을 소비하는 df를 DB에 로드할 수 있습니다.

engine = create_engine(
    'postgresql+psycopg2://username:password@host:port/database')

# Drop old table and create new empty table
df.head(0).to_sql('table_name', engine, if_exists='replace',index=False)

conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()
cur.close()
conn.close()

Pandas 0.24.0 이상 솔루션

Pandas 0.24.0에는 Postgres에 대한 빠른 쓰기를 위해 특별히 설계된 새로운 기능이 도입되었습니다.자세한 내용은 여기에서 확인할 수 있습니다. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

import csv
from io import StringIO

from sqlalchemy import create_engine

def psql_insert_copy(table, conn, keys, data_iter):
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)

이렇게 했어요.

사용하고 있기 때문에 더 빠를 수 있습니다.execute_batch:

# df is the dataframe
if len(df) > 0:
    df_columns = list(df)
    # create (col1,col2,...)
    columns = ",".join(df_columns)

    # create VALUES('%s', '%s",...) one '%s' per column
    values = "VALUES({})".format(",".join(["%s" for _ in df_columns])) 

    #create INSERT INTO table (columns) VALUES('%s',...)
    insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)

    cur = conn.cursor()
    psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
    conn.commit()
    cur.close()

인덱스를 사용하거나 사용하지 않고 사용자 정의 스키마의 테이블에 df를 더 빠르게 쓰는 방법:

"""
Faster way to write df to table.
Slower way is to use df.to_sql()
"""

from io import StringIO

from pandas import DataFrame
from sqlalchemy.engine.base import Engine


class WriteDfToTableWithIndexMixin:
    @classmethod
    def write_df_to_table_with_index(
            cls,
            df: DataFrame,
            table_name: str,
            schema_name: str,
            engine: Engine
    ):
        """
        Truncate existing table and load df into table.
        Keep each column as string to avoid datatype conflicts.
        """
        df.head(0).to_sql(table_name, engine, if_exists='replace',
                          schema=schema_name, index=True, index_label='id')

        conn = engine.raw_connection()
        cur = conn.cursor()
        output = StringIO()
        df.to_csv(output, sep='\t', header=False,
                  index=True, index_label='id')
        output.seek(0)
        contents = output.getvalue()
        cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output)
        conn.commit()


class WriteDfToTableWithoutIndexMixin:
    @classmethod
    def write_df_to_table_without_index(
            cls,
            df: DataFrame,
            table_name: str,
            schema_name: str,
            engine: Engine
    ):
        """
        Truncate existing table and load df into table.
        Keep each column as string to avoid datatype conflicts.
        """
        df.head(0).to_sql(table_name, engine, if_exists='replace',
                          schema=schema_name, index=False)

        conn = engine.raw_connection()
        cur = conn.cursor()
        output = StringIO()
        df.to_csv(output, sep='\t', header=False, index=False)
        output.seek(0)
        contents = output.getvalue()
        cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output)
        conn.commit()

df의 열에 JSON 값이 있는 경우 위의 방법은 여전히 모든 데이터를 올바르게 로드하지만 json 열에는 이상한 형식이 있습니다.그래서 그 json 열을 다음으로 변환합니다.::json오류가 발생할 수 있습니다.사용해야 합니다.to_sql().더하다method=multi속도를 높이고 추가하기 위해chunksize기계 동결 방지:

df.to_sql(table_name, engine, if_exists='replace', schema=schema_name, index=False, method='multi', chunksize=1000)

psycopg2를 사용하여 네이티브 sql 명령을 사용하여 데이터를 postgres 테이블에 쓸 수 있습니다.

import psycopg2
import pandas as pd

conn = psycopg2.connect("dbname='{db}' user='{user}' host='{host}' port='{port}' password='{passwd}'".format(
            user=pg_user,
            passwd=pg_pass,
            host=pg_host,
            port=pg_port,
            db=pg_db))
cur = conn.cursor()    
def insertIntoTable(df, table):
        """
        Using cursor.executemany() to insert the dataframe
        """
        # Create a list of tupples from the dataframe values
        tuples = list(set([tuple(x) for x in df.to_numpy()]))
    
        # Comma-separated dataframe columns
        cols = ','.join(list(df.columns))
        # SQL query to execute
        query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s,%%s)" % (
            table, cols)
    
        try:
            cur.executemany(query, tuples)
            conn.commit()

        except (Exception, psycopg2.DatabaseError) as error:
            print("Error: %s" % error)
            conn.rollback()
            return 1

Python 2.7 및 Pandas 0.24.2용 및 Psycopg2 사용

Psycopg2 연결 모듈

def dbConnect (db_parm, username_parm, host_parm, pw_parm):
    # Parse in connection information
    credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm}
    conn = psycopg2.connect(**credentials)
    conn.autocommit = True  # auto-commit each entry to the database
    conn.cursor_factory = RealDictCursor
    cur = conn.cursor()
    print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm))
    return conn, cur

데이터베이스에 연결

conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd)

데이터 프레임이 이미 df로 존재한다고 가정

output = io.BytesIO() # For Python3 use StringIO
df.to_csv(output, sep='\t', header=True, index=False)
output.seek(0) # Required for rewinding the String object
copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL ''  ESCAPE '\\' HEADER "  # Replace your table name in place of mem_info
cur.copy_expert(copy_query, output)
conn.commit()

엔진을 생성합니다(여기서 방언='포스트그레스' 또는 '스캐너' 등).):

from sqlalchemy import create_engine
engine = create_engine(f'{dialect}://{user_name}@{host}:{port}/{db_name}')
Session = sessionmaker(bind=engine) 

with Session() as session:
    df = pd.read_csv(path + f'/{file}') 
    df.to_sql('table_name', con=engine, if_exists='append',index=False)

언급URL : https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table

'programing' 카테고리의 다른 글

변수 값을 문자열에 삽입(문자열로 보간)하려면 어떻게 해야 합니까? (0)	2023.05.16
bash 스크립트를 실행하는 도커 진입점이 "권한 거부"를 수신함 (0)	2023.05.16
Git에 단계별 변경 사항만 저장하려면 어떻게 해야 합니까? (0)	2023.05.16
grep을 사용하여 폴더 내에서 단어를 찾으려면 어떻게 해야 합니까? (0)	2023.05.16
VBA 열에 값이 있는지 확인합니다. (0)	2023.05.16

현재글포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

각종 프로그래밍 정보를 다루는 블로그입니다.

SWIFT, C, ajax, Excel, mysql, mongodb, spring-boot, PowerShell, wordpress, jQuery, angularJS, ReactJS, PYTHON, asp.net, Android, Oracle, Git, MariaDB, json, sql-server,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

closeapi

포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

엔진을 생성합니다(여기서 방언='포스트그레스' 또는 '스캐너' 등).):

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

포스트 그레스 테이블에 데이터 프레임을 쓰는 방법

엔진을 생성합니다(여기서 방언='포스트그레스' 또는 '스캐너' 등).):

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바