Import into a database

Import comes either as a sql import reader or as objects to read tablels of data.

SQL Import

InputParser(sql_or_filename_or_stream=None, sql_terminator:str=';'):

Class to parse SQL input, either from file or from a stream (e.g., stdin) Assumes that all SQLs are terminated with an sql_terminator character (defaults to a semi-colon) at the end of a line.

Warning

This class may fail on multi-line string inputs that use the same character and the and of a CRLF.

Example:

import sys

from lwetl Jdbc, InputParser

jdbc = Jdbc('scott/tiger@osrv01')
with InputParser(sys.stdin) as parser:
    for sql in parser.parse():
        jdbc.execute(sql)
set_sql_terminator(sql_terminator):

Specifies the SQL terminator.

Parameters:sql_terminator (str) – The specified SQL terminator
open(sql_fn_stream=None):

Opens a TextIOWrapper for input

Parameters:sql_fn_stream (TextIOWrapper) – the stream to open
close():
Closes the input stream
parse(array_size=1) → iterator:

parses the input stream.

Parameters:array_size (int) – buffer size of the iterator
Returns:an iterator of SQL commands

Data Readers

class CsvImport(filename_or_stream=None, delimiter="t", encoding='utf-8')

Open a CSV file and extract data by row in the form of a dictionary Expects the first row if the dictionary to contain the column names.

Parameters:
  • filename_or_stream – if the argument is a string, the program will try to open the file with this name. For streams, it will use the stream as-is. Defaults to the stdin.
  • delimiter (str) – specifies the column delimiter of the CSV file. Defaulst to the TAB character.
  • encoding (str) – character encoding to use for the input file. Defaults to utf-8
open(filename_or_stream=None, delimiter=None, encoding=None)
Parameters:
  • filename_or_stream – specifies the input. If not specified, it takes the specifier when the class object was created.
  • delimiter (str) – specifies the column deliter. If not specified, it takes the delimiter specified when the object was created.
  • encoding (str) – specifies the character encoding. If not specified, it takes the encoding specified when the object was created.
close():

closes the input stream. Only has an effect, if the input was specified as a filename.

get_data(max_rows=1000) → iterator
Parameters:max_rows (int) – retrieve the data as an generator/iterator. The parameter specifies the buffer size.
class LdifImport(filename_or_stream=None, separator=None, encoding='utf-8')

Open a LDIF file and extract data as a dictionary with the attribute names as keys.

Parameters:
  • filename_or_stream – if the argument is a string, the program will try to open the file with this name. For streams, it will use the stream as-is. Defaults to the stdin.
  • separator (str) – in an ldif file, an attibure may occur multiple times in the same record. In such cases the value of the dictionary becomes a list. In the case the separator is specified, this list is transformed into a string, separating the elements with the specified separator.
  • encoding (str) – character encoding to use for the input file. Defaults to utf-8
open(filename_or_stream=None, separator=None, encoding=None)

opens the file or stream of input.

Parameters:
  • filename_or_stream – specifies the input. If not specified, it takes the specifier when the class object was created.
  • separator (str) – in an ldif file, an attibure may occur multiple times in the same record. In such cases the value of the dictionary becomes a list. In the case the separator is specified, this list is transformed into a string, separating the elements with the specified separator.
  • encoding (str) – specifies the character encoding. If not specified, it takes the encoding specified when the object was created.
close():

closes the input stream. Only has an effect, if the input was specified as a filename.

get_data() → iterator

retrieve the data as an generator/iterator. The parameter specifies the buffer size.

class XlsxImport(self, file_name: str, sheet_name: str = None)

Open an xls worksheet and extract the data by row in the form of a dictionary Expects the first row of the worksheet to contain the column names

open(file_name: str = None, sheet_name: str = None):
close()
get_data(max_rows=1000)

Examples

Import from the stdin in CSV format and upload in native query format (see next section).

import sys

from lwetl import Jdbc, CsvImport, NativeUploader

jdbc = Jdbc('scott')

with NativeUploader(jdbc,'TARGET_TABLE', commit_mode=lwetl.UPLOAD_MODE_COMMIT) as upl:
    # read CSV from stdin
    with CsvImport(sys.stdin) as csv:
        for r in csv.get_data():
            upl.insert(r)

Import from an excel 2007+ spreadsheet and upload using parameterized SQL syntax (see next section).

import sys

from lwetl import Jdbc, XlsxImport, ParameterUploader

jdbc = Jdbc('scott')

table = 'TARGET_TABLE'
# alternative to with statement
xls = XlsxImport()
xls.open(table + '.xlsx')
with ParameterUploader(jdbc,table, commit_mode=lwetl.UPLOAD_MODE_COMMIT) as upl:
    for r in xls.get_data():
        upl.insert(r)
        if upl.rowcount > 1000:
            upl.commit()
    if upl.rowcount > 0:
        upl.commit()
xls.close()

Upload models

Operational modes

Import into a database has the following modes of operation:

UPLOAD_MODE_DRYRUN
SQL statements are generated, but not send to the database.
UPLOAD_MODE_PIPE
SQL statements are generated and piped for futher processing. The database itself is not touched.
UPLOAD_MODE_ROLLBACK

SQL statements are generated and executed to the database. However, the commit statement performs a rollback instead.

Warning

This mode is not compatible with a database connection in auto-commit mode. It will also fail if the user sends commit commands independently.

UPLOAD_MODE_COMMIT
SQL statements are generated and executed to the database. However, the commit statement performs a rollback instead.

Classes

class NativeUploader(jdbc: Jdbc, table: str, fstream=None, commit_mode=UPLOAD_MODE_DRYRUN, exit_on_fail=True)

Upload data into a table with native SQL (no parameters in the jdbc execute command).

Parameters:
  • jdbc (Jdbc) – The target database connection
  • table (str) – Name of the table in the database to insert the data
  • fstream
  • commit_mode (str) – The upload mode, see Operational modes.
  • exit_on_fail (bool) – Clear the commit buffer and exit if an insert, update, or delete command fails.
insert(data: dict):

Insert into the table

Parameters:data (dict) – a dictionary of key (column name) and values. Keys, which do not correspond to an existing column names are ignored.
update(data: dict, where_clause):

Update an existing row in the table

Parameters:
  • data (dict) – a dictionary of key (column name) and values. Keys, which do not correspond to an existing column names are ignored.
  • where_clause (None,str,dict) –

    filter for column selection. Valid formats for the where clause are:

    None
    updates all columns.
    str
    raw SQL WHERE clause (the keyword WHERE may be omitted).
    dict
    keys are column names. Non exisiting column names are ignored. Multiple columns are combined with the AND statement. The value may be:
    • a value (results in COLUMN_NAME = VALUE)
    • a string with an operator and value, e.g., LIKE 'ABC%'
    • a tuple (operator,value), e.g., ('>=', 7)
delete(where_clause):

Delete rows in the table

Parameters:where_clause (None,str,dict) – filter for the columns to delete. Formats are identical to the update statement.
commit()

Processes previous insert/update/delete statements depending on the Operational modes of the instance.

UPLOAD_MODE_COMMIT
sends a commit statement to the database
UPLOAD_MODE_ROLLBACK
sends a rollback statement to the database
UPLOAD_MODE_DRYRUN
does nothing
UPLOAD_MODE_PIPE
work in progress

Warning

This mode is not compatible with a database connection in auto-commit mode. It will also fail if the user sends commit commands independently.

add_counter(columns: (str, list, set, tuple)):

Mark columns as counters. Assumes the column type is a number. Queries the maximum number of each column and then adds the next value (+1) in the column on each insert.

Parameters:columns (str,list,set,tuple) – names of the columns to add. May be a (comma-separated) string, or a list type.
class ParameterUploader(self, jdbc: Jdbc, table: str, fstream=None, commit_mode=UPLOAD_MODE_DRYRUN, exit_on_fail=True)

Upload data into a table using parameterized SQL commands. See the section NativeUploader for details on the command line arguments.

insert(data: dict):

Insert into the table, see the NativeUploader for details.

update(data: dict, where_clause):

Update an existing row in the table, see the NativeUploader for details.

delete(where_clause):

Delete existing rows from the table, see the NativeUploader for details.

commit()

Processes previous insert/update/delete statements depending on the Operational modes of the instance. See the NativeUploader for details

add_counter(columns: (str, list, set, tuple)):

Mark columns as counters. Assumes the column type is a number. Queries the maximum number of each column and then adds the next value (+1) in the column on each insert. See the NativeUploader for details

class MultiParameterUploader(jdbc: Jdbc, table: str, fstream=None, commit_mode=UPLOAD_MODE_DRYRUN, exit_on_fail=True)

Upload data into a table using the jdbc executemany parameterized command.

insert(data: dict):

Insert into the table, see the NativeUploader for details.

commit()

Processes previous insert/update/delete statements depending on the Operational modes of the instance. See the NativeUploader for details

add_counter(columns: (str, list, set, tuple)):

Mark columns as counters. Assumes the column type is a number. Queries the maximum number of each column and then adds the next value (+1) in the column on each insert. See the NativeUploader for details