Import into a database¶
Import comes either as a sql import reader or as objects to read tablels of data.
SQL Import¶
-
InputParser(sql_or_filename_or_stream=None, sql_terminator:str=';'):
Class to parse SQL input, either from file or from a stream (e.g., stdin) Assumes that all SQLs are terminated with an sql_terminator character (defaults to a semi-colon) at the end of a line.
Warning
This class may fail on multi-line string inputs that use the same character and the and of a CRLF.
Example:
import sys from lwetl Jdbc, InputParser jdbc = Jdbc('scott/tiger@osrv01') with InputParser(sys.stdin) as parser: for sql in parser.parse(): jdbc.execute(sql)
-
set_sql_terminator(sql_terminator):
Specifies the SQL terminator.
Parameters: sql_terminator (str) – The specified SQL terminator -
open(sql_fn_stream=None):
Opens a TextIOWrapper for input
Parameters: sql_fn_stream (TextIOWrapper) – the stream to open
-
close():
-
Closes the input stream
-
parse
(array_size=1) → iterator:¶ parses the input stream.
Parameters: array_size (int) – buffer size of the iterator Returns: an iterator of SQL commands
-
Data Readers¶
-
class
CsvImport
(filename_or_stream=None, delimiter="t", encoding='utf-8')¶ Open a CSV file and extract data by row in the form of a dictionary Expects the first row if the dictionary to contain the column names.
Parameters: - filename_or_stream – if the argument is a string, the program will try to open the file with this name. For streams, it will use the stream as-is. Defaults to the stdin.
- delimiter (str) – specifies the column delimiter of the CSV file. Defaulst to the TAB character.
- encoding (str) – character encoding to use for the input file. Defaults to utf-8
-
open
(filename_or_stream=None, delimiter=None, encoding=None)¶ Parameters: - filename_or_stream – specifies the input. If not specified, it takes the specifier when the class object was created.
- delimiter (str) – specifies the column deliter. If not specified, it takes the delimiter specified when the object was created.
- encoding (str) – specifies the character encoding. If not specified, it takes the encoding specified when the object was created.
-
close():
closes the input stream. Only has an effect, if the input was specified as a filename.
-
get_data
(max_rows=1000) → iterator¶ Parameters: max_rows (int) – retrieve the data as an generator/iterator. The parameter specifies the buffer size.
-
class
LdifImport
(filename_or_stream=None, separator=None, encoding='utf-8')¶ Open a LDIF file and extract data as a dictionary with the attribute names as keys.
Parameters: - filename_or_stream – if the argument is a string, the program will try to open the file with this name. For streams, it will use the stream as-is. Defaults to the stdin.
- separator (str) – in an ldif file, an attibure may occur multiple times in the same record. In such cases the value of the dictionary becomes a list. In the case the separator is specified, this list is transformed into a string, separating the elements with the specified separator.
- encoding (str) – character encoding to use for the input file. Defaults to utf-8
-
open
(filename_or_stream=None, separator=None, encoding=None)¶ opens the file or stream of input.
Parameters: - filename_or_stream – specifies the input. If not specified, it takes the specifier when the class object was created.
- separator (str) – in an ldif file, an attibure may occur multiple times in the same record. In such cases the value of the dictionary becomes a list. In the case the separator is specified, this list is transformed into a string, separating the elements with the specified separator.
- encoding (str) – specifies the character encoding. If not specified, it takes the encoding specified when the object was created.
-
close():
closes the input stream. Only has an effect, if the input was specified as a filename.
-
get_data
() → iterator¶ retrieve the data as an generator/iterator. The parameter specifies the buffer size.
-
class
XlsxImport
(self, file_name: str, sheet_name: str = None)¶ Open an xls worksheet and extract the data by row in the form of a dictionary Expects the first row of the worksheet to contain the column names
-
open(file_name: str = None, sheet_name: str = None):
-
close
()¶
-
get_data
(max_rows=1000)¶
-
Examples¶
Import from the stdin
in CSV format and upload in native query format (see next section).
import sys
from lwetl import Jdbc, CsvImport, NativeUploader
jdbc = Jdbc('scott')
with NativeUploader(jdbc,'TARGET_TABLE', commit_mode=lwetl.UPLOAD_MODE_COMMIT) as upl:
# read CSV from stdin
with CsvImport(sys.stdin) as csv:
for r in csv.get_data():
upl.insert(r)
Import from an excel 2007+ spreadsheet and upload using parameterized SQL syntax (see next section).
import sys
from lwetl import Jdbc, XlsxImport, ParameterUploader
jdbc = Jdbc('scott')
table = 'TARGET_TABLE'
# alternative to with statement
xls = XlsxImport()
xls.open(table + '.xlsx')
with ParameterUploader(jdbc,table, commit_mode=lwetl.UPLOAD_MODE_COMMIT) as upl:
for r in xls.get_data():
upl.insert(r)
if upl.rowcount > 1000:
upl.commit()
if upl.rowcount > 0:
upl.commit()
xls.close()
Upload models¶
Operational modes¶
Import into a database has the following modes of operation:
- UPLOAD_MODE_DRYRUN
- SQL statements are generated, but not send to the database.
- UPLOAD_MODE_PIPE
- SQL statements are generated and piped for futher processing. The database itself is not touched.
- UPLOAD_MODE_ROLLBACK
SQL statements are generated and executed to the database. However, the commit statement performs a rollback instead.
Warning
This mode is not compatible with a database connection in auto-commit mode. It will also fail if the user sends commit commands independently.
- UPLOAD_MODE_COMMIT
- SQL statements are generated and executed to the database. However, the commit statement performs a rollback instead.
Classes¶
-
class
NativeUploader
(jdbc: Jdbc, table: str, fstream=None, commit_mode=UPLOAD_MODE_DRYRUN, exit_on_fail=True)¶ Upload data into a table with native SQL (no parameters in the jdbc execute command).
Parameters: - jdbc (Jdbc) – The target database connection
- table (str) – Name of the table in the database to insert the data
- fstream –
- commit_mode (str) – The upload mode, see Operational modes.
- exit_on_fail (bool) – Clear the commit buffer and exit if an insert, update, or delete command fails.
-
insert(data: dict):
Insert into the table
Parameters: data (dict) – a dictionary of key (column name) and values. Keys, which do not correspond to an existing column names are ignored.
-
update(data: dict, where_clause):
Update an existing row in the table
Parameters: - data (dict) – a dictionary of key (column name) and values. Keys, which do not correspond to an existing column names are ignored.
- where_clause (None,str,dict) –
filter for column selection. Valid formats for the where clause are:
None
- updates all columns.
str
- raw SQL WHERE clause (the keyword WHERE may be omitted).
dict
- keys are column names. Non exisiting column names are ignored. Multiple columns are combined
with the AND statement. The value may be:
- a value (results in COLUMN_NAME = VALUE)
- a string with an operator and value, e.g.,
LIKE 'ABC%'
- a tuple (operator,value), e.g.,
('>=', 7)
-
delete(where_clause):
Delete rows in the table
Parameters: where_clause (None,str,dict) – filter for the columns to delete. Formats are identical to the update
statement.
-
commit
()¶ Processes previous insert/update/delete statements depending on the Operational modes of the instance.
- UPLOAD_MODE_COMMIT
- sends a commit statement to the database
- UPLOAD_MODE_ROLLBACK
- sends a rollback statement to the database
- UPLOAD_MODE_DRYRUN
- does nothing
- UPLOAD_MODE_PIPE
- work in progress
Warning
This mode is not compatible with a database connection in auto-commit mode. It will also fail if the user sends commit commands independently.
-
add_counter(columns: (str, list, set, tuple)):
Mark columns as counters. Assumes the column type is a number. Queries the maximum number of each column and then adds the next value (+1) in the column on each insert.
Parameters: columns (str,list,set,tuple) – names of the columns to add. May be a (comma-separated) string, or a list type.
-
class
ParameterUploader
(self, jdbc: Jdbc, table: str, fstream=None, commit_mode=UPLOAD_MODE_DRYRUN, exit_on_fail=True)¶ Upload data into a table using parameterized SQL commands. See the section NativeUploader for details on the command line arguments.
-
insert(data: dict):
Insert into the table, see the NativeUploader for details.
-
update(data: dict, where_clause):
Update an existing row in the table, see the NativeUploader for details.
-
delete(where_clause):
Delete existing rows from the table, see the NativeUploader for details.
-
commit
()¶ Processes previous insert/update/delete statements depending on the Operational modes of the instance. See the NativeUploader for details
-
add_counter(columns: (str, list, set, tuple)):
Mark columns as counters. Assumes the column type is a number. Queries the maximum number of each column and then adds the next value (+1) in the column on each insert. See the NativeUploader for details
-
-
class
MultiParameterUploader
(jdbc: Jdbc, table: str, fstream=None, commit_mode=UPLOAD_MODE_DRYRUN, exit_on_fail=True)¶ Upload data into a table using the jdbc executemany parameterized command.
-
insert(data: dict):
Insert into the table, see the NativeUploader for details.
-
commit
()¶ Processes previous insert/update/delete statements depending on the Operational modes of the instance. See the NativeUploader for details
-
add_counter(columns: (str, list, set, tuple)):
Mark columns as counters. Assumes the column type is a number. Queries the maximum number of each column and then adds the next value (+1) in the column on each insert. See the NativeUploader for details
-