Importing Data into the Database

Using Aginity Workbench, you can import data from local files into your Hadoop database, choosing whether to load the imported data into an existing table or create a table for the data.

Workbench supports import of source files in the following formats:

Files in all the supported formats can be imported directly from an archive (zip, gzip, bzip2) without creating any intermediate disk file that may be quite large. Multiple CSV files can be imported at a time.

Import Data into a Destination Database

You can import data from a local source file into a new or existing Hadoop table.

To import data from a local file into a Hadoop table:

  1. On the Tools menu, point to Import and then click the command for the source file format (for example, from Excel).
  2. If you want to import an Excel file, browse to and select the file and then select the worksheet in the Excel file.

    The Data Import Wizard opens.

    If you are importing from an Excel file and have used the Hive method to connect to your Hadoop cluster, note that Hive doesn't support bulk-loading of data from a client computer to a Hive table. Therefore the data is first loaded into an intermediate .csv file and then is imported from the intermediate file into a destination table. You need to specify a location for the intermediate file in HDFS in the Location box on the first page of the Data Import Wizard.

  3. If you are importing a JSON/XML file, select the nodes that will be turned into a flattened table during import.
  4. Select a file encoding and a field delimiter (applicable to a CSV or text file only), or the column widths and file encoding (applicable to a fixed column widths file only).
  5. Specify the range of data to import by setting the Start Column, Start Row, and End Row. If you don't want to import all the rows, clear the Import all rows check box and specify the End Row.

    During import of a CSV or text file, Workbench analyzes the data types in the file and makes necessary changes: it adjusts the length of some or all of the columns to the maximum length of the column's data type or changes the data type of columns. By default Workbench analyzes a maximum of 1,000 rows. If the file you are importing contains more than 1,000 rows, the Data Import Wizard displays a message that only 1,000 rows were analyzed. You can select the Analyze more option later in the wizard to have Workbench analyze all the rows in the file.

  6. Specify whether the first row in the source file contains the column names to be used in the destination table. Ensure that the Include Columns box lists only the columns that you want to import. Use the less than button to remove columns from the Include Columns box and the greater than button to add columns.
    • You can specify the order of columns in the target table by using the Up and Down buttons in the Include Columns box.
    • The column that is currently selected in the Include Columns box is highlighted in the Data Preview box.
  7. Specify data types, precision, length, and scale for the columns. Leave the default values that Aginity Workbench has determined to be valid syntax for the source data, or change the values. Select the check box in the Null column if the column can contain null values.

    A negative value of Scale means that scale is not applicable to the column.

  8. Select a destination database and a schema in the database, and type a name for the Hadoop table to hold the imported data. Under Existing Table Action, select an action to take if the specified table exists in the destination database. Select Ignore empty lines (all NULLS) if you don't want to import the empty lines or rows in the source file.
  9. In the final wizard page, specify the following settings:
  10. Click Finish.

    The Job Monitor displays the progress of a job for the file import. When the import is complete, the Status description column displays a corresponding status.

    • If you have selected the Existing Table Action of Do Not Import and the specified destination table exists in the database, the Job Monitor displays an error message.
    • If you have selected the Analyze option to generate statistics on the table and/or selected columns from the table, and generating or executing an ANALYZE script for the table fails for some reason, the Job Monitor displays a corresponding status.

    If you chose to import the data into a new table, the table is displayed within the specified schema node in the Object Browser.

    You may need to refresh the Object Browser for the table to be displayed.

Aginity Workbench for Hadoop * Topic last updated August 30, 2018