Compressed Table

This is a solution to handle tables from extraction ofcsv or Excel file. It provides a compression feature, also you can choose it un-compressed. The source data could come from a CSV file or Excel spread sheet.

Each CompressedTable represents a data set extracted from either a CSV or Excel file.

A handy tool named Bold/CompressedComparator can compare two tables to figure out

which records are missed in which table
which columns/headers are missed in which table
which records are mismatched
which records are matched
which mismatched record has how many fields mismatched
which column/header has how many records mismatched

To Use compressed-table

Maven

<dependency>
  <groupId>net.zousys</groupId>
  <artifactId>compressed-table</artifactId>
  <version>1.0.1</version>
</dependency>

Gradle:

implementation 'net.zousys:compressed-table:1.0.1'

Concept

Headers

When the data is parsed, the first row (after ignored rows before it) is treated as headers. And the columns referenced by this header.

Compressed

The Data carried by Table can be compressed to reduce the size of execution profile. Also it can be disabled by setting

CompressedTableFactory
        .build("csv")
        .compressed(false)
        .build();

This is useful in handling large data set.

Table Type

CSV: the source type of CSV file to parse
Excel: the source type of Excel file to parse

Key Sets

When a Table is parsed, the key set need to be set to Table thus table knows how to index the content and later those data can be queried. Each KeySet contains one or multiple columns which are represended by headers. To understand multiple Key Sets, you can imagine the name card, one person(Row in the case) could have multiple name cards, one is business card the other may be a social name card or family card, the name or identification on the card could be different but the purpose is to reveal to link it to the real human identity. Here, in Table, a Row could have multiple Keys to represent itself.

One Bold/KeySet contains:

One or Multiple headers or Columns

You could set one KeySet or mulitple KeySet:

SINGLE_KEY (one KeySet) In this mode, the key set is called Bold/Main Key
MULTI_KEYS (multiple KeySets) In this mode, A key set is automatically generated called Bold/Native Key, it's unique. Also one or a serial KeySets are attached in you favior. The Bold/KeyHeadersList represents the KeySets.

Delimeter

This is for CSV parsing, a delimeter can be specified if it is other than comma

Comparison

The Compressed Comparator is to compare two Tables to find out the disparities.

ComparatorListener

This is a call back interface usd by comparator to let outside world known what's happened during the process.

CompressedComparatorFactory

This is the factory to initialize the Comparator you prefer.

before() and after() to specify two tables
ignoredFields() to set the Fields/Columns you don't want to compare thus fields won't appear in result
comparatorListener() to set the call back listener
trim() is to let the process trim the data fields or not
strictMissed() this is the switch in case the columns are missed in before or after, and it is to enable the comparison on this column. If it is true, then any rows in the tables will be mismatch if there are any missed columns in table.

Examples

Parse a table with single key set

In this case, the key is composition keys of two columns of "Customer Id" and "First Name":

CompressedTable beforetable = CompressedTableFactory
        .build("csv")
        .keyHeaderList(
                new KeyHeadersList()
                        .addHeaders(new String[]{"Customer Id", "First Name"})
        )
        .compressed(false)  // the content is not compressed
        .ignoredLines(0)    // ignore the first number of lines
        .delimeter(',')
        .parse(Paths.get("customers-1000b.csv")
                .toAbsolutePath()
                .toString());

Parse a table with multiple key sets

In this case, there are two key sets used to identify a row by one set of {"Customer Id", "First Name"}, the other key set is {"Phone 1", "Phone 2", "Email"}

CompressedTable beforetable = CompressedTableFactory
        .build("csv")
        .keyHeaderList(new KeyHeadersList()
                    .addHeaders(new String[]{"First Name", "Last Name"})
                    .addHeaders(new String[]{"Phone 1", "Phone 2", "Email"})
                )
                .ignoredLines(0)
                .delimeter(',')
                .parse(Paths.get("customers-1000b.csv")
                        .toAbsolutePath()
                        .toString());

Compare two tables

        CompressedComparatorFactory.builder()
                .before(beforetable)
                .after(aftertable)
                .comparatorListener(listener)
                .strictMissed(false)
                .ignoredFields(new HashSet(Arrays.asList(new String[]{})))
                .build().create()
                .compare();

Executable examples

There are couple executable examples in the Test:

samples.ParseSingleKeySet.java
samples.ParseMultipleKeySets.java
samples.CompareTwoTables.java
samples.ParseAExcelFile.java

Bucket Comparator

This is fast comparator to compare csv or text delimetered file. It can consume and compare 2 million records within 10 to 20 seconds (depends on columns number)

To use it very simple:

import net.zousys.bucketcomp.Comparator;

public class BucketComparatorSample {
    public static void main(String[] args) throws Exception {
        CompListener listener = new CompListener();
        System.out.println( Math.abs("823949076hruf398hukey".hashCode()) % 2);
        new Comparator("path to/ucompconf.yml",
                "path to before csv",
                "path to after csv", listener).compare();

    }
}

A comparator configuration need to specified as:

bucketNumber: 2   # the bucket numbers
delimeter: ','
#quote: "
#escape: //
skipEmptyLines: true
headerLine: 1     # the line no of headers, this is for the csv has leading lines before header
bucket: bucket    # bucket directory path
keys:             # supports multiple keys
  - Customer Id
ignoredHeaders:   # ignored column header names
  - City

Please also refer to the comparator listener in test:

@Log4j2
public class CompListener implements ComparatorListener {

    @Override
    public void handleMissedInBefore(String key) {
        log.error("Missed in before key: " + key);
    }

    @Override
    public void handleMissedInAfter(String key) {
        log.error("Missed in after key: " + key);
    }

    @Override
    public void handleMissedBeforeHeader(List<String> headers) {

    }

    @Override
    public void handleMissedAfterHeader(List<String> headers) {

    }

    @Override
    public void handleMisMatched(String key, ComparedRow comparedRow) {
        log.error("Mismatched by key: " + key + " - " + comparedRow);
    }

    @Override
    public void handleMatched(String key, ComparedRow comparedRow) {
    }

    @Override
    public void updateUnitedHeaders(List<String> unitedHeaders) {

    }

    @Override
    public void handleUnitedHeaderMapping(Map<String, Integer> unitedHeadermapping) {

    }

    @Override
    public void handleMarkers(Map<String, Integer> markers) {

    }

    @Override
    public void handleNotice(String key) {

    }

    @Override
    public void appendInformation() {

    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compressed Table

To Use compressed-table

Concept

Headers

Compressed

Table Type

Key Sets

Delimeter

Comparison

ComparatorListener

CompressedComparatorFactory

Examples

Parse a table with single key set

Parse a table with multiple key sets

Compare two tables

Executable examples

Bucket Comparator

About

Uh oh!

Releases 16

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Compressed Table

To Use compressed-table

Concept

Headers

Compressed

Table Type

Key Sets

Delimeter

Comparison

ComparatorListener

CompressedComparatorFactory

Examples

Parse a table with single key set

Parse a table with multiple key sets

Compare two tables

Executable examples

Bucket Comparator

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages