Skip to content

Latest commit

 

History

History
129 lines (92 loc) · 4.6 KB

File metadata and controls

129 lines (92 loc) · 4.6 KB

Building Workbench

The org.bitfunnel.workbench package provides tools for converting Wikipedia database dump files into BitFunnel corpus files. Please see README.md for more information on using Workbench to process Wikipedia database dump files.

Building org.bitfunnel.workbench.

Java development requires a JDK ((we used jdk1.8.0_92). Our package is built with Maven. (version 3.3.9). The unit tests are based on JUnit.

OSX Configuration and Build

Install a JDK. We used Oracle's Java SE 8u92 which can be found on their downloads page.

If you don't want to manually install the exact version we used, you can take your chances with the version of Java you'll get with homebrew; the following commands worked on 8/9/2016 with 10.11.5 (El Capitan), but there's guarnatee that the build will continue to work with future updates:

brew update
brew cask install java

Use homebrew to install Maven:

% brew install maven

Build org.bitfunnel.workbench from the command line:

% mvn package

Windows Configuration and Build

Install a JDK. We used Oracle's Java SE 8u92 which can be found on their downloads page.

Install Maven.

  1. Download the Maven .zip file.

  2. Extract to some location on the machine.

  3. Add the extracted folder's bin directory to the PATH.

  4. Open the System Control panel by pressing (Windows + Pause). alt text

  5. Choose Advanced System Settings on the left. alt text

  6. Click Environment Varables at the bottom of the dialog. alt text

  7. Select the variable called PATH and press Edit... alt text

  8. Add a semicolon (;) to the PATH and then the path to the extracted bin folder. alt text

  9. OK out of all of the dialogs.

  10. Close and reopen any cmd.exe windows to get the new PATH.

  11. Tip. You can update the path in an open cmd.exe window, for example

    set PATH=%PATH%;C:\C:\Program Files\apache-maven-3.3.9\bin
    

    This change will only have effect in the current window and only until it is closed.

  12. In a similar manner, set the JAVA_HOME to point to your JDK. For example,

    set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_92
    

Build org.bitfunnel.workbench from the command line:

% mvn package

Linux Configuration and Build

sudo apt install openjdk-8-jdk python maven
mvn package

IntelliJ Configuration and Build

IntelliJ Community Edition is a fairly lightweight, free IDE for Java. It has a good debugger, support for Ant, Maven, Gradle, and JUnit, and it provides a number of nice code browsing and refactoring features. IntelliJ is available on Linux, OSX, and Windows.

Start IntelliJ. From the welcome screen, select open:

alt text

Select pom.xml and press OK.

alt text

The project will be imported. Now set up the debug and run configurations by clicking on Run => Edit Configurations ...

alt text

The click the green + in the upper left corner to add a new configuration:

alt text

Select Application

alt text

On the configuration tab, choose a Name for the configuration, set the Main class field to org.bitfunnel.workbench.MakeCorpusFile, and set the Program arguments to reference your input and output directories. OK out of all of the dialogs.

alt text

If you plan to edit the pom.xml file, say to add additional dependencies, it helps to configure auto import. To do this, go to File => Settings ...

alt text

Expand the tree on the left to Build, Execution, Deployment/Build Tools/Maven/Importing. Select Import Maven projects automatically. OK out of the dialog.

alt text

You are now good to go! Use Build => Make Project to build and Run => Run 'MakeCorpusFile' to run.

Please see README.md for more information on using Workbench to process Wikipedia database dump files.