Java Data Science Cookbook
上QQ阅读APP看书,第一时间看更新

Chapter 1. Obtaining and Cleaning Data

In this chapter, we will cover the following recipes:

  • Retrieving all file names from hierarchical directories using Java
  • Retrieving all file names from hierarchical directories using Apache Commons IO
  • Reading contents from text files all at once using Java 8
  • Reading contents from text files all at once using Apache Commons IO
  • Extracting PDF text using Apache Tika
  • Cleaning ASCII text files using Regular Expressions
  • Parsing Comma Separated Value files using Univocity
  • Parsing Tab Separated Value files using Univocity
  • Parsing XML files using JDOM
  • Writing JSON files using JSON.simple
  • Reading JSON files using JSON.simple
  • Extracting web data from a URL using JSoup
  • Extracting web data from a website using Selenium Webdriver
  • Reading table data from MySQL database