Using Splunk Archive Bucket Reader with Pig
This is part II in a series of posts about how to use the Splunk Archive Bucket Reader. For information about installing the app and using it to obtain jar files, please see the first post in this series.
In this post I want to show how to use Pig to read archived Splunk data. Unlike Hive, Pig cannot be directly configured to use InputFormat classes. However, Pig provides a Java interface—LoadFunc—that makes it reasonably easy to use an arbitrary InputFormat with just a small amount of Java code. A LoadFunc is provided with Splunk Archive Bucket Reader: com.splunk.journal.hive.JournalLoadFunc. If you would prefer to write your own, you can find more information here.
Whereas Hive closely resembles a …
Splunk Archive Bucket Reader and Hive
This year was my first .conf, and it was an amazingly fun experience! During the keynote, we announced a number of new Hunk features, one of which was the Splunk Archive Bucket Reader. This tool allows you to read Splunk raw data journal files using any Hadoop application that allows the user to configure which InputFormat implementation is used. In particular, if you are using Hunk archiving to copy your indexes onto HDFS, you can now query and analyze the archived data from those indexes using whatever your organization’s favorite Hadoop applications are (e.g. Hive, Pig, Spark). This will hopefully be the first of a series of posts showing in detail how to integrate with these systems. This post is …