ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Technology»
  • Computers & Software

How to use Apache PIG ?

Updated on July 3, 2016
Source

Pig Latin

Pig Commands

LOAD ---> Load command is used to specify the files to be loaded to pig

PigStorage ---> is used to denote the line delimiter inside the file loaded into pig

dump cust; ---> Load the data in to the temporary variable cust

filter ---> Filter is like where clause of sql where you can extract data based on specific condition

Foreach ... generate .....; ---> Filter columns

Stream ... Through `cut -f 1,2,4`; ---> Execute unix commands on pig grunt


Example 1 :

Sample input

custs

4000017,Neal,Lawrence,72,Computer support specialist
4000018,Jean,Griffin,45,Childcare worker
4000019,Kristine,Dougherty,63,Financial analyst
4000020,Crystal,Powers,67,Engineering technician
4000021,Alex,May,39,Environmental scientist
4000022,Eric,Steele,66,Doctor

cust = load '/pig/custs' using PigStorage(',') AS ( custid:chararray,firstname:chararray,lastname:chararray, age:long, profession:chararray);

filter_1 = filter cust by (age >= 40);

Sample output

(4009986,Jesse,Smith,57,Designer)
(4009991,Paul,Mullins,47,Reporter)
(4009993,Becky,Wolfe,67,Musician)
(4009994,Clyde,Welch,40,Photographer)
(4009996,Tonya,McIntosh,56,Engineering technician)
(4009998,Tracey,Bullock,60,Compute

NameAge= foreach cust generate firstname,age;

Sample output

(Paul,47)
(Erin,33)
(Becky,67)
(Clyde,40)
(Rebecca,37)
(Tonya,56)
(Ron,36)
(Tracey,60)
(Ray,64)

Example 2 :

Analyse the given datasets and print the student names who have successfully cleared the exam

Sample input

-> results

1 fail
2 fail
3 pass
4 pass
5 fail
6 pass
7 fail
8 pass

-> student

vineet 1
hisham 2
raj 3
ajeet 4
sujit 5
ramesh 6
priya 7

PIG SCRIPT FILE ====>  Student_results.pig

S = load '/pig/student' as (Name:chararray,Rollnumber:int);
R = load '/pig/results' as (Rollnumber:int,Result:chararray);
R1 = Join S by Rollnumber , R by Rollnumber; 
R2 = Foreach R1 Generate Name,Result;
R3 = Filter R2 by (Result!='fail');
Store R3 into '/pig_outputs/Final_students_results' using PigStorage ('-');
Dump R3;

======> Run the above pig script from unix prompt as below

pig Student_results.pig


---------------------------or------------------------

PIG SCRIPT FILE ====>  Student_results_1.pig

S = load '/pig/student' as (Name:chararray,Rollnumber:int);
R = load '/pig/results' as (Rollnumber:int,Result:chararray);
R1 = Join S by Rollnumber , R by Rollnumber; 
R2 = Filter R1 by (Result!='fail');
R3 = Stream R2 Through `cut -f 1,2,4`;
Store R3 into '/pig_outputs/Final_students_results_2' using PigStorage ('-');
Dump R3;

Run the above pig script from unix prompt as below

pig Student_results_1.pig

Sample output

raj-3-pass
ajeet-4-pass
ramesh-6-pass
priyanka-8-pass
suresh-9-pass

ritesh-10-pass

register /home/hadoop/pig-0.10.1/contrib/piggybank/java/piggybank.jar;
A = LOAD '/home/edureka/Desktop/f1.csv' USING org.apache.pig.piggybank.storage.CSVLoader();
Dump A;

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.