November 30, 2021
Mencari Data Pada Dataframe

Mencari Data Pada DataFrame

Pada artikel ini akan dibahas mengenai cara mencari data pada Pandas DataFrame. Teknik pencarian yang dibahas yaitu mencari data yang sesuai dan pencarian menggunakan regular expression.

Jupyter Notebook

Memuat data ke Pandas

In [1]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/Onestringlab/notebook/main/pokemon_data.csv')

Melihat data awal

In [2]:
df.head()
Out[2]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False

Mencari data menggunakan perintah loc

In [3]:
# Mencari Type 1 = Grass
df.loc[df['Type 1'] == 'Grass'].head()
Out[3]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
48 43 Oddish Grass Poison 45 50 55 75 65 30 1 False
In [4]:
# Mencari Type 1 = Grass dan Type 2 = Poison 
df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')].head()
Out[4]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
48 43 Oddish Grass Poison 45 50 55 75 65 30 1 False
In [5]:
# Mencari Type 1 = Grass dan Type 2 = Poison dan HP > 70 
df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)].head()
Out[5]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
50 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False
77 71 Victreebel Grass Poison 80 105 65 100 70 70 1 False
652 591 Amoonguss Grass Poison 114 85 70 85 80 30 5 False
In [6]:
# Mencari Type 1 = Grass atau Type 2 = Water
df.loc[(df['Type 1'] == 'Grass') | (df['Type 2'] == 'Water')]
Out[6]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
48 43 Oddish Grass Poison 45 50 55 75 65 30 1 False
... ... ... ... ... ... ... ... ... ... ... ... ...
741 673 Gogoat Grass NaN 123 100 62 97 81 68 6 False
758 688 Binacle Rock Water 42 52 67 39 56 50 6 False
759 689 Barbaracle Rock Water 72 105 115 54 86 68 6 False
760 690 Skrelp Poison Water 50 60 60 60 60 30 6 False
799 721 Volcanion Fire Water 80 110 120 130 90 70 6 True

84 rows × 12 columns

In [7]:
# mencari name yang terdapat kata "Mega"
df.loc[df['Name'].str.contains('Mega')].head(20)
Out[7]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
7 6 CharizardMega Charizard X Fire Dragon 78 130 111 130 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 78 104 78 159 115 100 1 False
12 9 BlastoiseMega Blastoise Water NaN 79 103 120 135 115 78 1 False
19 15 BeedrillMega Beedrill Bug Poison 65 150 40 15 80 145 1 False
23 18 PidgeotMega Pidgeot Normal Flying 83 80 80 135 80 121 1 False
71 65 AlakazamMega Alakazam Psychic NaN 55 50 65 175 95 150 1 False
87 80 SlowbroMega Slowbro Water Psychic 95 75 180 130 80 30 1 False
102 94 GengarMega Gengar Ghost Poison 60 65 80 170 95 130 1 False
124 115 KangaskhanMega Kangaskhan Normal NaN 105 125 100 60 100 100 1 False
137 127 PinsirMega Pinsir Bug Flying 65 155 120 65 90 105 1 False
141 130 GyaradosMega Gyarados Water Dark 95 155 109 70 130 81 1 False
154 142 AerodactylMega Aerodactyl Rock Flying 80 135 85 70 95 150 1 False
163 150 MewtwoMega Mewtwo X Psychic Fighting 106 190 100 154 100 130 1 True
164 150 MewtwoMega Mewtwo Y Psychic NaN 106 150 70 194 120 140 1 True
168 154 Meganium Grass NaN 80 82 100 83 100 80 2 False
196 181 AmpharosMega Ampharos Electric Dragon 90 95 105 165 110 45 2 False
224 208 SteelixMega Steelix Steel Ground 75 125 230 55 95 30 2 False
229 212 ScizorMega Scizor Bug Steel 70 150 140 65 100 75 2 False
232 214 HeracrossMega Heracross Bug Fighting 80 185 115 40 105 75 2 False
In [8]:
# mencari name yang tidak terdapat kata "Mega"
df.loc[~df['Name'].str.contains('Mega')]
Out[8]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False
5 5 Charmeleon Fire NaN 58 64 58 80 65 80 1 False
... ... ... ... ... ... ... ... ... ... ... ... ...
794 718 Zygarde50% Forme Dragon Ground 108 100 121 81 95 95 6 True
795 719 Diancie Rock Fairy 50 100 150 100 150 50 6 True
797 720 HoopaHoopa Confined Psychic Ghost 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 80 110 120 130 90 70 6 True

751 rows × 12 columns

Pemanfaatan regular expression untuk pencarian

In [9]:
import re

df.loc[df['Type 1'].str.contains('Fire|Grass', regex=True)]
Out[9]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ...
735 667 Litleo Fire Normal 62 50 58 73 54 72 6 False
736 668 Pyroar Fire Normal 86 68 72 109 66 106 6 False
740 672 Skiddo Grass NaN 66 65 48 62 57 52 6 False
741 673 Gogoat Grass NaN 123 100 62 97 81 68 6 False
799 721 Volcanion Fire Water 80 110 120 130 90 70 6 True

122 rows × 12 columns

In [10]:
df.loc[df['Type 1'].str.contains('fire|grass', flags=re.I, regex=True)]
Out[10]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ...
735 667 Litleo Fire Normal 62 50 58 73 54 72 6 False
736 668 Pyroar Fire Normal 86 68 72 109 66 106 6 False
740 672 Skiddo Grass NaN 66 65 48 62 57 52 6 False
741 673 Gogoat Grass NaN 123 100 62 97 81 68 6 False
799 721 Volcanion Fire Water 80 110 120 130 90 70 6 True

122 rows × 12 columns

Menyalin data hasil filter

In [11]:
df_new = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)].head(20)
df_new
Out[11]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
50 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False
77 71 Victreebel Grass Poison 80 105 65 100 70 70 1 False
652 591 Amoonguss Grass Poison 114 85 70 85 80 30 5 False

Mengatur ulang angka index

In [12]:
df_new.reset_index()
Out[12]:
index # Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
1 3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
2 50 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False
3 77 71 Victreebel Grass Poison 80 105 65 100 70 70 1 False
4 652 591 Amoonguss Grass Poison 114 85 70 85 80 30 5 False
In [13]:
# menghilangkan kolom index
df_new.reset_index(drop=True)
Out[13]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False
1 3 VenusaurMega Venusaur Grass Poison 80 100 123 122 120 80 1 False
2 45 Vileplume Grass Poison 75 80 85 110 90 50 1 False
3 71 Victreebel Grass Poison 80 105 65 100 70 70 1 False
4 591 Amoonguss Grass Poison 114 85 70 85 80 30 5 False

Kesimpulan

Pencarian data pada Pandas DataFrame dapat menggunakan pencarian berdasarkan data yang dicari atau dapat juga menggunakan regular expression. Untuk artikel lain terkait dengan data science silahkan lihat kumpulan artikelnya disini.

Rajo Intan

Blogger, pemiliki Onestring Lab, menulis artikel terkait teknologi informasi dan pendidikan. Web Developer, berpengalaman lebih dari 20 tahun mengembangkan berbagai aplikasi dan sistem informasi. Kerjasama kontak di onestringlab@gmail.com atau https://forms.gle/xAGKkpi6B3BzJyzk7

View all posts by Rajo Intan →