Pada artikel ini akan dibahas mengenai cara mencari data pada Pandas DataFrame. Teknik pencarian yang dibahas yaitu mencari data yang sesuai dan pencarian menggunakan regular expression.
Jupyter Notebook
Memuat data ke Pandas
In [1]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Onestringlab/notebook/main/pokemon_data.csv')
Melihat data awal
In [2]:
df.head()
Out[2]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
Mencari data menggunakan perintah loc
In [3]:
# Mencari Type 1 = Grass
df.loc[df['Type 1'] == 'Grass'].head()
Out[3]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
48 | 43 | Oddish | Grass | Poison | 45 | 50 | 55 | 75 | 65 | 30 | 1 | False |
In [4]:
# Mencari Type 1 = Grass dan Type 2 = Poison
df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')].head()
Out[4]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
48 | 43 | Oddish | Grass | Poison | 45 | 50 | 55 | 75 | 65 | 30 | 1 | False |
In [5]:
# Mencari Type 1 = Grass dan Type 2 = Poison dan HP > 70
df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)].head()
Out[5]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
50 | 45 | Vileplume | Grass | Poison | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
77 | 71 | Victreebel | Grass | Poison | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
652 | 591 | Amoonguss | Grass | Poison | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
In [6]:
# Mencari Type 1 = Grass atau Type 2 = Water
df.loc[(df['Type 1'] == 'Grass') | (df['Type 2'] == 'Water')]
Out[6]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
48 | 43 | Oddish | Grass | Poison | 45 | 50 | 55 | 75 | 65 | 30 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
741 | 673 | Gogoat | Grass | NaN | 123 | 100 | 62 | 97 | 81 | 68 | 6 | False |
758 | 688 | Binacle | Rock | Water | 42 | 52 | 67 | 39 | 56 | 50 | 6 | False |
759 | 689 | Barbaracle | Rock | Water | 72 | 105 | 115 | 54 | 86 | 68 | 6 | False |
760 | 690 | Skrelp | Poison | Water | 50 | 60 | 60 | 60 | 60 | 30 | 6 | False |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
84 rows × 12 columns
In [7]:
# mencari name yang terdapat kata "Mega"
df.loc[df['Name'].str.contains('Mega')].head(20)
Out[7]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
7 | 6 | CharizardMega Charizard X | Fire | Dragon | 78 | 130 | 111 | 130 | 85 | 100 | 1 | False |
8 | 6 | CharizardMega Charizard Y | Fire | Flying | 78 | 104 | 78 | 159 | 115 | 100 | 1 | False |
12 | 9 | BlastoiseMega Blastoise | Water | NaN | 79 | 103 | 120 | 135 | 115 | 78 | 1 | False |
19 | 15 | BeedrillMega Beedrill | Bug | Poison | 65 | 150 | 40 | 15 | 80 | 145 | 1 | False |
23 | 18 | PidgeotMega Pidgeot | Normal | Flying | 83 | 80 | 80 | 135 | 80 | 121 | 1 | False |
71 | 65 | AlakazamMega Alakazam | Psychic | NaN | 55 | 50 | 65 | 175 | 95 | 150 | 1 | False |
87 | 80 | SlowbroMega Slowbro | Water | Psychic | 95 | 75 | 180 | 130 | 80 | 30 | 1 | False |
102 | 94 | GengarMega Gengar | Ghost | Poison | 60 | 65 | 80 | 170 | 95 | 130 | 1 | False |
124 | 115 | KangaskhanMega Kangaskhan | Normal | NaN | 105 | 125 | 100 | 60 | 100 | 100 | 1 | False |
137 | 127 | PinsirMega Pinsir | Bug | Flying | 65 | 155 | 120 | 65 | 90 | 105 | 1 | False |
141 | 130 | GyaradosMega Gyarados | Water | Dark | 95 | 155 | 109 | 70 | 130 | 81 | 1 | False |
154 | 142 | AerodactylMega Aerodactyl | Rock | Flying | 80 | 135 | 85 | 70 | 95 | 150 | 1 | False |
163 | 150 | MewtwoMega Mewtwo X | Psychic | Fighting | 106 | 190 | 100 | 154 | 100 | 130 | 1 | True |
164 | 150 | MewtwoMega Mewtwo Y | Psychic | NaN | 106 | 150 | 70 | 194 | 120 | 140 | 1 | True |
168 | 154 | Meganium | Grass | NaN | 80 | 82 | 100 | 83 | 100 | 80 | 2 | False |
196 | 181 | AmpharosMega Ampharos | Electric | Dragon | 90 | 95 | 105 | 165 | 110 | 45 | 2 | False |
224 | 208 | SteelixMega Steelix | Steel | Ground | 75 | 125 | 230 | 55 | 95 | 30 | 2 | False |
229 | 212 | ScizorMega Scizor | Bug | Steel | 70 | 150 | 140 | 65 | 100 | 75 | 2 | False |
232 | 214 | HeracrossMega Heracross | Bug | Fighting | 80 | 185 | 115 | 40 | 105 | 75 | 2 | False |
In [8]:
# mencari name yang tidak terdapat kata "Mega"
df.loc[~df['Name'].str.contains('Mega')]
Out[8]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
5 | 5 | Charmeleon | Fire | NaN | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
794 | 718 | Zygarde50% Forme | Dragon | Ground | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True |
795 | 719 | Diancie | Rock | Fairy | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
751 rows × 12 columns
Pemanfaatan regular expression untuk pencarian
In [9]:
import re
df.loc[df['Type 1'].str.contains('Fire|Grass', regex=True)]
Out[9]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
735 | 667 | Litleo | Fire | Normal | 62 | 50 | 58 | 73 | 54 | 72 | 6 | False |
736 | 668 | Pyroar | Fire | Normal | 86 | 68 | 72 | 109 | 66 | 106 | 6 | False |
740 | 672 | Skiddo | Grass | NaN | 66 | 65 | 48 | 62 | 57 | 52 | 6 | False |
741 | 673 | Gogoat | Grass | NaN | 123 | 100 | 62 | 97 | 81 | 68 | 6 | False |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
122 rows × 12 columns
In [10]:
df.loc[df['Type 1'].str.contains('fire|grass', flags=re.I, regex=True)]
Out[10]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
735 | 667 | Litleo | Fire | Normal | 62 | 50 | 58 | 73 | 54 | 72 | 6 | False |
736 | 668 | Pyroar | Fire | Normal | 86 | 68 | 72 | 109 | 66 | 106 | 6 | False |
740 | 672 | Skiddo | Grass | NaN | 66 | 65 | 48 | 62 | 57 | 52 | 6 | False |
741 | 673 | Gogoat | Grass | NaN | 123 | 100 | 62 | 97 | 81 | 68 | 6 | False |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
122 rows × 12 columns
Menyalin data hasil filter
In [11]:
df_new = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)].head(20)
df_new
Out[11]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
50 | 45 | Vileplume | Grass | Poison | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
77 | 71 | Victreebel | Grass | Poison | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
652 | 591 | Amoonguss | Grass | Poison | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
Mengatur ulang angka index
In [12]:
df_new.reset_index()
Out[12]:
index | # | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
1 | 3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
2 | 50 | 45 | Vileplume | Grass | Poison | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
3 | 77 | 71 | Victreebel | Grass | Poison | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
4 | 652 | 591 | Amoonguss | Grass | Poison | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
In [13]:
# menghilangkan kolom index
df_new.reset_index(drop=True)
Out[13]:
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
1 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
2 | 45 | Vileplume | Grass | Poison | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
3 | 71 | Victreebel | Grass | Poison | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
4 | 591 | Amoonguss | Grass | Poison | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
Kesimpulan
Pencarian data pada Pandas DataFrame dapat menggunakan pencarian berdasarkan data yang dicari atau dapat juga menggunakan regular expression. Untuk artikel lain terkait dengan data science silahkan lihat kumpulan artikelnya disini.