Pandas DataFrame | set_index method
Start your free 7-days trial now!
Pandas's DataFrame.set_index(~) sets the index of the DataFrame using one of its columns.
Parameters
1. keys | string or array-like or list<string>
The names of the column(s) used for the index.
2. droplink | boolean | optional
If
True, then the column used for the index will be deleted.If
False, then column will be retained.
By default, drop=True.
3. appendlink | boolean | optional
If
True, then the columns will be appended to the current index.If
False, then the columns will replace the current index.
By default, append=False.
4. inplacelink | boolean | optional
If
True, then the source DataFrame will be modified and return.If
False, then a new DataFrame will be returned.
By default, inplace=False.
5. verify_integritylink | boolean | optional
If
True, then an error is raised if the new index has duplicates.If
False, then duplicate indexes are allowed.
By default, verify_integrity=False.
Return Value
A DataFrame with a new index.
Examples
Consider the following DataFrame:
df
A B C0 1 3 51 2 4 6
Setting a single column as the index
To set column A as the index of df:
df.set_index("A") # Returns a DataFrame
B CA 1 3 52 4 6
Here, the name assigned to the index is the column label, that is, "A".
Setting multiple columns as the index
To set columns A and B as the index of df:
df.set_index(["A","B"])
CA B 1 3 52 4 6
Here, the DataFrame ends up with 2 indexes.
Keeping the column used for the index
To keep the column that will be used as the index, set drop=False:
df.set_index("A", drop=False)
A B CA 1 1 3 52 2 4 6
Notice how the column A is still there.
Just as reference, here's df again:
df
A B C0 1 3 51 2 4 6
Appending to the current index
To append a column to the existing index, set append=True:
df.set_index("A", append=True)
B C A 0 1 3 51 2 4 6
Notice how the original index [0,1] has been appended to.
Setting an index in-place
To set an index in-place, supply inplace=True:
df.set_index("A", inplace=True)df
BA 1 32 4
As shown in the output above, by setting inplace=True, the source DataFrame will be directly modified. Opt to set inplace=True when you're sure that you won't be needing the source DataFrame since this will save memory.
Verifying integrity
Consider the following DataFrame:
df
A B0 1 31 1 4
By default, verify_integrity=False, which means that no error will be thrown if the resulting index contains duplicates:
df.set_index("A") # verify_integrity=False
BA 1 31 4
Notice how the new index contains duplicate values (two 1s), but no error was thrown.
To throw an error in such in cases, pass verify_integrity=True like so:
df.set_index("A", verify_integrity=True)
ValueError: Index has duplicate keys: Int64Index([1], dtype='int64', name='A')