Pandas | DataFrame constructor
Start your free 7-days trial now!
Pandas' DataFrame(~) constructor is used to initialise a new DataFrame.
Parameters
1. data | scalar or 2D ndarray or iterable or dict or DataFrame
The dict can contain scalars and array-like objects such as lists, Series and NumPy arrays.
2. indexlink | Index or array-like | optional
The index to use for the DataFrame. By default, if index is not passed and data provides no index, then integer indices will be used.
3. columnslink | Index or array-like | optional
The column labels to use for the DataFrame. By default, if columns is not passed and data provides no column labels, then integer indices will be used.
4. dtypelink | dtype | optional
The data type to use for the DataFrame if possible. Only one type is allowed, and no error is thrown if type conversion is unsuccessful. By default, dtype=None, that is, the data type is inferred.
5. copy | boolean | optional
This parameter is only relevant if data is a DataFrame or a 2D ndarray.
If
True, then a new DataFrame is returned. Modifying this returned DataFrame will not affectdata, and vice versa.If
False, then modifying the returned DataFrame will also mutate the originaldata, and vice versa.
By default, copy=False.
Return value
A DataFrame object.
Examples
Using a dictionary of arrays
To create a DataFrame using a dictionary of arrays:
df = pd.DataFrame({"A":[3,4], "B":[5,6]})df
A B0 3 51 4 6
Here, the key-value pair of the dictionary is as follows:
key: column labelvalue: values of that column
Also, since the data does not contain any index (i.e. row labels), the default integer indices are used.
Using a nested dictionary
To create a DataFrame using a nested dictionary:
col_one = {"a":3,"b":4}col_two = {"a":5,"b":6}df = pd.DataFrame({"A":col_one, "B":col_two})df
A Ba 3 5b 4 6
Here, we've specified the index in col_one and col_two.
Using a Series
To create a DataFrame using a Series:
Using 2D array
We can pass in a 2D list or 2D NumPy array like so:
df = pd.DataFrame([[3,4],[5,6]])df
0 10 3 41 5 6
Notice how the default row and column labels are integer indices.
Using a constant
To initialise a DataFrame using a single constant, we need to specify parameters columns and index so as to define the shape of the DataFrame:
pd.DataFrame(2, index=["a","b"], columns=["A","B","C"])
A B Ca 2 2 2b 2 2 2
Specifying column labels and index
To explicitly set the column labels and index (i.e. row labels):
df = pd.DataFrame([[3,4],[5,6]], columns=["A","B"], index=["a","b"])df
A Ba 3 4b 5 6
Specifying dtype
To set a preference for the type of all columns:
df = pd.DataFrame([["3",4],["5",6]], dtype=float)df
0 10 3.0 4.01 5.0 6.0
Notice how "3" was casted to a float.
Note that no error will be thrown even if the type conversion is unsuccessful. For instance:
df = pd.DataFrame([["3@@@",4],["5",6]], dtype=float)df
0 10 3@@@ 4.01 5 6.0
Here, the dtypes of the columns are as follow:
df.dtypes
0 object1 float64dtype: object