search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
chevron_leftDocumentation
Method argpartition
NumPy Random Generator4 topics
Method choiceMethod dotMethod finfoMethod histogramMethod iinfoMethod maxMethod meanMethod placeMethod rootsMethod seedMethod uniformMethod viewMethod zerosMethod sumObject busdaycalendarMethod is_busdayProperty dtypeMethod uniqueMethod loadtxtMethod vsplitMethod fliplrMethod setdiff1dMethod msortMethod argsortMethod lexsortMethod aroundMethod nanmaxMethod nanminMethod nanargmaxMethod nanargminMethod argmaxMethod argminProperty itemsizeMethod spacingMethod fixMethod ceilMethod diffProperty flatProperty realProperty baseMethod flipMethod deleteMethod amaxMethod aminMethod logical_xorMethod logical_orMethod logical_notMethod logical_andMethod logaddexpMethod logaddexp2Method logspaceMethod not_equalMethod equalMethod greater_equalMethod lessMethod less_equalMethod remainderMethod modMethod emptyMethod greaterMethod isfiniteMethod busday_countMethod repeatMethod varMethod random_sampleMethod randomMethod signMethod stdMethod absoluteMethod absMethod sortMethod randintMethod isrealMethod linspaceMethod gradientMethod allMethod sampleProperty TProperty imagMethod covMethod insertMethod logMethod log1pMethod exp2Method expm1Method expMethod arccosMethod cosMethod arcsinMethod sinMethod tanMethod fromiterMethod trim_zerosMethod diagflatMethod savetxtMethod count_nonzeroProperty sizeProperty shapeMethod reshapeMethod resizeMethod triuMethod trilMethod eyeMethod arangeMethod fill_diagonalMethod tileMethod saveMethod transposeMethod swapaxesMethod meshgridProperty mgridMethod rot90Method log2Method radiansMethod deg2radMethod rad2degMethod degreesMethod log10Method appendMethod cumprodProperty nbytesMethod tostringProperty dataMethod modfMethod fmodMethod tolistMethod datetime_as_stringMethod datetime_dataMethod array_splitMethod itemsetMethod floorMethod put_along_axisMethod cumsumMethod bincountMethod putMethod putmaskMethod takeMethod hypotMethod sqrtMethod squareMethod floor_divideMethod triMethod signbitMethod flattenMethod ravelMethod rollMethod isrealobjMethod diagMethod diagonalMethod quantileMethod onesMethod iscomplexobjMethod iscomplexMethod isscalarMethod divmodMethod isnatMethod percentileMethod isnanMethod divideMethod addMethod reciprocalMethod positiveMethod subtractMethod medianMethod isneginfMethod isposinfMethod float_powerMethod powerMethod negativeMethod maximumMethod averageMethod isinfMethod multiplyMethod busday_offsetMethod identityMethod interpMethod squeezeMethod get_printoptionsMethod savez_compressedMethod savezMethod loadMethod asfarrayMethod clipMethod arrayMethod array_equivMethod array_equalMethod frombufferMethod set_string_functionMethod matmulMethod genfromtxtMethod fromfunctionMethod asscalarMethod searchsortedMethod full_likeMethod fullMethod shares_memoryMethod ptpMethod digitizeMethod argwhereMethod geomspaceMethod zeros_likeMethod fabsMethod flatnonzeroMethod vstackMethod dstackMethod fromstringMethod tobytesMethod expand_dimsMethod ranfMethod arctanMethod itemMethod extractMethod compressMethod chooseMethod asarrayMethod asmatrixMethod allcloseMethod iscloseMethod anyMethod corrcoefMethod truncMethod prodMethod crossMethod true_divideMethod hsplitMethod splitMethod rintMethod ediff1dMethod lcmMethod gcdMethod cbrtMethod flipudProperty ndimMethod array2stringMethod set_printoptionsMethod whereMethod hstack
Char32 topics
check_circle
Mark as learned
thumb_up
1
thumb_down
0
chat_bubble_outline
0
Comment
auto_stories Bi-column layout
settings

NumPy | genfromtxt method

schedule Aug 11, 2023
Last updated
local_offer
PythonNumPy
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Numpy's genfromtext(~) method reads a text file, and parses its content into a Numpy array. Unlike Numpy's loadtxt(~) method, genfromtxt(~) works with missing numbers.

Parameters

1. fname | string

The name of the file. If the file is not in the same directory as the script, make sure to include the path to the file as well.

2. dtypelink | string or type or list<string> or list<type> | optional

The desired data-type of the constructed array. By default, dtype=float64. This means that all integers will be converted to floats as well.

If you set dtype=None, then Numpy will attempt to infer the type from your values. This may be significantly slower than setting the type yourself.

3. commentslink | string | optional

If your input file contains comments, then you can specify what identifies a comment. By default, comments="#", that is, characters after the # in the same line will be treated as a comment. You can set None if your text file does not include any comment.

4. delimiterlink | string | optional

The string used to separate your data. By default, the delimiter is a whitespace.

5. skiprows | int | optional

This parameter has been replaced by skip_header in Numpy version 1.10.

6. skip_headerlink | int | optional

The number of rows in the beginning to skip. Note that this includes comments. By default, skiprows=0.

7. skip_footerlink | int | optional

The number of rows at the end to skip. Note that this includes comments. By default, skiprows=0.

8. converterslink | dict<int,function> | optional

You can apply a mapping to transform your column values. The key is the integer index of the column, and the value is the desired mapping. Check examples below for clarification. By default, dict=None.

9. missing | string | optional

This parameter has been replaced by missing_values in Numpy version 1.10.

10. missing_valueslink | string or sequence<string> | optional

The sequence of strings that will be treated as missing values. This is only relevant when usemask=True. Consult examples for clarification.

11. filling_valueslink | value or dict or sequence<value> | optional

If a single value is passed then all missing and invalid values will be replaced by that value. By passing a dict, you can specify different fill values for different columns. The key is the column integer index, and the value is the fill value for that column.

12. usecolslink | int or sequence | optional

The integer indices of the columns you want to read. By default, usecols=None, that is, all columns are read.

13. nameslink | None or True or string or sequence<string> | optional

The field names of the resulting array. This parameter is relevant only for those who wish to create a structured array.

Type

Description

None

A standard array instead of a structured array will be returned.

True

The first row after the specified skip_header lines will be treated as the field names.

string

A single string containing the field names separated by comma.

sequence

An array-like structure containing the field names.

By default, names=None.

As a side note, structured arrays are not commonly used since Series and DataFrames in the Pandas library are better alternatives.

14. excludelistlink | sequence | optional

The passed strings will be appended to the default list of ["return", "file", "print"]. Note that an underscore will be appended to the passed strings (e.g. if "abc" is passed, then "abc_" will be appended to the default list). This is only relevant for those who wish to create a structured array.

15. deletecharslink | string of length one or sequence or dict | optional

The character(s) to delete from the names.

16. defaultfmtlink | string | optional

The format of the resulting field names. The syntax follows that of Python's standard string formatter:

17. autostriplink | boolean | optional

Whether or not to remove leading and trailing in the values. This is only applicable for values that are strings. By default, autostrip=False.

18. replace_spacelink | string | optional

The string used to replace spaces in the field names. Note that the leading and trailing spaces will be removed. By default, replace_space="_".

19. case_sensitivelink | string or boolean | optional

How to handle the casing of string characters.

Value

Description

True

Leave the casing as is.

False

Convert value to uppercase.

"upper"

Convert value to uppercase.

"lower"

Convert value to lowercase.

By default, case_sensitive=True.

20. unpacklink | boolean | optional

Instead of having one giant Numpy array, you could retrieve column arrays individually by setting this to True. For instance, col_one, col_two = np.genfromtxt(~, unpack=True). By default, unpack=False.

21. usemask | boolean | optional

Whether or not to return a masked boolean array. By default, usemark=True.

22. looselink | boolean | optional

If True, invalid values will be converted to nan and no error will be raised. By default, loose=True.

23. invalid_raiselink | boolean | optional

If the number of values in a row do not match up with the number of columns, then an error is raised. If set to False, then invalid rows will be omitted from the resulting array. By default, invalid_raise=True.

24. max_rowslink | int | optional

The maximum number of rows to read. By default, all lines are read.

25. encoding | string | optional

The encoding to use when reading the file (e.g. "latin-1", "iso-8859-1"). By default, encoding="bytes".

Return value

A Numpy array with the imported data.

Examples

Basic usage

Suppose we have the following text-file called my_data.txt:

1 2 3 4
5 6 7 8

To import this file:

a = np.genfromtxt("my_data.txt")
a
array([[1., 2., 3., 4.],
[5., 6., 7., 8.]])

Note that this Python script resides in the same directory as my_data.txt.

Also, the default data type is float64, regardless of whether or not the numbers in the text file are all integers:

print(a.dtype)
float64

Specifying the desired data type

Once again, suppose we have the following text-file called my_data.txt:

1 2 3 4
5 6 7 8

Instead of using the default float64, we can specify a type using dtype:

a = np.genfromtxt("my_data.txt", dtype=int)
a
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

Now, all the values have type float64.

You can also pass a list of types to assign different types to different columns:

a = np.genfromtxt("my_data.txt", dtype=[np.int,32 int, np.float,32 float])
a
array([(1, 2, 3., 4.), (5, 6, 7., 8.)],
dtype=[('f0', '<i4'), ('f1', '<i8'), ('f2', '<f4'), ('f3', '<f8')])

Here, the i4 represents int32 while i8 represents int64.

Note that this is a special type of Numpy array called structured array. This type of arrays is not often used in practise since Series and DataFrames in the Pandas library are alternatives with more feature.

Specifying a custom delimiter

Suppose our my_data.txt file is as follows:

1,2
3,4

Since our data is comma-separated, set delimiter="," like so:

a = np.genfromtxt("my_data.txt", delimiter=",")
a
1,2
3,4

Handling comments

Suppose our my_data.txt file is as follows:

1,2,3,4 / I'm the first row!
5,6,7,8 / I'm the second row!

To strip out comments in the text-file, specify comments:

a = np.genfromtxt("my_data.txt", delimiter=",", comments="/")
a
array([[1., 2., 3., 4.],
[5., 6., 7., 8.]])

Specifying skip_header

Suppose our my_data.txt file is as follows:

1 2 3
4 5 6
7 8 9

To skip the first row:

a = np.genfromtxt("my_data.txt", skip_header=1)
a
array([[4., 5., 6.],
[7., 8., 9.]])

Suppose our my_data.txt file is as follows:

1 2 3
4 5 6
7 8 9

To skip the last row:

a = np.genfromtxt("my_data.txt", skip_footer=1)
a
array([[1., 2., 3.],
[4., 5., 6.]])

Specifying converters

Suppose our my_data.txt file is as follows:

1 2
3 4

Just as an arbitrary example, suppose we wanted to add 10 to all values of the 1st column, and make all the values of the 2nd column be 20:

a = np.genfromtxt("my_data.txt", converters={0: lambda x: int(x) + 10, 1: lambda x: 20})
a
array([(11, 20),
(13, 20)], dtype=[('f0', '<i8'), ('f1', '<i8')])

Here, the "f0" and "f1" are the field names, and the "i8" denote a int64 data type.

Specifying missing_values

Suppose our my_data.txt file is as follows:

3,??
,6

All missing and invalid values are treated as nan, so you wouldn't need to specify missing_values="??" here:

a = np.genfromtxt("my_data.txt", delimiter=",")
a
array([[ 3., nan],
[nan, 6.]])

Note that is not possible to set the value 6, for instance, as a missing value. The missing_values comes into play only when you set usemask=True.

Here's usemask=True without missing_values:

a = np.genfromtxt("my_data.txt", delimiter=",", usemask=True)
a
masked_array(
data=[[3.0, nan],
[--, 6.0]],
mask=[[False, False],
[ True, False]],
fill_value=1e+20)

Notice how missing and invalid values are differentiated here - ?? has been mapped to nan with the mask boolean flagged as False, while an actual missing value has been mapped to -- with the masked boolean set as True.

Now, here's usemask=True with missing_values="??":

a = np.genfromtxt("my_data.txt", delimiter=",", missing_values="??", usemask=True)
a
masked_array(
data=[[3.0, --],
[--, 6.0]],
mask=[[False, True],
[ True, False]],
fill_value=1e+20)

The key here is that, ??, which is inherently an invalid value, is now treated like a missing_value.

Specifying filling_values

By default, all missing and invalid values are replaced by nan. To change this, specify the filling_values like so:

a = np.genfromtxt("my_data.txt", delimiter=",", filling_values=0)
a
array([[3., 0.],
[0., 6.]])

You could also pass in a dictionary, with the following key-value pairs:

  • key: the column integer index

  • value: the fill value

For instance, to set to map all missing and invalid values for first column to -1, and those for the second column to -2:

a = np.genfromtxt("my_data.txt", delimiter=",", filling_values={0:-1, 1:-2})
a
array([[ 3., -2.],
[-1., 6.]])

Reading only certain columns

Suppose our my_data.txt file is as follows:

1 2 3
4 5 6

To read only the 1st and 3rd columns (i.e. column index 0 and 2):

a = np.genfromtxt("my_data.txt", usecols=[0,2])
a
array([[1., 3.],
[4., 6.]])

Specifying names

Suppose our my_data.txt file is as follows:

3 4
5 6

To assign a name to each column:

a = np.genfromtxt("my_data.txt", names=("A","B"))
a
array([(3., 4.),
(5., 6.)],
dtype=[('A', '<f8'), ('B', '<f8')])

Here, we have assigned the name A to the first column. Note that f8 just denotes the type float64.

Specifying excludelist

Suppose our my_data.txt file is as follows:

3 4 5
6 7 8

To append a _ to certain names:

a = np.genfromtxt("my_data.txt", names=["A","B","C"], excludelist=["A"])
a
array([(3., 4., 5.), (6., 7., 8.)],
dtype=[('A_', '<f8'), ('B', '<f8'), ('C', '<f8')])

Notice how we have A_ as the field name for the first column.

Specifying deletechars

Suppose our my_data.txt file is as follows:

3 4
5 6

To remove the character "c" from the field names:

a = np.genfromtxt("my_data.txt", names=["Ab","BcD"], deletechars="c")
a
array([(3., 4.), (5., 6.)], dtype=[('Ab', '<f8'), ('BD', '<f8')])

To remove multiple characters:

a = np.genfromtxt("my_data.txt", names=["Ab","BcD"], deletechars=["c","A"])
a
array([(3., 4.), (5., 6.)], dtype=[('b', '<f8'), ('BD', '<f8')])

Specifying defaultfmt

Suppose our my_data.txt file is as follows:

3 4
5 6

If the returned result is a structured array, and the names parameter is not defined, then the field names take on the values "f0", "f1" and so on by default:

a = np.genfromtxt("my_data.txt", dtype=[int, float])
a
array([(3, 4.), (5, 6.)], dtype=[('f0', '<i8'), ('f1', '<f8')])

To customise this, pass the defaultfmt parameter:

a = np.genfromtxt("my_data.txt", dtype=[int, float], defaultfmt="my_var_%i")
a
array([(3, 4.), (5, 6.)], dtype=[('my_var_0', '<i8'), ('my_var_1', '<f8')])

Here, the %i is a placeholder for the column integer index.

Specifying autostrip

Suppose our my_data.txt file is as follows:

3,a, 4
5 ,b c,6

By default, all whitespaces that appear in the values are kept intact:

a = np.genfromtxt("my_data.txt", delimiter=",", dtype="U")
a
array([['3', 'a', ' 4'],
['5 ', 'b c', '6']], dtype='<U5')

If you want to strip away the leading and trailing whitespaces, set autostrip=True like so:

a = np.genfromtxt("my_data.txt", delimiter=",", autostrip=True, dtype="U")
a
array([['3', 'a', '4'],
['5', 'b c', '6']], dtype='<U3')

Notice how the whitespace in "b c" is still there.

Specifying replace_space

Suppose our my_data.txt is as follows:

3 4
5 6

By default, the non-leading and non-trailing spaces are replaced by _:

a = np.genfromtxt("my_data.txt", names=["A B", " C "])
a
array([(3., 4.), (5., 6.)], dtype=[('A_B', '<f8'), ('C', '<f8')])

Notice how the leading and trailing spaces have been stripped.

To replace the spaces by a custom string, set the replace_space parameter like so:

a = np.genfromtxt("my_data.txt", names=["A B", " C "], replace_space="K")
a
array([(3., 4.), (5., 6.)], dtype=[('AKB', '<f8'), ('C', '<f8')])

Specifying case_sensitive

Suppose our my_data.txt is as follows:

3 4
5 6

By default, case_sensitive is set to True, which means that the field names are left as is.

a = np.genfromtxt("my_data.txt", names=["Ab", "dC"])
a
array([(3., 4.), (5., 6.)], dtype=[('Ab', '<f8'), ('dC', '<f8')])

To convert field names to uppercase, either set "upper" or False:

a = np.genfromtxt("my_data.txt", names=["Ab", "dC"], case_sensitive=False)
a
array([(3., 4.), (5., 6.)], dtype=[('AB', '<f8'), ('DC', '<f8')])

To convert field names to lowercase, set "lower":

a = np.genfromtxt("my_data.txt", names=["Ab", "dC"], case_sensitive="lower")
a
array([(3., 4.), (5., 6.)], dtype=[('ab', '<f8'), ('dc', '<f8')])

Specifying unpack

Suppose our my_data.txt file is as follows:

1 2
3 4

To retrieve the data per column instead of a single Numpy array:

col_one, col_two = np.genfromtxt("my_data.txt", unpack=True)
print("col_one:", col_one)
print("col_two:", col_two)
col_one: [3. 5.]
col_two: [4. 6.]

Specifying loose

Suppose our my_data.txt file is as follows:

3 4
5 ??

By default, loose=True, which means that invalid values (e.g. the ?? here) are converted into nan:

a = np.genfromtxt("my_data.txt")
a
array([[ 3., 4.],
[ 5., nan]])

To raise an error if our file contains invalid values, set loose=False, like so:

a = np.genfromtxt("my_data.txt", loose=False)
a
ValueError: Cannot convert string '??'

Specifying invalid_raise

Suppose our my_data.txt file is as follows:

3,4
5
7,8

Here, the second row only contains 1 value even though the array seemingly has 2 columns.

By default, invalid_raise=True, which means that if the file contains invalid rows, then an error is raised:

a = np.genfromtxt("my_data.txt", delimiter=",")
a
ValueError: Some errors were detected!
Line #2 (got 1 columns instead of 2)

We can choose to omit invalid rows by setting it to False, like so:

a = np.genfromtxt("my_data.txt", delimiter=",", invalid_raise=False)
a
array([[3., 4.],
[7., 8.]])

No error is raised, but Numpy is nice enough to give us a warning:

ConversionWarning: Some errors were detected!
Line #2 (got 1 columns instead of 2)

Specifying the desired dimension

Suppose our sample.txt only had one row:

1 2 3 4

By default, loadtxt(~) will generate an one-dimensional array:

a = np.loadtxt("sample.txt")
a
array([1., 2., 3., 4.])

We can specify that we want our array to be two-dimensional by:

a = np.loadtxt("sample.txt", ndmin=2)
a
array([[1., 2., 3., 4.]])

Specifying max_rows

Suppose our my_data.txt file is as follows:

1 2
3 4
5 6

To read only the first two rows instead of the entire file:

a = np.genfromtxt("myy_data.txt", max_rows=2)
a
array([[1., 2.],
[3., 4.]])
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...