pyspark.pandas.Series.to_csv��

Series.to_csv(path: Optional[str] = None, sep: str = ',', na_rep: str = '', columns: Optional[List[Union[Any, Tuple[Any, ���]]]] = None, header: bool = True, quotechar: str = '"', date_format: Optional[str] = None, escapechar: Optional[str] = None, num_files: Optional[int] = None, mode: str = 'w', partition_cols: Union[str, List[str], None] = None, index_col: Union[str, List[str], None] = None, **options: Any) → Optional[str]��

Write object to a comma-separated values (csv) file.

Note

pandas-on-Spark to_csv writes files to a path or URI. Unlike pandas���, pandas-on-Spark respects HDFS���s property such as ���fs.default.name���.

Note

pandas-on-Spark writes CSV files into the directory, path, and writes multiple part-��� files in the directory when path is specified. This behavior was inherited from Apache Spark. The number of partitions can be controlled by num_files. This is deprecated. Use DataFrame.spark.repartition instead.

Parameters
path: str, default None

File path. If None is provided the result is returned as a string.

sep: str, default ���,���

String of length 1. Field delimiter for the output file.

na_rep: str, default ������

Missing data representation.

columns: sequence, optional

Columns to write.

header: bool or list of str, default True

Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.

quotechar: str, default ���������

String of length 1. Character used to quote fields.

date_format: str, default None

Format string for datetime objects.

escapechar: str, default None

String of length 1. Character used to escape sep and quotechar when appropriate.

num_files: the number of partitions to be written in `path` directory when

this is a path. This is deprecated. Use DataFrame.spark.repartition instead.

mode: str

Python write mode, default ���w���.

Note

mode can accept the strings for Spark writing mode. Such as ���append���, ���overwrite���, ���ignore���, ���error���, ���errorifexists���.

  • ���append��� (equivalent to ���a���): Append the new data to existing data.

  • ���overwrite��� (equivalent to ���w���): Overwrite existing data.

  • ���ignore���: Silently ignore this operation if data already exists.

  • ���error��� or ���errorifexists���: Throw an exception if data already exists.

partition_cols: str or list of str, optional, default None

Names of partitioning columns

index_col: str or list of str, optional, default: None

Column names to be used in Spark to represent pandas-on-Spark���s index. The index name in pandas-on-Spark is ignored. By default, the index is always lost.

options: keyword arguments for additional options specific to PySpark.

These kwargs are specific to PySpark���s CSV options to pass. Check the options in PySpark���s API documentation for spark.write.csv(���). It has higher priority and overwrites all other options. This parameter only works when path is specified.

Returns
str or None

Examples

>>> df = ps.DataFrame(dict(
...    date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')),
...    country=['KR', 'US', 'JP'],
...    code=[1, 2 ,3]), columns=['date', 'country', 'code'])
>>> df.sort_values(by="date")  
                   date country  code
... 2012-01-31 12:00:00      KR     1
... 2012-02-29 12:00:00      US     2
... 2012-03-31 12:00:00      JP     3
>>> print(df.to_csv())  
date,country,code
2012-01-31 12:00:00,KR,1
2012-02-29 12:00:00,US,2
2012-03-31 12:00:00,JP,3
>>> df.cummax().to_csv(path=r'%s/to_csv/foo.csv' % path, num_files=1)
>>> ps.read_csv(
...    path=r'%s/to_csv/foo.csv' % path
... ).sort_values(by="date")  
                   date country  code
... 2012-01-31 12:00:00      KR     1
... 2012-02-29 12:00:00      US     2
... 2012-03-31 12:00:00      US     3

In case of Series,

>>> print(df.date.to_csv())  
date
2012-01-31 12:00:00
2012-02-29 12:00:00
2012-03-31 12:00:00
>>> df.date.to_csv(path=r'%s/to_csv/foo.csv' % path, num_files=1)
>>> ps.read_csv(
...     path=r'%s/to_csv/foo.csv' % path
... ).sort_values(by="date")  
                   date
... 2012-01-31 12:00:00
... 2012-02-29 12:00:00
... 2012-03-31 12:00:00

You can preserve the index in the roundtrip as below.

>>> df.set_index("country", append=True, inplace=True)
>>> df.date.to_csv(
...     path=r'%s/to_csv/bar.csv' % path,
...     num_files=1,
...     index_col=["index1", "index2"])
>>> ps.read_csv(
...     path=r'%s/to_csv/bar.csv' % path, index_col=["index1", "index2"]
... ).sort_values(by="date")  
                             date
index1 index2
...    ...    2012-01-31 12:00:00
...    ...    2012-02-29 12:00:00
...    ...    2012-03-31 12:00:00