You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: readme.md
+39-20Lines changed: 39 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@
8
8

9
9

10
10
11
-
pdf2json is a [node.js](http://nodejs.org/) module converts binary PDF to JSON and text. Built with [pdf.js](https://github.com/mozilla/pdf.js/), it extracts text content and interactive form elements for server-side processing and command-line use.
11
+
pdf2json is a [node.js](https://nodejs.org/) module that converts binary PDF to JSON and text. Built with [pdf.js](https://github.com/mozilla/pdf.js/), it extracts text content and interactive form elements for server-side processing and command-line use.
12
12
13
13
## Features
14
14
@@ -20,45 +20,62 @@ pdf2json is a [node.js](http://nodejs.org/) module converts binary PDF to JSON a
20
20
21
21
## Install
22
22
23
-
> npm i pdf2json
23
+
You can install it using npm or bun:
24
24
25
-
Or, install it globally:
25
+
```bash
26
+
npm install pdf2json
27
+
bun add pdf2json
28
+
```
26
29
27
-
> npm i pdf2json -g
30
+
If you want to use the `pdf2json` CLI, you can install it globally:
28
31
29
-
To update with latest version:
32
+
```bash
33
+
npm install pdf2json -g
34
+
bun install pdf2json -g
35
+
```
30
36
31
-
> npm update pdf2json -g
37
+
## Usage
32
38
33
-
To Run in RESTful Web Service or as command line Utility
39
+
```javascript
40
+
importPDFParserfrom"pdf2json";
34
41
35
-
- More details can be found at the bottom of this document.
42
+
constpdfParser=newPDFParser();
43
+
```
36
44
37
-
## Test
45
+
The module is tested with [Node.js](https://nodejs.org/) 18+ and [Bun](https://bun.sh/) 1+.
38
46
39
-
After install, run command line:
47
+
## Test
40
48
41
-
> npm test
49
+
You can run tests in Bun, or in Node.js using Jest.
42
50
43
-
`pretest` step builds bundles and source maps for both ES Module and CommonJS, output to `./dist` directory. The Jest test suit is defined in `./test/_test_.cjs` with commonJS, test run will also cover `parse-r` and `parse-fd` with ES Modules via command line.
51
+
```bash
52
+
bun run test:bun # runs in Bun
53
+
bun run test:node # runs in Node.js using Jest
54
+
```
44
55
45
-
The default Jest test suits are essential tests for all PRs. But it only covers a portion of all testing PDFs, for more broader coverage, run:
56
+
The `pretest` script builds bundles and source maps for both ES Module and CommonJS, then outputs to `./dist` directory. The test suit is defined in `./test/p2j.test.js` with CommonJS, and will also cover `parse-r` and `parse-fd` with ES Modules via command line.
46
57
47
-
> npm run test:forms
58
+
The default test suits are essential tests for all PRs. But it only covers a portion of all testing PDFs, for more broader coverage, run:
48
59
49
-
It'll scan and parse _260_ PDF AcroForm files under _*./test/pdf*_, runs with _*-s -t -c -m*_ command line options, generates primary output JSON, additional text content JSON, form fields JSON and merged text file for each PDF. It usually takes ~20s in my MacBook Pro to complete, check _*./test/target/*_ for outputs.
60
+
```bash
61
+
bun run test:forms
62
+
```
50
63
51
-
_update on 4/27/2024_: parsing 260 PDFs by `npm run test:forms` on M2 Mac takes 7~8s
64
+
It'll scan and parse _260_ PDF AcroForm files under _*./test/pdf*_, runs with _*-s -t -c -m*_ command line options, generates primary output JSON, additional text content JSON, form fields JSON and merged text file for each PDF. It usually takes ~8s in my MacBook Pro to complete, check _*./test/target/*_ for outputs.
52
65
53
-
To run Jest test suits with commonJS bundle only
66
+
To run the test suite with CommonJS bundle only, run:
54
67
55
-
> npm run test:jest
68
+
```bash
69
+
bun run test
70
+
```
56
71
57
72
### Test Exception Handlings
58
73
59
74
After install, run command line:
60
75
61
-
> npm run test:misc
76
+
```bash
77
+
bun run test:misc
78
+
```
62
79
63
80
It'll scan and parse all PDF files under _*./test/pdf/misc*_, also runs with _*-s -t -c -m*_ command line options, generates primary output JSON, additional text content JSON, form fields JSON and merged text JSON file for 15 PDF fields, 12 are expected to success while the other three's exceptions are expected to catch with stack trace for:
64
81
@@ -70,7 +87,9 @@ It'll scan and parse all PDF files under _*./test/pdf/misc*_, also runs with _*-
70
87
71
88
After install, run command line:
72
89
73
-
> npm run parse-r
90
+
```bash
91
+
bun run parse-r
92
+
```
74
93
75
94
It scans 165 PDF files under _*./test/pdf/fd/form/*_, parses with [Stream API](https://nodejs.org/dist/latest-v14.x/docs/api/stream.html), then generates output to _*./test/target/fd/form/*_.
0 commit comments